"Superpositions and Successive Approximations in Exhaustive Vulnerability Testing" by Michal Zalewski [April 12, 2002] This is just a draft. A good chance this is all bogus. Never trust snake oil salesmen. 1) Abstract Conformance testing is a process of ensuring the validity of algorithm implementation by comparing the projected program behavior with formal functional specifications. We introduce the notion of "vulnerability testing" to denote a process of partial conformance testing intended to discover well-defined fault conditions in the code. In this model, formal specifications are replaced with pre-determined metrics of erratic behavior. Development of formal specifications for conformance testing is expensive, time consuming and prone to design problems. Almost all consumer grade systems do not have suitable specifications and do not conform to any particular design model. Market tension towards fast and cheap delivery of new features makes it impossible to even consider complete conformance testing procedures. For all the reasons mentioned, a significant value for vulnerability testing cannot be questioned. The most important problem with exhaustive conformance or vulnerability testing is that, for a typical application, the number and length of execution paths to be evaluated is excessively high. This makes the process impossible to perform. The commonly practiced approach is to sacrifice a certain amount of accuracy or to focus on a specific aspect of program conformance. Although there have been attempts to apply this principle to general purpose consumer grade software testing were successful, it typically incurs great added effort and high ratio of false positives / negatives generated. For the purpose of fully automated vulnerability testing, we assume that the presence of a vulnerability is constituted by a physical possibility of a given program to entering a defined dangerous state. We are not willing to accept false positives caused by simplified modeling of data flow or detection of potential, not actual, execution patterns. In other words, we want to perform an exhaustive search of all allowed internal states of the program, where "allowed" stands for paths that actually can be followed without causing a paradox. This paper is intended to provide an experimental framework for reducing the computational complexity of this process without compromising its accuracy. We will focus on the typical vulnerability testing process, but large portions of this work can be applied to general conformance testing tasks as well. This paper is not intended to provide a perfect time-sensitive solution that can be applied to any system. The paper will propose an approach that is sensible for most real-life scenarios and can be executed in many, but not all, cases. For this reason, presented results are usually not backed up by mathematical proofs but rather with empirical results. Keywords: conformance testing; white-box; vulnerability testing; software reliability; validation; fault specification 2) Definition of a vulnerability The common understanding of the term "vulnerability" is "a malicious condition that can be triggered by specific input to the program provided by an untrusted party". Even more narrow definitions are being used, such as the one proposed by CVE [1], limiting the meaning of this term to hostile attacks against the system. Unfortunately, such definitions are usually not suitable for software testing, being highly subjective and requiring case-by-case judgment in the context of given environment. Many security experts tend not to limit the meaning this way, and include spontaneous or indirectly caused failure to comply with the specifications. The rationale for this is that a program that is not reliable cannot be secure, as security is a superset of reliability. It is not possible to draw a line between what is dangerous and what is not without having extensive information about execution context (e.g. knowledge such as whether this application is going to run in a privileged environment, or how sensitive parts of this environment are). We use the term "vulnerability" to indicate a failure to pass a conformance test. Instead of actual functional specs, we use the specification that allows all behavior but certain constructions considered dangerous. This is done by introducing "vulnerability test" rules defined for specific locations in the code to identify such dangerous program states. Fractional conformance tests, or "vulnerability tests", are traps activated when the instruction pointer reaches a chosen point in the code. Such traps are supposed to evaluate internal state of the program and decide whether it does not match vulnerability specifications. If vulnerability specifications are met, the application fails its vulnerability tests. We operate under the assumption that both internal state parameters and the location at which to place this trap can be determined prior to running the program. For typical vulnerabilities such as buffer overflows, heap corruption, etc, this can be automated with a great success using static code analysis to find candidates for testing. Some cases of high level vulnerabilities exist only in the context of a previous execution path or other abstract concepts that are not a part of the internal state - for example, authentication bypassing can be identified only if extra information about the presence and result of a call to the authentication code is retained. This way, the check can take a different action when the routine was bypassed or never succeeded, and a different action when authentication was apparently successful. This would require inclusion of another factor in the process of judging whether the vulnerability is present - examining some characteristics of the execution path that led to this internal state. The proposed approach does not allow this, for it often operates on whole groups of execution paths, not single instances. Suggested way to deal with this problem is to insert appropriate neutral code, or "checkpoints", to set an additional bit of the internal state when some relevant point of the execution path is reached or other complex condition is met. The important information about the execution flow, as required for the analysis, is retained. To ensure that added code does not affect the execution path, and, thus, the ability to prove code conformance, "checkpoints" should operate on a separate memory area, and address space separation between the main code and the checkpoint code should be enforced by the code emulator. Note that high level vulnerability scenarios have to be evaluated on a case-to-case basis, and the necessity for adding several checkpoints in the code will not have a significant impact on feasibility of the process. The definition of a "vulnerability test" for the purpose of this publication is as follows: a well-defined set of characteristics of the internal state that have to be met at a certain location in the program in order for the code be considered not vulnerable. Note that vulnerability checks will be initially applied to a projection of possible variants of internal state, not a fully defined image - for this reason, vulnerability test cannot consist of a black-box function suitable for traditional exhaustive testing. Discussion on how the characteristics should be described and stored is beyond the scope of this paper and will be not discussed here. 3) Simplified models As for today, an exhaustive search of all the execution paths for a given program is considered highly not feasible for the reasons discussed in the next section. Because of that, code evaluations rely on simplified models, which are: a) static analysis: This approach [2] is based on the principle of finding potentially vulnerable patterns in the code without actually trying to model the execution flow. This method fails to detect any dynamic conditions, such as privilege dropping problems or virtually all conformance problems. It also tends to generate an excessive ratio of false positives for any complex code, mainly because it is impossible to determine code reachability or possible parameter domain for a function call. b) single or multiple execution path analysis: This approach [3] relies on tracing a subset of actual execution paths for the program. It is very good for debugging problems with a very limited set of input parameters, diagnosing known problems or building a partial model of program behavior. Multi-path variant is usually a version of the famous "fuzz" experiment [4], and typically provides very poor coverage of possible execution paths. It can provide a relatively good coverage of graph links, but will detect only the most trivial problems. c) graph analysis: Graph analysis methods [5] are based on building a tree of possible calls between certain procedural units (e.g. functions) in order to examine all potential execution paths. Basic assumption is that all information about internal state or input is discarded. This method is very effective for early debugging and coverage analysis or profiling, but cannot distinguish actual and potential paths in any way, thus will trigger enormous number of false positives when used for a general vulnerability pattern detection. d) simplified search: This approach [6] is based on simplified rules and certain assertions that are intended to reduce computational complexity of the execution path search. The major flaw is that certain shortcomings tend to generate false negatives in many cases, and this method cannot be use for formal proofs. e) abstract analysis: Abstract analysis [7] relies on building a high-level model of program functionality prior to performing any analysis. As with simplified search, most serious weakness of this model is that certain questionable assertions has to be made, and that high-level models are not always very susceptible to low-level analysis of basic problems. e) partial analysis: Certain methods [8] try to decrease computational complexity by limiting the scope of an analysis. This way, security is being reduced to the question of trusted data flow or a similar problem. There is usually a serious penealty, such as missing whole classes of possible flaws, but this is usually the best way to prove the conformity with some specific requirements in mission critical applications. Above methods are being combined together and have demonstrated their usability in the some phases of quality assurance and security testing. Unfortunately, all of them have one common flaw - are not exhaustive. Security vulnerabilities tend to appear in a very rare and subtle scenarios that survived the basic design and testing phases. It is possible and common for a code with 2^1000 possible execution paths that 2^30 exhibit the same potentially vulnerable construction, and only one can actually result in a real vulnerability. Methods that use a simplified approach will very likely miss the one case that is actually vulnerable, or report numerous false positives that will very likely be ignored along with the single hit. For mission-critical applications, it is feasible to redesign the application to avoid all potentially dangerous conditions, but for already designed and developed consumer grade software, we can't afford eliminating only hypothetical possibilities or reimplementing the project. Another reason why exhaustive search is a must for precise vulnerability elimination is that, in the world of security, quantitative differences are a minor issue. In other words, a program with one security vulnerability is exactly as insecure as any software with 100 vulnerabilities of comparable kind. Such differences simply affect the probability of vulnerabilities being found. Unfortunately, human-driven selective vulnerability analysis appears to be much more effective than human- or machine-driven exhaustive search, and it is easier to discover single bugs over time than to eliminate them all in a single run. In many ways, the proposed approach is actually a case of simplified search, but, unlike others, it is supposed to provide the definitive answer or no answer to the question of conformance. Traditional methods, except for partial analysis, typically give "maybe yes" and "maybe no" results in all cases. The proposed approach is based on certain computational tricks that should work well for a typical real-life code. The only consequence of the assumptions being inapplicable to a single case is coming back to excessive computational complexity, and, effectively, inability to use this model in this particular case. There is no and can be no guarantee of applicability of the proposed approach to all programs, especially ones for which there is no be easily determined termination / reset points for separable functional blocks that are supposed to be reached in finite time, as discussed earlier. The premise that there is very few vulnerabilities in the code, and only a marginal fraction of all possible execution paths lead to them, is one of foundations behind the approach discussed later in this paper. If a significant fraction of execution paths leads to a vulnerability (which can be the case in early stages of software testing), methods discussed above perform much better for debugging - at least as long as the vulnerable-to-not-vulnerable path ratio is high. 4) Computational complexity Most difficult problem with exhaustive execution path search is the computational complexity caused by the number of possible input states. The number of individual paths the program can follow is typically overwhelming and it is practically impossible to follow them all. For example, a small program that takes 128 bytes of effective input can have 179769313486231590772930519078902473361797697894230657273430081157732675 805500963132708477322407536021120113879871393357658789768814416622492847 430639474124377767893424865485276302219601246094119453082952085005768838 150682342462881473913110540827237163350510684586298239947245938479716304 835356329624224137216 possible individual tracks that have to be examined. By effective input we understand any parameters that can potentially be be read by the program, not necessarily in the same time or in the same execution path. Note that in the real world, the input poll size does not have to be bounded in any particular way. Many programs are capable of processing megabytes of streamed data coming from the network. Artificial processing limits sometimes has to be imposed on the program in order to pass the exhaustive search in finite time, but reasonable limits that can be enforced in real life do not make computations any more feasible. In the world of computer security, "input" is very often used to denote a data that comes from an untrusted source - from network, from some user-dependent sections of the system, but not, for example, system configuration. It is relatively safe to accept this rule for exhaustive testing, as most applications are not required to be immune to faulty contents coming from trusted channels. The only downside is that we end up testing a specific configuration, but typically it is of no value to discover that trusted users can misconfigure the program and make it behave erratically. In some very specific cases, variable initial parameters such as configuration can be introduced to the input poll, but it is not recommended as a general practice. Another issue is the length of an individual path. While the Halting Problem [9] can be answered for finite computers in finite time, it might be excessively time consuming to follow a single path that can, e.g., be looped but "practically never" return to the original internal state, and have some exit condition that cannot be determined in a simplified way. It is impossible to find the answer to a question about any given execution path in time shorter than time needed for this path to be executed, having the same computation capabilities. It is possible for certain cases, but not in general. Computational complexity is hardly a result of code complexity. In other words, in order for every execution path to consist of an unique code in our trivial "128 bytes" example, if every path consisted of only one instruction, and a programmer could write 100 lines per minute, it would take an army of 10,000 programmers coding non-stop for 342026852 1427541681372346253403776129410154069524936401701485562361732796908313606 0256559612330200917260295639244937853435842802285847911433190150426079551 8222541654156897803037104679606632599161820934079309371706313072074471961 342225862082459716965683695796584068299543278626214763539392761332438 years to write this code. And the resulting binary would have at least 209 27902484106783612273926739453160362527437728623703270385749772858418967283 90864244528083624405972905458345542095989892943643136117800866403237807558 31539139347026852035761434005363380124436364803792620176688964523084790378 88217888995203019681763505021868120481527671211777014946532005541417320448 terabytes. Typical programs have significantly bigger input polls and, quite obviously, are dramatically smaller. This implies very heavy code reuse across different execution paths and, typically, within every path. Because of that, graph analysis is possible and very feasible. The main reason for computational complexity is the size of input poll and resulting spectrum of internal program states. The proposed approach will try to reduce the problem of possible input combinations to the problem of potential program flow paths. While the number of execution paths is a serious issue, once we have several paths selected from the poll, we can evaluate them much easier in a reasonable time for most, but not all, projects - as long as we make some very rough functional assumptions, such as the maximum processing time allowed for the instruction pointer to leave the code that can eventually reach tagged location of a vulnerability check (point of no return). This is a common approach for single or multi-path mapping analysis. No exit from the block in given time means no definitive answer as to the presence of a vulnerability, but can be considered a fault condition itself: an excessive processing time in a single functional unit. This is trivial for many repetitive processing services, such as network services, one-shot applications, or codes where "points of no return" can be determined - but is not applicable to large continuous processing systems. Such systems can be, however, treated as many smaller functional blocks that are supposed to go to the same initial state after processing a portion of data. Few segments that do not work this way have to be evaluated by a human or proven valid in other way, but are not very common (state-machine command interpreters with potentially infinite input stream length are one of examples). Such systems are relatively self-contained, and abstract analysis [7] may work very well for them. Past this point, the block can be considered a black-box, and all valid results can be mapped to input of the program. Enumeration is preferred over direct mapping because it elliminates false positives caused by tracing input patterns that could be never generated by the black box segment. 5) Proposed approach The discussion below is based on the assumption that program is converted to a simple assembly language level representation that consists of a set of opcodes for reading input into memory, memory modification, memory arithmetics, substitution, conditional and unconditional jump. Any computer program can be translated into this simplified notation. Other, more advanced Turing-complete models can be exercised, but are not necessarily covered by this writing. First proposed paradigm is intended to decrease the computational complexity caused by input poll size. By "superposition" we understand an additional, third discrete state of uncertainty, a binary value that cannot be determined at this moment. We introduce this third state to the state machine, going from binary to three-bit computer. This increases the dimensions of internal state poll by the factor of two. Instead of exercising all possible input states, we start with all "untrusted input" bits set to the state of uncertainty. Some fixed portions of the input poll, such as configuration parameters, can be represented by a fixed binary data. Every time any "indeterminate" bit is involved directly in a computation of a new internal state, all affected bits are changed to "indeterminate". Note that indirect influence, such as by conditional expressions, is not to be considered at this point. As soon as an expression that differentiates instruction pointer based on any indeterminate value - such as a conditional construction - is reached, code execution is forked into an appropriate number of branches for all possible outcomes of the expression. Once again, very simple program representation described above allows making faster decisions in such cases, and limits us to very trivial two-branch splits where branches have different resulting instruction pointers. Note that while instruction pointers are modified to denote two possible outcomes of the conditional expression, the tested value itself is not modified and does not become determinate in any branch. There is no attempt made to differentiate the branches by defining this value. This will result in a number of "impossible" paths leading to a paradox being forked and traced over time. Certain constructions, such as calling user-provided addresses in a wide address space (such as 32- or 64-bit), are very difficult to trace this way. They are also very difficult to be translated to a simplified notation in a reasonable manner. Our assumption is that such constructions are very rare, and, generally speaking, should be considered dangerous and flawed on first sight. We assume that there is no code that directly modifies instruction pointer by setting it to some potentially modifiable variable and that all cases of pointer modification use values fixed in the code. Programs that are using variable code pointers and cannot be simply translated into fixed-pointer equivalents are to be considered not suitable for this type of analysis. The majority of programs that use variable pointers to just a small set of functions can be easily translated into their fixed pointer equivalents. As soon as any execution path enters the same state as at one of previously encountered addresses within any path, it is terminated and the end is joined with an appropriate location in the matching path. We use the notion of "branch collapse" to refer to this pattern. Because indeterminate values are retained, and, as discussed earlier, code reusability in all possible paths is typically high, this saves a significant computational effort, as we get rid of branches that can be differentiated only by the bit poll taken from the input poll as long as some outcome of the input data does not affect execution path. Only when determinable sections of internal state are different in this particular branch, it is considered a separate case and not terminated. As a result, only one path is followed as long as the input values do not influence instruction pointer changes. This saves a significant computational effort. As soon as one path results in the same map of indeterminate user-dependent data and same determinate sections of internal state at same address, it is reduced to one of already explored paths and the branch collapses. For a typical application, this reduces the number of paths to be examined very drastically because code reusability is handled effectively. There are certain constructions for which this is not true, and we will discuss them later. When an address marked as a potential vulnerability point is reached, the internal state is tested to see if it can possibly match a vulnerable state. While we do not have the exact data, we have some fixed information and several "unknowns", or wildcards, so initial pre-selection can be performed easily. As a result, based on the premise that there is very few vulnerabilities in the program and on the assumption that vulnerability check is reasonably placed in the code and is sufficiently strict, for typical code we can select just few paths that have to be actually examined in detail as potential fault scenarios. Last stage of the analysis resorts to a traditional exhaustive search of remaining paths, but the poll is very dramatically reduced. One of problems with this approach is that it can still result in an excessive number of "impossible" paths being forked in certain loop constructions that rely on user input and, at the same time, modify internal loop counters, etc. In some situations, forked branches would not return to the same state as one of previous branches too fast and will continue forking new branches from the same loop. While all branches will represent exactly the same code, the difference between all of them is caused by few bits of internal state used for the counter, etc. Only some constructions actually trigger this problem, others will have "impossible" branches that collapse back into the main branch and cause linear, not geometrical complexity. The problem can be eliminated by successive approximations of the internal state. To perform this task, we have to divide the internal state in a smaller number of "bit clusters". This less granular approach can and will result in inability to get precise information about separate bits that were used to build a cluster. Exact alignment, uniformity, continuity, state description capabilities and size of clusters can be evaluated to suit the program structure most efficiently and is beyond the scope of this paper. For now, we assume uniform clusters identified only by one of four states, "all zero", "mixed", "all one", "user-dependent". This approach will result in much more potential execution paths forked from standard not looped conditional expressions, most of them being impossible, but the number should not extremely exceed the dimensions of a complete call graph for given program because branches will collapse back almost immediately. This approach provides much more effective identification of loop conditions, as counter is being identified as a single cluster or a very small number of clusters, not, par example, 32 bits resulting in 4294967296 possible combinations. While the number of pre-selected paths will be higher because of introduced ambiguity, and it will keep on increasing with the decrease of granularity, it solves the problem of certain loop constructions resulting in absurd computational complexity. For better results, clustering can be applied only to certain bits that can be automatically identified during the process of execution path analysis as causing excessive number of very similar branches. Then, the new internal state approximation model can be automatically applied to existing paths so results can be simplified in the fly. Once again, it can be proven that this approach will not always generate an useful and feasible set of candidate paths or that it can be always performed in a feasible time. However, typically audited code, in connection with reasonable identification of functional blocks, placing vulnerability tests at possibly earliest stages and reasonable clustering of internal state seems to yield satisfying results. Improper code sequencing will result in presence of many insolvable execution paths that take excessive time to execute. Vulnerability tests located near exit conditions, far from the possible origin, will very often result in possible error propagation and excessive dimensions of the candidate set. Improper cluster selection can result in too vague or too precise internal state data, making candidate selection or path evaluation very difficult, respectively. Last stage of analysis is candidate elimination. Previously chosen execution paths should be followed in a traditional way. Every bit of input should be uniquely traced, and as much input data bits as possible should be determined in order to pass conditional constructions that are responsible for this particular execution path. Impossible paths (ones for which previously determined input bit would have to be different in order to follow this path in subsequent conditionals) should be rejected. As there is a chance that many paths will overlap in certain sections, it might be reasonable to deploy additional parallel search optimization algorithms. Once most "impossible" paths are eliminated, other methods, such as high-level analysis of the execution path to determine what input is required to trigger this condition, can be used. Generally speaking, computational complexity required to determine at least one input set or to find one possible path to the fault condition, if such path exists, should be relatively low. This algorithm is designed to deal with one untrusted, indeterminate source of data (input) at one time. For applications that deal with two different input sources of different trust levels, and the nature of vulnerability is defined as some not acceptable interaction between data from different sources, there are three possible solutions. One is to separate two segments in the code, one responsible for dealing with first source, second for another, and test them separately, evaluating the interface later. This, of course, is not always possible. Another approach is to introduce three different types of uncertainty: "from A", "from B", "mixed". This, quite unfortunately, can make computations more difficult. Another approach is to fall back to data flow analysis models [8] for assessment of this kind of vulnerabilities. Additional improvements can include further reduction of internal state poll dimensions by interpreting certain dynamic memory management commands provided by the language the program is written in. 6) Notes It should be clear that presented results are premilinary and purely experimental. Their usability for code analysis has to be evaluated and further analysis or extensions of this work are necessary. If confirmed, this model can have a significant impact on actual vulnerability testing and software security assesment. 7) References [1] Common Vulnerabilities and Exposures, terminology http://cve.mitre.org/about/terminology.html [2] ITS4: A Static Vulnerability Scanner for C and C++ Code http://citeseer.nj.nec.com/viega00its.html [3] Fenris: execution path analysis tool http://razor.bindview.com/tools/fenris/ [4] Fuzz Experiment http://epsych.msstate.edu/adaptive/Fuzz/fuzzApplet.html [5] Program Visualization at IBM Research http://www.research.ibm.com/pvres/ [6] The Feasibility of Performing Conformance Testing Using Statistical Measures of Correctness http://www.nist.gov/itl/div897/ctg/nm/statmeas.htm [7] An Abstract Analysis of the Probabilistic Termination of Programs http://www.di.ens.fr/~monniaux/biblio/Monniaux_SAS01.pdf [8] A Lattice Model of Secure Information Flow http://pag.lcs.mit.edu/6.893/readings/denning-cacm76.pdf [9] Computability and Complexity - The Halting Problem http://www.csc.liv.ac.uk/~ped/teachadmin/algor/halt.html