SCS Undergraduate Thesis Topics
|Alexander Reece||David Brumley||Evaluation of Loop Nesting Forest Algorithms|
Advanced control-flow analyses are often built on a technique known as structural analysis (SA), which iteratively transforms a control-flow graph (CFG) into a control-flow tree with leaves corresponding to CFG nodes and internal nodes corresponding to CFG regions. An iteration in SA typically proceeds by first identifying a region in the CFG that matches one of the known high-level control structures such as if-then-else and do-while, then contracting the region into a new node. In practice, however, many CFGs contain large regions that do not match any high-level control structures (e.g., due to the use of break), thus making control-flow analysis less precise. This is why researchers have recently started complementing SA with iterative refinement, which is the idea of using iteratively removing of a carefully-chosen arc to allow SA to continue and discover the finer structures embedded inside such regions. Unfortunately, their experience shows that straightforward implementations of SA with iterative refinement tend to be bug-ridden. In particular, the addition of refinements substantially complicates the control flow of the SA algorithm itself.
Our insight is that we can partition the arcs removed during refinement into two groups depending on whether an arc is a backward feedback arc or not. (A feedback arc is an arc whose removal breaks a cycle, and the arcs removed by refinement are all backward.) This naturally leads us to a new, conceptually-clean implementation strategy by handling cyclic and acyclic regions in two independent phases. The first phase preprocesses a CFG into a so-called "loop-nesting forest" (LNF) using existing loop identification algorithms in the compiler literature. The result is a hierarchical decomposition of the CFG into acyclic regions, with all candidate feedback arcs identified. The second phase applies SA in each of the acyclic regions, which can be implemented much easier since each region is guaranteed to be acyclic.
I propose to extend the CMU Binary Analysis Platform (BAP) with LNF algorithms from the literature. This will enable us to evaluate the effectiveness of the aforementioned two-phase structural analysis algorithms versus the existing mixed implementation in BAP.