Context Sensitive Analysis

A method for handling methods

Christian Gram Kalhauge

Table of Contents
  1. Why Method Calls are complicated.§1
    1. The Trivial Solution: In-lining§1.1
      1. Abstract Call Plumbing§1.2
      2. Context Sensitive Analysis§2
        1. Full Call Stack Context§2.1
          1. (k-CFA) Limited Context Sensitivity§2.2

          In this topic, we are going to take a look at how to make static analysis work on method calls.

          Why Method Calls are complicated. §1

          In our previous chapters, every time we encountered a jump instruction it was to a known location. This is called a statically known jump. If the jump is only known at run-time, we call it a dynamic jump. Method calls, or specifically returning from a method, are one of the most common cases of dynamic control flow. On a static method call the jump site is statically known, however, when we want to return, it depends on the caller. So when running a static analysis we don't know where to return the results.

          This is best illustrated by an example:

          Dynamic Control Flow

          int id(int i) { 
            return i; // Where to return to?
          } 
          
          int main() { 
            int x = id(1);
            int y = id(0);
            return x + y;
          }

          Of cause this problem gets even worse if we have dynamic dispatching, like when we call a method on an object. In this case we don't even know the call site.

          The Trivial Solution: In-lining §1.1

          The naive solution to this problem is to inline the method call. In theory, this means expanding the content of the callee-method, the method that is being called, inside the caller-method, and then continuing normal abstract interpretation.

          In-lining

          int id(int i) { 
            return i; // Where to return to?
          } 
          
          int main() { 
            int x = id(1);
            int y = id(0);
            return x + y;
          }

          Becomes

          int main() { 
            return 1 + 0;
          }

          This works very well in some situation, and is actually the preferred way to do optimizations in compiles as we can specialise the method call to the exact situation is is called in. However, the method has many drawbacks, it can be hard to do the in-lining correctly, we can not reuse the analysis across calls, and worst of all we can not handle recursive calls.

          Abstract Call Plumbing §1.2

          An alternative solution is to think about how the computer actually calls a method. When the computer calls a method, it stores a return address on the stack. When we return from a method, we use this return address as the target.

          Parameters for caller Locals of caller Return Address Parameters for callee Locals of callee Return Address Enter your text here Frame Pointer Stack Pointer top of stack Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here Enter your text here

          The call stack. Notice how the return address is stored secretly among the local variables. By R. S. Shaw (modified) - Own work, Public Domain link

          So one solution to the call problem, is to keep track of a set of return address alongside the current state. When we now reach a return call, we have a set of targets. We can use this set to update the state with the return value at each of those targets.

          In theoretical terms, your state is now a map from program points to a set of return states (2ι) as well as an abstraction of the powerset of states 𝐒𝐭𝐚𝐭𝐞.

          ι2ι×𝐒𝐭𝐚𝐭𝐞

          Notice that we do not have to abstract the set of return addresses, as it is finite (and normally quite small, e.g ., less that 100).

          Abstract Call Plumbing

          By keeping track of an overapproximation of return locations we can run abstract analysis as we are used to. After running the analysis for the first call we get the following:

          int id(int i) { 
            // { main:1} x {+}
            return i;       
            // updates the results of main:1 with {+}
          } 
          
          int main() { 
            int x = id(1); // result {+} 
            int y = id(0); // yet not computed
            return x + y;  // yet not computed
          }

          And after doing the second call we get the following:

          int id(int i) { 
            // { main:1, main:2} x {0, +}
            return i;       
            // updates the results of main:1 and main:2 with {0, +}
          } 
          
          int main() { 
            int x = id(1); // result {0, +} 
            int y = id(0); // result {0, +} 
            return x + y;  // result {+}
          }

          However, as we can see this comes at a cost. Keeping only a single instance of a method, means that every use of that method leaks into every other. If we look at the caller side of the problem we get the following:

          The Single Context Nightmare

          If we change the + with a /, we can see that our analysis finds a bug, even though it can never happen:

          int id(int i) { 
            // { main:1, main:2} x {0, +}
            return i;       
            // updates the results of main:1 and main:2 with {0, +}
          } 
          
          int main() { 
            int x = id(1); // result {0, +} 
            int y = id(0); // result {0, +} 
            return x / y;  // result BUG!
          }

          The issue is that we we do not save the context of the call, which makes the result too wide.

          Context Sensitive Analysis §2

          Context Sensitive Analysis is about expanding the abstract state to be able to react to the context in which the method was called.

          In our example from the last section, it would be nice to differentiate between the two call sites a(1) and a(0), as they produce different results.

          Full Call Stack Context §2.1

          One way of doing this, is by making the state of a bytecode instruction be dependent on the call stack. We can write the total state as a map from the bytecode offsets (e.i the stack) an abstract state with return values:

          ι+2ι*×𝐒𝐭𝐚𝐭𝐞

          This has the advantage of giving us precise static information. In our example, from above the two calls would be differentiated by their call stack (e.i. they are called from two different point caller), and we can remove the faulty warning:

          Full Stack Analysis

          Consider this program. The only way to know that it does not throw an error is to track the full callstack.

          int main() {
            int x = id(0); // main:1 -> {} x {0}
            int y = id(1); // not yet computed
            return x / y;  // not yet computed
          }
          
          int id(int x) {
            // main:1,id:1 -> {main:1} x { 0 }
            return x  
          }

          But now when we analyse a new call site we get the following:

          int main() {
            int x = id(0); // main:1 -> {} x {0}
            int y = id(1); // main:1 -> {} x {+}
            return x / y;  // main:1 -> {} x {0}
          }
          
          int id(int x) {
            // Now we keep two state:
            // main:1,id:1 -> {main:1} x { 0 }
            // main:2,id:1 -> {main:2} x { + }
            return x  
          }

          (k-CFA) Limited Context Sensitivity §2.2

          The problem with the approach above is two-fold. The first problem is space, as we have to keep a version of state for each reachable call-stack, this can quickly explode in size. The second problem is recursive methods. In recursive methods, the call-stack is often dependent on the input (and might be infinite), and can make the analysis run forever.

          As with everything else in static analysis, we have to abstract. Instead of keeping the entire call-stack, we just keep the last k call-sites, this is called k-CFA (Might 2010). Essentially, keeping no more than k+1 bytecode offsets.

          In object oriented programming languages are k2 often feasible, and polynomial (Might 2010), however in functional languages it is EXPTIME-complete meaning that there exist no polynomial-time algorithm that solves the problem.

          In Java, it is often fine (and necessary) to choose k=1 to get a good analysis.

          Until next time!

          Until next time, write the best analysis you can and upload the results to Autolab, and ponder the following:

          1. Why are method calls complicated?

          2. Why would you use context sensitive analysis?

          3. What is the drawback of a context sensitive analysis?

          Bibliography

          1. Might, Matthew; Smaragdakis, Yannis; Van Horn, David (2010). Resolving and exploiting the k-CFA paradox. doi:10.1145/1809028.1806631 link

          2. Smaragdakis, Yannis; Kastrinis, George; Balatsouras, George (2014). Introspective analysis. doi:10.1145/2666356.2594320 link