Context Sensitive Analysis

Christian Gram Kalhauge

In this chapters, we are going to take a look at how to make static analysis work on method calls.

1. Why Method Calls are complicated.

Method calls are one of the most common cases of dynamic control flow. In our previous chapters, every time we encountered a jump instruction, it was to a known static location, with dynamic control flow the jump location is only known at run-time. While the jump site when calling a static method is known at compile time, the return call is unknown -- it depends on the caller.

1.1. Inlining

The naive solution is to semantically inline the method call. In theory, this means expanding the content of the callee-method inside the caller-method, and then continuing normal abstract interpretation.

This is a bad idea, for multiple reasons, it can be hard to correctly change the code, we can not resuse the analysis across calls, and worst of all we can not handle recursive calls.

1.2. Abstract Calls

The alternative solution is to think about, how the computer actually calls a method. When the computer calls a method, it stores a return position on the stack. When we return from a method, we use this return position as the target.

So one solution to the call problem is to at each state to keep track of a set of return values alongside the current state. This can be a concrete set, because there can only be a finite number of call-sites. Now, when we have step return calls we can merge the return value into each of those return states.

In theoretical terms, your state is now a map from program points to a set of return states (2ι) as well as an abstraction of the powerset of states 𝐒𝐭𝐚𝐭𝐞.

ι2ι×𝐒𝐭𝐚𝐭𝐞

The analysis becomes something like this:


# The step function, now either returns a next state, or 
# a return value that can be blended with the caller state
def step(pc, state : AbstractState) -> Iterator[(PC, NextState | ReturnValue)] :
  # Partial implementation of new things:
  match bytecode[pc_]:
    case Call(method):
      # Create a call_state from the locals with return to this pc
      call_state = AbstractState.from_locals(state.take_args(len(method.args)))
      call_state.add_return(pc)
      yield (PC(method=method, offset=0), NextState(call_state))
    case Return():
      value = state.pop_stack()
      for rp in state.returns: 
          yield (rp, ReturnValue(value))
  ...

def analyse(pc: PC, inputs: list[AbstractValue]):
  # initialize the state and the worklist
  state : dict[PC, AbstractState] = dict()
  state[pc] = AbstractState.from_locals(locals)
  worklist = [pc]


  # work your way through the worklist
  while worklist:
    pc_ = worklist.pop()
 
    # the step function now can either return a state or 
    # return a value.
    for (next_pc, next_action) in step(pc_, state[pc_]):
      match next_action:
        # We can still return a new next state as we are used to
        case NextState(next_state):
          next_state = next_state
        # But now we can also return a single value.
        case ReturnValue(value):
          # The next_pc points to the method call, so we need to pop the arguments from 
          # the stack and replace them with the return value.
          next_state = AbstractState.simulate_call(state[next_pc], value)
          # also update the pc to the next position.
          next_pc = next_pc + 1
     
        # continue abstract interpretation as usual
        new_state = state[next_pc] | next_state 
        if new_state != state[next_pc]: 
          state[next_pc] = new_state
          worklist.append(next_pc)
          

2. Context Sensitive Analysis

will be added soon