Unbounded Static Analysis

Christian Gram Kalhauge

The goal of this chapter is to ask and answer the following questions:

What is the primary problem when doing unbounded static analysis?
Why do we need a widening operator?

In this section, we are going to cover a new integer abstraction, namely the interval abstraction, fixpoints, the widening operator, and a more efficient algorithm.

1. Unbounded Static Analysis

The problem with bounded static analysis is that even if we run our static analysis for 1.000.000 iterations, we have no gurantee that our program will not exhibit bad behaviour in iteration 1.000.001. Unbounded Static Analysis seeks to fix this, by running the static analysis until a fixed point is reached.

Ideally, we want the smallest such fixed point, which we refere to as the Minimal Fixed Point.

Minimal Fixed Point

Given a function $f : A \to A$ , that is monotone over $(A, ⊑_{A})$ $(\forall a \in A . a ⊑_{A} f (a))$ then the minimal fixed point is the least $n$ st. $f^{n} (⊥) = f^{n + 1} (⊥)$ .

$f^{n} (⊥)$ means $f (f (\dots f (⊥) \dots))$ , were $f$ is applied $n$ times.

Not all functions have a fixed point, but if it does, we can run the function until the result no longer changes, and then we we know we know we don't have to continue running the analysis.

For example given our Sign abstraction we are guranteed a fixed point in no more than 4 iterations, as the height of the latice is 4.

1.1. The Interval Abstraction

However, not all abstractions form a lattice of finite height. One example of such abstractions are the interval abstraction.

Here we represent a set integers as a interval between two numbers.

Interval Abstraction

The interval abstraction $𝐈 𝐭 𝐯 𝐚 𝐥$ is a tuple of two numbers $i$ , $j$ , representing the interval of integers between $[i, j]$ , both inclusive.

\begin{matrix} α (S) & = & [min S, max S] \\ γ ([i, j]) & = & {n \in ℤ | j \geq n \leq i} \\ [i, j] ⊑_{𝐈 𝐭 𝐯 𝐚 𝐥} [k, h] & \equiv & k \leq i \land j \leq h \\ [i, j] ⊔_{𝐈 𝐭 𝐯 𝐚 𝐥} [k, h] & \equiv & [min {i, k}, max {j, h}] \\ [i, j] ⊓_{𝐈 𝐭 𝐯 𝐚 𝐥} [k, h] & \equiv & [max {i, k}, min {j, h}] \\ ⊤_{𝐈 𝐭 𝐯 𝐚 𝐥} & = & [- \infty, \infty] \\ ⊥_{𝐈 𝐭 𝐯 𝐚 𝐥} & = & [\infty, - \infty] \end{matrix}

Implement the Interval abstraction.

Now implement the range abstraction in your language of choice. Remember to test your abstraction:

from hypothesis import given
from hypothesis.strategies import integers, sets

@given(sets(integers()))
def test_interval_abstraction_valid(xs):
  r = Interval.abstract(xs) 
  assert all(x in r for x in xs)

@given(sets(integers()), sets(integers()))
def test_interval_abstraction_distributes(xs, ys):
  assert (Interval.abstract(xs) | Interval.abstract(ys)) == Interval.abstract(xs | ys)

Implement Operations on Interval Abstraction.

Define the arithmetic operations you need to run your static analysis.

You can use hypothesis to test them, remember you want the operations to be an over approximation:

@given(sets(integers()), sets(integers()))
def test_interval_abstraction_add(xs,ys):
  r = Interval.abstract(xs) + Interval.abstraction(ys)
  assert all(x + y in r for x in xs for y in ys)

Notice that the latice of the interval abstraction does not have a finite height, we can always add 1 to either end of the interval.

Consider the following program:

void example() {
  int i = 0;
  while (i >= 0) {
    i = i + 1;
  }
}

In this case our interval analysis will look something like this:

\begin{matrix} 1 & {i \mapsto [0, 0]} \\ 2 & {i \mapsto [0, 0]} & 𝚜𝚎𝚎 𝚝𝚑𝚊𝚝 𝚒 >= 𝟶 \\ 3 & {i \mapsto [0, 0] + [1, 1]} & 𝚒 = 𝚒 + 𝟷 \\ 4 & {i \mapsto [0, 0] ⊔ [1, 1]} & 𝚕𝚘𝚘𝚙 𝚋𝚊𝚌𝚔 \\ 5 & {i \mapsto [0, 1]} & 𝚜𝚎𝚎 𝚝𝚑𝚊𝚝 𝚒 >= 𝟶 \\ 6 & {i \mapsto [0, 1] + [1, 1]} & 𝚒 + 𝚒 + 𝟷 \\ 7 & {i \mapsto [0, 1] ⊔ [1, 2]} & 𝚕𝚘𝚘𝚙 𝚋𝚊𝚌𝚔 \\ 8 & . . . & 𝚊𝚗𝚍 𝚜𝚘 𝚘𝚗 \end{matrix}

This might continue forever, unless we find a better way.

1.2. The Widening Operator

[ 1 ]

Because, we are not guranteed to reach a fixed point with the interval abstraction, we need to force it. This is known as the widening operator, the goal of the widening operator $\nabla$ is to dramatically over approximate joins in a way that we are guranteed a fixed point in a finite number of steps.

One way of adding a widing operator to the interval abstraction is to us a constant set, often refered to as $𝐊$ . This set contains all the constants in the program or file of interest. This is like the dictionaries we used in the dynamic analysis.

To define the widing operator we define two operators ${min}_{K} J$ and ${max}_{K} J$ which given a set of intergers $J$ pics the largest and smallest, respectively, $k \in K \cup {\infty, - \infty}$ st. $k \leq min J$ and $k \geq max J$ .

[i, j] \nabla [k, h] = [min_{K} {i, k}, max_{K} {j, h}]

We can see that we can only apply this operator a finite number of times until we either get $\infty$ or $- \infty$ .

Now when running our analysis we use the $\nabla$ operator instead of the $⊔$ operator, and we can continue computing until we reach a fixed point.

Define a widing operator

Include a widening operator to your analysis, so that you can merge a finite number of times.

2. The Worklist Algorithm

[ 1 ]

Currently our algorithm is a little slow, as we essentially in lockstep iterate over the entier program for every step.

This is of cause silly, as we would much rather only handle unprocessed events. Consider you have a stepping function from a program locations and an abstract state to a set of program locations and abstract states, which needs to joined at those points $𝐬 𝐭 𝐞 𝐩 : 𝐏 𝐂 \times \hat{𝐒 𝐭 𝐚 𝐭 𝐞} \to 2^{𝐏 𝐂 \times \hat{𝐒 𝐭 𝐚 𝐭 𝐞}} .$

Now the analysis algorithm can be written something like this:

def analyse(i : tuple[PC, AState]):
  state : dict[PC, AState] = dict()
  needs_work : list(PC) = []

  # Add the initial state to the set of items that need work, and set the state.
  needs_work.append(i[0])
  state[i[0]] = i[1]

  # Iterate until the needs_work set is empty:
  while needs_work:
    next = needs_work.pop()
    for pc, astate in step(next, state[next]):
      # merge the new state into the old using the widening operator.
      old, state[pc] = state[pc], wide(state[pc], astate)

      # if the state has changed, append it to the work list.
      if old != state[pc]:
        needs_work.append(i)

This approach uses significantly less instructions than running all steps every time, and still produce the same results.

2.1. The Reverse Post-Order

Tree traversal, Wikipedia.

However there is still some work we can do to improve the algorithm. For-example, when you analyse the following program:

void fn() { 
  int i = 0;
  while (i < 100) { 
    i += 1;
  }
  // Many operations on i.
}

There is no reason to update the states after the while loop, before we have reached fixpoint of the while loop.

The correct order to do this computation is in reverse post-order. If you do this you are guranteed to process any loops first.

References

1. Hanne Riis Nielson and Flemming Nielson. 2020. Program Analysis (an Appetizer).