Bounded Static Analysis
Everything Everywhere All at Once
The goal of this chapter is to ask and answer the following questions:
- Why should we use abstractions when analysing code?
- What is the nice thing about Galoi connections?
- Which cases are the Sign abstraction not able to catch?
- At which cases do your static analysis outperform your dynamic?
1. What is Bounded Static Analysis?
In bounded static analysis we want to talk about all the behaviors of the program up to some depth. There are no easy way to do this, because we have to handle a possible infinite set of traces, which by itself is hard to do.
The idea behind bounded static analysis is straight forward. Instead of selecting a single initial state and then applying the semantics on that until a trace has been made. This is what we did with dynamic analysis. We start with the set initial states and then apply the semantics to all states at once. Essentially, if dynamic analysis is depth first search, Static Analysis is breath first. While we have the downside of having to work with infinite sets of traces, we can be smart about it, and use abstractions to represent the set using a finite number of traces.
1.1. In Theory
The Bounded Static Analysis on a program can defined as the set of traces of depth : . Using a many-stepping function , that step set of traces up to length to the set of traces up to length .
In the definition of , is the transition relation defined by the single step semantics, and means appending to . Since we are always talking about the same program we'll sometimes omit it.
Now, we check for assertion errors down to depth , let's call this , by seeing if any trace ends in an assertion error:
1.2. May and Must Analyses
Sadly, it's infeasible to do computations over all traces at ones. Therefore, when designing an analysis, we either underestimate the set of traces this is called a must analysis, or overestimate the set of traces which is called a may analysis. A must analysis is called that, because we know what the program must do, and a may analysis is called that because we know what the program may do.
In the may analysis we overestimate every step, and in the must analysis we underestimate every step:
Naturally, we'll see that
The may analysis is great if we want to make sure that something newer happens. Essentially, if we want to warn about an assertion error may happen, we need a may analysis. If we want to grantee that an assertion error must happen on all runs, then we need a must analysis:
However, even with the concessions, where we are approximating the correct results we still need some extra magic, to make our analyses truly efficient.
We need abstractions!
2. Abstractions
Instead of working with sets of traces, we have to work with finite structures instead. We call these abstractions.
2.1. The Sign Analysis
Before we go down into the theory, let's start with a concrete example.
Assume that we are working with all the possible integer values that an variable has at some point in the program. It would be much easier to simply keep track of whether the set contains positive, negative or zero values.
We can represent this as a set that only contain three elements , , larger than zero and smaller than zero . This set of size , which is much easier to work with, than the set of size . In the following sections we are going to refer to this set as .
We call an abstraction, as we can covert the concrete domain of integers into the abstract domain of . We call this conversion the abstraction :
However before we continue this set, we first have to ensure that a computation in abstract domain, does not invalidate our over-approximation.
In the following sections we'll see that:
- Our abstract domains can also be ordered.
- We can get the least over-approximation of two abstract sets called the join and the greatest under-approximation of two abstract sets called the meet.
- Then, we have a formal definition of a correct over approximating or under-approximating apporximation called a Galoi connection!
- And finally, we'll cover monotone functions, which are function that preserves the order.
Make the Sign Abstraction
Implement the SignSet as an abstract set in the language of your choosing.
In Python, you can start with this:
from dataclasses import dataclass
from typing import TypeAlias, Literal
Sign : TypeAlias = Literal["+"] | Literal["-"] | Literal["0"]
@dataclass
class SignSet:
signs : set[Sign]
Fasten your seatbelts it's going to be a bumby night!
2.2. Partially Ordered Sets (Posets)
- Partially ordered set, Wikipedia.
A partially ordered set or poset is a tuple set of elements and an ordering , that uphold:
Common partially ordered sets are the integers (also in the other direction ), the booleans , and the set of ,
Check the rules
Think about if the rules apply to the posets above.
Make Sign a partially ordered set
Make your abstraction into a partially ordered set by implementing (<=
) in the language of your choice.
2.3. Lattices
- Semilattice, Wikipedia.
- Lattice (order), Wikipedia.
A lattice is partially ordered sets , with two extra operators and . is the least upper bound , meaning that
The dual is true for that is the greatest lower bound
Furthermore, this implies that there exist a least bound and a greatest bound , from which we have the following identities: , and and .
2.3.1. Hasse diagrams.
- Hasse diagram, Wikipedia.
The reason why they are called latices is that they can be drawn using Hasse digrams which gives these nice structures, which looks like a wooden lattice.
Here we only draw the imitate next elements in the order. So while we do not draw that edge because and are in the way.
Rate your favourite animals
Create a hasse diagram over your favorite animals, The order is . Where
While objectively some of the other animals like and are no worse than each other.
Is your order a lattice: Does every pair of animals have meet and a join?
2.3.2. Back to the Sign Abstraction
In our Sign abstraction, we use set inclusion as our order and use the intersection and union of the underlying set as the meet () and join () of our lattice. and .
Make the Sign a lattice
To make your abstraction into a lattice, you need to also implement the meet and join operators.
In Python, you can implement __and__
and __or__
to get the meet operator a & b
, and the join operator a | b
.
2.4. Galois Connection
- Galois connection, Wikipedia.
A Galois connection is a relationship between two ordered sets and , where we can abstract information from the concrete domain into the abstract while preserving the order if going back . explains how to abstract a value, and explains what the abstraction means.
Because the abstraction might be lossy it is not the case that if we abstract a value we get back the same concrete value . Instead a Galois connection satisfies the following rule:
It states, that if have a concrete value , and abstract it , then any value larger
than this abstraction will concertize to a value that is larger
than the original value. At first this might not seem useful, but if we satisfy this rule we get the following laws for free:
The two first rules gives us confidence that whatever abstraction we choose only lose information never create it. And the second two rules gives us confidence that we don't keep losing that information.
2.4.1. Back to the Sign Analysis
We can see that there exist a Galoi connection between our concrete integer domain and our abstract sign domain . Where the abstraction is: And is:
We can also see that we satisfy the rules, let's pick a concrete value . If we convert that to an abstract value we get: .
Since , and since and , then we get that .
Build the Sign abstraction
In your favorite language build the sign abstraction.
You need an abstract function, which takes a set and turns it into your abstract domain.
@staticmethod
def abstract(items : set[int]):
...
return SignSet(...)
Then because we are using possible infinite sets, instead of generating that set, we can just implement the contains metod:
def __contains__(self, member : int):
...
if (member == 0 and "0" in self.signs):
return true
You can use the Galoi rules to test your abstraction:
If you are using python you fuzz test this property using hypothesis.
from hypothesis import given
from hypothesis.strategies import integers, sets
@given(sets(integers()))
def test_valid_abstraction(xs):
s = SignSet.abstract(xs)
assert all(x in s for x in xs)
2.5. Abstract Operations
Finally, we have reached the fun part. We can now define operations in the abstract domain, that mimic the operations in the concrete.
Because our concrete domain is the set of integers, we can define , as the operation over the product:
Now we can also define the plus abstract domain:
And many more.
But, how do we know if we have implemented our abstract operation correctly? Since we want to over-approximate our operations, we know that computing in our abstract domain should newer under-approximate:
This is of cause not easy to test for, instead we can use use our Galoi connection to see that, if only have to show the following:
Essentially, abstracting the result of an operation, should be smaller than abstracting and doing the operation in the abstract domain.
Implement Abstract Arithmetic
Implement arithmetic for your sign abstraction.
In Python, you can build an Arithmetic
class that can handle all the abstract arithmetics. Also if you use hypothesis, you can test that we have done the conversion correctly by, testing the following:
@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
assert (
SignSet.abstract({x + y for x in xs for y in ys})
<= arithmetic.binary("add", SignSet.abstract(xs), SignSet.abstract(ys))
)
For cases, where we compare like le
, we might want to return set of booleans instead.
@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
assert (
{x <= y for x in xs for y in ys}
<= arithmetic.compare("le", SignSet.abstract(xs), SignSet.abstract(ys))
)
3. Bounded Abstract Interpretation
- [ 2 ]
In practice, Galoi connections allows us to map our infinite domain of may or must analyses into a more manageable abstract domain, while giving us guarantees that we are still correctly over- or under-estimating.
Let's assume we are building a may analysis. Since we want to over approximate our function, we therefore introduce an abstract domain that has an abstract stepping function . We assume the galoi connection . As with the abstract operations, we want the following to be true:
Now we can apply our knowledge about Galoi connections, which we can prove by showing the following:
We can use this new abstract stepping function to define an bounded static analysis of depth : , and we can prove by induction over that stepping in the abstract domain is in fact a may analysis .
Show that a may analysis
By induction over , in the base case you have to show that and in the induction case you need to show that but are given
3.1. The State Abstraction
The first, and most useful abstraction, is the state abstraction. We can abstract every trace as it's final state.
We can define the abstraction from a set of traces as the set of end states and the concretion of a set of states as the set of traces that end in one of the states.
This is a useful abstraction for us, because we do not need to know where the trace came from to figure out if an assertion error exist in the program, only that a trace ends in an assertion error.
We can now define as only stepping the final state of each trace:
Check that is an abstract operation of
Now let . It is now clear by induction on that . So if there exist an assertion error within steps, then there exist a trace st. and .
From our Galoi connection we can see that
Which means that if , then . Which is exactly the grantee we are looking for in a may analysis.
3.2. The Per-Instruction Abstraction
In our next abstraction we take advantage of that we treat states at with the same program counter alike.
Let's focus on a JVM without a method stack, which means that every state is . We'll get back to talk about how to handle the method stack in later sections. We can abstract this by collecting the states per . Let be a mapping from program counters to sets of states. is a lattice with partial order were one mapping is less than another if all states are smaller than the other and is pointwise of states.
Convince yourself this is a Galoi connection
Now we can write a stepping function that can step all states with the same instruction at once , and then merge the result into the other states.
Consider the following python, which does the same thing, given you have defined StateSet
as an set of states:
def many_step(states : dict[Pc, StateSet]):
next_states = states.copy()
for pc, astate in states.items():
b = bytecode[pc]
# run the operation on the state set
for (pc_, s_) in lookup(self, f"step_{b['opr']}")(b, pc, astate):
# merge the new state with the current state or bottom
next_states[pc_] = next_states.get(pc_, StateSet.bot) | s_
return next_states
3.3. The Per-Variable Abstraction
We are now faced with our first hard choice.
We would love to use Sign abstraction we build in the previous section, one way of doing that is compressing the state, so that instead of having a set of states, we can have distribute the content of the locals and stack.
Since the stack and the locals are typed at each instruction, these are actually sets of integers, which we know how to abstract using the sign abstraction. Furthermore is also a partially ordered set, by point wise comparing the set of variables, and a lattice by point-wise joining and meeting the sets of variables. We can also see that is a galoi connection.
The problem with this abstraction, and the reason this choice is hard, is that is the first that severely over-approximates our problem. More about this next time.
Define the Abstract State
Define the Abstract (per variable) State in your program analysis, remember to define the ordering as well as the meet and join operations. It might be useful to define three pseudo elements Bot
, is an element that joins with all others, Err
represent the error state and Ok
represent the Ok state.
In Python, we could define an abstract state like so:
class AbstractState:
stack: list[AbstactValues]
locals: list[AbstactValues]
Remember to define the abstract
, __le__
, __and__
, and __or__
methods.
Where the abstract values, if we only work with integers, could be the SignSet
from before.
3.4. Getting Started
To get started with static analysis we need to make some changes to our interpreter, as our input values are going to be abstractions.
Extending the interpreter
Extend or rebuild your interpreter so that it is able to handle the per-instruction abstraction.
You can use the following snip-it for inspiration:
@dataclass
class AbstractInterpreter:
bytecode : dict
states : dict[ByteCodeOffset, AbstractState]
final : set[str]
arithmetic : Arithmetic
def step(self):
next_states = self.states.copy()
for pc, astate in self.states.items():
b = bytecode[pc]
for (pc_, s_) in lookup(self, f"step_{b['opr']}")(b, pc, astate):
# if the pc_ is -1 we are "done", and s_ is the end string
if pc_ == -1:
final.add(s_)
else:
# merge the new state with the current state or bottom
next_states[pc_] = next_states.get(pc_, AbstractState.bot) | s_
self.states = next_states
# Stepping functions are now generators that can generate different
# states depending on the abstract state.
def step_ifz(self, b, pc, astate):
# depending on the defintion of the abstract_state
aval = astate.stack.pop()
# Note that the abstract value might both compare and not compare to 0
for b in self.arithmetic.compare(b["compare"], aval, 0):
if b:
yield (b["target"], astate.copy())
else:
yield (pc + 1, astate.copy())
Experiment with different Abstractions
You have been introduced to the Sign Abstraction, but next time we'll look at more different abstractions. If you have time, try to come up with your own.
References
- 1. Flemming Nielson and Hanne Riis Nielson. Semantics with Applications: An Appetizer.
- 2. Hanne Riis Nielson and Flemming Nielson. 2020. Program Analysis (an Appetizer).