Everything Everywhere All at Once
In this topic we are going to make our first static analysis. It is based on abstract interpretation.
In bounded static analysis we want to talk about all the behaviors of the program up to some depth. There are no easy way to do this, because we have to handle a possible infinite set of traces, which by itself is hard to do.
The idea behind bounded static analysis is straight forward. Instead of selecting a single initial state and then applying the semantics on that until a trace has been made. This is what we did with dynamic analysis. We start with the set initial states and then apply the semantics to all states at once. Essentially, if dynamic analysis is depth first search, Static Analysis is breath first. While we have the downside of having to work with infinite sets of traces, we can be smart about it, and use abstractions to represent the set using a finite number of traces.
The Bounded Static Analysis on a program can defined as the set of traces of depth : . We can get this set using a many-stepping function . It takes a set of traces and steps them one step.
In the definition of , is the transition relation defined by the single step semantics, and means appending to . Since we are always talking about the same program we'll sometimes omit it.
Now, we can define the set of traces of up to length , as applying to the initial state, times, and taking the union of the results.
Now, we can check for assertion errors down to depth . Let's call this , by seeing if any trace ends in an assertion error:
Sadly, it is infeasible, and sometimes impossible, to do computations over all traces at ones. Therefore, when designing an analysis, we either underestimate the set of traces. This is called a must analysis, or overestimate the set of traces which is called a may analysis. A must analysis is named like that, because we know what the program must do, and a may analysis is called that because we know what the program may do.
In a may analysis we overestimate every step, and in the must analysis we underestimate every step:
Naturally, we'll see that
The may analysis is great if we want to make sure that something newer happens. Essentially, if we want to warn about an assertion error may happen, we need a may analysis. If we want to grantee that an assertion error must happen on an run, then we need a must analysis:
However, it is unclear why allowing us to approximate the result will allow us to run the analyses at all. We need abstractions!
Instead of working with sets of traces, we have to work with finite structures instead. We call these abstractions.
Before we go down into the theory, let's start with a concrete example.
Assume that we are working with all the possible integer values that an variable has at some point in the program. It would be much easier to simply keep track of whether the set contains positive, negative or zero values.
We can represent this as a set that only contain three elements , exactly zero , larger than zero and smaller than zero . This set of size , which is much easier to work with than the set of size . In the following sections we are going to refer to this set as .
We call an abstraction, as we can covert the concrete domain of integers into the abstract domain of . We call this conversion the abstraction :
Implement the SignSet as an abstract set in the language of your choosing.
In Python, you can start with this:
from dataclasses import dataclass
from typing import TypeAlias, Literal
Sign : TypeAlias = Literal["+"] | Literal["-"] | Literal["0"]
@dataclass
class SignSet:
signs : set[Sign]However before we continue this set, we first have to ensure that a computation in abstract domain does not invalidate our over-approximation.
In the following sections we'll see that:
Our abstract domains can be ordered.
We can get the least over-approximation of two abstract sets called the join and the greatest under-approximation of two abstract sets called the meet.
Then, we have a formal definition of a correct over-approximating or under-approximating called a Galoi connection.
And finally, we'll cover monotone functions, which are function that preserves order.
A partially ordered set or poset is a tuple set of elements and an ordering , that uphold:
Common partially ordered sets are the integers (also in the other direction ), the booleans , and the set of .
Think about if the rules apply to the posets above.
Make your abstraction into a partially ordered set by implementing (<=) in the language of your choice.
A lattice is partially ordered sets , with two extra operators and . is the least upper bound , meaning that
The dual is true for that is the greatest lower bound
Furthermore, this implies that there exist a least bound and a greatest bound , from which we have the following identities: , and and .
A Hasse diagram over the poset . By I, KSmrq, CC BY-SA 3.0 (with edits)
The reason why they are called latices is that they can be drawn using Hasse digrams which gives these nice structures, which looks like a wooden lattice.
Here we only draw the imitate next elements in the order. So while we do not draw that edge because and are in the way.
Create a Hasse diagram over your favorite animals, The order is . Where
While objectively some of the other animals like and are no worse than each other.
Is your order a lattice: Does every pair of animals have meet and a join?
In our Sign abstraction, we use set inclusion as our order and use the intersection and union of the underlying set as the meet () and join () of our lattice. and .
To make your abstraction into a lattice, you need to also implement the meet and join operators.
In Python, you can implement __and__ and __or__ to get the meet operator a & b, and the join operator a | b.

(A Galois Connection is a connection between two ordered sets, with a concretion and an abstraction function.)
A Galois connection is a relationship between two ordered sets and , where we can abstract information from the concrete domain into the abstract while preserving the order if going back . explains how to abstract a value, and explains what the abstraction means.
Because the abstraction might be lossy it is not the case that if we abstract a value we get back the same concrete value . Instead a Galois connection satisfies the following rule:
It states, that if we have a concrete value , and abstract it , then any value larger
than this abstraction will concertize to a value that is larger
than the original value. At first this might not seem useful, but if we satisfy this rule we get the following laws for free:
The two first rules gives us confidence that whatever abstraction we choose only lose information never create it. And the second two rules gives us confidence that we don't keep losing that information.
Galoi connections is actually a concrete example on a mathematical construct called adjunction. Adjunctions are one of the most useful and unappreciated mathematical structures in computer science. One example is testing technique for testing data readers.
While in most cases we would expect, anything we pretty print to be parsable to the same value. That is not always the case as line numbers might differ or some data are lost in the translation. It should, however, always hold that
# Most of the time
parse(pretty(a)) == a
# Always
pretty(parse(pretty(a))) == pretty(a)
# Bonus for all strings s:
parse(pretty(parse(s))) == parse(s)We can see that there exist a Galoi connection between our concrete integer domain and our abstract sign domain . Where the abstraction and concretion are defined like this:
We can also see that we satisfy the rules, let's pick a concrete value . If we convert that to an abstract value we get: .
Since , and since and , then we get that .
You need an abstract function, which takes a set and turns it into your abstract domain.
@classmethod
def abstract(cls, items : set[int]):
signset = set()
if 0 in items:
signset.add("0")
...
return cls(signset)Then because we are using possible infinite sets, instead of generating that set, we can just implement the contains method:
def __contains__(self, member : int):
if (member == 0 and "0" in self.signs):
return True
... You can use the Galoi rules to test your abstraction:
If you are using python you fuzz test this property using hypothesis.
from hypothesis import given
from hypothesis.strategies import integers, sets
@given(sets(integers()))
def test_valid_abstraction(xs):
s = SignSet.abstract(xs)
assert all(x in s for x in xs)Finally, we have reached the fun part. We can now define operations in the abstract domain that mimic the operations in the concrete.
Because our concrete domain is the set of integers, we can define , as the operation over the product:
Now we can also define the plus abstract domain:
And many more.
But, how do we know if we have implemented our abstract operation correctly? Since we want to over-approximate our operations, we know that computing in our abstract domain should newer under-approximate:
The above equation requires us to create huge sets of values. Instead we can use our Galoi connection to rewrite the equation to see that, we only have to show the following:
Essentially, abstracting the result of an operation, should be smaller than abstracting and doing the operation in the abstract domain.
Implement arithmetic for your sign abstraction.
In Python, you can build an Arithmetic class that can handle all the abstract arithmetics. Also if you use hypothesis, you can test that we have done the conversion correctly by, testing the following:
@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
assert (
SignSet.abstract({x + y for x in xs for y in ys})
<= SignSet.abstract(xs) + SignSet.abstract(ys)
)For cases, where we compare like le, we might want to return set of booleans instead.
@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
assert (
{x <= y for x in xs for y in ys}
<= arithmetic.compare("le", SignSet.abstract(xs), SignSet.abstract(ys))
)In practice, Galoi connections allows us to map our infinite domain of may or must analyses into a more manageable abstract domain, while giving us guarantees that we are still correctly over- or under-estimating.
Let's assume we are building a may analysis. Since we want to over approximate our function, we therefore introduce an abstract domain that has an abstract stepping function . We assume the galoi connection . We can use this new abstract stepping function to define an bounded static analysis of depth : . We define recursively:
We can prove by induction over that stepping in the abstract domain is in fact a may analysis , given that
Using induction over , show that the following is true:
We can also show that is a stronger statement, and is sufficient to prove the above, by seeing that From , And using that gamma preserves meets we get
Finally, as it is very hard to talk about the concrete domain, as it is infinite, we can use the galoi connection again to see that we only have to show:
The first, and most useful abstraction, is the state abstraction. Here we throw away everything from the trace except the final state of the trace:
We can define the abstraction as the set of end states of the traces and the concretion as the set of traces that end in one of the states.
This is a useful abstraction for us, because we do not need to know where the trace came from to figure out if an assertion error exist in the program, only that a trace ends in an assertion error.
We can now define as only stepping the final state of each trace:
Now let . It is now clear by induction on that . So if there exist an assertion error within steps, then there exist a trace st. and .
From our Galoi connection we can see that
Which means that if , then . Which is exactly the grantee we are looking for in a may analysis.
In our next abstraction, we want to take advantage of that execute the same instruction in states with the same program counter.
Let's focus on a JVM without a method stack, which means that every state is . We'll get back to talk about how to handle the method stack in later sections. We can abstract this by collecting the states per . Let be a mapping from program counters to sets of states. is a lattice with partial order were one mapping is less than another if all states are smaller than the other and is pointwise of states.
Now we can write a stepping function that can step all states with the same instruction at once , and then merge the result into the other states.
We are now faced with our first hard choice.
We would love to use the Sign abstraction, that we build in the previous section, one way of doing that is compressing the state, so that instead of having a set of states, we distribute the content of the locals and stack. So instead of having a set of states, we have a pair of abstract locals and abstract stack. The abstract locals is not a mapping from naturals to a powerset of possible values, and the abstract stack is stack of possible values:
Since the stack and the locals are typed at each instruction, these are actually sets of integers, which we know how to abstract using the sign abstraction. Furthermore is also a partially ordered set, by point wise comparing the set of variables, and a lattice by point-wise joining and meeting the sets of variables. We can also see that is a galoi connection.
The problem with this abstraction, and the reason this choice is hard, is that is the first that severely over-approximates our problem. More about this next time.
Define the Abstract (per variable) Frame in your program analysis, remember to define the ordering as well as the meet and join operations. It might be useful to define three pseudo elements Bot, is an element that joins with all others, Err represent the error state and Ok represent the Ok state.
In Python, we could define an abstract frame like so:
@dataclass
class PerVarFrame[AV]:
locals: dict[int, V]
stack: Stack[AV]Remember to define the abstract, __le__, __and__, and __or__ methods.
Finally, we are at a point where we can use our Sign Abstraction, by using it to abstract the sets of variables.
If we can see is an abstraction of , then we also get a Galoi connection from to .
Specifically, we get:
You have been introduced to the Sign Abstraction, but next time we'll look at more different abstractions. If you have time, try to come up with your own.
One great thing about Galoi connections is that they compose. This mean we can create one long chain:
To get started with the building your first static analysis, first refactor your interpreter to handle a more abstract state.
First, don't try to do everything at once! Start with a single frame.
Then, we need to realize that we can no longer return just one state. Eg., in the case where we are handling ifz we might need to return two states one where we jump and one where we don't.
def step(state : AState) -> Iterable[AState | str]
...Secondly, do all the steps at once:
def many_step(state : dict[Pc, AState | str])
-> dict[Pc, AState | str]:
new_state = dict(state)
for k, v in state.items():
for s in step(v):
new_state[s.pc] |= s
return new_stateUntil next time, write the best analysis you can and upload the results to Autolab, and ponder the following:
Why should we use abstractions when analysing code?
What is the advantage of Galoi connections?
Which cases are the Sign abstraction not able to catch?
At which cases do your static analysis outperform your dynamic?
Cousot, Patrick; Cousot, Radhia (1977). Abstract interpretation.
doi:10.1145/512950.512973 link
Cousot, Patrick; Cousot, Radhia (1979). Systematic design of program analysis frameworks.
doi:pdf/10.1145/567752.567778 link
Nielson, Hanne Riis; Nielson, Flemming (2007). Semantics with Applications.