Bounded Static Analysis

Everything Everywhere All at Once

The goal of this chapter is to ask and answer the following questions:

Why should we use abstractions when analysing code?
What is the nice thing about Galoi connections?
Which cases are the Sign abstraction not able to catch?
At which cases do your static analysis outperform your dynamic?

1. What is Bounded Static Analysis?

In bounded static analysis we want to talk about all the behaviors of the program up to some depth. There are no easy way to do this, because we have to handle a possible infinite set of traces, which by itself is hard to do.

The idea behind bounded static analysis is straight forward. Instead of selecting a single initial state and then applying the semantics on that until a trace has been made. This is what we did with dynamic analysis. We start with the set initial states and then apply the semantics to all states at once. Essentially, if dynamic analysis is depth first search, Static Analysis is breath first. While we have the downside of having to work with infinite sets of traces, we can be smart about it, and use abstractions to represent the set using a finite number of traces.

1.1. In Theory

The Bounded Static Analysis on a program $P$ can defined as the set of traces of depth $n$ : ${𝐁 𝐒 𝐀}_{P}^{n}$ . Using a many-stepping function $𝐬 𝐭 𝐞 𝐩$ , that step set of traces up to length $n$ to the set of traces up to length $n + 1$ .

\begin{matrix} {𝐬 𝐭 𝐞 𝐩}_{P} (T) = & T \cup {τ' s | τ' \in T, δ_{P} (τ'_{| τ' |}, s)} \\ {𝐁 𝐒 𝐀}_{P}^{n} = & {𝐬 𝐭 𝐞 𝐩}_{P}^{n} (I_{P}) \end{matrix}

In the definition of $𝐬 𝐭 𝐞 𝐩$ , $δ$ is the transition relation defined by the single step semantics, and $τ' s$ means appending $s$ to $τ'$ . Since we are always talking about the same program $P$ we'll sometimes omit it.

Now, we check for assertion errors down to depth $n$ , let's call this ${𝐁 𝐀 𝐄}_{P}^{n}$ , by seeing if any trace ends in an assertion error:

{𝐁 𝐀 𝐄}_{P}^{n} \equiv (τ s) \in {𝐁 𝐒 𝐀}_{P}^{n} \land s = err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)

1.2. May and Must Analyses

Sadly, it's infeasible to do computations over all traces at ones. Therefore, when designing an analysis, we either underestimate the set of traces this is called a must analysis, or overestimate the set of traces which is called a may analysis. A must analysis is called that, because we know what the program must do, and a may analysis is called that because we know what the program may do.

In the may analysis we overestimate every step, and in the must analysis we underestimate every step:

\begin{matrix} {𝐬 𝐭 𝐞 𝐩}_{𝗆 𝗎 𝗌 𝗍} (T) & \subseteq & 𝐬 𝐭 𝐞 𝐩 (T) & \subseteq & {𝐬 𝐭 𝐞 𝐩}_{𝗆 𝖺 𝗒} (T) \end{matrix}

Naturally, we'll see that

{𝐬 𝐭 𝐞 𝐩}_{𝗆 𝗎 𝗌 𝗍}^{n} (I_{P}) \subseteq {𝐬 𝐭 𝐞 𝐩}^{n} (I_{P}) \subseteq {𝐬 𝐭 𝐞 𝐩}_{𝗆 𝖺 𝗒}^{n} (I_{P})

The may analysis is great if we want to make sure that something newer happens. Essentially, if we want to warn about an assertion error may happen, we need a may analysis. If we want to grantee that an assertion error must happen on all runs, then we need a must analysis:

\begin{matrix} {𝐁 𝐀 𝐄}_{𝗆 𝗎 𝗌 𝗍}^{n} & ⟹ & {𝐁 𝐀 𝐄}^{n} \\ \neg {𝐁 𝐀 𝐄}^{n} & ⟸ & \neg {𝐁 𝐀 𝐄}_{𝗆 𝖺 𝗒}^{n} \end{matrix}

However, even with the concessions, where we are approximating the correct results we still need some extra magic, to make our analyses truly efficient.

We need abstractions!

2. Abstractions

Instead of working with sets of traces, we have to work with finite structures instead. We call these abstractions.

2.1. The Sign Analysis

Before we go down into the theory, let's start with a concrete example.

Assume that we are working with all the possible integer values that an variable has at some point in the program. It would be much easier to simply keep track of whether the set contains positive, negative or zero values.

We can represent this as a set that only contain three elements ${+, -, 0}$ , $(0)$ , larger than zero $(+)$ and smaller than zero $(-)$ . This set of size $3!$ , which is much easier to work with, than the set of size $2^{32}!$ . In the following sections we are going to refer to this set as $𝐒 𝐢 𝐠 𝐧$ .

𝐒 𝐢 𝐠 𝐧 = {+, 0, -}

We call $𝐒 𝐢 𝐠 𝐧$ an abstraction, as we can covert the concrete domain of integers into the abstract domain of $𝐒 𝐢 𝐠 𝐧$ . We call this conversion the abstraction $α$ :

\begin{matrix} α ({1, 2, 3}) = & {+} \\ α ({0}) = & {0} \\ α ({- 1, 3}) = & {+, -} \\ α ({- 1, 0, 3}) = & {0, +, -} \end{matrix}

However before we continue this set, we first have to ensure that a computation in abstract domain, does not invalidate our over-approximation.

In the following sections we'll see that:

Our abstract domains can also be ordered.
We can get the least over-approximation of two abstract sets called the join and the greatest under-approximation of two abstract sets called the meet.
Then, we have a formal definition of a correct over approximating or under-approximating apporximation called a Galoi connection!
And finally, we'll cover monotone functions, which are function that preserves the order.

Make the Sign Abstraction

Implement the SignSet as an abstract set in the language of your choosing.

In Python, you can start with this:

from dataclasses import dataclass
from typing import TypeAlias, Literal

Sign : TypeAlias = Literal["+"] | Literal["-"] | Literal["0"]

@dataclass
class SignSet:
  signs : set[Sign]

Fasten your seatbelts it's going to be a bumby night!

2.2. Partially Ordered Sets (Posets)

Partially ordered set, Wikipedia.

A partially ordered set or poset is a tuple $(L, ⊑)$ set of elements $L$ and an ordering $⊑$ , that uphold:

\begin{matrix} reflexive & \forall a . a ⊑ a \\ antisymetric & \forall a . a ⊑ b \land b ⊑ a ⟹ a = b \\ transitive & \forall a . a ⊑ b \land b ⊑ c ⟹ a ⊑ c \end{matrix}

Common partially ordered sets are the integers $(ℤ, \leq)$ (also in the other direction $(ℤ, \geq)$ ), the booleans $({𝚝 𝚝, 𝚏 𝚏}, \Rightarrow)$ , and the set of $𝐒 𝐢 𝐠 𝐧' s$ $(2^{𝐒 𝐢 𝐠 𝐧}, \subseteq)$ ,

Check the rules

Think about if the rules apply to the posets above.

Make Sign a partially ordered set

Make your $𝐒 𝐢 𝐠 𝐧$ abstraction into a partially ordered set by implementing $⊑$ (<=) in the language of your choice.

2.3. Lattices

Semilattice, Wikipedia.
Lattice (order), Wikipedia.

A lattice is partially ordered sets $(L, ⊑)$ , with two extra operators $⊔$ and $⊓$ . $⊔$ is the least upper bound $a ⊔ b$ , meaning that $\forall c . a ⊑ c \land b ⊑ c ⟹ a ⊔ b ⊑ c .$

The dual is true for $⊓$ that is the greatest lower bound $\forall c . c ⊑ a \land c ⊑ b ⟹ c ⊑ a ⊓ b .$

Furthermore, this implies that there exist a least bound $⊥ = ⨅ L$ and a greatest bound $⊤ = ⨆ L$ , from which we have the following identities: $⊤ ⊓ a = a = a ⊔ ⊥$ , and $⊤ ⊔ a = ⊤$ and $a ⊓ ⊥ = ⊥$ .

2.3.1. Hasse diagrams.

Hasse diagram, Wikipedia.

Figure: A hasse diagram over the poset

(2^{𝐒 𝐢 𝐠 𝐧}, \subseteq)

. By I, KSmrq, CC BY-SA 3.0 (with edits)

The reason why they are called latices is that they can be drawn using Hasse digrams which gives these nice structures, which looks like a wooden lattice.

Here we only draw the imitate next elements in the order. So while ${+} \subseteq {+, 0, -}$ we do not draw that edge because ${+, 0}$ and ${+, -}$ are in the way.

Rate your favourite animals

Create a hasse diagram over your favorite animals, The order is $(𝐀𝐧𝐢𝐦, ‘𝚒𝚜 𝚠𝚘𝚛𝚜𝚎 𝚝𝚑𝚊𝚗’)$ . Where

𝐀𝐧𝐢𝐦 = {Cat, Dog, Ant, Worm, Spider, Horse, Rabbit}

While objectively $(Cat ‘𝚒𝚜 𝚠𝚘𝚛𝚜𝚎 𝚝𝚑𝚊𝚗’ Dog)$ some of the other animals like $Ant$ and $Worm$ are no worse than each other.

Is your order a lattice: Does every pair of animals have meet and a join?

2.3.2. Back to the Sign Abstraction

In our Sign abstraction, we use set inclusion as our order $⊑_{𝐒 𝐢 𝐠 𝐧} = \subseteq$ and use the intersection and union of the underlying set as the meet ( $⊓$ ) and join ( $⊔$ ) of our lattice. $⊥ = \emptyset$ and $⊤ = {+, -, 0}$ .

Make the Sign a lattice

To make your $𝐒 𝐢 𝐠 𝐧$ abstraction into a lattice, you need to also implement the meet $⊓$ and join $⊔$ operators.

In Python, you can implement __and__ and __or__ to get the meet operator a & b, and the join operator a | b.

2.4. Galois Connection

Galois connection, Wikipedia.

**Figure:** A Galois Connection is a connection between two ordered sets, with a concretion $γ$ and an abstraction $α$ function.

A Galois connection is a relationship between two ordered sets $(C, ⊑_{C})$ and $(A, ⊑_{A})$ , where we can abstract information from the concrete domain into the abstract $α : C \to A$ while preserving the order if going back $γ : A \to C$ . $α$ explains how to abstract a value, and $γ$ explains what the abstraction means.

Because the abstraction might be lossy it is not the case that if we abstract a value we get back the same concrete value $γ (α (c)) \neq c$ . Instead a Galois connection satisfies the following rule:

\forall c \in C, a \in A . c ⊑_{C} γ (a) \Leftrightarrow α (c) ⊑_{A} a

It states, that if have a concrete value $c$ , and abstract it $α (c)$ , then any value larger than this abstraction $a$ will concertize to a value $γ (a)$ that is larger than the original value. At first this might not seem useful, but if we satisfy this rule we get the following laws for free:

\begin{matrix} c ⊑_{C} γ (α (c)) & – if we abstract we only increase in size \\ α (γ (a)) ⊑_{C} a & – if we concertize we only decrease in size \\ α (γ (α (c))) = α (c) & – we don’t keep loosing information \\ γ (α (γ (a))) = γ (a) & – in both directions \end{matrix}

The two first rules gives us confidence that whatever abstraction we choose only lose information never create it. And the second two rules gives us confidence that we don't keep losing that information.

2.4.1. Back to the Sign Analysis

We can see that there exist a Galoi connection between our concrete integer domain and our abstract sign domain $(2^{ℤ}, \subseteq) \leftrightarrow_{γ}^{α} (𝐒 𝐢 𝐠 𝐧, ⊑_{𝐒 𝐢 𝐠 𝐧})$ . Where the abstraction $α$ is: $α (N) = {+ | n \in N, n < 0} \cup {- | n \in N, n > 0} \cup {0 | n \in N, n = 0}$ And $γ$ is: $γ (S) = {n | n \in ℤ, n < 0, + \in S} \cup {n | n \in ℤ, n > 0, - \in S} \cup {0 | 0 \in S}$

We can also see that we satisfy the rules, let's pick a concrete value ${0, 1}$ . If we convert that to an abstract value we get: $α ({0, 1}) = {0, +}$ .

Since $α ({0, 1}) ⊑ {0, +} ⊑ {0, +, -}$ , and since $γ ({0, +}) = ℤ^{0, +}$ and $γ ({0, +, -}) = ℤ^{0, +, -}$ , then we get that ${0, 1} \subseteq ℤ^{0, +} \subseteq ℤ$ .

Build the Sign abstraction

In your favorite language build the sign abstraction.

You need an abstract function, which takes a set and turns it into your abstract domain.

@staticmethod
def abstract(items : set[int]): 
  ...
  return SignSet(...)

Then because we are using possible infinite sets, instead of generating that set, we can just implement the contains metod:

def __contains__(self, member : int): 
  ... 
  if (member == 0 and "0" in self.signs): 
      return true

You can use the Galoi rules to test your abstraction:

\forall x \in ℤ : {x} \subseteq γ (α ({x}))

If you are using python you fuzz test this property using hypothesis.

from hypothesis import given
from hypothesis.strategies import integers, sets

@given(sets(integers()))
def test_valid_abstraction(xs):
  s = SignSet.abstract(xs) 
  assert all(x in s for x in xs)

2.5. Abstract Operations

Finally, we have reached the fun part. We can now define operations in the abstract domain, that mimic the operations in the concrete.

Because our concrete domain is the set of integers, we can define $+_{set}$ , as the operation over the product:

\begin{matrix} A +_{set} B = & {a + b | a \in A, b \in B} \\ {1, 3} +_{set} {0, 10} = & {1, 3, 11, 13} \end{matrix}

Now we can also define the plus abstract domain:

\begin{matrix} {+} +_{𝐒 𝐢 𝐠 𝐧} {+} = & {+} & {+} +_{𝐒 𝐢 𝐠 𝐧} {-} = & {-, +, 0} \\ {+} +_{𝐒 𝐢 𝐠 𝐧} {0} = & {+} & {0} +_{𝐒 𝐢 𝐠 𝐧} {+} = & {+} \\ {0} +_{𝐒 𝐢 𝐠 𝐧} {-} = & {-} & {0} +_{𝐒 𝐢 𝐠 𝐧} {0} = & {0} \\ {-} +_{𝐒 𝐢 𝐠 𝐧} {+} = & {-, +, 0} & {-} +_{𝐒 𝐢 𝐠 𝐧} {-} = & {-} \\ {-} +_{𝐒 𝐢 𝐠 𝐧} {0} = & {-} & \dots \end{matrix}

And many more.

But, how do we know if we have implemented our abstract operation correctly? Since we want to over-approximate our operations, we know that computing in our abstract domain should newer under-approximate:

A +_{set} B \subseteq γ (α (A) +_{𝐒 𝐢 𝐠 𝐧} α (B))

This is of cause not easy to test for, instead we can use use our Galoi connection to see that, if only have to show the following:

α (A +_{set} B) \subseteq α (A) +_{𝐒 𝐢 𝐠 𝐧} α (B)

Essentially, abstracting the result of an operation, should be smaller than abstracting and doing the operation in the abstract domain.

Implement Abstract Arithmetic

Implement arithmetic for your sign abstraction.

In Python, you can build an Arithmetic class that can handle all the abstract arithmetics. Also if you use hypothesis, you can test that we have done the conversion correctly by, testing the following:

@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
  assert (
    SignSet.abstract({x + y for x in xs for y in ys}) 
      <= arithmetic.binary("add", SignSet.abstract(xs), SignSet.abstract(ys))
    )

For cases, where we compare like le, we might want to return set of booleans instead.

@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
    assert (
      {x <= y for x in xs for y in ys} 
      <= arithmetic.compare("le", SignSet.abstract(xs), SignSet.abstract(ys))
      )

3. Bounded Abstract Interpretation

[ 2 ]

In practice, Galoi connections allows us to map our infinite domain of may or must analyses into a more manageable abstract domain, while giving us guarantees that we are still correctly over- or under-estimating.

Let's assume we are building a may analysis. Since we want to over approximate our $𝐬 𝐭 𝐞 𝐩$ function, we therefore introduce an abstract domain $(A, ⊑_{A})$ that has an abstract stepping function ${𝐬 𝐭 𝐞 𝐩}_{A}$ . We assume the galoi connection $(2^{𝐓 𝐫 𝐚 𝐜 𝐞}, \subseteq) \leftrightarrow_{γ}^{α} (A, ⊑_{A})$ . As with the abstract operations, we want the following to be true:

T \subseteq 𝐬 𝐭 𝐞 𝐩 (T) \subseteq γ ({𝐬 𝐭 𝐞 𝐩}_{A} (α (T)))

Now we can apply our knowledge about Galoi connections, which we can prove by showing the following:

α (𝐬 𝐭 𝐞 𝐩 (T)) ⊑_{A} {𝐬 𝐭 𝐞 𝐩}_{A} (α (T))

We can use this new abstract stepping function to define an bounded static analysis of depth $n$ : ${𝐁 𝐒 𝐀}_{A}^{n} = {𝐬 𝐭 𝐞 𝐩}_{A}^{n} (α (I_{P}))$ , and we can prove by induction over $n$ that stepping in the abstract domain is in fact a may analysis ${𝐁 𝐒 𝐀}^{n} \subseteq γ ({𝐁 𝐒 𝐀}_{A}^{n})$ .

Show that ${𝐁 𝐒 𝐀}_{A}^{n}$ a may analysis

By induction over $n$ , in the base case you have to show that ${𝐬 𝐭 𝐞 𝐩}^{0} (I_{P}) \subseteq γ (α ({𝐬 𝐭 𝐞 𝐩}_{A}^{0} (I_{P})))$ and in the induction case you need to show that ${𝐬 𝐭 𝐞 𝐩}^{(n + 1)} (I_{P}) \subseteq γ ({𝐬 𝐭 𝐞 𝐩}^{(n + 1)} (α (I_{P})))$ but are given ${𝐬 𝐭 𝐞 𝐩}^{n} (I_{P}) \subseteq γ ({𝐬 𝐭 𝐞 𝐩}^{n} (α (I_{P}))) .$

3.1. The State Abstraction

The first, and most useful abstraction, is the state abstraction. We can abstract every trace as it's final state.

(2^{𝐓 𝐫 𝐚 𝐜 𝐞}, \subseteq) \leftrightarrow_{γ}^{α} (2^{𝐒 𝐭 𝐚 𝐭 𝐞}, \subseteq)

We can define the abstraction from a set of traces as the set of end states and the concretion of a set of states as the set of traces that end in one of the states.

\begin{matrix} α (T) = & {τ_{| τ |} | τ \in T} \\ γ (S) = & {τ | s \in S, τ \in 𝐓 𝐫 𝐚 𝐜 𝐞, τ_{| τ |} = s} \end{matrix}

This is a useful abstraction for us, because we do not need to know where the trace came from to figure out if an assertion error exist in the program, only that a trace ends in an assertion error.

We can now define ${𝐬 𝐭 𝐞 𝐩}_{𝐒 𝐭 𝐚 𝐭 𝐞}$ as only stepping the final state of each trace:

{𝐬 𝐭 𝐞 𝐩}_{𝐒 𝐭 𝐚 𝐭 𝐞} (S) = S \cup {s' | s \in S, δ (s, s')}

Check that ${𝐬 𝐭 𝐞 𝐩}_{𝐒 𝐭 𝐚 𝐭 𝐞}$ is an abstract operation of $𝐬 𝐭 𝐞 𝐩$

α (𝐬 𝐭 𝐞 𝐩 (T)) \subseteq {𝐬 𝐭 𝐞 𝐩}_{𝐒 𝐭 𝐚 𝐭 𝐞} (α (T))

Now let ${𝐁 𝐒 𝐀}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n} = {𝐬 𝐭 𝐞 𝐩}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n} (α (I_{P}))$ . It is now clear by induction on $n$ that ${𝐁 𝐒 𝐀}^{n} \subseteq γ ({𝐁 𝐒 𝐀}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n})$ . So if there exist an assertion error within $n$ steps, then there exist a trace $τ \in {𝐁 𝐒 𝐀}^{n}$ st. $τ_{| τ |} = err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)$ and $| τ | \leq n$ .

From our Galoi connection we can see that

\begin{matrix} {τ err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)} \subseteq {𝐁 𝐒 𝐀}^{n} \subseteq γ ({𝐁 𝐒 𝐀}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n}) \\ ⟹ & α ({τ err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)}) \subseteq {err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)} \subseteq {𝐁 𝐒 𝐀}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n} \end{matrix}

Which means that if $err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’) \notin {𝐁 𝐒 𝐀}_{𝐒 𝐭 𝐚 𝐭 𝐞}^{n}$ , then ${τ err (‘𝚊𝚜𝚜𝚎𝚛𝚝𝚒𝚘𝚗 𝚎𝚛𝚛𝚘𝚛’)} ⊈ {𝐁 𝐒 𝐀}^{n}$ . Which is exactly the grantee we are looking for in a may analysis.

3.2. The Per-Instruction Abstraction

In our next abstraction we take advantage of that we treat states at with the same program counter alike.

Let's focus on a JVM without a method stack, which means that every state is $⟨ σ, λ, ι ⟩$ . We'll get back to talk about how to handle the method stack in later sections. We can abstract this by collecting the states per $ι$ . Let $𝐏 𝐜 = ι \to 2^{𝐒 𝐭 𝐚 𝐭 𝐞}$ be a mapping from program counters to sets of states. $𝐏 𝐜$ is a lattice with partial order $⊑_{𝐏 𝐜}$ were one mapping is less than another if all states are smaller than the other and $⊔_{𝐏 𝐜}$ is pointwise $\cup$ of states.

(2^{𝐒 𝐭 𝐚 𝐭 𝐞}, \subseteq) \leftrightarrow_{γ}^{α} (𝐏 𝐜, ⊑_{𝐏 𝐜})

Convince yourself this is a Galoi connection

Now we can write a stepping function that can step all states with the same instruction at once ${𝐬 𝐭 𝐞 𝐩}_{ι}$ , and then merge the result into the other states.

{𝐬 𝐭 𝐞 𝐩}_{𝐏 𝐜} (C) = C ⊔_{𝐏 𝐜} ⨆_{𝐏 𝐜} {ι' \mapsto {⟨ σ', λ', ι' ⟩} | \begin{matrix} (ι \mapsto S) \in C, S' = {𝐬 𝐭 𝐞 𝐩}_{ι} (S), \\ ⟨ σ', λ', ι' ⟩ \in S' \end{matrix}}

Consider the following python, which does the same thing, given you have defined StateSet as an set of states:

def many_step(states : dict[Pc, StateSet]):
    next_states = states.copy()
    for pc, astate in states.items():
        b = bytecode[pc]
        # run the operation on the state set
        for (pc_, s_) in lookup(self, f"step_{b['opr']}")(b, pc, astate):
          # merge the new state with the current state or bottom
          next_states[pc_] = next_states.get(pc_, StateSet.bot) | s_
    return next_states

3.3. The Per-Variable Abstraction

We are now faced with our first hard choice.

We would love to use Sign abstraction we build in the previous section, one way of doing that is compressing the state, so that instead of having a set of states, we can have distribute the content of the locals and stack.

𝐏 𝐯 = ι \to {(2^{𝐕_{σ}})}^{⋆} \times {(2^{𝐕_{λ}})}^{⋆} \cup {ok, err (‘.’)}

Since the stack and the locals are typed at each instruction, these are actually sets of integers, which we know how to abstract using the sign abstraction. Furthermore $𝐏 𝐯$ is also a partially ordered set, by point wise comparing the set of variables, and a lattice by point-wise joining and meeting the sets of variables. We can also see that $(𝐏 𝐜 \leftrightarrow_{γ}^{α} 𝐏 𝐯)$ is a galoi connection.

The problem with this abstraction, and the reason this choice is hard, is that is the first that severely over-approximates our problem. More about this next time.

Define the Abstract State

Define the Abstract (per variable) State in your program analysis, remember to define the ordering as well as the meet and join operations. It might be useful to define three pseudo elements Bot, is an element that joins with all others, Err represent the error state and Ok represent the Ok state.

In Python, we could define an abstract state like so:

class AbstractState:
  stack: list[AbstactValues]
  locals: list[AbstactValues]

Remember to define the abstract, __le__, __and__, and __or__ methods.

Where the abstract values, if we only work with integers, could be the SignSet from before.

3.4. Getting Started

To get started with static analysis we need to make some changes to our interpreter, as our input values are going to be abstractions.

Extending the interpreter

Extend or rebuild your interpreter so that it is able to handle the per-instruction abstraction.

You can use the following snip-it for inspiration:

@dataclass
class AbstractInterpreter:
  bytecode : dict
  states : dict[ByteCodeOffset, AbstractState]
  final : set[str]
  arithmetic : Arithmetic

  def step(self):
    next_states = self.states.copy()
    for pc, astate in self.states.items():
        b = bytecode[pc]
        for (pc_, s_) in lookup(self, f"step_{b['opr']}")(b, pc, astate):
          # if the pc_ is -1 we are "done", and s_ is the end string
          if pc_ == -1:
            final.add(s_)
          else:
            # merge the new state with the current state or bottom
            next_states[pc_] = next_states.get(pc_, AbstractState.bot) | s_
    self.states = next_states
 
  # Stepping functions are now generators that can generate different 
  # states depending on the abstract state.
  def step_ifz(self, b, pc, astate):
    # depending on the defintion of the abstract_state 
      aval = astate.stack.pop()
      # Note that the abstract value might both compare and not compare to 0
      for b in self.arithmetic.compare(b["compare"], aval, 0):
        if b:
          yield (b["target"], astate.copy())
        else:
          yield (pc + 1, astate.copy())

Experiment with different Abstractions

You have been introduced to the Sign Abstraction, but next time we'll look at more different abstractions. If you have time, try to come up with your own.

References

1. Flemming Nielson and Hanne Riis Nielson. Semantics with Applications: An Appetizer.
2. Hanne Riis Nielson and Flemming Nielson. 2020. Program Analysis (an Appetizer).