Bounded Static Analysis

Everything Everywhere All at Once

Table of Contents

  1. What is Bounded Static Analysis?
  2. Abstractions
  3. Bounded Abstract Interpretation
  4. Getting Started

Proposal due Wednesday Evening after fall break.

But you can hand it in before.

Questions

  1. Why should we use abstractions when analysing code?

  2. What is the advantage of Galoi connections?

  3. Which cases are the Sign abstraction not able to catch?

  4. At which cases do your static analysis outperform your dynamic?

Feedback

Is this still fun?

What is Bounded Static Analysis? (Β§1)

In Theory (Β§1.1)

Breath-first!

πšœπšπšŽπš™P(T)={Ο„β€²sΒ |Β s∈Σ,Ο„β€²βˆˆT,Ξ΄P(Ο„β€²|Ο„β€²|,s)}
πšœπšπšŽπš™P0=IPπšœπšπšŽπš™Pn=πšœπšπšŽπš™P(πšœπšπšŽπš™Pnβˆ’1)βˆͺπšœπšπšŽπš™Pnβˆ’1
𝐁𝐒𝐀Pn=πšœπšπšŽπš™Pn
𝐁𝐀𝐄Pnβ‰‘βˆƒ(Ο„s)βˆˆππ’π€Pn∧s=err(β€˜πšŠπšœπšœπšŽπš›πšπš’πš˜πš— πšŽπš›πš›πš˜πš›β€™)

May and Must Analyses (Β§1.2)

πšœπšπšŽπš™π—†π—Žπ—Œπ—(T)βŠ†πšœπšπšŽπš™(T)βŠ†πšœπšπšŽπš™π—†π–Ίπ—’(T)
πšœπšπšŽπš™π—†π—Žπ—Œπ—nβŠ†πšœπšπšŽπš™nβŠ†πšœπšπšŽπš™π—†π–Ίπ—’n
ππ€π„π—†π—Žπ—Œπ—nβŸΉππ€π„n¬𝐁𝐀𝐄nβŸΈΒ¬ππ€π„π—†π–Ίπ—’n

Abstractions (Β§2)

  1. The Sign Analysis
  2. Partially Ordered Sets (Posets)
  3. Lattices
  4. Galois Connection
  5. Abstract Operations

The Sign Analysis (Β§2.1)

𝐒𝐒𝐠𝐧={+,0,βˆ’}

Ξ± is the abstractor

Ξ±({1,2,3})={+}Ξ±({0})={0}Ξ±({βˆ’1,3})={+,βˆ’}Ξ±({βˆ’1,0,3})={0,+,βˆ’}

Partially Ordered Sets (Posets) (Β§2.2)

A partially ordered set (L,βŠ‘)

reflexiveβˆ€a.aβŠ‘aantisymetricβˆ€a.aβŠ‘b∧bβŠ‘a⟹a=btransitiveβˆ€a.aβŠ‘b∧bβŠ‘c⟹aβŠ‘c

Examples

  • Integers in both directions (β„€,β‰₯) (β„€,≀)

  • Sets ({𝚝𝚝,𝚏𝚏},β‡’)

  • Signed (2𝐒𝐒𝐠𝐧,βŠ†)

Why Partial Orders?

Lattices (Β§2.3)

  • Partial orders (L,βŠ‘L),

  • meet βŠ“,

  • join βŠ”,

  • and has a smallest and biggest element.

Hasse diagrams. (Β§2.3.1)

Hasse diagram phi Ø x->phi x {+} y->phi y {0} z->phi z {-} xy->x xy->y xy {+, 0} xz->x xz->z xz {+, -} yz {0, -} yz->y yz->z xyz {+,0,-} xyz->yz xyz->xz xyz->xy

A Hasse diagram over the poset (2𝐒𝐒𝐠𝐧,βŠ†). By I, KSmrq, CC BY-SA 3.0 (with edits)

Back to the Sign Abstraction (Β§2.3.2)

βŠ‘π’π’π π§=βŠ†, βŠ₯=βˆ…, and ⊀={+,βˆ’,0}

Galois Connection (Β§2.4)

βˆ€a∈C,b∈A.aβŠ‘CΞ³(b)⇔α(a)βŠ‘Ab

(A Galois Connection is a connection between two ordered sets, with a concretion Ξ³ and an abstraction Ξ± function.)

What are they good for?

  • A mathematical definition of one-sided information loss. If my abstraction finds a bug, is there a bug in the original domain?

  • A weak equality.

Laws For Free

cβŠ‘CΞ³(Ξ±(c))– if we abstract we only increase in sizeΞ±(Ξ³(a))βŠ‘Aa– if we concertize we only decrease in sizeΞ±(Ξ³(Ξ±(c)))=Ξ±(c)– we don’t keep loosing informationΞ³(Ξ±(Ξ³(a)))=Ξ³(a)– in both directionsΞ³(a1βŠ”Aa2)=Ξ³(a1)βŠ”CΞ³(a2)– Ξ³ preserves joins.Ξ±(c1βŠ“Cc2)=Ξ±(c1)βŠ“AΞ±(c2)– Ξ± preserves meets.

Adjunctions and Testing

# Most of the time
parse(pretty(a)) == a 

# Always
pretty(parse(pretty(a))) == pretty(a)

# Bonus for all strings s:
parse(pretty(parse(s))) == parse(s)

Back to the Sign Analysis (Β§2.4.1)

Ξ±(N)={+Β |Β n∈N,n<0}βˆͺ{βˆ’Β |Β n∈N,n>0}βˆͺ{0Β |Β n∈N,n=0}Ξ³(S)={nΒ |Β nβˆˆβ„€,n<0,+∈S}βˆͺ{nΒ |Β nβˆˆβ„€,n>0,βˆ’βˆˆS}βˆͺ{0Β |Β 0∈S}

Build the Sign Abstraction

@classmethod
def abstract(cls, items : set[int]): 
  signset = set()
  if 0 in items:
    signset.add("0")
  ...
  return cls(signset)
def __contains__(self, member : int): 
  if (member == 0 and "0" in self.signs): 
      return True
  ... 

Test your abstaction

βˆ€XβŠ‚2β„€:XβŠ†Ξ³(Ξ±(X))
from hypothesis import given
from hypothesis.strategies import integers, sets

@given(sets(integers()))
def test_valid_abstraction(xs):
  s = SignSet.abstract(xs) 
  assert all(x in s for x in xs)

Abstract Operations (Β§2.5)

A+232BβŠ†Ξ³(Ξ±(A)+𝐒𝐒𝐠𝐧α(B))

Ξ±(A+232B)βŠ†Ξ±(A)+𝐒𝐒𝐠𝐧α(B)

Define the operation

{+}+𝐒𝐒𝐠𝐧{+}={+}{+}+𝐒𝐒𝐠𝐧{βˆ’}={βˆ’,+,0}{+}+𝐒𝐒𝐠𝐧{0}={+}{0}+𝐒𝐒𝐠𝐧{+}={+}{0}+𝐒𝐒𝐠𝐧{βˆ’}={βˆ’}{0}+𝐒𝐒𝐠𝐧{0}={0}{βˆ’}+𝐒𝐒𝐠𝐧{+}={βˆ’,+,0}{βˆ’}+𝐒𝐒𝐠𝐧{βˆ’}={βˆ’}{βˆ’}+𝐒𝐒𝐠𝐧{0}={βˆ’}…

Test your Abstract Operation

@given(sets(integers()), sets(integers()))
def test_sign_adds(xs, ys):
  assert (
    SignSet.abstract({x + y for x in xs for y in ys}) 
      <= SignSet.abstract(xs) + SignSet.abstract(ys)
    )

Bounded Abstract Interpretation (Β§3)

(2π“π«πšπœπž,βŠ†)↔γα(A,βŠ‘A) 𝐁𝐒𝐀An=πšœπšπšŽπš™An

πšœπšπšŽπš™A0=Ξ±(IP)πšœπšπšŽπš™An=πšœπšπšŽπš™A(πšœπšπšŽπš™Anβˆ’1)βŠ”AπšœπšπšŽπš™Anβˆ’1

To show 𝐁𝐒𝐀nβŠ†Ξ³(𝐁𝐒𝐀An), we need to prove:

πšœπšπšŽπš™(T)βˆͺTβŠ†Ξ³(πšœπšπšŽπš™A(Ξ±(T))βŠ”AΞ±(T)).

But we can get away with

Ξ±(πšœπšπšŽπš™(T))βŠ‘AπšœπšπšŽπš™A(Ξ±(T))

The State Abstraction (Β§3.1)

(2π“π«πšπœπž,βŠ†)↔γα(2π’π­πšπ­πž,βŠ†)
Ξ±(T)={Ο„|Ο„|Β |Β Ο„βˆˆT}Ξ³(S)={τ |Β s∈S,Ο„βˆˆπ“π«πšπœπž,Ο„|Ο„|=s}
πšœπšπšŽπš™π’π­πšπ­πž(S)={sβ€²Β |Β s∈S,Ξ΄(s,sβ€²)}

This is nice because

  • if err(β€˜πšŠπšœπšœπšŽπš›πšπš’πš˜πš— πšŽπš›πš›πš˜πš›β€™)βˆ‰ππ’π€π’π­πšπ­πžn, then

  • Β¬βˆƒΟ„.Ο„err(β€˜πšŠπšœπšœπšŽπš›πšπš’πš˜πš— πšŽπš›πš›πš˜πš›β€™)βˆˆππ’π€n

The Per-Instruction Abstraction (Β§3.2)

𝐏𝐜=ΞΉβ†’2π’π­πšπ­πž
(2π’π­πšπ­πž,βŠ†)↔γα(𝐏𝐜,βŠ‘ππœ)
πšœπšπšŽπš™ππœ(C)=β¨†ππœ{ι′↦{βŸ¨Οƒβ€²,Ξ»β€²,ΞΉβ€²βŸ©}Β |Β (ι↦S)∈C,Sβ€²=πšœπšπšŽπš™ΞΉ(S),βŸ¨Οƒβ€²,Ξ»β€²,ΞΉβ€²βŸ©βˆˆSβ€²}

The Per-Variable Abstraction (Β§3.3)

𝐏𝐯=ΞΉβ†’(ℕ↦(2𝐕σ))Γ—(2𝐕σ)*βˆͺ{ok,err(β€˜.’)}
(𝐏𝐜,βŠ‘ππœ)↔γα(𝐏𝐯,βŠ‘ππ―)

Variable Abstractions (Β§3.4)

𝐏𝐯[A]=ΞΉβ†’(ℕ↦A)Γ—A*βˆͺ{ok,err(β€˜.’)}
(2𝐕σ,βŠ†)↔γα(A,βŠ‘A)⟹(𝐏𝐯,βŠ‘ππ―)↔γα(𝐏𝐯[A],βŠ‘ππ―[A])
(𝐏𝐯,βŠ‘ππ―)↔γα(𝐏𝐯[𝐒𝐒𝐠𝐧],βŠ‘ππ―[𝐒𝐒𝐠𝐧])

Abstractions All The Way Down. (Β§3.5)

(2π“π«πšπœπž,βŠ†)↔γα(2π’π­πšπ­πž,βŠ†)↔γα(𝐏𝐜,βŠ‘ππœ)↔γα(𝐏𝐯,βŠ‘ππ―)↔γα(𝐏𝐯[𝐒𝐒𝐠𝐧],βŠ‘ππ―[𝐒𝐒𝐠𝐧])

Getting Started (Β§4)

  • Abstract Your Interpreter

  • Emit multiple states.

  • Start Small, Just One frame.

Emit multiple States

def step(state : AState) -> Iterable[AState | str]
  ...

Do all states at once.

def many_step(state : dict[Pc, AState | str]) 
  -> dict[Pc, AState | str]:
  new_state = dict(state)
  for k, v in state.items():
      for s in step(v):
        new_state[s.pc] |= s
  return new_state

Questions

  1. Why should we use abstractions when analysing code?

  2. What is the advantage of Galoi connections?

  3. Which cases are the Sign abstraction not able to catch?

  4. At which cases do your static analysis outperform your dynamic?