Concolic Execution

Christian Gram Kalhauge

Table of Contents

  1. SMT Solving
  2. Symbolic Execution
  3. Concolic Execution

Questions

  1. What are some good uses for SMT Solving?

  2. What are the big limitations of Symbolic Execution?

  3. Why would you use Concolic Execution?

SMT Solving (§1)

  • The most useful skill of this course

ϕ1=(A¬B)Cϕ2=(¬AB)(AC)(B¬C)

check out Z3 Playground

(declare-const a Bool)
(declare-const b Bool)
(declare-const c Bool)

(assert (and (or a (not b)) (not c)))
(assert (and (or (not a) b) (or a c) (or b (not c))))

(check-sat)
(get-model)

In practice (§1.1)

  • Encode your problem as a proposition that the SMT solver can check.

  • Run the SMT solver, by checking if the proposition is satisfied.

  • If that is true, then a model (a truth assignment) is found, you should now turn that back into a (counter -) solution the user can understand.

N-Queens

import z3
import itertools


def at_most_one(props):
    conditions = []
    for p, q in itertools.permutations(props, 2):
        conditions.append(z3.Or(z3.Not(p), z3.Not(q)))
    return z3.And(*conditions)


columns = "abcd"
rows = "1234"

# Create a boolean for each possible queen location.
queen_on_square = [[z3.Bool(f"Q{c}{r}") for c in columns] for r in rows]
s = z3.Solver()

# There need to be exactly one queen per row.
for ir, r in enumerate(rows):
    row = [queen_on_square[ic][ir] for ic, c in enumerate(columns)]
    s.add(z3.Or(*row))
    s.add(at_most_one(row))

# There can be at most one queen per column.
for ic, c in enumerate(columns):
    column = [queen_on_square[ic][ir] for ir, r in enumerate(rows)]
    s.add(at_most_one(column))

# There can be no more than one queen per diagonal.
prim, secd = [[] for r in row * 2], [[] for c in column * 2]
for ir, r in enumerate(rows):
    for ic, c in enumerate(columns):
        # primary diagonals are constant over subtraction
        prim[ir - ic].append(queen_on_square[ic][ir])
        # secondary diagonals are constant over summation
        secd[ir + ic].append(queen_on_square[ic][ir])

for diag in prim + secd:
    s.add(at_most_one(diag))

# This is kind of an ugly hack.
assert str(s.check()) == "sat"

model = s.model()

for ir, r in reversed(list(enumerate(rows))):
    print(f" {r} ", end="")
    for ic, c in enumerate(columns):
        is_black = (ir + ic) % 2 == 0
        if is_black:
            print("\033[40m\033[37m ", end="")  # ]]
        else:
            print("\033[47m\033[30m ", end="")  # ]]
        has_queen = bool(model[queen_on_square[ic][ir]])
        print("♛" if has_queen else " ", end="")
        print(" \033[0m", end="")  # ]
    print()
print(f"    {'  '.join(columns)}")

Symbolic Execution (§2)

  • Running programs with symbols instead of values.

int incr(int a) { 
  int x = a;        // x := a
  x += 1;           // x := a + 1
  x += 3;           // x := a + 1 + 3 = a + 4
  return x;         // return (a + 4)
}

Symbolic Values (§2.1)

s1,s2𝐒𝐲𝐦𝐕𝐚𝐥:=v– a symbol|s1+s2– operations on the symbols|– and so on

Remember to simplify

>>> i = z3.Int("i")
>>> for r in range(10):
...     i = i + 1
... 
>>> print(i)
i + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1

>>> i = z3.Int("i")
>>> for r in range(10):
...     i = z3.simplify(i + 1)
... 
>>> print(i)
10 + i

Path Constraints (§2.2)

ψσ
int incr(int a) { 
  int x = a;          // []                    |- x := a
  if (x > 10) {       // [a > 10]              |- x := a
    x =+ 1;           // [a > 10]              |- x := a + 1
    if (x < 10)       // [a > 10, a + 1 < 10]  |- UNSAT
      assert false;   // never executed
  }
  return x;           // returns [ a  > 10 |- a + 1 
                      //         , a <= 10 |- a      ]
}
>>> import z3
>>> a = z3.Ints("a")[0]
>>> s = z3.Solver()
>>> path = []

>>> x = a
>>> path.append(x > 10)
>>> print(path, s.check(path), s.model())
[a > 10] sat [a = 11]

>>> x += 1
>>> print(x) 
a + 1
>>> path.add(x < 10)
>>> print(path, s.check(path))
[a > 10, a + 1 < 10] unsat 

But how do we handle loops?

We don't... Bound the depth of the execution

In Practice (§2.3)

stack : list[tuple[PC, SymState, SymPath, int]]
def step(pc: PC, state: SymState) 
  -> Iterable[PC, SymState, SymPath]]: ...
def analyse(pc : PC, inputs : list[tuple[str, JvmType]], max_depth : int):
  # assuming only integers, create symbolic vars for all input
  locals = z3.Ints(" ".join(i_name for i_name, _ in inputs))
  # create state.
  state = SymState.from_locals(locals)

  # Create a stack to do depth first search with.
  stack : list[tuple[PC, SymState, SymPath]] = [(pc, state, [])]

  # As long as the stack is not empty
  while stack:
    # pop the end of the stack
    (pc, state, path, n) = stack.pop(-1)

    # Check if state does something we want to warn the user about.
    if interesting(state):
      return "found interesting state"

    # otherwise create the list of next branches 
    for (pc, state_, pathc) in step(pc, state):
        # add new path constraints to the path
        path_ = path + pathc

        # check if path is reachable
        if str(z3.check(path_)) != "sat": 
          continue

        # append stack if within limits
        if n < max_depth:
          stack.append((pc, state_, path_, n + 1))

  return f"all is fine, uptil depth {max_depth}"

Limitations (§2.4)

  • Path Explosion

  • One-way Functions

  • Strings

  • Arrays

  • Environment

Concolic Execution (§3)

Concrete and Symbolic

  • Do not use the solver to check reachability.

  • Use the solver to find new inputs.

concolic(P,i)=ψ0,ψ1,,ψn
i𝐬𝐨𝐥𝐯𝐞(ψ0ψ1¬ψn)

Code Instrumentation (§3.1)

  • Don't do it... yet.

Questions

  1. What are some good uses for SMT Solving?

  2. What are the big limitations of Symbolic Execution?

  3. Why would you use Concolic Execution?