Dynamic Analysis

Let's run it and see?

Christian Gram Kalhauge

The goal of this chapter is to ask and answer the following questions:

In which cases does dynamic analysis outperform syntactic analysis?
When is random sampling of inputs better than testing small values and visa versa?
What are the big limitations of dynamic analysis?

1. What is Dynamic Analysis?

Dynamic program analysis, Wikipedia.

Dynamic analysis, is finding out information about a program by running it. More formally, it means that we will try to make inferences about all the traces of a program by from only a few.

Running the program is the easiest way to get a trace from the program, but it does have some disadvantages:

Setting up the correct environment of a program can be hard to downward impossible, without running in production.
Warning about that something can happen by doing it, can be problematic if the program has side-effects like deleting the database or firing the missiles.
Running the program, will never tell you if something can't happen, like the program running forever.

It does, when done correctly, also have some very real advantages:

The analysis can present you with a concrete trace that produce the error. This makes it very easy to debug the error.
Since the goal of the dynamic analysis is to underestimate the set of valid traces, if it warns us about a problem, then the problem is real.

No analysis are truly sound

I was convinced that we could create a sound dynamic memory analysis for an company focusing on embedded devices.

The idea was simple, fuzz the program while using a memory sanitizer to check if all memory that was allocated was eventually deallocated.

The analysis per-say was sound, every trace we warned about did contain a memory error, but the company in question had a coding style where they didn't clean up the initial memory allocation, because it was cleaned up when the device were shut off. The warnings we produced were not a problem to this company, and therefore a false positive.

The lesson: no analysis is fire and forget, they all requires some coding conventions to give good results.

1.1. Is ... a Dynamic Analysis?

Dynamic analysis is sometimes looked down on because it a simple approach to verifying the program. Actually, you have been doing dynamic analysis most of your career without knowing it.

1.1.1. Running the Program?

Yes! Running the program is a great way to figure out what it does. However, we can newer answer question about what it doesn't do.

1.1.2. Logging?

Yes! Printing out information about the runtime of the system is one of the easiest dynamic analysis. Of cause, the inference about the program is mostly left up to the developer.

1.1.3. Testing?

Yes! Running the program and checking the results is a straight forward and automatic dynamic analysis.

When have we tested enough? Well, we can newer know, but metrics like instruction and path coverage are great ways of estimating if we have covered all cases in the program.

1.2. In Theory

The fundamental goal of a dynamic analysis is to figure out if a trace contains a problem.

if check(trace): return True

In case of checking for error, we only care about the end states, so it's relatively easy. However, this works well if we know how to run the program and get a trace. But what if we care about all traces?

The goal of dynamic analysis is to find as many different traces in the program as possible, covering as much of the behavior of the program.

The first part of dynamic analysis is choosing traces. We call this Trace Selection. Naively, we could imagine that we do the following:

traces = sem(p)
# pick a trace from possible uncovered traces
while trace := select(traces):
    # check if the trace satisfies 
    if check(trace): return True
    # remove the trace from the traces.
    traces = traces - { trace }

The problem is of cause that the number of traces in the semantics of most programs infinite or close to that. Therefore, we try to expand the number of traces we cover and eliminate every time, by inferring information about other traces from one trace. We call this Trace Prediction.

Given a trace $t \in Sem ⟦ 𝙿 ⟧$ we can predict a possible infinite class of traces $[t] \subseteq 𝐓 𝐫 𝐚 𝐜 𝐞$ . Now we can build our property checker so that it can check a class of predicted classes. The pseudo-code for writing a dynamic analysis is there fore something like this:

traces = sem(p)
# pick a trace from possible uncovered traces
for trace in select(traces):
    # find the trace class
    trace_class = predict(trace)
    # check the trace class for the property.
    if check(trace_class): return True
    # remove all traces we predict will also fail the check.
    traces = traces - trace_class

1.2.1. The Path Equivalence Class

The most interesting trace class is the path equivalent class: ${[t]}_{π}$ . In this trace class, we say that all traces, that follow the same path through the program are equivalent:

In our JVM language, the path equivalence class, is all traces that has has the same bytecode offset in the same order:

{[τ]}_{π} = {t' \in 𝐓 𝐫 𝐚 𝐜 𝐞 | \forall i \in [1, \infty) . (t'_{i}) . ι = (t_{i}) . ι}

This class is interesting because many errors are path dependent. For example consider the following buggy program, which checks if the wrong part of the division operator:

int checkTheWrongThing(int a) { 
  if (a != 0) { 
    return a / 0;
  } else { 
    return 0;
  }
}

This program contains near infinite ( $2^{32}$ ) traces, the one starting from 0, 1, -1, 2, and so on, but only contains two path equivalence classes. The one ending in a error and the one that does not.

1.2.2. Coverage

When writing a dynamic analysis, it is nice to know how much of the possible paths you have covered. We can measure coverage in many different ways. The most common are line coverage, branch coverage, and instruction coverage.

Instruction coverage of a set of traces is the set of instruction covered by

#_{ι} (T) = {τ_{i} . ι | i \in [0, \infty], τ \in T}

The reason instruction coverage is a nice coverage metric because we can see that the cover of the path equivalence class of a trace is the same as the cover of one element from the class:

#_{ι} ({[τ]}_{π}) = #_{ι} ({τ})

So if we have uncovered instructions, either we have not found the trace class that covers it, or there is dead-code in our program.

1.3. Getting Started

To get started on writing a dynamic analysis you simply have to continue from last time and expand your interpreter to be as capable as possible.

Interpretation as a library

First convert your interpreter to a library. Consider doing it in a way so that you can easily override specific operation handlers and especially how to handle arithmetic of values.

You can (but don't have to) achieve this either through a mixture of composition or inheritance.

# compose the interpreter of a state and a arithmetic module.
@dataclass
class Interpreter:
  state : State
  arithmetic : Arithmetic 

  ...

  def step_binary(self, b):
      st = self.state.copy() # Copy the current state.
      a = st.stack.pop()
      b = st.stack.pop()
      if b["opr"] == "add":
        c = self.arithmetic.add(b["type"], a, b)
        st.append(a)
      ...
      return st

Now you can change interpreter by either inheriting the Interpreter or by using a different Arithmetic module.

2. Fuzzing

[ 2 ]

One of the most successful dynamic analysis strategies is called Fuzzing. There exist many different kinds of fuzzing, but I'm going to use the definition of [ 2 ]. Fuzz testing is about automating software testing by generating test automatically.

Essentially, fuzz testing is all about trace selection.

2.1. Random Inputs

The easiest way to get started with fuzzing, is to simply choose a inputs at random at the start of the program and then seeing if it can make the program fail.

This works because most traces of a program can be approximated using relatively few inputs. The hypothesis is that the input space relatively correspond to the path equivalent space. And this is correct for examples like this:

void split(int i) { 
  if (i > 0) { 
    ...
  } else {
    ...
  }
}

In this world we can clearly see there are two path equivalent classes, those with $i > 0$ and $i \leq 0$ . In this case choosing $i$ randomly give equal chance to hit either branch.

Your first dynamic analysis

Build a new analysis that use your interpreter. To generate traces you can choose inputs randomly from their domains and then run the interpreter. Figure out a good limit for how many random inputs to use

Now if the program exhibit a query behavior emit 100% for that behavior, otherwise return 50% or nothing for the rest of the behaviors.

Tag your solution with random and dynamic.

2.2. Dictionaries

The problem is of cause that not all branches are like that, some are checking for specific values.

void isEqualToZero(int i) { 
  if (i == 0) { 
    ...
  } else {
    ...
  }
}

In this case we use a trick called dictionaries. We pre generate a list of interesting values to test for. By using a syntactic analysis we can find all literals, e.g. 1 and "test", in the program and use those as inputs to increase the chance to hit new branches.

A hybrid analysis

Can you improve your analysis by first scanning the input method for interesting values?

Tag your solution with random, dictionary, syntatic, and dynamic.

2.3. Coverage-Guided Fuzzing

[ 1 ]
aflplus.plus/

While not as applicable to our benchmark suite yet. A cool approach used in real application is Coverage-Guided Fuzzing [ 1 ].

This approach manipulates a list of byte-strings called test cases. The user of the fuzzer are expected to provide a function of those byte-strings to input of the program. And now the fun begins.

We start with the empty byte-string as input, then we find its coverage (the bytecode offsets it executes). We add that byte-string to a list we call interesting.

Now, while we have interesting test-cases:

We pick a test-case, and manipulate it, by either changing a byte, removing a byte or appending a byte.
Then we run it, and see if it add coverage compared to the previous cases.
If it does add it to the interesting cases, otherwise ignore it.
Repeat.

A Coverage-Guided Analysis

Can you improve your analysis by using the coverage-guided technique?

Tag your solution with random, coverage, and dynamic.

2.4. Above and beyond

If you are interested, you should read through the FuzzingBook [ 2 ], it has many tips and pointers on how to randomly select inputs to increase the chance of hitting a target.

3. The small-scope hypothesis

[ 3 – 5 ]

As an alternative to the fuzzing technique, we have the small-scope hypothesis:

The Small-Scope Hypothesis

Most bugs in a program can be found by investigating only a small section of the inputs.

Model-checkers like Alloy [ 3 ] and testing frameworks like SmallCheck [ 4 ], use this idea by only investigating small test-cases. Instead of testing random numbers, you instead test only a few small inputs. The good thing about this approach is that

you can be deterministic and always test the same inputs, because you can cover all small values to a depth $d$ .
The second advantage is that if a buggy input is found, it is the smallest input that can create that error.

This technique excel at small cases, like checking of empty lists or stuff like that:

void isEqualToZero(int i) { 
  if (i == 0) { 
    ...
  } else {
    ...
  }
}

However, the technique does have clear disadvantages: what if the error hides behind a big input?

void isEqualToAMillion(int i) { 
  if (i == 1000000) { 
    ...
  } else {
    ...
  }
}

In the following activity, you can seek inspiration from the SmallCheck paper [ 4 ].

Make use of the small-check idea

Instead of trying random inputs, create a generator that can generate list of all inputs smaller than some depth $d$ .

To generate integers you can use the following python generator:

def gen_int(depth): 
  yield 0
  for i in range(depth):
    yield (i + 1)
    yield -(i + 1)

Now use iterative deepening to slowly increase $d \in [1, 2, 3, \dots, D]$ until some max depth $D$ .

Tag your solution with smallcheck and dynamic.

References

1. Marc Heuse, Heiko Eißfeldt, Dominik Maier , and Andrea Fioraldi. 2020. AFL++: Combining incremental steps of fuzzing research.
2. Christian Holler, Gordon Fraser, Marcel Böhme, Rahul Gopinath , and Andreas Zeller. 2024. The Fuzzing Book. CISPA Helmholtz Center for Information Security.
3. Daniel Jackson. 2012. Software Abstractions: Logic, Language, and Analysis.
4. Fredrik Lindblad, Matthew Naylor , and Colin Runciman. 2008. Smallcheck and lazy smallcheck: automatic exhaustive testing for small values.
5. Darko Marinov, Sarfraz Khurshid, Dumitru Daniliuc , and Alexandr Andoni. 2003. Evaluating the Small Scope Hypothesis.