Introduction

Decidability

Christian Gram Kalhauge

The goal of this chapter is to ask and answer the following questions:

Were you stuck on anything in the setup phase?
What strategy gave the best results for you?
Is program analysis easy?

1. What is Program Analysis?

Program analysis, Wikipedia.
Turing completeness, Wikipedia.
Decidability (logic), Wikipedia.

To talk about program analysis, we first have to talk about what we mean when we say program. A program in the context of this course is going to be a structured object that exhibit some behavior when executed.

When we say a program $p$ is a structured object, we mean that is taken from some set of possible programs which we call a language ( $L$ ): $p \in L$ . And, when we say behavior we mean that it affects the world in some way.

Program analysis is the art of extracting facts about the structure or the possible behavior from a program. These facts could be does the program eventually come to a halt, is the program well-formatted, or will my program every exhibit a bug. In this course, we are going to investigate multiple approaches to answer these questions ranging from manual to automatic, from syntactic to semantic, and from dynamic to static.

For simple languages, it is relatively easy to figure out what they do. For example, 1 + 2 will always execute to 3. But, that program analysis, in general, is extremely hard.

1.1. A Turing-Complete Mess

We are going to focus most our energy on program from languages that are Turing complete. Turing complete languages are languages in which you can write any program executable on a machine. This class of programs is also what you are mostly familiar with. Java, C, Python, etc., are all Turing complete languages. Even Brainfuck and PowerPoint are Turing complete.

Using Turing complete languages is a double edge sword: while it is nice that it is powefull enough to do everything, it is also powerful enough to:

fire the missiles,
not fire the missiles,
never terminate,
kill grandma, and worst of all
throw a null pointer exeception!

So it would be nice be warned if any of these things might happen. However, that turns out to be very hard.

1.2. The Halting problem

Halting problem, Wikipedia.

Consider the base of all anlysis problems, the halting problem:

The halting problem

Given any program $p \in L$ from the language $L$ , decide if it is going to terminate (halt) when executed on a machine $m$ .

This problem turns out to be impossible to solve correctly for all programs of a Turing complete language. It is in fact undecidable. While for some problems it is possible to say with confidence that it does terminate, we can't build an mechanical procedure which can do this for every program.

The argument goes a little like this. Consider you have a mechanical procedure for solving the halting problem, in that case we can encode it as a program:

def does_halt(p : Program) -> bool:
    ...

Now we can use this program in another program:

def main():
    while does_halt(main):
        continue

This is a weird program: We can see that if main halts, it runs forever, and if it runs forever it will eventually halt. This is of cause impossible, which strongly suggest that does_halt cannot exist.

1.3. General Undecidability of Program Analysis

We could hope that the problem that we can't figure out if a program terminates or not is special and does not affect any of the other things we would like to know about the program. But, sadly this is not the case. Almost any interesting thing you would like to know about the behavior of the program can be reduced to the halting problem. This fact is called Rice's theorem.

Since figuring out the halting problem is impossible, we can't say if the following problem actually fires the nukes:

def main():
    something_that_might_go_forever()
    fire_the_nukes()

1.4. Sound or Complete, choose one.

Soundness, Wikipedia.

There is however hope. In most cases, you either care that something may happen, or that it must happen. In the program from before, we can quite easily say that it may fire the nukes. If we are keen on keeping nukes from not being fired, this might be a bad thing. However, we might also build a missile luncher in which we would rather be told that if we press the red button the nukes must be fired.

**Figure:** A sound analysis do not have false positivies, and a complete analysis does not have false negavies.

These kinds of analyses allows us to error one of the sides. Here we steal some nomenclature from logic. In logic soundness means that every provable statement is true, and completeness means that every true statement is provable. Translated into program analysis jargon, we say an analysis is sound if when it produce a fact, then that is a real fact, and complete if it always can find a fact about the program if it is there. Because the problem we are talking about is undecidable, we can't both be sound and complete.

WARNING: depending of the culture and essentially the underlying question, soundness and completeness can mean the same thing. A sound analysis that reject buggy programs is the same as a complete analysis finds bugs in programs.

1.5. A Relaxed Goal

dl.acm.org/doi/pdf/10.1145/2644805

Actually, it turns out that in most real language settings, it is very hard to write an program static analysis that are sound. So in this course, instead of working with trivial programs, we relax the goal, and instead of requiring our analyses to be sound or complete, we try to aim to produce the best result in the shortest amount of time.

[...], virtually all published wholeprogram analyses are unsound and omit conservative handling of common language features when applied to real programming languages. -- In Defense of Soundiness: A Manifesto

2. The Kinds of Analysis

When talking about program analyses we break them down according to their properties.

2.1. Manual vs Automatic Analysis

The difference between manual and automatic analysis, is whether we have a well-defined set procedure for analysing the code.

In many cases are manual inspection a crucial companion to automatic analysis. If taken to the extreme, we can actually require the user of the tool to prove to us that the code is correct. Now we only have to check the proof, which case we are entering the world of Program Verification.

2.2. Syntactic vs Semantic Analysis

The difference between a syntactic and a semantic analysis is whether we focus on the structure of the program or on the meaning of the program.

The structure of the program or syntax is often represented as a tree of nodes as recognized by the parser.

In contrast the meaning of the program or semantics is represented as a set of all possible traces of a program. Here trace means a sequence of states and operations possible by the program.

2.3. Dynamic vs Static Analysis

Finally we can differentiate between dynamic and static analysis. A dynamic analysis interpolates the meaning of the program from a single trace, where a static analysis tries to predict all possible behaviors.

Dynamic analysis are often just executing the programs, and then reporting any behavior it exhibits. An dynamic analysis often have proof of the bad behavior. A dynamic analysis is sound if every behavior it finds is a real behavior, and complete if can find all behaviors.

Static analyses in contrast consider the entire programs, and then reports if the program is without bugs or problems. When a good static analysis says your program is good, it probably is, however, when it finds a potential bug, it can often not prove it to you. A static analysis is sound if every program it flags do exhibit some behavior, and complete if it flags all programs that contain the behavior.

It is sometimes a great idea to do a mix of a dynamic and static analyis, in which case we call it a hybrid analysis.

3. Getting Started

The goal of this course is to be a practical introduction to program analysis. We are therefore sometimes going to skip some of the theory to make more room for implementation. However we'll try to reference relevant resources when necessary.

Throughout this course you'll find activities which we suggest you'll do to get started with the over arching problem:

The first activity

This is a sample activity, you don't have to do anything for this.

3.1. The Game

In this course, we are going to build the best analysis that we can for the JPAMB suite. The goal of the game is to write analysis which can score the highest, and do that in the shortest amount of time.

Download the benchmark suite

Download the benchmark suite:

$ git clone https://github.com/kalhauge/jpamb.git

and read the README file.

Create your first analysis!

Here is an example of a program that thinks that it is highly probable that all cases contain an assertion error.

#!/usr/bin/env python3
import sys

# ignored for now
method_name = sys.argv[1]

print("assertion error;80%")

Save it under guess.py and make it exeuctable

$ chmod +x ./guess.py

Evaluate your analysis

First create an experiment.yaml file.

# Choose a group name, remember it will be made public so choose 
# a random string if you want to be anonymous
group_name: <INSERT_COOL_GROUPNAME_HERE> 

# Answer true if the data you provide can be used anonomized for science!
for_science: true

# Underneath here add the tools which you want to run
tools:
  test: 
    technologies:
      - python
    
    executable: ./guess.py
      # if on windows you'll need to write it like this:
      # - python
      # - guess.py

# Describe your machine here (for research purposes) 
machine:
  os: macosx
  processor: "2,3 GHz Quad-Core Intel Core i7"
  memory: "16 GB 3733 MHz LPDDR4X"

Then run the experiment:

$ ./bin/evaluate.py experiement.yaml -v > result.json

Finally, you should submit it to learn.

3.2. The Rest of the Course

The rest of the course is about applying different techniques from the course to score as high as possible in the benchmark suite.