Introduction

Decidability

Christian Gram Kalhauge

Table of Contents
  1. Program Analysis§1
    1. What is a Program?§1.1
      1. What is Program Analysis?§1.2
        1. A Turing-Complete Mess§1.3
          1. The Halting problem§1.4
            1. General Undecidability of Program Analysis§1.5
              1. Sound or Complete... Choose one.§1.6
                1. A Note: Soundness Confusion§1.6.1
                2. A Relaxed Goal§1.7
                3. The Kinds of Analysis§2
                  1. Manual vs Automatic Analysis§2.1
                    1. Syntactic vs Semantic Analysis§2.2
                      1. Dynamic vs Static Analysis§2.3
                      2. Getting Started§3
                        1. The Game§3.1

                        This is the first lecture in the course, and is about program analysis at large, and specifically about how to participate in this course.

                        Program Analysis §1

                        Back in the beginning of 20th century, a group of logician where obsessed with coming up with a system for proving all things in mathematics. The problem was that a foundational crisis had emerged. Many mathematical theories had holes in them, or was able to prove theorems which are false E.g ., Russell's paradox.

                        To this end they came up with different systems for automatically proving things. David Hilbert came up with the Hilbert's program, and Alanzo Church came up with the Lambda Calculus. The underlying idea was, if we could come up with an automatic procedure for producing everything that is true, we can recursively enumerate all true things. This is known as recursively axiomatizable.

                        However, they quickly ran into a problem, how can we prove that these programs actually terminate. This is the fundamental program analysis question, and it strangely came before the program.

                        What is a Program? §1.1

                        To talk about program analysis, we first have to define what we mean when we say program. A program in the context of this course is going to be a structured object that exhibit some behavior when executed.

                        Program

                        A program is structured object pL, from a language L, with a step function from state to state:

                        𝚜𝚝𝚎𝚙:𝐒𝐭𝐚𝐭𝐞𝐒𝐭𝐚𝐭𝐞

                        When executing a program, we often want to run it until it changes the state. This is called a fixpoint, or running the program to completion.

                        What is Program Analysis? §1.2

                        Program analysis is the art of extracting facts about the structure or the possible behavior from a program. These facts could be does the program eventually come to a halt, is the program well-formatted, or will my program exhibit a bug. In this course, we are going to investigate multiple approaches to answer these questions ranging from manual to automatic, from syntactic to semantic, and from dynamic to static.

                        In summary, program analysis is:

                        Using automatic techniques to figure out facts about a computer program.

                        For simple languages, it is relatively easy to figure out what they do. For example, 1 + 2 will always execute to 3. But, program analysis, in general, is extremely hard.

                        A Turing-Complete Mess §1.3

                        We are going to focus most our energy on programs from languages that are Turing complete. Turing complete languages are languages in which you can write any program executable on a machine. This class of programs is also what you are mostly familiar with. Java, C, Python, etc ., are all Turing complete languages. Even Brainfuck and PowerPoint are Turing complete.

                        Using a Turing complete language is a double edge sword: while it is nice that it is powerful enough to do everything, it is also powerful enough to:

                        • fire the missiles,

                        • not fire the missiles,

                        • never terminate,

                        • kill grandma, and worst of all

                        • throw a null pointer exception!

                        So it would be nice be warned if any of these things might happen. However, that turns out to be very hard.

                        The Halting problem §1.4

                        Consider the base of all analysis problems, the halting problem:

                        The halting problem

                        Given any program pL from the language L, decide if it is going to terminate (halt) when executed on a state s.

                        This problem turns out to be impossible to solve correctly for all programs of a Turing complete language. It is in fact undecidable. While for some programs it is easy to say with confidence that it does terminate. We can't build an mechanical procedure which can do this for every program.

                        The argument goes a little like this. Consider you have a mechanical procedure for solving the halting problem, in that case we can encode it as a program:

                        def does_halt(p : Program) -> bool:
                            ...

                        Now we can use this program in another program:

                        def main():
                            while does_halt(main):
                                print("Running")

                        This is a weird program: We can see that if main halts, it runs forever, and if it runs forever it will eventually halt. This is of course impossible, which strongly suggest that does_halt cannot exist.

                        General Undecidability of Program Analysis §1.5

                        We could hope that the problem that we can't figure out if a program terminates or not is special and does not affect any of the other things we would like to know about the program. But, sadly this is not the case. Almost any interesting thing you would like to know about the behavior of the program can be reduced to the halting problem. This fact is called Rice's theorem.

                        For example, since figuring out the halting problem is impossible, we can't say if the following problem actually fires the nukes:

                        def main():
                            something_that_might_go_forever()
                            fire_the_nukes()

                        Sound or Complete... Choose one. §1.6

                        There is hope, however. In most cases, you either care that something may happen, or that it must happen. In the program from before, we can quite easily say that it may fire the nukes. If we do not want the nukes from not being fired, we can flag this as a bug. However, we might also build a missile luncher, in which case the nukes must be fired when press the red button.

                        These kinds of analyses allows us to error one of the sides. Here we steal some nomenclature from logic. In logic soundness means that every provable statement is true, and completeness means that every true statement is provable.

                        Soundness

                        In a sound system, we can only prove true things. Or if we can prove Φ given Σ (ΣΦ) then Φ is true given Σ (ΣΦ).

                        ΣΦΣΦ

                        Completeness

                        In a complete system, we can prove all true things. Or if Φ is true given Σ (ΣΦ) then Φ is provable given Σ (ΣΦ).

                        ΣΦΣΦ

                        Translated into program analysis jargon, we say an analysis is sound if when it produce a fact, then that is a real fact, and complete if it always can find a fact about the program if it is there. Because the problem we are talking about is undecidable, we can't both be sound and complete.

                        It is also useful to talk about how a program analysis has performed on individual programs or bugs. To do this we use nomenclature from classification.

                        FP,TN,FN vs TP

                        An individual proposition Φ is either a true positive, true negative, false positive, or false negative, following the table below:

                        ΣΦΣ⊭Φ
                        ΣΦTPFP
                        Σ⊬ΦFNTN

                        A sound analysis, therefore, has no false positives, and a complete analysis has no false negatives.

                        A Note: Soundness Confusion §1.6.1

                        One problem with the use of soundness and completeness in program analysis, is that depending of the culture and essentially the underlying question, soundness and completeness can mean the same thing. A sound analysis that reject buggy programs is the same as a complete analysis finds bugs in programs.

                        Program based questions focus on if the program is well formed. Essentially, a sound analysis must only accept correct programs, and a complete analysis accept all correct analysis. This is often used with type-systems and static analyses.

                        Language L Incomplete INCOMPLETE Unsound UNSOUND False Negative False Negative False Positive False Positive

                        Program based questions, soundness and completeness

                        Trace based questions focus on if the program contains an execution that ends in a state which contains a bug. In that case a sound analysis will only find correct traces, and complete analysis will find all traces. This is mostly used in dynamic analyses.

                        Execution Incomplete INCOMPLETE Unsound UNSOUND Danger Zone Analysis False Negative False Negative False Positive False Positive Program Execution

                        Trace based questions, soundness and completeness

                        In case of confusion, use may and must instead.

                        A Relaxed Goal §1.7

                        Actually, it turns out that in most real language settings, it is very hard to write an program static analysis that are sound. So in this course, instead of working with trivial programs, we relax the goal, and instead of requiring our analyses to be sound or complete, we try to aim to produce the best result in the shortest amount of time.

                        [...], virtually all published wholeprogram analyses are unsound and omit conservative handling of common language features when applied to real programming languages.

                        In Defense of Soundiness: A Manifesto

                        The Kinds of Analysis §2

                        When talking about program analyses we break them down according to their properties.

                        Manual vs Automatic Analysis §2.1

                        The difference between manual and automatic analysis, is whether we have a well-defined procedure for analysing the code.

                        In many cases manual inspection is a crucial companion to automatic analysis. If taken to the extreme, we can actually require the user of the tool to prove to us that the code is correct. Now we only have to check the proof, which case we are entering the world of Program Verification.

                        Syntactic vs Semantic Analysis §2.2

                        The difference between a syntactic and a semantic analysis is whether we focus on the structure of the program or on the meaning of the program.

                        The structure of the program or syntax is often represented as a tree of nodes as recognized by the parser.

                        In contrast the meaning of the program or semantics is represented as a set of all possible traces of a program. Here trace means a sequence of states and operations possible by the program.

                        Dynamic vs Static Analysis §2.3

                        Finally we can differentiate between dynamic and static analysis. A dynamic analysis interpolates the meaning of the program from a single trace, where a static analysis tries to predict all possible behaviors.

                        Dynamic analysis are often just executing the programs, and then reporting any behavior it exhibits. An dynamic analysis often have proof of the bad behavior. A dynamic analysis is sound if every behavior it finds is a real behavior, and complete if can find all behaviors.

                        Static analyses in contrast consider the entire programs, and then reports if the program is without bugs or problems. When a good static analysis says your program is good, it probably is, however, when it finds a potential bug, it can often not prove it to you. A static analysis is sound if every program it flags do exhibit some behavior, and complete if it flags all programs that contain the behavior.

                        It is sometimes a great idea to do a mix of a dynamic and static analysis, in which case we call it a hybrid analysis.

                        Getting Started §3

                        The goal of this course is to be a practical introduction to program analysis. We are therefore sometimes going to skip some of the theory to make more room for implementation. However, we'll try to reference relevant resources when necessary.

                        Throughout this course you'll find activities which we suggest you'll do to get started with the over arching problem:

                        The first activity

                        This is a sample activity, you don't have to do anything for this.

                        The Game §3.1

                        In this course, we are going to build the best analysis that we can for the JPAMB suite. The goal of the game is to write analysis which can score the highest, and do that in the shortest amount of time.

                        Download the benchmark suite

                        Download the benchmark suite:

                        $ git clone https://github.com/kalhauge/jpamb.git

                        and read the README file.

                        Create your first analysis!

                        Here is an example of a program that thinks that it is highly probable that all cases contain an assertion error.

                        import jpamb
                        
                        methodid = jpamb.getmethodid("test", "0.0", 
                            "The Rice Theorem Cookers", [], 
                            for_science=True)
                        
                        print("assertion error;80%")

                        To see that it performs correctly, save it under test.py, and then you can test it on the simple cases, like so:

                        uvx jpamb test python test.py
                        

                        Evaluate your analysis

                        $ uvx jpamb evaluate python ./guess.py > result.json

                        Finally, you should submit it to learn.

                        Until next time!

                        Until next time, write the best analysis you can and upload the results to Learn, and ponder the following:

                        1. Is program analysis easy?

                        2. In what circumstances would you want a May vs a Must analysis?

                        3. What does it mean that an analysis is sound?