Project

Christian Gram Kalhauge

Table of Contents

Choosing a Project§1
1. Proposal§1.1
2. Project Ideas§1.2
Assessment§2
Technical Contributions§3

Here is the definitive guide to the project and the assessment.

Choosing a Project §1

The first thing you should do is find an interesting topic. The topic should be centered around program analysis, and should demonstrate your understanding of the course. The normal case is to choose one more benchmark cases which you want to explore building a faster or more precise analysis for.

It is possible to choose your own custom project. Talk to the teacher, if you are interested.

Proposal §1.1

Before starting on the project, you have to hand in a project proposal. The project proposal is ~2 pages and correspond roughly to the introduction of your final paper. It should formatted using the ACM small format (see template), and will be evaluated according to the CGIE model, and graded 1-4:

Context: what is the problem?
Gap: why does the current techniques not work?
Innovation: what is your proposed solution?
Evaluation: when have you solved the problem?
Plan: what is the plan?
Presentation: is the proposal formatted correctly and presented nicely?

As an appendix to the proposal, please also submit a table of Technical Contributions (§3), as described in Project Overview Sheet (§2.2). Remember to fill in the maximal grade for each student.

You need to score at least 12 points and none of the parameters can be 1. If your proposal are rejected you have to resubmit (even after the deadline). The proposals are graded as you submit them, so you do not have to wait to the last day.

Project Ideas §1.2

As inspiration, here is selection of project ideas. Each project is described using abstracts from the CGIE Model as well as the potential technical contributions of those projects.

Analysis Survey

Developers know they should use a static analysis when developing code. However, there currently exist so many options that it is hard to chose.

In this project, we will investigate the state of the art, by implement runners for (five?) different analyses on the JPAMB suite, to illustrate their differences. We will expand the benchmark suite until we have found a clear difference between each different analyses, so we can discuss their short-comings.

ID	Name	Max
PWP	Write Paper	10
PEX	Find Illustrating Examples	4
PRW	Read and relate to recent papers	10
NAN	Integrate Analyses	15
EXB	Extend Benchmark Suite	5
Sum		44

With a maximum of 44 contribution points the suggested maximal grade is 7 (Groups of 4), 4 (Groups of 5), and 02 (Groups of 6).

Handling String

Strings are one of the basic units of computing. However, currently none of the techniques shown in class handle strings.

In this project, we extend our current analyses to handle strings. To evaluate our implementation we will expand the JPAMB suite to contain cases that show normal use cases of strings.

ID	Name	Max
PWP	Write Paper	10
PEX	Find Illustrating Examples	4
PRW	Read and relate to recent papers	10
IIN	Implement Interpreter	10
ISY	Implement Syntactic Analysis	10
IAI	Implement Abstract Interpreter	10
IAB	Implement Novel Abstractions	10
NDT	Handle Novel Datatype	10
EXB	Extend Benchmark Suite	5
Sum		79

With a maximum of 79 contribution points the suggested maximal grade is 10 (Groups of 4), 7 (Groups of 5), and 7 (Groups of 6).

Program Debloating

When writing software, developers (and now also LLMs) often add code which might be needed later, but never is. Ideally, the computer should be able to remove code which is not needed automatically. Currently, both dynamic and static analysis approaches have been used to remove unused method bodies and classes. However, current work does not remove code inside of functions.

In this project, we will use static and dynamic analysis techniques to estimate possible code-coverage on the source line level, and remove all the code not needed. To evaluate our implementation we will run our debloater on JPAMB suite and show we can remove X% of the code while all cases still can evaluate.

ID	Name	Max
PWP	Write Paper	10
PEX	Find Illustrating Examples	4
PRW	Read and relate to recent papers	10
IIN	Implement Interpreter	10
ISY	Implement Syntactic Analysis	10
IAI	Implement Abstract Interpreter	10
IAB	Implement Novel Abstractions	10
NDT	Handle Novel Datatype	10
EXB	Extend Benchmark Suite	5
NCR	Analysis informed code-rewriting	10
IBA	Implement Unbounded Static Analysis	7
Sum		96

With a maximum of 96 contribution points the suggested maximal grade is 12 (Groups of 4), 10 (Groups of 5), and 10 (Groups of 6).

Statically Improved Fuzzing

Fuzzing is one of the most used analysis techniques in production. But for deep programs it can be hard to direct the fuzzer to all parts of the program. Currently, some dynamic techniques, like coverage and in some cases concolic analysis are used to improve the efficiency. But since these techniques are dynamic, they cannot help us with sections we have missed.

In this project, we will use static analysis techniques to improve fuzzing coverage. We do this by rating branches which changes to get higher coverage higher. To evaluate our implementation we will show that our hybrid fuzzer explores cases in the JPAMB faster than pure fuzzing.

ID	Name	Max
EXB	Extend Benchmark Suite	5
IAB	Implement Novel Abstractions	10
IAI	Implement Abstract Interpreter	10
IIN	Implement Interpreter	10
ICF	Implement Coverage-based Fuzzing	10
ISY	Implement Syntactic Analysis	10
NDT	Handle Novel Datatype	10
NIN	Use Bytecode Instrumentation	10
PEX	Find Illustrating Examples	4
PWP	Write Paper	10
Sum		89

With a maximum of 89 contribution points the suggested maximal grade is 12 (Groups of 4), 10 (Groups of 5), and 7 (Groups of 6).

Assessment §2

Your final grade will be an overall assessment based on:

the Written Answers (§2.1), handed in by your feedback group
the Paper (§2.3), handed in by your project group
the individualized Project Overview Sheet (§2.2), handed in by the project groups.
and the Video Presentation (§2.4) and Oral Defence (§2.5), done in the project groups.

The work will be evaluated according to the learning objectives. Special weight will be put on that you are using the content of the course (indicated by the technical contributions), that the project is well done (easy to read, use of techniques are warranted by the problem, and claims are supported by evaluation), and finally (for extra points) if the project problem is interesting and the solution is innovative.

Written Answers §2.1

In the end of the course, your feedback group must hand in a single PDF, with written answers to of all the Questions (§2.5) asked in class. The answers should be around a paragraph. The answers will be graded on correctness, and account for no more than 10% of the final grade.

Project Overview Sheet §2.2

Technical contributions is an advisory grade, which goal is to set the expectations of the course. At the exam, your grade might be higher or lower than your calculated grade, depending on the outcome of project as a whole.

Additionally to the Paper (§2.3), your project group also must hand in a single A4 page, containing the name, student number, the feedback group and the images of each of the students in the group. Finally you should also create a table summarizing the technical contributions of each of the group members.

The table, should have a row for each technical contribution. They should contain the contribution, the points it is worth, and how the points should be distributed between the students.

Contribution	Points	sXXXXXX	sYYYYYY	sZZZZZZ	...
IIN Implement Interpreter	10	4	3	3	-
IAI Implement Abstract Interpreter	10	0	0	10	-
NAN Integrate Analyses (3)	15	8	4	7	-
EXB Extend Benchmark Suite (10)	5	2	1	0	-
-	-	-	-	-	-
Total	-	14	8	21	-
Suggested Maximal Grade	-	7	4	12	-

Finally, the numbers are tallied up and used to calculate a maximal grade, each individual student can get. The grades are calculated from this table (subject to change):

Grade	Point	Diversity
`12`	$\geq 20$	Across 5 contributions
`10`	$\geq 15$	Across 4 contributions
`7`	$\geq 10$	Across 3 contributions
`4`	$\geq 7$	Across 3 contributions
`02`	$\geq 5$	Across 3 contributions

Paper §2.3

The paper should be at most 10 pages, excluding references. It must be formatted using the ACM format in acmsmall mode. To be sure to adhere to the format.

Before you continue writing the paper, consider the advice in this lecture, by Simon Peyton Jones.

The final paper should contain the following sections (approximate page numbers in parentheses):

An abstract, describing the paper.
An introduction (1-2 pages), with
- a good motivation and description of the problem you are trying to solve (consider using examples),
- the problem stated as a research question,
- a short description of your solution, and
- a list of contributions with forward references, described in the lecture by Simon P. Jones.
An example illustrating your approach (1-2 pages).
An approach and theory section: high level description of the technique used to analyze (2-3 pages), with
- a formal description of the approach;
- a walkthrough of the theory needed to understand the technique; and
- any theoretical guarantees of the approach.
An evaluation section, describing the evaluation and the results (2-3 pages), with
- an empirical evaluation of the approach,
- compared against at least one other technique (or variations (optimizations) of your technique),
- an analysis of the results, and
- a discussion of the threats to validity.
A related work section, discussion of the technique, and alternatives techniques to solve the same problem (1 - 2 pages), with
- an explanation why other techniques from the lectures (and potentially the literature) were not used.
A conclusion (0 - 1/2 page), which is
- A short recap of the paper.

Video Presentation §2.4

The week of the exam (see LEARN) your group has to hand in a video presentation of your project. The video should be formatted like a normal oral presentation. You can (and probably should) use slides, which you record using a screen capture application (fx OBS).

Please include examples and diagrams which we can use to guide the oral exam. If you are using slides, please you slide numbers. That way we can easily navigate your video during the exam.

It can be no longer than 5 min.

Oral Defence §2.5

Finally, the projects are defended orally in a group exam.

The presentation plan is still being worked out. But expect ~30 minutes per group starting with the 5 minutes video presentation, and ending in individual questioning

The questions are to your technical contributions in the project as well as the course material in general, especially the Questions (§2.5).

Technical Contributions §3

This list is incomplete. Please send suggestions on email to get it expanded or revised.

Here is an table of technical contributions. Use the formula to calculate the number of points gained. The max points is the maximal points the group can get per contribution.

Implementation §3.1

IIN Implement Interpreter (Max: 10)

IAI Implement Abstract Interpreter (Max: 10)

ISY Implement Syntactic Analysis (Max: 10)

Formula: 5 per analysis

For example look through the bytecode for values to use in a fuzzer. Or quickly detect if there is assertions in a method.

ICF Implement Coverage-based Fuzzing (Max: 10)

IAB Implement Novel Abstractions (Max: 10)

Formula: 5 per abstraction

Implement new abstractions not covered in class. This could be machine word abstractions or polyhedras.

ISE Implement Symbolic Execution (Max: 10)

IBA Implement Unbounded Static Analysis (Max: 7)

Run your static analysis until a fixed point.

Engineering §3.2

NAB Integrate Abstractions (Max: 10)

Formula: 5 per abstraction after the first

Run two or more abstractions at the same time, letting them inform each other during execution.

NAN Integrate Analyses (Max: 15)

Formula: 5 per analysis after the first

Use the result of one analysis to improve another. Running them in parallel does not count. You do not have to have written any of the analyses.

NIN Use Bytecode Instrumentation (Max: 10)

Instrument the JVM bytcode, and make the JVM run the dynamic analysis.

NCR Analysis informed code-rewriting (Max: 10)

Using information from an analysis to automatically rewrite code.

NTS Analysis informed test synthesis (Max: 10)

Using information from an analysis to automatically build programs which can be used for testing the code.

The programs should be more complex than a simple call to a function.

NDT Handle Novel Datatype (Max: 10)

Formula: 5 per datatype

Handle datatypes like strings or doubles, which are currently not covered by the material.

Evaluation §3.3

EXB Extend Benchmark Suite (Max: 5)

Formula: 0.5 per method added

Extend the JPAMB benchmark suite to contain new methods. Pull requests are appreciated but not required.

ERC Make Analysis run on Real Code (Max: 10)

Formula: 1 per 10k lines handled

EOP Optimize the code (Max: 10)

Formula: 5 per order of magnitude faster

ENJ Support a Different Language than Java (Max: 10)

ERW Compare with existing tools. (Max: 10)

Formula: 2.5 per tool

Compare existing static analysis as part of your evaluation.

Theory §3.4

TAB Prove abstractions correct (Max: 10)

Formula: 5 per abstraction

TWI Prove termination for widening operators (Max: 10)

Formula: 5 per abstraction

Presentation §3.5

PWP Write Paper (Max: 10)

Formula: 1 per page

PEX Find Illustrating Examples (Max: 4)

Formula: 2 per example

PDI Create a Process Diagram (Max: 3)

Every tool paper deserves a diagram illustrating how the tool works.

PRW Read and relate to recent papers (Max: 10)

Formula: 1 per paper

PMG Project Management (Max: 9)

Formula: 3 per person over 3

Managing the work in a larger group is hard. This score represents the work needed to make the work flow nicely.