How to Write a Good Paper

Remember the Reader!

Christian Gram Kalhauge

In this chapter, we are going to cover how to write a good paper for this course. The material presented here is mostly summaries by the great talks by Simon Peyton Jones and Derick Dreyer.

This chapter is separated in two parts, general advice about how to write coherent and then specific advice about how to write a paper.

1. How to Write Coherent

The most important thing you can do when writing a paper is to write clearly and coherent. Most readers are pressed for time, and reading your paper is rarely a priority. It is therefore important that you do not waste the interest of the reader by writing convoluted or incoherent.

1.1. One Point per Paragraph.

If you want to write text that is easy to read, make sure that every paragraph have exactly one main point. This point should be expressed like a sentence in the beginning of the paragraph. The rest of the paragraph should be supporting evidence for that point. If you write like this, readers are much more likely to get your point, even when skimming the paper.

1.2. Flow and Old-to-new

When writing a paragraph, you should make sure that every sentence fits into the structure paragraph. We call this flow. Flow is broken when a sentence introduces new information at the start of the sentence, so that is seems unrelated to the sentences that comes before. A good sentence will start with old information, and end with new information.

Don't. Program analysis is one of the most fun disciplines in computer science. After taking this course, we hope that the students will feel that as well. Concolic execution is a great example of why people should know about program analysis. Sadly, this is not well known.

Do. Program analysis is one of the most fun disciplines in computer science. After taking this course, we hope that the students will feel that as well. One example of a program analysis that might entice students to take and enjoy the course, is Concolic execution. Sadly, is concolic execution not well known.

1.3. Active Voice

Active voice is when every action in a sentence is done by a subject, while Passive voice is when it is not. Way too many people try to sound smart or academic by writing in a passive voice. Do not do that. Instead you should write in active voice as it is much easier to read. When you did something, use we.

Don't. A through investigation of the causes of the inaccuracies in the analysis was done, and no clear cause were found.

Do. We did a through investigation of the inaccuracies in the analysis, and we found no clear cause.

Don't. Contained in this paper, are many truths!

Do. This paper contains many truths!

1.4. Read aloud!

A very useful tool for figuring out if your writing is coherent, try to read it aloud. You will quickly find out if your sentences make sense.

1.5. (Extra) Orwell's 6 Rules

www.openculture.com/2016/05/george-orwells-six-rules-for-writing-clear-and-tight-prose.html

Also George Orwell, have these 6 rules about writing good prose.

Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
Never use a long word where a short one will do.
If it is possible to cut a word out, always cut it out.
Never use the passive where you can use the active.
Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
Break any of these rules sooner than say anything outright barbarous.

2. How to Structure Your Paper

In general, your paper should be structured something like this.

Title (1 sentence, 10.000 readers)
Abstract (1-2 pargraphs, 1.000 readers)
Introduction (1-2 pages, 100 readers)
Key Idea (2-3 pages, 50 readers)
Technical Meat (3-5 pages, 10 readers)
Related Work (1-2 pages, 100 readers)
Conclusion (0-1/2 pages, 100 readers)

Some of the parts above cover multiple sections, and these sections might be named something different, however, the structure is often the same.

2.1. Title

The title is the most important part of the paper as it is what invites people to read your paper. Try to use keywords that people recognize.

Bad examples:

Project 10 - Group 10 (Waste of space, your group name is in the author list.)
Program Analysis (Does not tell the reader anything).
Testcase Prediction (Ambigous and the method is not clear).

Good examples:

Test Case Selection using Static Call-Graph Analysis.
Testing Static Analysis using LLM-Synthesized Code.
Confirming Static Analysis Results using Directed Dynamic Analysis.

2.2. Abstract and Introduction

The abstract and introduction are the most important parts of the paper.I say that after reading the paper, I should have a good Idea what Grade you would like, the rest of the paper is about proving that you deserve it. Therefore, I recommend writing the introduction before writing the rest of the paper or even finishing the proofs.

You should use the CGI model (Context - Gap - Innovation), when writing your abstract and introduction. First you state the context, which is what the reader needs to know to be motivated about the problem. Then you explain the gap, which is why current approaches do not solve the problem completly. Finally, you explain the innovation, which is your key idea and how you set yourself apart from the rest including how you perform on the problem.

In practice, your structure would be something like this:

(Context) The regression test-suites of big software projects can contain thousands of tests. Running all of these tests can take multiple days. Selecting which tests to run is therefore crucial to maintain developer productivity. (Gap) Currently, developers will manually select which tests to run after a change. This is inefficient and error-prone.

(Innovation) In this paper, we use syntactic static analysis and information logged in an initial run of the test-suite, to predict which test might have changed. Our key insight is ... In a case study, we ran our approach on Maven, which has 8323 tests. Over the last 300 commits we were able to correctly predict 98 % of the changed tests, and were able to on average save 23.2 min of test time per commit.

The introduction is essentially an expanded of the abstract, but there are some notable differences. Firstly, you can, and should, make references to prior work (but not all, that's what the related work section is for). Secondly, you should use the extra space for a small motivating example. This will anchor your reader, so they concretely know what you want to do. Thirdly, I like having the first paragraph after In this paper, to contain a single highlighted sentence that encapsulates your research question or hypothesis. This single sentence, has the goal to crystallize for the reader exactly what you want to do. Finally, I like having a bullet list of contributions with forward reference in the end. The goal of this section is to make it very clear to the reader what you did and how to find it in your paper.

Good examples (but all without a research question or hypothesis):

2.3. Key Idea

In these sections, you should illustrate (a possible limited version) of your approach, ideally given one or more examples. Write this section so that people with limited knowledge of your field can read it.

Good examples:

Section 2 and 3 in Binary reduction of dependency graphs

2.4. Technical Meat

In these sections, you should present the technical meat of your paper. This is what people will read to really understand what you have done. It should contain a theory, implementation, and evaluation part.

2.4.1. Theory

Here you argue that you approach is sound using math. Depending on what you have done this can be more or less rigorous.

2.4.2. Implementation

Here you explain how you have taken your theoretical idea and implemented it. Implementation is often uninteresting to the reader, so do not spend more than a paragraph or two on this. You should focus on how your implementation differs from the theoretical solution you lined out. Hopefully, these differences are relatively small.

2.4.3. Evaluation

The evaluation is the reason for the implementation. Here, you should show that your new technique is better on your problem.

I like to start the evaluation section with a summary of the evaluation, and a list of sub-research questions, which all tries to answer part of the main research question.

In this section, we answer our main research question:

Is static analysis better, in terms of speed and accuracy, than dynamic at detecting information flow bugs?

To evaluate this, we choose a benchmark of real java programs with information flow problems (see Section 6.1), and implemented both a static and dynamic version of the information flow algorithm (see Section 6.2). We then asked two questions:

How fast are the analysis? We ran both algorithms on the benchmark suite, and found that the dynamic analysis was 2.01x faster that the static analysis if limited to 10 inputs, however if we ran the dynamic analysis until the precision matched that of the static analysis it was 5x slower (see Section 6.3).
Can we tune the dynamic analysis? Inspired by the last experiment, we considered that the dynamic analysis could be tuned to give arbitrary performance ...

While our experiments are limited by the scope (see Section 6.5), we conclude that dynamic analysis is faster, but also more imprecise than static analysis, in detecting information flow bugs.

The rest of the evaluation is filling out those sections with the relevant information. Things that are important to consider. Did you,

answer your research question (if not change it)?
analyse your results statically (does your result vary)?
think of threats to validity (otherwise reduce your scope)?

Always include a section about the possible problems of your experiment, called threats to validity. There exist two kinds internal and external validity. Internal validity is about measuring errors in your experiment that might have caused your analysis to be wrong. External validity is that the scope of your experiment limits the inferences you can make.

Internal: We tried to keep our system under test as quiet as possible, but laptops are inherently noisy. We have corrected for this by taking multiple measurements over time.
External: Our experiment was ran on a small set of Java benchmarks. While we tried to choose a representing set, our results might not generalize to other languages or specific use cases of Java.
External: The implementations were both done by us, and while we have done our best to implement both analyses performant and fairly, better implementations might exist that invalidate our results.

2.5. Related Work

In the related work, you describe how your work is different from the prior work. Do not just list related work, but spend the time explaining why you feel like this related work does not solve your problem.

You can compare with academic papers (use Google Scholar to do searches), known tools, or just techniques described during the course.

Don't. There exist other techniques that also solve our problem, like fuzz testing and symbolic execution.

Do. We could also have used fuzz testing to solve our problem, however it does have two mayor drawbacks compared to our technique. Firstly, it requires a ... The same limitations apply to symbolic execution.

2.6. Conclusion

A recap of your project.

3. Get writing!

Writing is Thinking!

Next Time, have Introduction + Abstract ready for your feedback group to give feedback on.

It's okay to make-up results, like performance numbers. Try to choose the once you think are realistic so that you can see if your feedback group finds the numbers interesting.