A poor mans analysis
Christian Gram Kalhauge
What are the limitations of a syntactic analysis?
When is syntactic analysis a good tool?
Trick Question: Is Java a regular language, context free or recursive enumerable?
Pattern Matching on the structure of the program.
The levels of syntactic matches can range from matching the bits, to matching the final compiled bytecode.
Preventing AND and OR
fn (int a, b, c) {
return a && b || c; # Might be a mistake
}
fn (int a, b, c) {
return (a && b) || c; # Ah, never mind.
}
No bad warnings (false negatives).
Complete is better than Sound.
Program analysis design should aim for a false-positive rate no higher than 15–20%.
— Christakis and Bird
When forced to choose between more bugs or fewer false positives, [developers] typically choose the latter.
— Christakis and Bird
The set of programs matched by a pattern is called a Language.
A language over the alphabet , is a subset of all words from the alphabet .
Sound:
Complete:
A grammar is a quadruple , where is a set of non-terminals (, , ), is the alphabet, is a set of production rules, and is the start symbol.
The language of the grammar is:
Where means applying the rules in until no non-terminal is left.
Consider the grammar , where is defined like this:
It can generate the following word in the language
Some languages actually work like this, think about LaTeX and the C macro system.
Chomsky hierarchy
The Chomsky hierarchy, with production rules.
XKCD, source: https://xkcd.com/1171
The grammar for defining regular expressions.
where is the smallest solution to the equation.
The denotational semantics for regular expressions
Write a Grammar in Backus Naur form.
Parse the unstructured text into an CST.
...
Profit
Tree-Sitter
(§3.2.1)Lambda Calculus
{
rules: {
source_file: $ => $._exp,
_exp: $ => choice($.abs, $.app, $.var, $._parens),
abs: $ => prec.right(seq('λ', $.var, '.', $._exp)),
app: $ => prec.left(seq($._exp, $._exp)),
var: $ => /[a-zA-Z]+/,
_parens: $ => choice("(", $._exp, ")"),
}
}
public class Simple {
public static int assertFalse() {
assert False;
}
public static int divideByZero() {
return 1 / 0;
}
}
Pre- (red), In- (green), and Post-order (blue) traversal of tree. source
def traverse(item: ts.Tree) -> Iterator[ts.Node]:
cursor = item.walk()
godeeper = True
while True:
node = cursor.node
if godeeper:
yield node # Pre or Post
if not cursor.goto_first_child():
godeeper = False
elif cursor.goto_next_sibling():
godeeper = True
elif cursor.goto_parent():
yield node # Pre or Post
godeeper = False
else:
break
Linters (SpotBugs and EsLint)
No good system, yet...
LLMs to the rescue