Let's give meaning to it all.
Christian Gram Kalhauge
Is analysing bytecode a semantic analysis?
Name some ways can you describe the semantics of a program?
What does it mean to interpret a piece of code?
Gentzen-style proofs
It's Meaning
Meaning by specification
The flowchart from the original paper on semantics Assigning Meaning to Programs
by Robert W. Floyd. The program computes the sum of an array.
Hoare Triplet
Example: Composition
Meaning by mapping
x + 5
Meaning by execution
But there is more...
A Transition system is a triplet where is the set of program states, is the transition relation (defined by the single step semantics) and are possible initial states.
$ javap -cp target/classes -c jpamb.cases.Simple
.methods[] | select(.name=="assertFalse") | .code.bytecode
[
{ "field": { "class": "jpamb/cases/Simple", "name": "$assertionsDisabled", "type": "boolean" }, "offset": 0, "opr": "get", "static": true },
{ "condition": "ne", "offset": 3, "opr": "ifz", "target": 6 },
{ "class": "java/lang/AssertionError", "offset": 6, "opr": "new" },
{ "offset": 9, "opr": "dup", "words": 1 },
{ "access": "special", "method": { "args": [], "is_interface": false, "name": "<init>", "ref": { "kind": "class", "name": "java/lang/AssertionError" }, "returns": null }, "offset": 10, "opr": "invoke" },
{ "offset": 13, "opr": "throw" },
{ "offset": 14, "opr": "return", "type": null }
]
in# | opr | stack | description |
---|---|---|---|
00 | get | [] | Get the assertionsDisabled boolean and put it on the stack |
01 | ifz | [bool] | if it is not equal to zero (false) jump to the 6th instruction (i.e. return) |
02 | new | [] | otherwise create a new AssertionError object. |
03 | dup | [ref] | dublicate the reference |
04 | invoke | [ref, ref] | call the init method on the AssertionErrror (consuming the top ref erence). |
05 | throw | [ref] | throw the assertion error. |
06 | return | [] | otherwise return. |
$ uv run jpamb inspect "jpamb.cases.Simple.assertBoolean:(Z)V"
--format=...
Look at the solutions/bytecoder.py
Follow along in the solutions/interpreter.py
def step(s : State) -> State | str:
...
@dataclass
class PC:
method: jvm.AbsMethodID
offset: int
def __iadd__(self, delta):
self.offset += delta
return self
def __add__(self, delta):
return PC(self.method, self.offset + delta)
def __str__(self):
return f"{self.method}:{self.offset}"
@dataclass
class Bytecode:
suite: jpamb.Suite
methods: dict[jvm.AbsMethodID, list[jvm.Opcode]]
def __getitem__(self, pc: PC) -> jvm.Opcode:
try:
opcodes = self.methods[pc.method]
except KeyError:
opcodes = list(self.suite.method_opcodes(pc.method))
self.methods[pc.method] = opcodes
return opcodes[pc.offset]
@dataclass
class Frame:
locals: dict[int, jvm.Value]
stack: Stack[jvm.Value]
pc: PC
def __str__(self):
locals = ", ".join(f"{k}:{v}" for k, v in self.locals.items())
return f"<{{{locals}}}, {self.stack}, {self.pc}>"
def step(state: State) -> State | str:
frame = state.frames.peek() # Get the current frame
match bc[frame.pc]:
case ...
case jvm.Push(value=v):
frame.stack.push(v)
frame.pc += 1
return state
case jvm.Load(type=jvm.Int(), index=n):
v = frame.locals[n]
assert v.type is jvm.Int()
frame.stack.push(v)
frame.pc += 1
return state
@dataclass
class State:
heap: dict[int, jvm.Value]
frames: Stack[Frame]
or
In Python
case jvm.Return(type=jvm.Int()):
v1 = frame.stack.pop()
state.frames.pop()
if state.frames:
frame = state.frames.peek()
frame.stack.push(v1)
frame.pc += 1
return state
else:
return "ok"
In Python
case jvm.Binary(type=jvm.Int(), operant=jvm.BinaryOpr.Div):
v2, v1 = frame.stack.pop(), frame.stack.pop()
assert v1.type is jvm.Int(), f"expected int, but got {v1}"
assert v2.type is jvm.Int(), f"expected int, but got {v2}"
if v2 == 0:
return "divide by zero"
frame.stack.push(jvm.Value.int(v1.value // v2.value))
frame.pc += 1
return state
It's now up to you!
jvm2json/CODEC.txt at kalhauge/jvm2json: the decompiled codec (search for <ByteCodeInst>
).
Chapter 4. The class File Format: the class file format. And, finally
Chapter 6. The Java Virtual Machine Instruction Set: the official specification of each instruction.
Write an interpreter
$ uv run interpreter.py "jpamb.cases.Simple.assertInteger:(I)V" "(1)"
... alot ...
... of intermediate ...
... results ...
ok
Write an interpreter
$ uv run jpamb interpret -W --filter Simple interpreter.py
Start small, one method at a time. You don't have to cover the entire language. Also you don't need a method stack or a heap for most of the cases โ do them later.
It's okay to hack some things, like getting
the $assertionsDisabled
static field. You can assume that is always be false
.
Print out the state to stderr at every step, this will help you debug.
Look at solutions/interpreter.py
for inspiration, or extend it.
Use the --stepwise
flag to run the last failed case next time.
Is analysing bytecode a semantic analysis?
Name some ways can you describe the semantics of a program?
What does it mean to interpret a piece of code?