Semantics

Let's give meaning to it all.

Christian Gram Kalhauge

Table of Contents

  1. Preliminaries: Natural Deduction
  2. What are Semantics?
  3. Java Bytecode
  4. What are the Semantics of the JVM?

Questions

  1. Is analysing bytecode a semantic analysis?

  2. Name some ways can you describe the semantics of a program?

  3. What does it mean to interpret a piece of code?

Preliminaries: Natural Deduction (ยง1)

Gentzen-style proofs

๐‘๐‘Ÿ๐‘’๐‘š๐‘–๐‘ 1โ€ฆ๐‘๐‘Ÿ๐‘’๐‘š๐‘–๐‘ n๐‘๐‘œ๐‘›๐‘๐‘™๐‘ข๐‘ ๐‘–๐‘œ๐‘›(name)
ABAโˆงB(โˆง)AAโˆจB(โˆจL)BAโˆจB(โˆจR)

What are Semantics? (ยง2)

  1. Axiomatic Semantics
  2. Denotational Semantics
  3. Operational Semantics
  4. Transition System and Traces

It's Meaning

Axiomatic Semantics (ยง2.1)

Meaning by specification

The flowchart from the original paper on semantics Assigning Meaning to Programs by Robert W. Floyd. The program computes the sum of an array.

Hoare Triplet

{P}C{Q}

Example: Composition

{P1}C1{Q1}{P2}C2{Q2}Q1โ‡’P2{P1}C1;C2{Q2}

Denotational Semantics (ยง2.2)

Meaning by mapping

A simple expression language

eโˆˆ๐”ผ๐•:=e1+e2|x|n

Mapped to Math

โ„ฐ:๐”ผ๐•โ†’(ฮฃโ†’โ„•)
โ„ฐโŸฆ๐š—โŸงฯƒ=toNat(โŸฆ๐š—โŸง)โ„ฐโŸฆ๐šกโŸงฯƒ=lookup(โŸฆ๐šกโŸง,ฯƒ)โ„ฐโŸฆ๐šŽ๐Ÿท+๐šŽ๐ŸธโŸงฯƒ=โ„ฐโŸฆ๐šŽ๐ŸทโŸงฯƒ+โ„ฐโŸฆ๐šŽ๐ŸธโŸงฯƒ

Example: x + 5

โ„ฐโŸฆ๐šก+๐ŸปโŸงฯƒ=โ„ฐโŸฆ๐šกโŸงฯƒ+โ„ฐโŸฆ๐ŸปโŸงฯƒ=lookup(โŸฆ๐šกโŸง,ฯƒ)+โ„ฐโŸฆ๐ŸปโŸงฯƒ=3+โ„ฐโŸฆ๐ŸปโŸงฯƒ=3+toNat(โŸฆ๐ŸปโŸง)=3+5=8

Operational Semantics (ยง2.3)

Meaning by execution

Natural or Big-Step

ฯˆโŠขฯƒโ†“v

Structural (SOS) or Small-Step

ฯˆโŠขฯƒโ†’ฯƒโ€พ

Beta Reduction in NOS

ฯˆโŠขe1โ†“ฮปx.e3ฯˆโŠขe2โ†“v2ฯˆ[xโ†ฆv2]โŠขe3โ†“vฯˆโŠขe1e2โ†“v(โ†“ฮฒ)

Beta Reduction in SOS

ฯˆโŠขe1โ†’eโ€พ1ฯˆโŠขe1e2โ†’eโ€พ1e2(ฮฒ1)ฯˆโŠขe2โ†’eโ€พ2ฯˆโŠข(ฯˆ1|ฮปx.e1)e2โ†’(ฯˆ1|ฮปx.e1)eโ€พ2(ฮฒ2)
ฯˆโŠข(ฯˆ1|ฮปx.e1)vโ†’(ฯˆ1[xโ†ฆv]|e1)(ฮฒ3)

But there is more...

ฯˆโ‹…ฯˆ1โŠขe1โ†’eโ€พ1ฯˆโŠข(ฯˆ1|e1)โ†’(ฯˆ1|e1โ€พ)(ฯˆ1)ฯˆโ‹…ฯˆ1โŠขe1โ†’vฯˆโŠข(ฯˆ1|e1)โ†’v(ฯˆ2)

SOS is more expressive than NOS

ฯˆโŠขฯƒโ†’ฯƒโ€พฯˆโŠขฯƒโ€พโ†“vฯˆโŠขฯƒโ†“v(step)ฯˆโŠขฯƒโ†’vฯˆโŠขฯƒโ†“v(done)

Transition System and Traces (ยง2.4)

A Transition System

A Transition system is a triplet โŸจ๐’๐ญ๐š๐ญ๐žP,ฮดP,IPโŸฉ where ๐’๐ญ๐š๐ญ๐žP is the set of program states, ฮดP is the transition relation (defined by the single step semantics) and IP are possible initial states.

๐“๐ซ๐š๐œ๐žP=๐’๐ญ๐š๐ญ๐žPโ‹†
Sem:๐๐ซ๐จ๐ ๐ซ๐š๐ฆโ†’2๐“๐ซ๐š๐œ๐žSem(P)={ฯ„โˆˆ๐’๐ญ๐š๐ญ๐žPnย |ย nโˆˆ[1,โˆž],ฯ„0โˆˆIP,โˆ€iโˆˆ[1,nโˆ’1],ฮดP(ฯ„iโˆ’1,ฯ„i)}

Example: The language of halt

โ„’halt={Pย |ย Pโˆˆโ„’,โˆ€ฯ„โˆˆSem(P).|ฯ„|โ‰ โˆž}

Java Bytecode (ยง3)

$ javap -cp target/classes -c jpamb.cases.Simple

.methods[] | select(.name=="assertFalse") | .code.bytecode

[ 
{ "field": { "class": "jpamb/cases/Simple", "name": "$assertionsDisabled", "type": "boolean" }, "offset": 0, "opr": "get", "static": true },
{ "condition": "ne", "offset": 3, "opr": "ifz", "target": 6 },
{ "class": "java/lang/AssertionError", "offset": 6, "opr": "new" },
{ "offset": 9, "opr": "dup", "words": 1 },
{ "access": "special", "method": { "args": [], "is_interface": false, "name": "<init>", "ref": { "kind": "class", "name": "java/lang/AssertionError" }, "returns": null }, "offset": 10, "opr": "invoke" },
{ "offset": 13, "opr": "throw" },
{ "offset": 14, "opr": "return", "type": null }
]
in#oprstackdescription
00get[]Get the assertionsDisabled boolean and put it on the stack
01ifz[bool]if it is not equal to zero (false) jump to the 6th instruction (i.e. return)
02new[]otherwise create a new AssertionError object.
03dup[ref]dublicate the reference
04invoke[ref, ref]call the init method on the AssertionErrror (consuming the top ref erence).
05throw[ref]throw the assertion error.
06return[]otherwise return.
$ uv run jpamb inspect "jpamb.cases.Simple.assertBoolean:(Z)V"

--format=...

Building a Bytecode Syntactic Analysis (ยง3.1)

Look at the solutions/bytecoder.py

What are the Semantics of the JVM? (ยง4)

Follow along in the solutions/interpreter.py

def step(s : State) -> State | str:
    ...

The Context and the Program Counter (ยง4.1)

๐š‹๐šŒ[ฮน]

ฮน=โŸจฮนm,ฮนoโŸฉฮน+n=โŸจฮนm,ฮนo+nโŸฉn/ฮน=โŸจฮนm,nโŸฉ
@dataclass
class PC:
    method: jvm.AbsMethodID
    offset: int

    def __iadd__(self, delta):
        self.offset += delta
        return self

    def __add__(self, delta):
        return PC(self.method, self.offset + delta)

    def __str__(self):
        return f"{self.method}:{self.offset}"


@dataclass
class Bytecode:
    suite: jpamb.Suite
    methods: dict[jvm.AbsMethodID, list[jvm.Opcode]]

    def __getitem__(self, pc: PC) -> jvm.Opcode:
        try:
            opcodes = self.methods[pc.method]
        except KeyError:
            opcodes = list(self.suite.method_opcodes(pc.method))
            self.methods[pc.method] = opcodes

        return opcodes[pc.offset]

The Values, Operator Stack, and Locals. (ยง4.2)

๐•ฯƒ:=(๐š’๐š—๐šn)|(๐š๐š•๐š˜๐šŠ๐šf)|(๐š›๐šŽ๐šr)๐•ฮท:=๐•ฯƒ|(๐š‹๐šข๐š๐šŽb)|(๐šŒ๐š‘๐šŠ๐š›c)|(๐šœ๐š‘๐š˜๐š›๐šs)|(๐šŠ๐š›๐š›๐šŠ๐šขta)|(๐š˜๐š‹๐š“๐šŽ๐šŒ๐šcnfs)

The Operator Stack

  • ฯƒโˆˆ๐•ฯƒ*

  • ฯƒ=ฯต(๐š’๐š—๐š1)(๐š’๐š—๐š2)(๐š’๐š—๐š3)

The Locals

  • ฮปโˆˆโ„•โ†’๐•ฯƒ

  • ฮป[10]

A Frame

โŸจฮป,ฯƒ,ฮนโŸฉ
@dataclass
class Frame:
    locals: dict[int, jvm.Value]
    stack: Stack[jvm.Value]
    pc: PC

    def __str__(self):
        locals = ", ".join(f"{k}:{v}" for k, v in self.locals.items())
        return f"<{{{locals}}}, {self.stack}, {self.pc}>"

The Stepping Function (ยง4.3)

๐š‹๐šŒโŠขโŸจฮป,ฯƒ,ฮนโŸฉโ†’โŸจฮปโ€พ,ฯƒโ€พ,ฮนโ€พโŸฉ
def step(state: State) -> State | str:
    frame = state.frames.peek() # Get the current frame
    match bc[frame.pc]:
      case ...

Implementing (๐š™๐šž๐šœ๐š‘:๐™ธ v)

๐š‹๐šŒ[ฮน]=(๐š™๐šž๐šœ๐š‘:๐™ธ v)๐š‹๐šŒโŠขโŸจฮป,ฯƒ,ฮนโŸฉโ†’โŸจฮป,ฯƒ(๐š’๐š—๐šv),ฮน+1โŸฉ(pushI)
case jvm.Push(value=v):
    frame.stack.push(v)
    frame.pc += 1
    return state

Implementing (๐š•๐š˜๐šŠ๐š:๐™ธ n)

๐š‹๐šŒ[ฮน]=(๐š•๐š˜๐šŠ๐š:๐™ธ n)(๐š’๐š—๐šv)=ฮป[n]๐š‹๐šŒโŠขโŸจฮป,ฯƒ,ฮนโŸฉโ†’โŸจฮป,ฯƒ(๐š’๐š—๐šv),ฮน+1โŸฉ(loadI)
case jvm.Load(type=jvm.Int(), index=n):
    v = frame.locals[n]
    assert v.type is jvm.Int()
    frame.stack.push(v)
    frame.pc += 1
    return state

The Call Stack, The Heap, and Terminating the Program (ยง4.4)

ฮผโˆผโ€ฆโŸจฮป2,ฯƒ2,ฮน2โŸฉโŸจฮป1,ฯƒ1,ฮน1โŸฉ
ฮทโˆˆโ„•โ†’๐•ฮท
โŸจฮท,ฮผโŸฉโˆˆ๐’๐ญ๐š๐ญ๐ž
@dataclass
class State:
    heap: dict[int, jvm.Value]
    frames: Stack[Frame]

Do you even lift?

๐š‹๐šŒโŠขโŸจฮท,ฮผโŸฉโ†’โŸจฮทโ€พ,ฮผโ€พโŸฉ
๐š‹๐šŒโŠขโŸจฮป,ฯƒ,ฮนโŸฉโ†’โŸจฮปโ€พ,ฯƒโ€พ,ฮนโ€พโŸฉ๐š‹๐šŒโŠขฮผโŸจฮป,ฯƒ,ฮนโŸฉโ†’ฮผโŸจฮปโ€พ,ฯƒโ€พ,ฮนโ€พโŸฉ(liftฮผ)
๐š‹๐šŒโŠขฮผโ†’ฮผโ€พ๐š‹๐šŒโŠขโŸจฮท,ฮผโŸฉโ†’โŸจฮท,ฮผโ€พโŸฉ(liftฮท)

We terminate with...

ok or err(โ€˜๐š›๐šŽ๐šŠ๐šœ๐š˜๐š—โ€™)

Terminating with ok

๐š‹๐šŒ[ฮน]=(๐š›๐šŽ๐š๐šž๐š›๐š—:๐™ธ)๐š‹๐šŒโŠขโŸจฮท,ฯตโŸจฮป,ฯƒ(๐š’๐š—๐šv),ฮนโŸฉโŸฉโ†’ok(returnฯต)
๐š‹๐šŒ[ฮน]=(๐š›๐šŽ๐š๐šž๐š›๐š—:๐™ธ)ฮผ1=โŸจฮป2,ฯƒ2,ฮน2โŸฉฮผ2=โŸจฮป,ฯƒ(๐š’๐š—๐šv),ฮนโŸฉ๐š‹๐šŒโŠขโŸจฮท,ฮผฮผ1ฮผ2โŸฉโ†’โŸจฮท,ฮผโŸจฮป2,ฯƒ2(๐š’๐š—๐šv),ฮน2+1โŸฉโŸฉ(returnฮผ)

In Python

case jvm.Return(type=jvm.Int()):
    v1 = frame.stack.pop()
    state.frames.pop()
    if state.frames:
        frame = state.frames.peek()
        frame.stack.push(v1)
        frame.pc += 1
        return state
    else:
        return "ok"

Terminating with err(โ€˜..โ€™)

๐š‹๐šŒ[ฮน]=(๐š‹๐š’๐š—๐šŠ๐š›๐šข:๐™ธ ๐š๐š’๐šŸ)v2=0๐š‹๐šŒโŠขโŸจฯƒ(๐š’๐š—๐šv1)(๐š’๐š—๐šv2),ฮนโŸฉโ†’err(โ€˜๐š๐š’๐šŸ๐š’๐š๐šŽ ๐š‹๐šข ๐šฃ๐šŽ๐š›๐š˜โ€™)(bdivI0)
๐š‹๐šŒ[ฮน]=(๐š‹๐š’๐š—๐šŠ๐š›๐šข:๐™ธ ๐š๐š’๐šŸ)v2โ‰ 0v3=v1/๐š’๐Ÿน๐Ÿธv2๐š‹๐šŒโŠขโŸจฯƒ(๐š’๐š—๐šv1)(๐š’๐š—๐šv2),ฮนโŸฉโ†’โŸจฯƒ(๐š’๐š—๐šv3),ฮน+1โŸฉ(bdivI1)

In Python

case jvm.Binary(type=jvm.Int(), operant=jvm.BinaryOpr.Div):
    v2, v1 = frame.stack.pop(), frame.stack.pop()
    assert v1.type is jvm.Int(), f"expected int, but got {v1}"
    assert v2.type is jvm.Int(), f"expected int, but got {v2}"
    if v2 == 0:
        return "divide by zero"

    frame.stack.push(jvm.Value.int(v1.value // v2.value))
    frame.pc += 1
    return state

What about the rest? (ยง4.5)

It's now up to you!

How can I figure out what everything means?

  1. jvm2json/CODEC.txt at kalhauge/jvm2json: the decompiled codec (search for <ByteCodeInst>).

  2. List of Java bytecode instructions - Wikipedia.

  3. Chapter 4. The class File Format: the class file format. And, finally

  4. Chapter 6. The Java Virtual Machine Instruction Set: the official specification of each instruction.

Write an interpreter

$ uv run interpreter.py "jpamb.cases.Simple.assertInteger:(I)V" "(1)"
... alot  ...
... of intermediate  ...
... results ...
ok

Write an interpreter

$ uv run jpamb interpret -W --filter Simple interpreter.py
  1. Start small, one method at a time. You don't have to cover the entire language. Also you don't need a method stack or a heap for most of the cases โ€” do them later.

  2. It's okay to hack some things, like getting the $assertionsDisabled static field. You can assume that is always be false.

  3. Print out the state to stderr at every step, this will help you debug.

  4. Look at solutions/interpreter.py for inspiration, or extend it.

  5. Use the --stepwise flag to run the last failed case next time.

Questions

  1. Is analysing bytecode a semantic analysis?

  2. Name some ways can you describe the semantics of a program?

  3. What does it mean to interpret a piece of code?