Module panopticon_core::il [] [src]

Panopticon uses a language called RREIL to model mnemonic semantics.

Conventional disassembler translate machine code from its binary representation to into a list of mnemonics similar to the format assemblers accept. The only knowledge the disassembler has of the opcode is its textual form (for example mov) and the number and type (constant vs. register) of operands. These information are purely "syntactic" – they are only about shape. Advanced disassembler like distorm or IDA Pro add limited semantic information to an mnemonic like whenever it's a jump or how executing it effects the stack pointer. This ultimately limits the scope and accuracy of analysis a disassembler can do.

Reverse engineering is about understanding code. Most of the time the analyst interprets assembler instructions by "executing" them in his or her head. Good reverse engineers can do this faster and more accurately than others. In order to help human analysts in this labours task the disassembler needs to understand the semantics of each mnemonic.

Panopticon uses a simple and well defined programming language (called RREIL) to model the semantics of mnemonics in a machine readable manner. This intermediate languages is emitted by the disassembler part of Panopticon and used by all analysis algorithms. This way the analysis implementation is decoupled from the details of the instruction set.

Basic structure

A RREIL program modeling the AVR "adc rd, rr" instruction looks as this:

zext carry:8, C:1
add res:8, rd:8, rr:8
add res:8, res:8, carry:8

// zero flag
cmpeq Z:1, res:8, 0:8

mov rd:8, res:8

Each RREIL program is a sequence of instructions. The first argument of each instructions is assigned its result. The remaining arguments are only read. Arguments can be constants, variables of a special undefined value ?. Except for the undefined value all arguments are integers with a fixed size.

Memory in RREIL programs is modeled as an array of memory cells. The are accessed by the load and store instructions.

Control Flow

The RREIL programs produced by the disassemblers are sequences of instructions. No jump or optional instructions are allowed inside a mnemonic. After each mnemonic an unlimited number of jumps is allowed. Each jump is associated with a guard which is a one bit large variable or constant. If the guard is 1, the jump is taken.

The RREIL implemented in Panopticon has a call instruction. This instruction has a single argument that specifies the address where a new function begins. No "return" instruction exists. Functions terminate after a sequence with no outgoing jumps is reached.

Generating Code

Internally, RREIL code is a Vec<_> of Statement instances while the arguemnts are either Lvalue (writeable) or Rvalue (read only). To make generating RREIL easier one can use the rreil! macro which translates slightly modified RREIL code into a Result<Vec<Statement>> instance.

The rreil! macro expects constants to be delimited by brackets ([/]). Rust values can be embedded into RREIL code by enclosing them in parens.

The following code generates RREIL code that implements the first part of the AVR adc R0, R1 instruction.

#[macro_use] extern crate panopticon_core;
let rd = Lvalue::Variable{ name: "R0".into(), size: 8, subscript: None };
let rr = Rvalue::Variable{ name: "R1".into(), size: 8, subscript: None, offset: 0 };
let stmts = try!(rreil!{
    zext/8 carry:8, C:1;
    add res:8, (rd), (rr);
    add res:8, res:8, carry:8;

    // zero flag
    cmpeq Z:1, res:8, [0]:8;
});

Structs

Statement

A single RREIL statement.

Enums

Endianess

Endianess of a memory operation.

Guard

Branch condition

Lvalue

A writeable RREIL value.

Operation

A RREIL operation.

Rvalue

A readable RREIL value.

Functions

execute

Executes a RREIL operation returning the result.

lift

Maps the function m over all operands of op.