OTAWA provides an abstraction layer between the actual binary program and static analyses. More...

Classes
class	otawa::sem::inst
	This structure class represents an instruction in the semantics representation of machine instruction. More...

class	otawa::sem::Printer
	Printer class for semantic instructions (resolve the generic register value to the their real platform name). More...

class	otawa::sem::PathIter
	This iterator allows easily to traverse all execution paths of a block of semantic instructions. More...

Functions
cond_t	otawa::sem::invert (cond_t cond)
	Invert the given condition. More...

int	otawa::sem::size (type_t type)
	Get the size of the given type. More...

Detailed Description

OTAWA provides an abstraction layer between the actual binary program and static analyses.

This abstraction is composed of generic information about the processed instruction: type of instruction, target for branch instruction, read and writtent registers, and so on.

To perform value or interval analysis, we need to understand the processing of instructions. To maintain this abstraction layer, OTAWA provides with some architecture plugins a way to translate instruction into a list of architecture-independent instructions to let value analyses to be independent of the actual architecture.

Semantics Instruction Set

Designing a language to describe any instruction set is may an impossible task. In the opposite, our semantics language has for goal to allow to perform analyses while retaining only important things for such a task: basic arithmetics and comparison for integer and address and integer computation, minimal flow of control to allow fast analysis.

Basically, a machine instruction I is translated into a block of n semantic instructions [i0, i1, i2, ..., in-1]. To interpret a semantic block, one has to consider a semantic instruction counter, p. At beginning of the interpretation, p = 0. According to the interpreted instructions, p is simply incremented (+1) to perform sequential execution, or changed by adding a positive quantity n to avoid executing some instructions. Notice that n >= 0 induces that no loop can appear in the interpretation of semantic block and hence no fix point is required. The execution stops as soon as p >= n.

A interpretation of a semantic i, |[i]|, can be viewed as a function taking as input (p, s) with s the state of the program and producing a new (p', s'):

|[i]|: (p, s) -> (p', s')

The state s is a an instance of possible states, that is, of functions mapping registers R, temporaries T and memory addresses A to a value in domain V:

S: R U T U A -> V

Set of registers depends on the underlying architecture but is mapped to IN. Temporaries are a subset of negative integers but A currently represents address on 32-bits. For the remaining of the document, s[i] represents the value of V in the state s that matches i, that may be a register, a temporary or an address. s[x / y] represents a new state where value of y is x, that is, s' = s[x / y] = lambda i. if i = y then x else s[i].

The algorithm to interpret a semantic block follows (s0 is the initial state):

p <- 0
s <- s0
WHILE i < n DO
   (p, s) <- |[ip]| (p, s)

The following sections describes the semantics of each instruction.

Computation Instructions

Basically, our semantic instructions have three operands and works with registers. The first operand d is the target registers to store the result and next ones represents source operands (a and b). The following arithmetics operations exists:

ADD – addition,
SUB – subtraction,
SHL – logical shift left,
SHR – logical shift right,
ASR – arithmetic shift right,
AND – binary and,
OR – binary inclusive or,
XOR – binary inclusive xor,
MUL – signed multiplication,
MULU – unsigned multiplication,
MULH – higher order bits of multiplication,
DIV – signed division,
DIVU – unsigned division,
MOD – signed integer division remainder,
MODU – unsigned integer division remainder.

The following instructions represents unary operations applied on the a register and stores the result on the d register.

NEG – sign negation,
NOT – binary complement.

Below is given the semantics of each instruction where:

a, b, c represents registers,
i represents 32-bits immediate value,
operators applies on unsigned values except if they are followed by "+".

|[add (d, a, b)]| (p, s) = (p + 1, s[s[a] +   s[b] / d])
|[sub (d, a, b)]| (p, s) = (p + 1, s[s[a] -   s[b] / d])
|[shl (d, a, b)]| (p, s) = (p + 1, s[s[a] <<  s[b] / d])
|[shr (d, a, b)]| (p, s) = (p + 1, s[s[a] >>  s[b] / d])
|[asr (d, a, b)]| (p, s) = (p + 1, s[s[a] >>+ s[b] / d])
|[and (d, a, b)]| (p, s) = (p + 1, s[s[a] &   s[b] / d])
|[or  (d, a, b)]| (p, s) = (p + 1, s[s[a] |   s[b] / d])
|[xor (d, a, b)]| (p, s) = (p + 1, s[s[a] ^   s[b] / d])
|[mul (d, a, b)]| (p, s) = (p + 1, s[s[a] *   s[b] / d])
|[mulu(d, a, b)]| (p, s) = (p + 1, s[s[a] *+  s[b] / d])
|[mulh(d, a, b)]| (p, s) = (p + 1, s[(s[a] *  s[b]) >> bitlength(d) / d])
|[div (d, a, b)]| (p, s) = (p + 1, s[s[a] /   s[b] / d])
|[divu(d, a, b)]| (p, s) = (p + 1, s[s[a] /+  s[b] / d])
|[mod (d, a, b)]| (p, s) = (p + 1, s[s[a] %   s[b] / d])
|[modu(d, a, b)]| (p, s) = (p + 1, s[s[a] %+  s[b] / d])
|[neg (d, a)   ]| (p, s) = (p + 1, s[-s[a]         / d])
|[not (d, a)   ]| (p, s) = (p + 1, s[~s[a]         / d])

Semantics Instruction Set

There are four set instructions:

to transfer content of a register to another one, set,
to get a constant 32-bits value in a register, seti,
to assign undefined T value to a register, scratch,
to set the page of an address for multi-address space architectures, setp (very rarely used).

Their semantics is described below (i is a 32-bits value):

|[set    (d, a)]| (p, s) = (p + 1, s[s[a] / d])
|[seti   (d, i)]| (p, s) = (p + 1, s[i / d])
|[scratch(d)   ]| (p, s) = (p + 1, s[T / d])
|[setp   (d, i)]| (p, s) = (p + 1, s[p :: s[d] / d])

Flow Instructions

In semantic instructions, there are two types of control flow: machine flow corresponds to PC assignment in the underlying instruction set and is supported by CFG; semantic flow that controls the execution flow of semantic instruction. The only instruction concerning the machine control flow is branch that gives the new address stored in the PC but it doesn't change the semantic instruction counter p.

The semantic control flow is handled by two instruction, cont that stops the execution and if that makes following instructions optional. The condition used in if is generated from a comparison result, a ~ b (a compared to b), generated by either a cmp instruction, or instruction. Then the if statement exploits the comparison results to apply one of the following condition:

EQ – a == b
LT – a < b
LE – a <= b
GE – a >= b
GT – a > b
NE – a != b
ANY_COND – do not know anything (recall we are performing static analysis).
ULT – a < b
ULE – a <= b
UGE – a >= b
UGT – a > b

Below is the semantics of the flow instructions (c is a condition and c[a] returns true if condition c is satisfied in register a):

|[branch(a)       ]| (p, s) = (p + 1, s)                // PC <- s[a]
|[cmp   (d, a, b) ]| (p, s) = (p + 1, s[s[a] ~+ s[b] / d])
|[cmpu  (d, a, b) ]| (p, s) = (p + 1, s[s[a] ~  s[b] / d])
|[cont            ]| (p, s) = (n, s)
|[if    (c, a, k) ]| (p, s) = (if c[a] then p + 1 else p + k, s)

Memory Access Instruction

There are only two instructions used to perform memory access:

load allows to get a value from memory using its address and its type,
store allows to store a register value to a memory address using the given type.

Notice that the types are required to realize the bytes modified in memory and to generate correct value in register (including sign extension). The types may be one of:

int8 – signed integer on 8-bits,
int16 – signed integer on 16-bits,
int32 – signed integer on 32-bits,
int64 – signed integer on 64-bits,
uint8 – unsigned integer on 8-bits,
uint16 – unsigned integer on 16-bits,
uint32 – unsigned integer on 32-bits,
uint64 – unsigned integer on 64-bits.

The semantics of these instructions follows (t is the type, s(t) represents access to state with type t):

|[load (d, a, t)]| (p, s) = (p + 1, s[s(t)[s[a] / d])

|[store(d, a, t)]| (p, s) = (p + 1, s(t)[d / s[b]])

Special Instruction

There is also a semantics instruction kind that must be used by instruction effects not tractable with the current semantics instruction set. One using the SPEC instructions must be aware that standard usual analyses will not cope with such instructions: they will need to be customized. Therefore, the semantics of spec is undefined.

Register and Temporaries

Operators used in the instruction (a, b and d) represents unique representation of registers (as returned by Register::platformNumber()) for positive number or temporaries when negative numbers are used.

Temporary values are useful when the semantics expression of an instruction is complex and requires several temporary results. To alleviate the management of temporaries, they are easy identified as they are represented as negative numbers and their maximum number is provided by the Process::tempMax(). Please, notice that the liveness of a temporary must not expand out of the semantics block of an instruction !

Building Analyses

Static analyses using instruction semantics are usually called data-flow analysis. Using Abstact Interpretation, the interpretation domain is usually an abstraction of the program variables state. The state includes usually the value of the registers (simply identified by the register platform number) and the addresses of used locations in memory. Register numbers and memory location addresses grouped together forms the address set.

So, the state becomes usually maps from addresses to value. OTAWA provides already several representations for these maps.The next step is to define the abstraction of the values: values are stored in registers and in memory and must be specialized according to the performed analysis. For example, the CLP (Cycle Linear Progression) analysis, the values are represented as triplets (b, d, n) representing a set of integers (and adresses) from the set {b + i d / 0 <= i < n}. But, it may be any value you want, adapted to your analysis. Whatever, one must remark that addresses are usually storable in registers and in memory, the value set must provides a way to represent them.

The usual map representations provide already functions to perform abstract interpretation (bottom value, initial value, update, join, etc). In the case of the value, you have also to provide function to perform abstract interpretation but also functions to interpret the different semantic instructions.

To help the developer supports the multiple execution path of a semantic bloc, one can use the class otawa::sem::PathIter that works like a usual iterator but provides also indications of the executed paths and instructions:

op, d, a, b, cst, reg, addr – access to instruction description,
pathEnd() – true if execution is at end,
isCond() – true if the current instruction is a if, meaning that two paths will start from this point.

A common way to use otawa::sem::PathIter is to maintain a stack of states where the current state is pushed when an if is found and a state is popped when an execution path ends. The different states obtained at each end of an execution path can be joined:

genstruct::Vector<State> stack;
State s = initial_state, result = bottom_state;
stack.push(s);
for(sem::PathIter i(machine_instruction); i; i++) {
        if(i.pathEnd()) {
                result = join(result, s);
                s = stack.pop();
        }
        else {
                if(i.isCond())
                stack.push(s);
        s = apply(i, s);
        }
}
return result;

A specific processing is devoted to SPEC instructions. A convenient analysis must let its user to specialize it in order to support these instructions. To achieve this goal, it must provide in the analysis a virtual function that is called each time the SPEC instruction is interpreted. It would be useful if this function takes as parameter the real instruction, the semantics instruction and the current abstract state. In the initial analysis, this function simply do nothing but it lets customizer to overload it in order to customize the interpretation.

Function Documentation

cond_t otawa::sem::invert ( cond_t cond )

Invert the given condition.

Parameters

cond	Condition to invert.

Returns: Inverted condition.

References otawa::sem::ANY_COND, otawa::se::EQ, otawa::se::GE, otawa::se::GT, otawa::se::LE, otawa::se::LT, otawa::sem::MAX_COND, otawa::se::NE, otawa::sem::NO_COND, otawa::se::UGE, otawa::se::UGT, otawa::se::ULE, and otawa::se::ULT.

int otawa::sem::size ( type_t type )

Get the size of the given type.

Parameters

type	Type to get size for.

Returns: Size in bytes.

References otawa::type().

Classes

Functions

Detailed Description

Semantics Instruction Set

Computation Instructions

Semantics Instruction Set

Flow Instructions

Memory Access Instruction

Special Instruction

Register and Temporaries

Building Analyses

Function Documentation