Otawa  0.10
Semantic Instructions

OTAWA provides an abstraction layer between the actual binary program and static analyses. More...

Classes

class  otawa::sem::inst
 This structure class represents an instruction in the semantics representation of machine instruction. More...
 
class  otawa::sem::Printer
 Printer class for semantic instructions (resolve the generic register value to the their real platform name). More...
 
class  otawa::sem::PathIter
 This iterator allows easily to traverse all execution paths of a block of semantic instructions. More...
 

Functions

cond_t otawa::sem::invert (cond_t cond)
 Invert the given condition. More...
 
int otawa::sem::size (type_t type)
 Get the size of the given type. More...
 

Detailed Description

OTAWA provides an abstraction layer between the actual binary program and static analyses.

This abstraction is composed of generic information about the processed instruction: type of instruction, target for branch instruction, read and writtent registers, and so on.

To perform value or interval analysis, we need to understand the processing of instructions. To maintain this abstraction layer, OTAWA provides with some architecture plugins a way to translate instruction into a list of architecture-independent instructions to let value analyses to be independent of the actual architecture.

Semantics Instruction Set

Designing a language to describe any instruction set is may an impossible task. In the opposite, our semantics language has for goal to allow to perform analyses while retaining only important things for such a task: basic arithmetics and comparison for integer and address and integer computation, minimal flow of control to allow fast analysis.

Basically, a machine instruction I is translated into a block of n semantic instructions [i0, i1, i2, ..., in-1]. To interpret a semantic block, one has to consider a semantic instruction counter, p. At beginning of the interpretation, p = 0. According to the interpreted instructions, p is simply incremented (+1) to perform sequential execution, or changed by adding a positive quantity n to avoid executing some instructions. Notice that n >= 0 induces that no loop can appear in the interpretation of semantic block and hence no fix point is required. The execution stops as soon as p >= n.

A interpretation of a semantic i, |[i]|, can be viewed as a function taking as input (p, s) with s the state of the program and producing a new (p', s'):

|[i]|: (p, s) -> (p', s')

The state s is a an instance of possible states, that is, of functions mapping registers R, temporaries T and memory addresses A to a value in domain V:

S: R U T U A -> V

Set of registers depends on the underlying architecture but is mapped to IN. Temporaries are a subset of negative integers but A currently represents address on 32-bits. For the remaining of the document, s[i] represents the value of V in the state s that matches i, that may be a register, a temporary or an address. s[x / y] represents a new state where value of y is x, that is, s' = s[x / y] = lambda i. if i = y then x else s[i].

The algorithm to interpret a semantic block follows (s0 is the initial state):

p <- 0
s <- s0
WHILE i < n DO
(p, s) <- |[ip]| (p, s)

The following sections describes the semantics of each instruction.

Computation Instructions

Basically, our semantic instructions have three operands and works with registers. The first operand d is the target registers to store the result and next ones represents source operands (a and b). The following arithmetics operations exists:

The following instructions represents unary operations applied on the a register and stores the result on the d register.

Below is given the semantics of each instruction where:

|[add (d, a, b)]| (p, s) = (p + 1, s[s[a] + s[b] / d])
|[sub (d, a, b)]| (p, s) = (p + 1, s[s[a] - s[b] / d])
|[shl (d, a, b)]| (p, s) = (p + 1, s[s[a] << s[b] / d])
|[shr (d, a, b)]| (p, s) = (p + 1, s[s[a] >> s[b] / d])
|[asr (d, a, b)]| (p, s) = (p + 1, s[s[a] >>+ s[b] / d])
|[and (d, a, b)]| (p, s) = (p + 1, s[s[a] & s[b] / d])
|[or (d, a, b)]| (p, s) = (p + 1, s[s[a] | s[b] / d])
|[xor (d, a, b)]| (p, s) = (p + 1, s[s[a] ^ s[b] / d])
|[mul (d, a, b)]| (p, s) = (p + 1, s[s[a] * s[b] / d])
|[mulu(d, a, b)]| (p, s) = (p + 1, s[s[a] *+ s[b] / d])
|[mulh(d, a, b)]| (p, s) = (p + 1, s[(s[a] * s[b]) >> bitlength(d) / d])
|[div (d, a, b)]| (p, s) = (p + 1, s[s[a] / s[b] / d])
|[divu(d, a, b)]| (p, s) = (p + 1, s[s[a] /+ s[b] / d])
|[mod (d, a, b)]| (p, s) = (p + 1, s[s[a] % s[b] / d])
|[modu(d, a, b)]| (p, s) = (p + 1, s[s[a] %+ s[b] / d])
|[neg (d, a) ]| (p, s) = (p + 1, s[-s[a] / d])
|[not (d, a) ]| (p, s) = (p + 1, s[~s[a] / d])

Semantics Instruction Set

There are four set instructions:

Their semantics is described below (i is a 32-bits value):

|[set (d, a)]| (p, s) = (p + 1, s[s[a] / d])
|[seti (d, i)]| (p, s) = (p + 1, s[i / d])
|[scratch(d) ]| (p, s) = (p + 1, s[T / d])
|[setp (d, i)]| (p, s) = (p + 1, s[p :: s[d] / d])

Flow Instructions

In semantic instructions, there are two types of control flow: machine flow corresponds to PC assignment in the underlying instruction set and is supported by CFG; semantic flow that controls the execution flow of semantic instruction. The only instruction concerning the machine control flow is branch that gives the new address stored in the PC but it doesn't change the semantic instruction counter p.

The semantic control flow is handled by two instruction, cont that stops the execution and if that makes following instructions optional. The condition used in if is generated from a comparison result, a ~ b (a compared to b), generated by either a cmp instruction, or instruction. Then the if statement exploits the comparison results to apply one of the following condition:

Below is the semantics of the flow instructions (c is a condition and c[a] returns true if condition c is satisfied in register a):

|[branch(a) ]| (p, s) = (p + 1, s) // PC <- s[a]
|[cmp (d, a, b) ]| (p, s) = (p + 1, s[s[a] ~+ s[b] / d])
|[cmpu (d, a, b) ]| (p, s) = (p + 1, s[s[a] ~ s[b] / d])
|[cont ]| (p, s) = (n, s)
|[if (c, a, k) ]| (p, s) = (if c[a] then p + 1 else p + k, s)

Memory Access Instruction

There are only two instructions used to perform memory access:

Notice that the types are required to realize the bytes modified in memory and to generate correct value in register (including sign extension). The types may be one of:

The semantics of these instructions follows (t is the type, s(t) represents access to state with type t):

|[load (d, a, t)]| (p, s) = (p + 1, s[s(t)[s[a] / d])
|[store(d, a, t)]| (p, s) = (p + 1, s(t)[d / s[b]])

Special Instruction

There is also a semantics instruction kind that must be used by instruction effects not tractable with the current semantics instruction set. One using the SPEC instructions must be aware that standard usual analyses will not cope with such instructions: they will need to be customized. Therefore, the semantics of spec is undefined.

Register and Temporaries

Operators used in the instruction (a, b and d) represents unique representation of registers (as returned by Register::platformNumber()) for positive number or temporaries when negative numbers are used.

Temporary values are useful when the semantics expression of an instruction is complex and requires several temporary results. To alleviate the management of temporaries, they are easy identified as they are represented as negative numbers and their maximum number is provided by the Process::tempMax(). Please, notice that the liveness of a temporary must not expand out of the semantics block of an instruction !

Building Analyses

Static analyses using instruction semantics are usually called data-flow analysis. Using Abstact Interpretation, the interpretation domain is usually an abstraction of the program variables state. The state includes usually the value of the registers (simply identified by the register platform number) and the addresses of used locations in memory. Register numbers and memory location addresses grouped together forms the address set.

So, the state becomes usually maps from addresses to value. OTAWA provides already several representations for these maps.The next step is to define the abstraction of the values: values are stored in registers and in memory and must be specialized according to the performed analysis. For example, the CLP (Cycle Linear Progression) analysis, the values are represented as triplets (b, d, n) representing a set of integers (and adresses) from the set {b + i d / 0 <= i < n}. But, it may be any value you want, adapted to your analysis. Whatever, one must remark that addresses are usually storable in registers and in memory, the value set must provides a way to represent them.

The usual map representations provide already functions to perform abstract interpretation (bottom value, initial value, update, join, etc). In the case of the value, you have also to provide function to perform abstract interpretation but also functions to interpret the different semantic instructions.

To help the developer supports the multiple execution path of a semantic bloc, one can use the class otawa::sem::PathIter that works like a usual iterator but provides also indications of the executed paths and instructions:

A common way to use otawa::sem::PathIter is to maintain a stack of states where the current state is pushed when an if is found and a state is popped when an execution path ends. The different states obtained at each end of an execution path can be joined:

genstruct::Vector<State> stack;
State s = initial_state, result = bottom_state;
stack.push(s);
for(sem::PathIter i(machine_instruction); i; i++) {
if(i.pathEnd()) {
result = join(result, s);
s = stack.pop();
}
else {
if(i.isCond())
stack.push(s);
s = apply(i, s);
}
}
return result;

A specific processing is devoted to SPEC instructions. A convenient analysis must let its user to specialize it in order to support these instructions. To achieve this goal, it must provide in the analysis a virtual function that is called each time the SPEC instruction is interpreted. It would be useful if this function takes as parameter the real instruction, the semantics instruction and the current abstract state. In the initial analysis, this function simply do nothing but it lets customizer to overload it in order to customize the interpretation.

Function Documentation

cond_t otawa::sem::invert ( cond_t  cond)

Invert the given condition.

Parameters
condCondition to invert.
Returns
Inverted condition.

References otawa::sem::ANY_COND, otawa::se::EQ, otawa::se::GE, otawa::se::GT, otawa::se::LE, otawa::se::LT, otawa::sem::MAX_COND, otawa::se::NE, otawa::sem::NO_COND, otawa::se::UGE, otawa::se::UGT, otawa::se::ULE, and otawa::se::ULT.

int otawa::sem::size ( type_t  type)

Get the size of the given type.

Parameters
typeType to get size for.
Returns
Size in bytes.

References otawa::type().