OTAWA

Properties, Features and Instructions

Summary
This lecture presents basics of OTAWA programming: features allows to require already implemented services that are represented by properties hooked to the program representation.Properties may also be used to alleviate implementation of program analyses. In the end, this applied to generate a graph representation of the control flow between instructions.

Low Level Representation of the Program

The analyses on the machine language, including WCET computation, is based on the low-level representation of the program. This representation is built by the loader, the plug-in responsible for a specific ISA ((Instruction Set Architecture)) but is viewed by the analyses as a set of interfaces hiding the details of the actual architecture. The goal is not to isolate the architecture from the analyses but to provide common interfaces allowing the analyses (a) to be performed whatever the architecture and (b) to cope with the specific feature of the architecture in a generic way.

This low-level representation is returned by the loader as an otawa::Process object that represents the image of the program during running time. It is made of the content of the executable file and possible libraries but may contain any kind of data relative to the OS or to the ISA. In addition, the otawa::Process provides several facilities to look to byte in memories, to access meta-information like debugging data and defined symbols, characteristics of the ISA, etc. Yet, the otawa::Process is not returned as is but inside an otawa::WorkSpace that is dedicated to collect the result of the different analyses. For one otawa::Process, several otawa::WorkSpace can exist and contain data of different analyses.

The otawa::Process is made of a collection of otawa::File. They may be listed using dedicated iterators, otawa::Process::FileIter.

In OTAWA, when an object contains a collection of other objects, a dedicated iterator class is provided. This iterator (a) must initialized passing the first object or copying an iterator of the same class, (b) may be passed as a condition of a loop and (c) is incremented with ”++” operator. When used as an argument or in an expression, it is automatically converted in the required type (if it can statically determined). Whatever, the ”*” can be used to get the value of the iterator.

for(MyIterator iter(container); iter; iter++)
    do_something(*iter);

In turn, there is one otawa::File for each executable and library file. They contain a collection of otawa::Symbol and of otawa::Segment. otawa::Symbol represents the symbols, variables, functions, constants, labels, etc defined in the file while the otawa::Segment represents a memory part sharing the same properties, basically, readable, writtable or executable. otawa::Segment as other object representing memory areas are defined by an address() method, a size() method and a topAddress(), address of the first byte after the area. The otawa::Segment is made of a collection of otawa::ProgItem that represent meaningful items of the memory, code or data. A frequently used specialization of otawa::ProgItem is the class otawa::Inst.

An otawa::Inst represents any kind of machine instruction and, therefore, provides a complex interface. Basically, an instruction is defined by its kind that provides information on the nature of the instruction. The kinds are defined by the constants IS_XXX where XXX being one of COND, CONTROL (CALL, RETURN, TRAP, INDIRECT), MEM (LOAD, STORE, MULTI, ATOMIC), INT, FLOAT, ALU, MUL, DIV, SHIFT, INTERN, UNKNOWN, SPECIAL.

The kind can be obtained with kind() method but also with dedicated isXXX() functions. An instruction provides also the following information:

  • read registers, method readRegs(),
  • written registers, method writtenRegs(),
  • branch target (if it is a control instruction), method target(),

And can be disassembled by putting it to an output stream.

The instruction is the initial source of information to analyse the machine code and can be handled in a lot of different ways. They may receive properties dedicated to some type of architectures to support features not provided directly supplied by the class.

There are different ways to get an instruction:

  • by its address from an otawa:Process or an otawa::WorkSpace,
  • from its container otawa::Segment,
  • using the nextInst() or prevInst() method that link instruction together.

Yet, one must keep in mind that the content of the segment is dynamic: at loading time, there is no way to know which part of the segment are instructions and which parts are data. This is found back slightly by following the execution paths or using the symbols provided in the file. To traverse in a consistent way the instructions of a program, the developer must first invoke an instruction decoding analysis with the following command:

WorkSpace *ws = MANAGER.open("...", props);
ws->require(DECODED_TEXT);

Adding Information using Properties and Features

In OTAWA, an important part of the analyses aim at building back the structure, data or control of the program. Each analysis computes information that is used by the other one and that completes the understanding of the application. This means that OTAWA computes information items that are linked to the code and data. To preserve this link, the information items may be linked to the program elements using properties: in C++ terminology, most of OTAWA classes inherit from otawa::PropList and support properties.

Basically, an otawa::PropList is a map of pairs whose first member is an identifier and the second is a value. Identifiers are global objects, of type otawa::Identifier, with a name, the type of values it denotes and a default value. Thanks to the ability of C++ to overload common operators, handling is very easy in OTAWA.

The code snippet below declares an identifier named my_identifier, supporting values of type ot::time (time in cycles) and whose default value is -1. As identifiers are shared between analyses, an identifier is often declared as external in an header file and defined as a global variable in a source file.

Identifier TIME("time", -1);

To store a property in an otawa::PropList, use the syntax below:

identifier ( proplist ) = value ;

For example, the code below attach to each instruction of the vector ”v” a time according to their kind:

elm::genstruct::Vector class is a useful ELM container aiming at storing small collection of objects. The vector can be seen as an array, supporting indexed access, whose can be expanded until 65,535 elements. It is advised to not use vector for too big collection of data because the size extension requires a copy of its content.

genstruct::Vector insts;
for(int i = 0; insts.length(); i++) {
    Inst *inst = insts[i];
    if(inst->isDiv())
        TIME(inst) = 20;
    else if(inst->isMul())
        TIME(inst) = 5;
    else if(inst->isFloat())
        TIME(inst) = 6;
    else
        TIME(inst) = 1;
}

The following syntax allows to get back the value of the property. If the property list does not contain a property matching the identifier, the default value of the identifier is returned.

identifier ( proplist )

In the example below, we are computing the total time of the instructions in the vector:

genstruct::Vector insts;
ot::time total = 0;
for(int i = 0; insts.length(); i++)
total += TIME(insts[i]);

To remove a properties, use the syntax below:

identifier ( proplist ).remove();

We can also store several times the same property with:

identifier ( proplist ).add( value to add );

If we store several properties with the same identifier, we need an iterator to retrieve all these values:

for(Identifier::Getter it( identifier, proplist ); it; it++)

For example, the code below allows to display all label linked to an instruction:

for(Identifier::Getter label(LABEL, inst); label; label++)
cout << *label << io::endl;

Unfortunately, C++ syntax does not allows to avoid to repeat the type of the identifier values. The properties is an easy to use and a flexible way to store analysis information in the workspace. Consequently, WCET computations requires hundreds of identifiers and properties. As the properties are very versatile, using properties as is would be a cumbersome task as the property user has to check every time if the property is available or not. To alleviate this, the properties must be (1) well documented and (2) grouped in so-called feature. A feature, class otawa::Feature, is a contract between the producer of the feature and the user of the feature that asserts that some properties are consistently set. Before performing an analysis, one has to ensure that the required features are already available. This is one with the method require() of otawa::WorkSpace. If the feature is available, nothing is done else the analysis associated with feature is invoked. In turn, this analysis may invoke other required analyses. This is what was done in the previous section: the DECODED_TEXT feature((The use of this feature requires to include otawa/prog/features.h)) ensures that the code has been decoded (by launching, if required, the associated analysis):

For examples of features, this page contains a non-exhaustive list of OTAWA features.

Exercise: generating an instruction flow graph

The goal of the exercise is to produce a graphical output of the control flow of the instructions of a function. The output will be textual using the GraphViz language to represents graphs. The figure below shows the graph in textual .dot, to the left, and its graphical display to the right using xdot.py.

ex1.png
digraph "main" {

node [ shape=box ];

"00001000" [ label = "my_function:\n00001000 add sp, sp, #8" ];
"00001004" [ label = "00001004 cmp r0, #0" ];
"00001008" [ label = "00001008 bgt 00001010" ];
"0000100C" [ label = "0000100C bl 00001500" ];
"00001010" [ label = "00001010 sub sp, sp, #8" ];
"00001014" [ label = "0000101C bx lr" ];

"00001000" -> "00001004";
"00001004" -> "00001008";
"00001008" -> "0000100C";
"00001008" -> "00001010" [ label = "taken" ];
"0000100C" -> "00001010";
"0000100C" -> "sin" [ label = "call" ];
"00001010" -> "00001014";
}

To produce such an output, we have to get the first instruction of the function we want to display (method ”findInstAt”() of ”otawa::Process”) and to follow every execution path ending in return instructions or in ”call” instruction (call to another function). The issue with this kind of algorithm is that the execution paths forms a graph possibly containing loops. To avoid to iterate forever, a simple technique is the use of a Boolean marker on the processed instructions: if the marker is false, the instruction has been processed, else the instruction has already been handled and we do not have to continue past this instruction. To end up, we propose to use properties to represent the marker on the instructions.

The algorithm may split in three actions:

  1. gather instructions by following executions paths of the function,
  2. generate the header of ”.dot”,
  3. generate the nodes for each instruction,
  4. generate the edges for each instructions,
  5. generate the footer of ”.dot”.

Step 1 is maybe the more complex and will be based on a working list, todo, and on the MARK property:

working_list <- insts
WHILE working_list != {} DO
    i <- working_list
    IF not MARK(i) THEN
        insts add outputs of i to working_list
    END IF
END DO

Follow the instructions below:

  1. Write a file display.cpp and a Makefile defining an OTAWA application (as done in the pervious section).
  2. Add the declaration of the MARK identifier.
  3. Implements the algorithm in th e'work method without support of label and function call display.
  4. Produce the executable.
  5. Apply it to the function, icrc1, that does not contain any function call.
  6. Add support for label and function call display.
  7. Apply it to the function, icrc, that contains function call.
  8. Apply it to the function, main.

Labels in OTAWA

otawa::Inst supports three type of labels:

The resulting program will be applied on the functions of {{:doc:train:crc.elf|crc.elf}}. To check the result, you can use the following {{:doc:train:crc.dis|disassembly of crc}}.

How to handle output of instructions?

  • most instruction have only as output the next instruction (like ARM ”add r0, r0, #1”),
  • return control instruction as ARM ”mov pc, lr” have no output (end of function),
  • unconditional branch like ”b ”label have only as output the target of the branch,
  • conditional branch like ”beq ”label have as output the next instruction and the target of the branch.

Used Classes
  • elm::genstruct::Vector
  • otawa::Feature
  • otawa::File
  • otawa::Identifier
  • otawa::Inst
  • otawa::Process
  • otawa::PropList
  • otawa::Segment
  • otawa::WorkSpace

Leave a Reply

Your email address will not be published. Required fields are marked *