OTAWA

Properties, Features and Instructions

Summary

This lecture presents basics of OTAWA programming: features allows to require already implemented services that are represented by properties hooked to the program representation.Properties may also be used to alleviate implementation of program analyses. In the end, this applied to generate a graph representation of the control flow between instructions.

Low Level Representation of the Program

The analyses on the machine language, including WCET computation, is based on the low-level representation of the program. This representation is built by the loader, the plug-in responsible for a specific ISA ((Instruction Set Architecture)) but is viewed by the analyses as a set of interfaces hiding the details of the actual architecture. The goal is not to isolate the architecture from the analyses but to provide common interfaces allowing the analyses (a) to be performed whatever the architecture and (b) to cope with the specific feature of the architecture in a generic way.

This low-level representation is returned by the loader as an otawa::Process object that represents the image of the program during running time. It is made of the content of the executable file and possible libraries but may contain any kind of data relative to the OS or to the ISA. In addition, the ##otawa::Process## provides several facilities to look to byte in memories, to access meta-information like debugging data and defined symbols, characteristics of the ISA, etc. Yet, the ##otawa::Process## is not returned as is but inside an ##otawa::WorkSpace## that is dedicated to collect the result of the different analyses. For one ##otawa::Process##, several ##otawa::WorkSpace## can exist and contain data of different analyses.

The ##otawa::Process## is made of a collection of ##otawa::File##. They may be listed using dedicated iterators, ##otawa::Process::FileIter##.

OTAWA extensively uses the enumeration form of for instruction. Therefore, to visit a collection of items in OTAWA, you have just to write:
for(const auto& x: values)
    use(x);

In turn, there is one ##otawa::File## for each executable and library file. They contain a collection of ##otawa::Symbol## and of ##otawa::Segment##. ##otawa::Symbol## represents the symbols, variables, functions, constants, labels, etc defined in the file while the ##otawa::Segment## represents a memory part sharing the same properties, basically, readable, writtable or executable. ##otawa::Segment## as other object representing memory areas are defined by an ##address()## method, a ##size()## method and a ##topAddress()##, address of the first byte after the area. The ##otawa::Segment## is made of a collection of ##otawa::ProgItem## that represent meaningful items of the memory, code or data. A frequently used specialization of ##otawa::ProgItem## is the class ##otawa::Inst##.

An ##otawa::Inst## represents any kind of machine instruction and, therefore, provides a complex interface. Basically, an instruction is defined by its kind that provides information on the nature of the instruction. The kinds are defined by the constants ##IS_##//XXX// where //XXX// being one of COND, CONTROL (CALL, RETURN, TRAP, INDIRECT), MEM (LOAD, STORE, MULTI, ATOMIC), INT, FLOAT, ALU, MUL, DIV, SHIFT, INTERN, UNKNOWN, SPECIAL.

The kind can be obtained with ##kind()## method but also with dedicated ##is##//XXX//() functions. An instruction provides also the following information:

  • read registers, method ##readRegs##(),
  • written registers, method ##writtenRegs##(),
  • branch target (if it is a control instruction), method ##target##(),

And can be disassembled by putting it to an output stream.

The instruction is the initial source of information to analyse the machine code and can be handled in a lot of different ways. They may receive properties dedicated to some type of architectures to support features not provided directly supplied by the class.

There are different ways to get an instruction:

  • by its address from an ##otawa:Process## or an ##otawa::WorkSpace##,
  • from its container ##otawa::Segment##,
  • using the ##nextInst()## or ##prevInst()## method that link instruction together.


Yet, one must keep in mind that the content of the segment is dynamic: at loading time, there is no way to know which part of the segment are instructions and which parts are data. This is found back slightly by following the execution paths or using the symbols provided in the file. To traverse in a consistent way the instructions of a program, the developer must first invoke an instruction decoding analysis with the following command:

#include <otawa/cfg/features.h>

WorkSpace *ws = MANAGER.open("...", props);
ws->require(COLLECTED_CFG_FEATURE);

Adding Information using Properties and Features

In OTAWA, an important part of the analyses aim at building back the structure, data or control of the program. Each analysis computes information that is used by the other one and that completes the understanding of the application. This means that OTAWA computes information items that are linked to the code and data. To preserve this link, the information items may be linked to the program elements using properties: in C++ terminology, most of OTAWA classes inherit from ##otawa::PropList## and support properties.

Basically, an ##otawa::PropList## is a map of pairs whose first member is an identifier and the second is a value. Identifiers are global objects, of type ##otawa::Identifier##, with a name, the type of values it denotes and a default value. Thanks to the ability of C++ to overload common operators, handling is very easy in OTAWA.

The code snippet below declares an identifier named ##my_identifier##, supporting values of type ##ot::time## (time in cycles) and whose default value is -1. As identifiers are shared between analyses, an identifier is often declared as //external// in an header file and defined as a global variable in a source file.

p::id<int> TIME("time", -1);

To store a property in an ##otawa::PropList##, use the syntax below:

> //identifier// __(__ //proplist// __)__ __=__ //value// __;__

For example, the code below attach to each instruction of the vector ”v” a time according to their kind:

Vector<Inst *> insts;
for(auto inst: insts) {
    if(inst->isDiv())
        TIME(inst) = 20;
    else if(inst->isMul())
        TIME(inst) = 5;
    else if(inst->isFloat())
        TIME(inst) = 6;
    else
        TIME(inst) = 1;
}
elm::Vector class is a useful ELM container aiming at storing small collection of objects. The vector can be seen as an array, supporting indexed access, whose can be expanded until 65,535 elements. It is advised to not use vector for too big collection of data because the size extension requires a copy of its content.

The following syntax allows to get back the value of the property. If the property list does not contain a property matching the identifier, the default value of the identifier is returned.

> //identifier// __(__ //proplist// __)__

In the example below, we are computing the total time of the instructions in the vector:

Vector<Inst *> insts;
ot::time total = 0;
for(auto inst: insts)
    total += TIME(inst);

To remove a properties, use the syntax below:

> //identifier// __(__ //proplist// __).remove();__

We can also store several times the same property with:

> //identifier// __(__ //proplist// __).add(__ //value to add// __);__

If we store several properties with the same identifier, we need an iterator to retrieve all these values:

> __for__(const auto& //it// __=__ //identifier//__,all(__//proplist//__))__ …

For example, the code below allows to display all label linked to an instruction:

for(const auto& label: LABEL.all(inst))
    cout << label << io::endl;


The properties is an easy to use and a flexible way to store analysis information in the workspace. Consequently, WCET computations requires hundreds of identifiers and properties. As the properties are very versatile, using properties as is would be a cumbersome task as the property user has to check every time if the property is available or not. To alleviate this, the properties must be (1) well documented and (2) grouped in so-called feature. A //feature//, class ##otawa::Feature##, is a contract between the producer of the feature and the user of the feature that asserts that some properties are consistently set. Before performing an analysis, one has to ensure that the required features are already available. This is one with the method ##require##() of ##otawa::WorkSpace##. If the feature is available, nothing is done else the analysis associated with feature is invoked. In turn, this analysis may invoke other required analyses. This is what was done in the previous section: the ##DECODED_TEXT## feature((The use of this feature requires to include ##otawa/prog/features.h##)) ensures that the code has been decoded (by launching, if required, the associated analysis):

{{{ lang=c++ ws->require(DECODED_TEXT); }}}

For examples of features, this [[http://www.irit.fr/recherches/ARCHI/MARCH/OTAWA//autodoc/group__features.html|page]] contains a non-exhaustive list of OTAWA features. = Exercise: generating an instruction flow graph = The goal of the exercise is to produce a graphical output of the control flow of the instructions of a function. The output will be textual using the [[http://graphviz.org/|GraphViz]] language to represents graphs. The figure below shows the graph in textual ##.dot##, to the left, and its graphical display to the right using ##xdot.py##.

digraph "main" {
    node [ shape=box ];

    "00001000" [ label = "my_function:\n00001000 add sp, sp, #8" ];
    "00001004" [ label = "00001004 cmp r0, #0" ];
    "00001008" [ label = "00001008 bgt 00001010" ];
    "0000100C" [ label = "0000100C bl 00001500" ];
    "00001010" [ label = "00001010 sub sp, sp, #8" ];
    "00001014" [ label = "0000101C bx lr" ];

    "00001000" -> "00001004";
    "00001004" -> "00001008";
    "00001008" -> "0000100C";
    "00001008" -> "00001010" [ label = "taken" ];
    "0000100C" -> "00001010";
    "0000100C" -> "sin" [ label = "call" ];
    "00001010" -> "00001014";
}

The algorithm may split in three actions:

  1. gather instructions by following executions paths of the function,
  2. generate the header of ”.dot”,
  3. generate the nodes for each instruction,
  4. generate the edges for each instructions,
  5. generate the footer of ”.dot”.

Step 1 is maybe the more complex and will be based on a working list, //todo//, and on the MARK property:

  • //todo// <- first instruction
  • WHILE //todo// != {} DO
    • //i// <- pop //todo//
    • IF not MARK(i) THEN
      • add outputs of //i// to working_list
      • work with //i//

The resulting program will be applied on the functions of crc.elf. To check the result, you can use the following {{:doc:train:crc.dis|disassembly of crc}}.

  • Write a file display.cpp and a Makefile defining an OTAWA application (as done in the pervious section).
  • Add the declaration of the MARK identifier.
  • Implements the algorithm in the work method. (labels and subprogram calls are ignored for now).
  • Compile all together.
  • Apply it to the function, icrc1, that does not contain any function call.
  • Add support for labels and function calls.
  • Apply it to the function, icrc, that contains function calls.

Labels in OTAWA

##otawa::Inst## supports three type of labels:

  • [[http://www.irit.fr/recherches/ARCHI/MARCH/OTAWA//autodoc/namespaceotawa.html#a3ebb1b90f5ad91d3f36034bb65dd6279|LABEL]]
  • [[http://www.irit.fr/recherches/ARCHI/MARCH/OTAWA//autodoc/namespaceotawa.html#afab070911ff2d9c51668a689a4b9b617|FUNCTION_LABEL]]
  • [[http://www.irit.fr/recherches/ARCHI/MARCH/OTAWA//autodoc/classotawa_1_1Symbol.html#a91e4aa8784f5ae008d9b59db823a41ca|Symbol::ID]] (non-function and function labels)

How to handle output of instructions?

  • Most instructions have only as output the next instruction (like ARM ”add r0, r0, #1”),
  • return control instruction as ARM ”mov pc, lr” have no output (end of function),
  • unconditional branch like ”b ”//label// have only as output the target of the branch,
  • conditional branch like ”beq ”//label// have as output the next instruction and the target of the branch.

Used Classes

  • elm::Vector
  • otawa::Feature
  • otawa::File
  • otawa::p::id
  • otawa::Inst
  • otawa::Process
  • otawa::PropList
  • otawa::Segment
  • otawa::WorkSpace