Content

1 Script Development

A script, in OTAWA, allows to configure a WCET computation. It contains :

step-by-step code processor invocations,
a way to let the user fine-tune the computation thanks to user parameters,
description of the involved architecture,
possible documentation about the computation (allowing to inform the user about limitations for example).

The scripts are described in XML format and, therefore, easy to read and write by a human user. They are mostly the simpler way to extend OTAWA without the need to understand the internal API of the framework. In addition, scripts are easy to write because they use well-known formats based on XML like XInclude or XSLT that gives a lot of power in the script behavior.

Scripts have basically two usage. First, they are used to configure the computation for a particular architecture. Instead of using the main stream computation approach, they allows to easily perform and automate specific analyzes to fine-tune a computation. Another use of the script is to provide support in OTAWA for a new micro-architecture: indeed, they allows to describe the components of an architecture or a model processor (pipeline, caches, memory space) and then to invoke all analyzes required to support the architecture. Obviously, this way to describe a micro-architecture is bound to the feature supported by OTAWA. For more exotic architectures, you will need to implements plugins described in the following sections.

In addition, describing a WCET computation allows to seamlessly use it under the OTAWA plugin inside Eclipse. The plugin has the ability to understand the script and to translate the configuration into a user interface form adapted to Eclipse.

1.1 Notation

The description of the XML files in this document merges XML textual format with EBNF.

The grammar is formed of a list of rules whose root is the first one. Each rule is made of:

an XML comment giving the name of the rule,
the matching XML element possibly containing other elements giving the shape of the rule (as below).

<!-- RULE_NAME ::= -->
<tag> ... </tag>

This description element may contain a sequence of symbols that may be :

other elements (that in turn contains also the same type of item that the rule element),
identifiers in uppercase (to represent terminal symbols, see below for the list of accepted terminals),
empty XML element to refer to subsequently defined rules (they are replaced in actual files by their definition element),
EBNF symbols (described below).

EBNF symbols allows to repeat elements, make them optional or select alternatives. They may be :

* – repeat 0 or n times the previous symbol,
+ – repeat 1 or n times the previous symbol,
? – the previous symbol is optional,
sym₁ | sym₂ | ⋯ – symbols separated by pipes are alternative, only one is selected at one time,
( symbols ) – parentheses allows to group symbols in order to support previous operators on a group of symbols.

The XML elements attributes are defined at their normal location, after the opening XML tag. They conform to the usual XML notation except for their content and their activation. An attribute definition may be followed by a question mark ? to denote it as optional. Otherwise, it is considered as mandatory. The content of the attribute, between simple or double quotes, supports the EBNF annotations for the contained text. Finally, an optional attribute defined as an alternative may provide a default value by appending to the alternative list the string ; default= default value.

The terminal identifier have the following meaning:

ID – an XML identifier (any non-blank sequence of characters),
INT – a decimal or hexadecimal (prefixed by 0[xX]) integer,
TEXT – any text,
ADDRESS – an address (synonym of INT),
BOOL – a boolean value (true ou false).

1.2 Script Format

A script is a textual XML file whose extension is usually .osx for Otawa Script XML. It follows the usual rule of XML and the top-level element is called otawa-script:

<!-- OTAWA-SCRIPT ::= -->
  <?xml version="1.0"?>
  <otawa-script
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- DESCRIPTION -->

  <!-- IDENTIFICATION -->
  
  <!-- CONFIGURATION -->?
  
  <!-- PLATFORM -->?
  
  <!-- SCRIPT -->?

Notice the two namespace declarations xmlns:: they are not mandatory but are very useful if you to use XInclude or XSLT.

The script is made of 5 different parts detailed below:

DESCRIPTION provides various information targeting the human user,
IDENTIFICATION contains mainly identifier about the hardware (architecture, model, ABI),
CONFIGURATION provides a list of items the user may tune,
PLATFORM describes the hardware,
SCRIPT details the performed computation steps.

1.2.1 Script Description

The description is made of the following items:

<!-- DESCRIPTION ::= -->
<name> TEXT </name>
<info> XHTML </info>?
<path to="PATH"/>*

The name tag is mandatory and provide the name of the script as displayed to the human user. The info element may contain a whole documentation describing the script, its applications and its limitation. As it is intended displayed to the human user and as it may contains a structured documentation, it is described in XHTML. Finally, the path is used for the internal work of the script inside OTAWA. The to attributes contain paths that are used to retrieve plugins used in the WCET computation. If a relative path is used, it is based on the directory containing the script file.

1.2.2 Script Identification

The identification part has the following structure:

<!-- IDENTIFICATION ::= -->
<id>
	<arch>TEXT</arch>
	<abi>TEXT<abi>
	<mach>TEXT</mach>
</id>

The arch tag allows to identify the progamming model, also called the ISA (Instruction Set Architecture) of the supported hardware. Common values include arm, powerpc, sparc, x86, etc. The abi element gives the Application Binary Interface with common values being eabi, elf, linux, etc. Finaly, the mach element allows to precisely identify the processor model the script is targetting. Only the arch element is mandatory to be able to check if it supports the instruction set used in the processed executable file.

1.2.3 Platform Description

The platform provide details about the hardware feature of the targeted system.

<!-- PLATFORM :: -->
<platform>
	<!-- PIPELINE -->?
	<!-- CACHES -->?
	<!-- MEMORY -->?
</platform>

The items found in the PLATFORM may described directly in the script or in a separate file included by XInclude. In the latter case, the file must be prefixed by the usual XML identification line:

<?xml version="1.0"?>

The example belows uses XInclude to get the hardware description from three different external files:

<platform>
  <xi:include href="mpc5554/pipeline.xml"/>
  <xi:include href="mpc5554/cache.xml"/>
  <xi:include href="mpc5554/memory.xml"/>
</platform>

Notice that the relative paths passed in the href attribute are resolved from the XML base of the document, that is, the directory containing the script file.

Their content being very complex is described in their own parts.

1.2.4 Configuration Description

The configuration lists a set of items to let the human user parameterize the computation:

<!-- CONFIGURATION ::= -->
  <configuration>
    <!-- CONFIGURATION-ITEM -->*
  </configuration>

<!-- CONFIGURATION-ITEM ::= -->
  <item
    name="TEXT"
    type="bool|int|string|range|enum"
    default="TEXT"?
    label="TEXT"?>
    	<help> <!-- TEXT --> </help>
  </item>

A configuration item is made of:

an internal name used to identify the variable containing their value in XSLT,
a label, the name of the configuration item displayed to the user,
a default value,
a type that describes the type of value,
an help sub-element that contains human-readable information to help the user understanding the configuration item.

In addition, each type of items may have its own set for attributes and sub-elements.

1.2.4.1 `bool` type

The bool type allows to get boolean information from the user. Possible values are true or false. They are used to enable or disable specific features of the script. The example below allows to activate or not the use of prefetching from a flash memory:

<item name="flash_prefetch" type="bool" default="true" label="Flash Prefetch">
	<help>MPC5554 provides flash prefetching to improve performances.
	You may activate it or no.</help>
</item>

1.2.4.2 `int` type

The integer type allows to get an integer configuration value from the user. If no default value is given, it is assumed to be 0. The argument may be expressed as a decimal integer or hexadecimal one prefixed by 0x or 0X. An integer configuration item is used to pass any integer quantitive value as a number of functional units in a pipeline description, a specific address in the memory space or the size of any part of the architecture.

In the example below, the intt' configuration item is isued to configure the number of wait states used to write in the static RAM.

<item name="ramwws" type="int" default="0" label="SRAM write wait states">
<help>Defines the number of wait state for a SRAM write. One wait cycle delays one cycle.</help>
</item>

1.2.4.3 `string` type

This configuration item type is used to pass a string to the script. If no default value is given, an empty string is assumed. A common usage of this type of item is to pass a path in the file system to a specific resource used in the computation but any use of a string is supported.

1.2.4.4 `range` type

A range configuration item is a bit like the int type but with bounds on the possible given value. The bounds are inclusive and given by two additional attributes, low and high. The default value must be in the bounds and, if not given, the low bound is assumed as default.

The code below shows the range in action to define the pre-charge time of a dynamic ram in the range of [2, 3].

<item name="trp" type="range" default="2" label="SDRAM precharge time" low="2" high="3">
	<help>In cycles.</help>
</item>

1.2.4.5 `enum` type

The enum type allows to support the selection of value from a collection of different values.

The enumerated values are declared with the syntax below:

<!-- ITEM ::= -->
<item name="TEXT" type="enum" label="TEXT">
	<value label="TEXT" value="INT"  default="BOOL"?/>+
</item>

Each value is defined from a value XML element with a label, to display to the user and an integer value that represents the value handled by the script if the enumerated value is selected. In addition, a default attribute may be set to indicate if the current value is the default one. If no default attribute is set to true, the first enumeration value is considered as the default.

The range configuration type is useful to display a choice to the user in a textual way while keeping hidden the associated integer value. The example allows to select a multiplier implementation in order to compute for a processor delivered as an IP.

	<item name="multiplier" type="enum" label="multiplier" default="m32x32">
		<help>Multiplier implementation: defines the multiplier latency.</help>
		<value label="iterative" value="0" default="true"/>
		<value label="m16x16 + pipeline" value="1"/>
		<value label="m16x16" value="2"/>
		<value label="m32x8" value="3"/>
		<value label="m32x16" value="4"/>
		<value label="m32x32" value="5"/>
	</item>

1.2.5 Script Description

This element describes the script itself, that is, the analyzes to apply to get the WCET. In fact, the script works directly on the code processor structure of OTAWA that is composed:

code processors that implements simple analyzes or may transform the program representation,
features are required or provided by code processors and represent information retrieved by the analyzes,
properties that are annotations representing results of analyzes, hooked to the program representation and grouped in features.

Code processors, features and properties are documented in the automatic documentation of OTAWA.

In OTAWA, computing the WCET is invoking either the code processor computing the WCET, or requiring the feature provided by this code processor. In turn, this processor may require other features that will be achieved by other code processors and so on. The rule is that if an already-provided feature is required, it is used as is. If it is not provided, the default processor associated to the feature is invoked.

It comes out that the order of feature requirements or code processor invocations matters! To substitute an analysis A to the default analyses B of a feature F, the A analysis must be invoked first to let it providing the feature F. When a code processor will require the feature F, it will use the feature provided by A as it is already available.

Yet, one may observe that, thanks to the default processor associated with a feature, it is not mandatory to provided the whole chain f analyzes to perform the WCET computation. Instead, the script writer has only to focus only on the particular analyzes of its script.

The script part has the syntax below and is made of a sequence of steps, possibly with configuration items that apply to all steps:

<!-- SCRIPT ::= -->
    <script>
    	<!-- CONFIG -->*
    	<!-- STEP -->+
    </script>

A step may invoke a code processor (attribute processor) or require a feature (attribute feature). If a step contains configuration items, they are only applied to this step and to code processors automatically invoked from this step.

<!-- STEP ::= -->
    <step processor="C++PATH"? require="C++PATH"?>
        <!-- CONFIG -->*
    </step>

The C++PATH used to identify a processor or a feature (but also a property) is the full-qualified path of the object in the C++ implementation of OTAWA. For example, if a code processor implementing class is MyAnalysis that is contained in namespace my and otawa, the matching C++PATH is otawa::my::MyAnalysis.

The configuration items allows to provide important parameters or to tune the behavior of the analysis. They matches exactly the properties in OTAWA but only some properties provide a converter from XML text to C++ values. The syntax is below:

<!-- CONFIG ::= -->
    <config name="C++PATH" value="TEXT" add="yes|true|no|false"?/>

The matching property identifier is retrieved from its C++PATH and its value is converted according to the type of the identifier and pushed in the property list used to configure the code processor. If the add attribute is to yes or true, the value is added to the property list such that several values with the same identifier can be added to the configuration property list.

Below is a small example for the LPC2138 microprocessor:

<script>
    <step require="otawa::VIRTUALIZED_CFG_FEATURE"/>
    <step processor="otawa::lpc2138::CATMAMBuilder"/>
    <step processor="otawa::lpc2138::ARM7ParamExeGraphBBTime">
	    <config name="otawa::lpc2138::FLASH_MISS" value="56"/>
    </step>
    <step require="otawa::ipet::WCET_FEATURE"/>
</script>

This script requires first the feature otawa::VIRTUALIZED_CFG_FEATURE ensures that all functions calls have been inlined. In fact, this requirement will cause the invocation of several analyses like the flow fact loader, program text decoder, CFG building, etc.

The the otawa::lpc2138::CATMAMBuilder analyzes the prefetcher of the LPC2138 flash memory and computes the execution of the blocks with otawa::lpc2138::ARM7ParamExeGraphBBTime. In this step, a configuration parameter is passed to configure the time for a memory flash miss. As will be presented below, the value is rarely a constant: it may be derived from the configuration variables.

Finally, the WCET computation is required, otawa::ipet::WCET_FEATURE, that will build the ILP system, flowfact constraints but will re-use the block timings already provided by otawa::lpc2138::ARM7ParamExeGraphBBTime without invoking the default computation of block timings.

1.3 File Organization and XInclude

Using the XInclude XML extension, a script can be made of several different files. An XInclude element looks like:

	<xi:include href="PATH"/>

Where PATH is an URL pointing to the file to include. The included file must be a valid XML file (prefixed by the <?xml ⋯ > tag and the top level element will replaced the xi:include element so that the application processing the resulting XML file does not have to do any more processing.

A common use of this feature is to split the script description in several files, one for the script, the entry point whose name is suffixed by .osx and one for each aspect of the hardware (pipeline, caches, memory space). The example shows a summary of the entry point in such an organization:

<?xml version="1.0"?>
  <otawa-script
    xmlns:xi="http://www.w3.org/2001/XInclude"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
    <info> ... </info>?
    <name> ... </name>
 	<configuration> ... </configuration>
 	<id> ... </id>

	<platform>
		<xi:include href="mpc5554/pipeline.xml"/>
  		<xi:include href="mpc5554/cache.xml"/>
  		<xi:include href="mpc5554/memory.xml"/>
	</platform>		

	<script> ... </script> 
</otawa-script>

The PATH attribute of xi:include element can contains any type of path, absolute or relative, but the latter option allows to preserve the consistency of the script. Indeed, the script interpreter considers that any relative path is relative to the directory containing the script. Therefore, if the script is moved in the file system tree, they will be correctly retrieved if the relative position of the script file and the included file is not changed. Such a method allows also to deliver easily a package containing a script without to have to fix the paths of the included file.

In the example above, the script is contained in a file named mpc5554.osx and the included files are found out from a directory named mpc5554 installed in the same directory than the entry file mpc5554.osx. It is also advised to put the included file in a sub-directory and to not suffix these files with .osx in order to not confuse application using the .osx files (as the OTAWA Eclipse plugin).

1.4 Smart Scripts and XSLT

XSLT is an XML-based language to describe templates to perform automatic transformation on XML files. In the OTAWA scripts, we do not use its templating system but only the interpreter of its algorithmic components, that is, its capacity to have variables and perform computation with conditional structure allowing to insert or not XML elements. This section gives basic commands of the XSLT language but more details can be found in the XSLT documentation.

In XSLT, the variable are accessed by prefixing them with $: an element attribute whose value is "here is my $myvar !" will get as value the given string with the variable reference $myvar replaced by the actual value of myvar. Basically, the available variables, when the script is processed by the XSLT interpreter, is made of the variables declared for the configuration items, named according to the name attribute name and whose value is the one passed by the user or, else, the default value.

These variables may be used throughout the script file to provide more flexibility in the configuration of the script. Using the following syntax, one may change the value stored in a script element:

<element> <xsl:value-of select="XPATH expression"/> </element>

In the example below, the content of the element write_latency is replaced by the value computed bu the xsl-value-of element, that is, the sum of 2, value of variable trp and value of variable sdrcas.

<otawa-script>
	<platform>
		<memory>
			...
			<bank>
				<write_latency><xsl:value-of select="2+$trp+$sdrcas"/></write_latency>
			</bank>
			...
		</meory>
	</platform>

The XPATH expression (follow the link for more details) is very versatile and provide usually operators to perform computation:

$name to access a variable content,
( and ), parentheses,
usually unary and binary operators like +, -, *, /, etc
and a lot of common mathematic, textual and logic functions.

To set the value of an attribute of a script element, one can use the following syntax. The xsl:value will be first evaluated to produce the VALUE and, then, the xsl::attribute takes effect to add an attribute named NAME to my-element with the computed VALUE.

<my-element> <xsl:attribute name="NAME"> <xsl:value-of select="VALUE" /> </xsl:attribute> </my-element>

XSLT allows to compute values but also to add conditionally XML elements thanks to xsl:if and xsl:choose. The former syntax allows to keep the containted XML element if an XPath condition evaluates to true or to remove it:

<!-- xsl::if ::= -->
	<xsl:if test="xpath-test"> <!-- contained elements --> </xsl:if>

The xpath-test is any valid XPath expressions evaluating to non-zero with the following operators:

=, '!= – equality, inequality,
>, >= – greater than, greater or or equal,
<, ⇐ – less than, less or equal. In additions, the comparisons may be combined with:
and – logical and,
or – logical or,
not – logical not.

The example below shows the use of xsl:if. Depending of the definition of the configuration item $virtual, the code processor otawa::Virtualizer will be launched because its XML element will be maintained in the script or not, because its step element will be removed from the script.

<script>
	<xsl:if test="$virtual!=0">
		<step processor="otawa::Virtualizer"/>
	</xsl:if>
	<step processor="otawa::ipet::WCETComputation"/>
</script></code>

The latter form of conditional is ''xsl:choose''. It supports several test and an //otherwise// case.
<code xml>
<!-- xsl:choose ::= -->
	<xsl:choose>
  		<xsl:when test="xpath-test"> <!-- contained elements --> </xsl:when>+
  		<xsl:otherwise> <!-- contained elements --> </xsl:when>?
	</xsl:choose>

In this conditional structure, the test attributes are evaluated sequentially, in the order of the XML file, and the result, as an XML result, are the contained elements of the first test resulting to true. If all tests fail, the result is the elements contained in the xsl:otherwise node.

The example below shows that, using the name of $processor_model variable, we can select precisely the size of ROM memory of the processed architecture.

	<bank>
		<name>ON-CHIP NON-VOLATILE MEMORY</name>
		<address>0x00000000</address>
		<xsl:choose>
			<xsl:when test="$processor_model='lpc2131'"><size>0x8000</size></xsl:when>
			<xsl:when test="$processor_model='lpc2132'"><size>0x10000</size></xsl:when>
			<xsl:when test="$processor_model='lpc2134'"><size>0x20000</size></xsl:when>
			<xsl:when test="$processor_model='lpc2136'"><size>0x40000</size></xsl:when>
			<xsl:when test="$processor_model='lpc2138'"><size>0x80000</size></xsl:when>
		</xsl:choose>
		<type>ROM</type>
	</bank>

1.5 Pipeline Description

This section provides the XML format description used to represent pipeline of microprocessor. It is used to compute time for block of code either trivially (see otawa::ipet::TrivialBBTime analysis), or by simulation (see otawa::ipet::BBTimeSimulator analysis), or by execution graphs (see otawa::BasicGraphBBTime.h>).

Each of these analysis uses these information in a specific way but, more often, it is insufficient to describe the whole complexity of an actual microprocessor pipeline. The goal of such a format could be to describe the whole complexity of the pipeline but there are so heterogeneous and complex features that the right level should be VHDL, a pipeline description complete but indadequate to infer static analysis. In a more realistic goal, this format provides only a big picture of the pipeline and the exotic features of actual processors is left to a plugin implementation.

1.5.1 Top-Level

A pipeline description can be found lonely in an XML file (in this case, the first element must be <?xml ⋯> or inside another file like scripts. The top-level element must be:

<!-- pipeline ::= -->
	<processor class="otawa::hard::Processor">
		<arch>ARCH</arch>
		<model>MODEL</model>
		<builder>BUILDER</builder>
		<stages> <!-- stage -->+ </stages>
		<queues> <!-- queue -->+ </qeues>
	</processor>

The ARCH gives the programming model (instruction set, registers, etc) supported by the pipeline. Current values includes arm, ppc, sparc, tricore, etc. It is mainly used to check that the processor description supports the programming model of loaded binary program.

MODEL and BUILDER are only informative data but may help to identify a pipeline description. MODEL represents the accurate model of the hardware: for example, if ARCH is arm, usual models includes armv5t, cortexa8. The BUILDER item gives the name of the microprocessor builder like atmel, nxp, etc for an arm architecture.

The stages elements gives the list of stages composing the pipeline that are detailed in the following sections. Notice this list is ordered according to the order of stages in the actual pipeline. The first stage must be of type fetch while the last stage must be of type commit.

The queues contains the list of queues in the pipeline. The queues represents any pipeline feature storing a set of instructions, that is, FIFO buffer, reorder buffer, etc. As a default, if there are no queue is declared between two stages, it is considered implicitly that there is a latch whose dimension is the width of incoming stage. Only queues that does not match the previous description must be put here.

1.5.2 Stage Description

A stage has the following syntax:

<!-- stage ::= -->
	<stage id="ID">
		<name>STRING</name>?
		<width>INT</width>?
		<latency>INT</latency>?
		<type>fetch|lazy|commit|exec</type>?
	</stage>

An id attribute is not required but it allows to link a stage with a queue.

The name gives the name of the stage only used to display to a human user. As a default, the stage has an empty string for name.

The width represents the number of instructions that are processed in parallel by the stage. If not given, the default is 1.

The latency gives the basic latency (in cycle) of the stage in order to process as many instructions as its width. As a default, the latency is of 1 cycle.

The type is the more interesting part of the stage as it provides insight on the work of the stage. Notice that the work of a stage, from the point of view OTAWA, is mainly its effect on the execution time. The type must be one the enumerated value below:

fetch – ever the first stage of the pipeline, it is reponsible for fetching instrucions from the memory. Consequently, its throughput depends not only on its own properties but also on the time spent to access the memory.
lazy – the simpler stage type whose only goal is to spent time in the instruction execution.
commit – the last stage where the instructions go out of the pipeline.
exec – represents the stage where an instruction is executed. It is complexe because (a) it is often split in different function units (see below) and (b) this stage handles the data dependencies between the instruction.

A stage of type exec has two more elements:

<!-- exec-stage ::= -->
	...
	<dispatch> <!-- instruction match -->+ </dispatch>
	<fus> <!-- fu -->+ </fus>

The fus element gives the list of functional unit (described in the next section). The dispatch allows to dispatch instruction for functional unit. It is made of a list of inst elements:

<!-- instruction match ::= -->
	<inst>
		<type>masks</type>
		<fu ref="ID"/>
	</inst>

The type allows to select an instruction from its kind mask. The kind mask of the otawa::Inst class is a set of flags describing the nature of the instruction. Each bit is identified by mask named IS_xxx and the type element can supports several of these masks separated by a pipe |. To be selected, an instruction must match all the flags in the type. If no flags is given, any instruction will be selected. The fu gives the functional unit that will receive the instruction matching the type. The ID is one of the functional unit id describing in the next section.

The supported flags are:

IS_COND if the instruction is conditional,
IS_CONTROL if the instruction changes the PC,
IS_CALL if the instruction performs a sub-program call,
IS_RETURN if the instruction returns from a sub-program,
IS_MEM if the instruction performs memory accesss,
IS_LOAD if the instruction loads of data from memory,
IS_STORE if the instruction stores data to memory,
IS_INT if the instruction works on integer values,
IS_FLOAT if the instruction works on float values,
IS_ALU if the instruction performs arithmetic or logic operation (not address calculation),
IS_MUL if the instruction performs a multiplication,
IS_DIV if the instruction performs a division,
IS_SHIFT if the instruction performs a shift,
IS_TRAP if the instruction performs a trap (system call, exception raise, debugging, etc)
IS_INTERN if the instruction has an effect on the internal work of the microprocessor,
IS_MULTI if the instruction performs multiple accesses to memory,
IS_SPECIAL, other types of instructions not covered by the existing flags,
IS_INDIRECT if the instruction performs an indirect branch,
IS_ATOMIC if the instruction performs atomic access to memory (in case of parallel access to memory).

Hexadecimal numbers are also accepted as flag masks to cope with the specificities of some instruction sets.

Below is the example of the Leon 3 dispatch element of the execution stage:

	<dispatch>
		<inst> <type>IS_MEM</type> <fu ref="INT"/> </inst>
		<inst> <type>IS_CONTROL</type> <fu ref="INT"/> </inst>
		<inst> <type>IS_INT</type> <fu ref="INT"/> </inst>
		<inst> <type>IS_FLOAT|IS_DIV</type> <fu ref="FDIV"/> </inst>
		<inst> <type></type> <fu ref="FPU"/> </inst>
	</dispatch>

Any memory, control or integer instruction goes in the INT functional unit. The first three instructions matches works like an disjunctive condition as they branching to the same functional unit. Otherwise, if the instruction is working on float and performs a division, it moves to the FDIV functional unit. Finally, it can be stated from the instruction set of the Leon that remaining instruction are working on floating-point numbers and must go to the FPU functional unit.

1.5.3 Functional Unit Description

A functional unit is like a stage but dedicated to the execution / work of an instruction. The syntax of functional units is given below:

<!-- fu ::= -->
	<name>STRING</name>?
	<width>INT</width>?
	<latency>INT</latency>?
	<pipelined>BOOL</pipelined>

The name allows to associatve the functional unit with a name and is only used foer human user convenience, or with the dispatch element described in the previous section.

The width defines the number of functional unit existing, that is, the number of instruction that may be executed in the current functional unit. Its value is 1 instruction as a default.

The latency describes how many cycles is required for an instruction to traverse the functional unit. Its value is 1 cycle as a default.

Finally,pipelined express a fact that a multiple-cycle functional unit is not blocked until the end of the instruction: at each cycle, it can accept another instruction. Considered false as default.

Below is an example of a multiple ALU functional unit in a superscalar microprocessor:

<fu>
	<name>ALU</name>
	<width>4</width>
	<latency>1</latency>
</fu>

The coming example is a multiplication of 4-cycles supporting pipelining of operations:

<fu>
	<name>MUL</name>
	<latency>4</latency>
	<pipelined>true</pipelined>
</fu>

The final example represents a floating-point division functional unit of 10-cyles but that does not support pipelinging 1:

<fu>
	<name>FDIV</name>
	<latency>10</latency>
	<pipelined>false</false>
</fu>

1.5.4 Queue Description

A queue is a small data structure containing and buffering instruction between stages. The syntax of queues is:

<!-- queue ::= -->
<queue>
	<name>STRING</name>?
	<size>INT</size>
	<input ref="ID"/>
	<output ref="ID"/>
	<intern>  <stage ref="ID"/>+ </intern>?
</queue>

The name element allows to associate a human-readable identifier with a queue.

The size gives the maximum number of instructions the queue can contain.

The ref attribute of input element represents the stage that deposits instructions in the queue.

The ref attribute of output element represents the stage that extract instructions from the queue.

The intern element is only used in the case of queues implementing a re-order buffer. In this case, an instruction can only be extracted if it has been processed by one or several calculation stages (usually execution stages). The list of stage elements gives the list of stages that will validate an instruction (that is, execute it).

Below is the example of simple FIFO queue between a fetch stage and a decode stage:

<queue>
	<name>FETCH_QUEUE</name>
	<size>8</size>
	<input ref="FI"/>
	<output ref="DI"/>
</queue>

In this second example, a reorder buffer stores instructions until they have been executed by the EX stage:

<queue>
	<name>ROB_QUEUE</name>
	<size>16</size>
	<input ref="DI"/>
	<intern>
		<stage ref="EX"/>
	</intern>
	<ouput ref="CM"/>
</queue>

1.6 Cache Description

A cache is a small fast memory that allows to store and access fastly little block of bytes of the main memory. Caches are a main feature feature to speed up processor execution by allowing to avoid long-time access to the main memory.

From the point of view of caches, the main memory is divided in blocks of same size and the cache is divided in sets. Based on its address, each memory block is assigned to a cache set that, in turn, may contrain one (direct-mapped cache) or several blocks (associative cache).

To be compliant with the binary encoding of address, the block size, the set number and the cache size are power of 2.

There are a lot of different way to configure the caches:

they may be split between instruction and data or unified,
they have different configurations for block size, set size, cache size,
there are different levels of cache from L1 (close to the core) to L2, L3 (close to the main memory),
there are different wayq to manage blocks stored in a set (replacement policy, write-through or write-back save model, etc).

1.6.1 Cache Configuration Level

OTAWA try to provide a consistent and versatile representation of caches as below:

<!-- CACHES ::= -->
<cache-config>
	(<icache ref="ID"/> | <icache> CACHE </icache>)
	(<dcache ref="ID"/> | <dcache> CACHE </dcache>)
	(<cache id="ID"/> | <cache> CACHE </cache>)*
<cache-config>

A cache configuration may:

have without cache – cache-config is empty,
contain only instruction cache – cache-config contains only an icache element,
be split (Harvard architecture) – cache-config contains an element icache and an element dcache,
or unified – cache-config contains only an element named cache.

This describes only the first level of cache, L1. If there is an L2 cache, its description is provided inside the L1 cache element called next. If the L2 is unified while the L1 is split, the first next element contains the L2 unified cache description and with an id attribute while the second next element is ampty but provides a ref attribute that design the identifier of the first next element. Of course, the scheme may repeated as far as needed.

1.6.2 Cache elements

The content of a cache element, CACHE, is defined below:

<!-- CACHE ::= -->
	<block_bits>INT</block_bits>
	<row_bits>INT</row_bits>
	<set_bits>INT<set_bits>
	(<next ref="ID"/> | <next> CACHE </next>)?
  	<replace>NONE|OTHER||LRU|RANDOM|FIFO|PLRU</replace>?	<!-- default: LRU -->
 	<write>WRITE_THROUGH|WRITE_BACK</write>?				<!-- default: WRITE_THROUGH --> 
	<allocate>BOOL</allocate>?								<!-- default: true -->

block_bits defines the size of the cache blocks: if its value is N, the block size is 2^B and means that the B less significant bits of the addresses design the accessed byte in the block.

row_bits2 value, S, determines the number of sets in the cache, that is, 2^S. In the address, the bits selecting the set ranges from B to B + S - 1.

Finally, the set_bits A determines the number of blocks in each set, that is, 2^A. As any block may go any way of a set, there is no matching in the address. In the end, the cache size in bytes is 2^(B + S + A).

The next element allows to link the current cache at level Li to a cache at level Li+1. The cache at level Li+1 may either be described in the next element or just contain a reference to an element describing this cache.

The following elements are used to describe the policy of use of the blocks stored in a set. As a set contains several blocks in an unordered way, a policy must be applied to know which block to wipe out when a new cache block needs to be loaded. Notice that this element can be ignored if the number of way, A, is equal to 0 as the set contains only one block.

replace describes the replacement policy that may be:

NONE – null value (usually unused in this format),
OTHER – unknown policy,
LRU (Least Recently Used) – the replaced block is the least recently used,
RANDOM – the replaced block is selected randomly,
FIFO (First-In First-Out) – also called Round-Robin, blocks are organized as a queue and the last block is replaced,
PLRU (Pseudo-LRU) – this policy mimics LRU but with a lower hardware cost.

It's likely that this list be extended in the future.

write and allocate elements are only used with data or unified caches. write describes the write policy of the cache (when a block is modified):

WRITE_THROUGH – means that a write to block is immediately propagated to the main memory to avoid to write-back the block when it is wiped out; if the allocate element is set to true, a write-through is performed but, if the block is not already in the cache, it is allocated and loaded.
WRITE_BACK – means that a write to a block is just performed in the cache and the memory modification will be propagated to the main memory only when the block is wiped out; if the block is not in the cache, it is loaded; allocate element is not used.

1.6.3 Examples

Below is a simple configuration of an architecture with only one instruction cache (block of 16 bytes, 4-way associative, 128 sets, 8 Kb size):

<cache-config>
	<icache>
		<block_bits>4</block_bits>
		<row_bits>7</row_bits>
		<way_bits>2</way_bits>
		<replace>LRU</replace>
	</icache>
</cache-config>

Below is the cache structure of the ARM9, that is, split with random replacement policy (64 ways, 32 b per block, 15Kb size):

<cache-config>
	<icache>
		<block_bits>6</block_bits>
		<row_bits>3</row_bits>
		<way_bits>5</way_bits>
		<replace>RANDOM</replace>
	</icache>
	<dcache>
		<block_bits>6</block_bits>
		<row_bits>3</row_bits>
		<way_bits>5</way_bits>
		<replace>RANDOM</replace>
	</dcache>
</cache-config>

This example represents a unified cache of 16 Kb:

<cache-config>
	<cache>
		<block_bits>5</block_bits>
		<row_bits>3</row_bits>
		<way_bits>6</way_bits>
		<replace>PLRU</replace>
	</cache>
</cache-config>

Finally, the example shows an example of split L1 cache (16 Kb) and unified L2 (256 Kb):

<cache-config>
	<icache>
		<block_bits>5</block_bits>
		<row_bits>9</row_bits>
		<way_bits>2</way_bits>
		<replace>RANDOM</replace>
		<next id="M2">
			<block_bits>6</block_bits>
			<row_bits>8</row_bits>
			<way_bits>4</way_bits>
			<replace>RANDOM</replace>
		</next>
	</icache>
	<dcache>
		<block_bits>5</block_bits>
		<row_bits>9</row_bits>
		<way_bits>2</way_bits>
		<replace>RANDOM</replace>
		<next ref="L2"/>
	</dcache>

1.7 Memory Space Description

The memory space description defines properties of the different areas of the address space. So, a bank element is mainly a memory area described by its base address and its size.

OTAWA allows to represent addresses over different address spaces. An address is made of two components:

page –identification of the address space on 32-bits,
offset – identification of a byte in an address space on 32-bits.

One must observe that the address space (-1) or 0xffffffff is used to represent the null address, that is, the abstract value representing no address.

<!-- ADDRESS ::= -->
	INT
	| (<page>INT</page>?</page> <offset>INT</offset>)

If the page element is not given, the default address space (e.g. page) is 0.

1.7.1 Memory Element

<!-- MEMORY ::= -->
	<memory>
		<banks>
			BANK*
		</banks>
	</memory>

A memory is just a collection of memory areas that are called banks.

1.7.2 Bank Element

<!-- BANK ::= -->
	<bank>
		<name>TEXT</name>
	  	<address>ADDRESS</address>
	  	<size>INT</size>
	  	<type>DRAM|SPM|ROM|IO</type>
	  	<latency>INT</latency>?
	  	<write_latency>INT</write_latency>?
		<cached>BOOL</cached>		<!-- default: false -->
	  	<writable>BOOL</writable>	<!-- default: true -->
	</bank>

name assign a name to a memory area and exists only for human user interaction.

address defines the base address of the memory area.

size defines the memory area size in bytes. As the representation of this value is on 32-bits and as the 0 value does not mean anything for size, 0 size represents a full coverage of the address space, that is, a memory of size 2^32.

The type gives hints on the nature of the memory. Possible values may be:

DRAM – usual dynamic RAM,
SPM – on-chip static RAM (usually accessed in 1 cycle),
ROM – read-only memory (EEPROM or anything else) but also NAND flash memory with random access,
IO – represents input/output register of peripherals (access time is usually high).

The latency defines the number of cycle to access the memory space to read a data item. If there is no write_latency, this defines also the time in cycle to write a data item.

The write_latency element provides the write time in cycles.

cached allows to know if the memory area is accessed through the cache or directly. IO memory areas are often not cached.

With the writable element, one can know if a memory area is writable or not: ROM are often considered as not writable using the classic load / store instruction of a microprocessor. For example, flash memories are readable word by word but are written block by block through dedicated IO registers.

1 Notice that the pipelined is not required

2 This name is relatively old and misleading.