Elm  2
ELM is a library providing generic data structures, OS-independent interface, plugins and XML.
Serialization / unserialization

ELM provides support for pseudo-automatic serialization / unserialization of objects. More...

Classes

class  ExternalSolver
 
class  Serializer
 
class  Unserializer
 
class  TextSerializer
 
class  XOMElementSerializer
 
class  XOMUnserializer
 

Detailed Description

ELM provides support for pseudo-automatic serialization / unserialization of objects.

The following C++ features are supported in serialization:

In addition, the user can provides its own custom serialization code in a serializer / unserializer independent format.

A simple example

The serialization or the unserialization is a very simple to perform using the same interface as usual C++ input/output.

#include <elm/serial2/serial.h>
#include <elm/serial2/TextSerializer.h>
#include <elm/serial2/collections.h>
using namespace elm;
int main(void) {
serializer << "Hello, World !\n" << 666 << true;
MyClass my_object;
serializer << my_object;
Vector<MyClass> objects;
for(int i = 0; i < 10; i++)
objects.add(my_object);
serializer << objects;
}

In the example above, we show how serializable objects (scalar values, string, MyClass and Vector) are easily serialized by a serializer providing a text output. As a default, the output is performed to the standard output. Notice that the unserialization of data from the bytes produced by this example must be performed in the same order.

#include <elm/serial2/serial.h>
#include <elm/serial2/XOMUnserializer.h>
#include <elm/serial2/collections.h>
using namespace elm;
int main(void) {
serial2::XOMUnserializer unser("unser.xml");
string str;
int x;
bool boolean;
serializer >> str >> x >> boolean;
MyClass my_object;
serializer >> my_object;
Vector<MyClass> objects;
serializer >> objects;
}
}

The unserialization above looks much like the previous serialization example. It works in the same way as the input stream of C++ standard library. The read values must be put in variable. Notice that in the case of the Vector, the unserializer automatically allocate enough space to store the serialized collection of objects. This examples illustrates also the use of an XML unserializer taking its input from the file "unser.xml". This file is usual XML text that may be modified by hand and that contains very few serialization-systems items.

ELM provides serializer / unserializer for usual types (like scalars, strings or data collections). Yet, to serialize custom classes, the user has to add some information about the fields to work with. In our example, we use the following declaration for MyClass in MyClass.h:

#include <elm/serial2/macros.h>
#include <elm/serial2/collections.h>
class MyClass {
SERIALIZABLE(MyClass, FIELD(name) & FIELD(value) & FIELD(attrs));
string name;
int value;
public:
...
};

The macro SERIALIZABLE is used to add to the class some RTTI information that is used by the serialization system. After the name of the class, this macro takes the list of fields to serialize separated by the '&' operator and embedded in the FIELD macro. The field macro is not mandatory, it allows only to provide better reading on textual output by providing the name of the field. This macro is used both by serialization and unserialization.

Making a class serializable

In ELM, making a class serializable is very easy. One has to add some RTTI information in the class declaration and add a macro in the class definition file.

The declaration ".h" file must includes serialization headers:

#include <elm/serial2/macros.h>
#include <elm/serial2/collections.h>

Then, we have to add the macro SERIALIZE in the class declaration including the name of the class and the list of the field to serialize separated by '&'.

class ClassName {
SERIALIZABLE(ClassName, field1 & field2 & field3 & ...);
...
};

In the definition ".cpp" file, we have just to put the following macro that provides the implementation of the RTTI information of the class:

SERIALIZE(ClassName)

The passed class name must be the same between the SERIALIZABLE and the SERIALIZE macro and fully qualified to avoid ambiguities.

Declaring a field serializable is as easy as passing its name in the SERIALIZABLE list. Actually, a reference to the field is taken and used to read or write the serialized values. In human readable formats (like XML or text), it may be useful to provide also the identifier of the field to the serialization system. This is easily done using the FIELD macro in place of the field name:

SERIALIZE(MyClass, FIELD(field1) & field2 & FIELD(field3) & ...);

Some serialization formats supports optional field definition. In this case, a default value may be provided with the DFIELD macro:

SERIALIZE(MyClass, DFIELD(field1, default_value) & field2 & FIELD(field3) & ...);

Finally, if the serialized class inherit from a serializable class, the base class must be added to the list of field with the BASE macro:

class MyBaseClass {
SERIALIZABLE(MyBaseClass, ...);
...
};
class MyClass: public MyBaseClass {
SERIALIZABLE(MyClass, BASE(MyBaseClass) & field1 & ...);
...
};
Enumeration Serialization

Depending on the serialization format (textual, XML, etc), it may be useful to provide more readable information for the human user. This applies typically to enumeration values. ELM provides already such a facility.

In the code below, an enumeration type is declared (in header file) and serialization information is provided by "DECLARE_ENUM" macro:

#include <elm/rtti.h>
typedef enum color_t {
RED,
} color_t;
DECLARE_ENUM(color_t);

Then, in the matching source file, you have to create an object describing the enumeration type and the available enumerated values:

elm::rtti:Enum color_type(elm::rtti::make("color_")
.value("RED", RED)
.value("GREEN", GREEN)
.value("BLUE", BLUE));
DEFINE_ENUM(color_t, color_type);

The last macro DEFINE_ENUM link the enumeration type descriptor with the enumeration type itself and provide typing information to the serialization system. Additionally, a textual output of the enumerated type is also provided.

Collection serialization
Customizing the serialization

Basically, what does serialization is to pass forth and back field references to classes Serializer for serialization and Unserializer for unserialization. Depending on the operation, one of the following method is called:

template <> void __serialize(Serializer& s, const T& v);
template <> void __unserialize(Unserializer& s, T& v);

With T being the type of the field to serialize. ELM provides serializer/unserializer for most basic types and most of its collection types. Providing __serialize() and __unserialize() with T being your own type is a first way to specialize the serialization process. In this configuration, you have to use methods provided by classes Serializer and Unserializer to perform the actual work.

Then, one has to remark that the type T does not need to be the one of a field of serialized object. This may be any type that will refer to any object. This is the case of FIELD macro that build an object of type Field. The trick here is that the macro SERIALIZABLE creates a specific method, named "__visit" in where the field are copied. This means that, when the field are built, a specific instance of the class is available and usable, for example, using the "this" self pointer. This means that at the construction of the field, the whole object (and its methods) are available and may be used for customizing the serialization process.

Hence, to customize the way an object is serialized, one has to:

This is illustrated in the example below where methods getID and setID are used instead of the direct access to the field itself:

class MyClass;
class GetSetID {
public:
inline GetSetField(MyClass *p): _p(p) { }
MyClass *_p;
};
class MyClass {
public:
SERIALIZABLE("my_class", GetSetID(this));
};
void __serialize(Serializer& s, const GetSet& i) {
s.beginField("id");
s.onValue(i._p->getID());
s.endField();
}
void __unserialize(Unserializer& s, const GetSet& i) {
s.beginField("id");
string s;
s.onValue(s);
i._p->setID(s);
s.endField();
}
Writing a serializer

Writing a serializer is relatively easy in ELM: one has just to implement either Serializer interface interfaces:

#include <elm/serial2/Serializer.h>
class MySerializer: public elm::serial2::Serializer {
public:
...
};

Depending on the data to serialize, one or several functions of this interface will be called.

Simple type (boolean, integer, float, string) are generated by a call to one of the Serializer::onValue() method with the corresponding type for the unique parameter. Notice that the parameter to Serializer::onValue() calls is a reference corresponding exactly to the read value in the object or in the compound containing the value and may be used for the pointer linkage support (described further).

Serializing an enumeration value is a bit more complex: the Serializer::onEnum() function will receive the address of the value (for pointer linkage described below), the enumerated value converted to int and a descriptor of the enumerated type (of type elm::rtti::Enum).

Collection or array types starts by a call to Serializer::beginCompound() and terminates by a call to Serializer::endCompound(). Before each item, a call to Serializer:::onItem() is performed. The data item, itself, is supported by serializer function call corresponding to their type (Serializer::onValue(), compound access, etc).

Object serialization is surrounded by a Serializer::beginObject() and Serializer::endObject(). The first parameter of the Serializer::beginObject() contains a reference to the actual class of the object. Then each serialized field surrounded by Serializer::beginField() and Serializer::endField() and the field value passed using other serialization methods.

Pointers are particular values for serialization. First, circularities involved by the use of pointers must be supported. This means that a pointer to object must be stored in a map and when an already encountered object is referenced or serialized again, a reference to it must be serialized to ensure that the same structure is rebuilt in memory. Notice that each object passed to Serializer::onValue() takes a reference possibly dereferenced by "&" and that Serializer::beginCompound() and Serializer::beginObject() get a pointer on their object. The current implementation of serialization is not able to decide if an object is referenced or not by a pointer: the serializer implementation has to provide an identifier for each serialized object.

The work of Serializer is summarized below:

Writing an unserializer

Writing an unserializer is relatively easy in ELM: one has just to implement the Unserializer interface:

#include <elm/serial2/Unserializer.h>
class MySerializer: public elm::serial2::Unserializer {
public:
...
};

Depending on the data to serialize, one or several functions of this interface will be called.

Simple type (boolean, integer, float, string) are generated by a call to one of the Unserializer::onValue() method with the corresponding type for the unique parameter. The parameter is passed by reference to let the unserializer change the value.

Unserializing an enumeration value is a bit more complex: the Serializer::onEnum() function a descriptor of the enumerated type (of type elm::rtti::Enum) and must return the enumerated value as an int.

Collection or array types starts by a call to Unserializer::beginCompound() and terminates by a call to Unserializer::endCompound(). If required, the unserialized data type can call the function Unserializer::countItems() to get the count of items to unserialize (this is used by some fixed-size data types like AllocArray). Then, the current item is unserialized using a call to the method corresponding to its type. This is followed by a call to Unserializer::nextItem() to pass to the next item.

Object unserialization is surrounded by Unserializer::beginObject() and Unserializer::endObject() calls. The first parameter of the Unserializer::beginObject() contains a reference to the actual class of the object. Then each unserialized is field surrounded by Unserializer::beginField() and Unserializer::endField() and the field value passed using other serialization methods.

Pointer are a particular value for unserialization. Notice the unserializer must maintain an identifier / reference system in order to ensure that the same structure is rebuilt in memory after serialization.. Notice also that each object passed to Unserializer::onValue() takes a reference possibly dereferenced by "&" and that Unserializer::beginCompound() and Unserializer::beginObject() get a pointer on their object.

The work of Unserializer is summarized below:

Low-level of the serialization module

Basically, serialization or unserialization applies mainly the same process. Therefore, in the following, only the serialization process is described but it may be applied symmetrically as is to unserialisation.

A serialization starts with the following command:

S << D;

Where S is the Serializer object and D the data to serialize. At this point, the operator<< is overload to support a first argument of type Serializer and a parametric type T, the type of D. It simply call the function __unserialize(S, D).

This function will choose which serialization process to perform. According to the type T, it will select from_class<T>, from_enum<T> or from_type<T> or pointer specialization. In case of pointer, __serialize call directly the function Serializer::onPointer(). A first way of specialization the serialization process is to provide your version of __serialize for your own type. This is the case of collections data type that specialize their specialization as compound by overriding __serialize.

The from_type<T> specialization calls the Serializer::onValue() method (and thanks to C++ type overload will call the right one).

The from_enum<T> specialization calls the Serializer:: onEnum() method but has also to select the enumeration type descriptor. This is obtained using elm::rtti::type_of<T>().

The from_class<T> has to perform several actions (Serializer::beginClass(), Serializer::beginField(), Serializer::endField() and Serializer::endClass()) but, because of the inheritance in C++,it has mainly to find the actual type of the object. This is enabled by the use of the SERIALIZABLE macro that add a virtual function named __getSerialClass().

elm::serial2::__unserialize
void __unserialize(Unserializer &s, const ArrayField< T > &field)
Definition: ArrayField.h:62
elm::io::p
Printable< T, M > p(const T &data, const M &man)
Definition: Output.h:302
DEFINE_ENUM
#define DEFINE_ENUM(type, desc)
elm::rtti
Definition: Class.h:33
elm::io::GREEN
ANSICode GREEN
< ANSI code for red text
Definition: ansi.h:49
elm::serial2::__serialize
void __serialize(Serializer &s, const ArrayField< T > &field)
Definition: ArrayField.h:50
value
elm
Definition: adapter.h:26
elm::serial2::Serializer
Definition: Serializer.h:36
elm::serial2::TextSerializer
Definition: TextSerializer.h:19
elm::Vector::add
void add(const T &v)
Definition: Vector.h:101
elm::Vector
Definition: Vector.h:34
elm::serial2::Unserializer
Definition: Unserializer.h:15
elm::io::BLUE
ANSICode BLUE
< ANSI code for yellow text
Definition: ansi.h:51
DECLARE_ENUM
#define DECLARE_ENUM(name)
elm::str
string str(const char *s)
Definition: String.h:150
elm::serial2::XOMUnserializer
Definition: XOMUnserializer.h:42
Vector
elm::rtti::make
Definition: Class.h:251
elm::io::RED
ANSICode RED
< ANSI code for black text
Definition: ansi.h:48