Transformation graph describes how to transform data from one form to another. Graph consists at least three elements, Nodes (perform various simple transformations), Edges (connect Nodes and pass data around) and Metadata (describe data structure that is defined at every Nodes and Edges).
The input to the transformation process are Input Nodes (those which have no Edges comming into). On the other side, there are Output Nodes storing results of transformation for example into data files, or database.
Example of transformation graph
<?xml version="1.0" encoding="UTF-8"?>
<!--
How this reads :
1. Create a new public Transformation Graph licensed GPL.
2. Add a FEED reader module to this Graph, named Read digg feed.
3. Set the url of the RSS or ATOM document to read (http://www.digg.com/rss/index.xml)
-->
<Graph
access = "public"
author = "MixDem dev"
license = "GPL"
>
<Element>
<Node id="Read digg feed" type="FEED_READER" url="http://www.digg.com/rss/index.xml" />
</Element>
</Graph>
More informations about XML Graph definition can be found in Tag Reference (section <Graph/>)
Nodes (or Components) are most important elements of graph. They can read, write or transform data from one form to another, every component has some input or output ports or both, component read or write data through them from to another component. If a component is initiative or terminated then has just input or output ports. Two components is connected by edge that is transmitting data in one direction from output port to input port and every edge has internal buffer that stores data.
<Node id="Read feedburner feed" type="FEED_READER" url = "http://feeds.feedburner.com/BurnThisRSS2" path = "channel" />
<Node id="Read CSV file" type="CSV_READER" url = "http://www.abc.virginia.gov/Pricelist/text/disjan08.csv" quoted = "true" delimiter = "," charset = "ISO-8859-15" newline = "\n" skiprows = "5" maxrows = "15" />
<Node id="Read from mysql table" type="SQL_READER" dsn = "mysql://root:root@127.0.0.1/information_schema" sqlQuery = "SELECT * FROM TABLES" />More informations about XML Node definition can be found in Tag Reference (section <Node/>)
Edges represent data flows between components in one direction. Edges between components are always bound to components ports. Components read data records from input ports and write them to their output ports.
Each Edge has associated metadata which describe structure of data coming from that port (if Input) or data which can be sent through port (Output)
<Edge from="Node 1" to="Node 2" metadata="mdata" />More informations about XML Edge definition can be found in Tag Reference (section <Edge/>)
Metadata describes semantics of data record, this is how to create data record from elementary data types (string, date, integer, numeric, long, decimal, byte,..)
<Metadata> <DataRecord name="rss_metadata"> <DataField name="link" type="string" /> <DataField name="title" type="string" trim="true" /> <DataField name="title" rename="description" type="string" trim="true" size="200"/> </DataRecord> </Metadata>More informations about XML Metadata definition can be found in Tag Reference (section <Metadata/>)