MixDEM Programmer's Reference Guide

MixDEM as embeded PHP Framework

Transformation graph

Transformation is defined in form of graph, which contains :
  • Metadata descriptions
  • Sequences :
    • Nodes/Transformation Components
    • Edges
Transformation Graph can be assembled by :
  • Reading/deserializing from XML
  • Instantiating PHP classes :
    • Transformation Graph
    • DataRecordMetadata, DataFieldMetadata
    • Nodes ( DataReader, DataWriter, Filter, Sort... )

The Transformation Graph 'ETL_Graph' class

Transformation graph is both abstraction and class which performs some operation. Graph keeps track of all Nodes, Edges, metadata objects. It is also accompanied by class which enables reading the definition of graph from XML file and building everything dynamically.

ETL_Graph class includes important methods:

  • init : initializes all graph's components - Nodes & Edges
  • newNode : create an return a new node to graph
  • link : create an edge (or link) beteween two nodes
  • dump : debug an all graph's components
  • run : starts data processing
  • abort : interrupts processing in case of some emergency situation

Assembling graph from pieces of components

Sample : We need to recuperate an rss feed from news.google.com and save it on two CSV files after performing some transformation on them. So we need three nodes or components : one "FEED READER" and tow "CVS WRITER", and two edges to propagate the "FEED READER" result to "CVS WRITERs".

There are tow way to assembling the Transformation Graph : on PHP or XML .

1 - Graph definition on PHP

<?php
    # we include config ETL
    include "../_LIB/ETL.config.php";

    # set debug mode
    $GLOBAL_DEBUG_MODE_DISPLAY_INFO  = 0 ; 
    $GLOBAL_DEBUG_MODE_DISPLAY_DEBUG = 0 ; 

    # & we include ETL & APP main class
    include $GLOBAL_PATH_KERNEL . "CLASS/APP.php";
    include $GLOBAL_PATH_KERNEL . "CLASS/ETL.php"; 

    # instanciate Singleton ETL
    $etl   = $__S->get( 'ETL' ) ;  
    $graph = $etl->newGraph ( "Graph test" ); 
    $node  = $graph->newNode ( "Read news RSS" );

    $node->setComponent( 
        ARRAY (  
            "type"    => "FEED_READER", 
            "attribs" => Array( "url" => "http://feeds.feedburner.com/BurnThisRSS2" ) 
        )
    );
 
    # We set given node name as break node, with da offset & rows
    $graph->setBreaknode ( "Read news RSS", 1, 2 );

    # & run da graph
    $graph->run( ); 
    
    # & runder node result
    $node->component->renderDEFAULT( );
?>

 

2 - Loading graph definition from XML

This example shows how to save some work and load graph definition from XML file:

<?php 
     # we include config ETL
    include "../_LIB/ETL.config.php";

    # set debug mode
    $GLOBAL_DEBUG_MODE_DISPLAY_INFO  = 0 ; 
    $GLOBAL_DEBUG_MODE_DISPLAY_DEBUG = 0 ; 

    # & we include ETL & APP main class
    include $GLOBAL_PATH_KERNEL . "CLASS/APP.php";
    include $GLOBAL_PATH_KERNEL . "CLASS/ETL.php"; 

    # instanciate Singleton ETL
    $etl   = $__S->get( 'ETL' ) ;  
    $graph  = $etl->newGraph ( "TEST loading XML config" );

    $graph->loadXMLFile ( "./graph.xml" );

    if( ! $graph->init( ) ) 
        die( "Graph initialization failed " );

    $graph->run( ); 
?>


This is the content of XML file "graph.xml" describing graph's topology:

 
<?xml version="1.0" encoding="UTF-8"?>
<Graph>
    <Element>
        <Node id="READ GOOGLE FEED" type="FEED_READER" url="http://news.google.com/nwshp?hl=en&amp;tab=wn&amp;output=rss" />
        <Node id="WRITE CVS FILE 1" type="CSV_WRITER"  url="news1.csv" newline="\n" quoted="true" delimiter=";" charset="UTF-8" />
        <Node id="WRITE CVS FILE 2" type="CSV_WRITER"  url="news2.csv" newline="\n" delimiter="#" />

        <Edge from="READ GOOGLE FEED" to="WRITE CVS FILE 1" metadata="rss.news.1" />
        <Edge from="READ GOOGLE FEED" to="WRITE CVS FILE 2" metadata="rss.news.2" />
    </Element>                  

    <Metadata>
        <DataRecord name="rss.news.1">
            <DataField name="title"                            type="string" />
            <DataField name="title"       rename="description" type="string" />
            <DataField name="link"                             type="string" />
            <DataField name="description" rename="content"     type="string" />
        </DataRecord>

        <DataRecord name="rss.news.2">
            <DataField name="title" type="string" size="200" trim="true" />
            <DataField name="link"  type="string" size="100" />
        </DataRecord>
    </Metadata>
</Graph>