MixDEM components - Readers

Readers are initial components of graph that reads data from input source. The source can be for example a file placed on local disk, ftp, ldap, http or database tables, etc.


XML Reader

Component type : XML_READER.
XML_READER can access and extract data from XML document or URL on the web. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute Description Default
id component node identification
type component type XML_READER
url URL of the XML document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
path You can extract just a portion of the data by listing the nested XML elements, separating each with a dot ("."). If path is not given XML_READER will start reading the XML root tag.
charset Convert character encoding of the data source. UTF-8
skiprows specifies how many records/rows should be skipped. 0
maxrows specifies how many records/rows should be read.

Example:
<Node id="Read xml - Yahoo! Answers" type="XML_READER" 
      url  = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&amp;query=mars" 
      path = "Question" 
/>

Top

JSON Reader

Component type : JSON_READER.
JSON_READER can access and extract data from JSON (Javascript Object Notation) data sources. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute Description Default
id component node identification
type component type JSON_READER
url URL of the JSON document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
path You can extract just a portion of the data by listing the nested JSON elements, separating each with a dot ("."). If path is not given JSON_READER will start reading the JSON root tag.
charset Convert character encoding of the data source. UTF-8
skiprows specifies how many records/rows should be skipped. 0
maxrows specifies how many records/rows should be read.

Example:
<Node id="Read xml - Yahoo! Answers" type="JSON_READER" 
      url  = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&amp;query=mars" 
      path = "Question" 
/>

Top

FEED Reader

Component type : FEED_READER.
FEED_READER can access and extract data from feeds in RSS, Atom, and RDF formats. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute Description Default
id component node identification
type component type FEED_READER
url URL of the RSS or ATOM document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
path You can extract just a portion of the data by listing the nested channel.item (items) elements or only channel elements. channel.item
charset Convert character encoding of the data source. UTF-8
skiprows specifies how many records/rows should be skipped. 0
maxrows specifies how many records/rows should be read.

Examples:
<Node id="Read channel infos fo feedburner feed" type="FEED_READER" 
     url="http://feeds.feedburner.com/BurnThisRSS2" 
    path="channel" 
/>
<Node id="Read digg items feed" type="FEED_READER" 
     url="http://www.digg.com/rss/index.xml" 
    path="channel.item" 
/>

Top

HTTP Reader

Component type : HTPP_READER.
HTPP_READER this module fetches the source of a given web page as a string. This data can then be converted into supported type with the results viewer or merged with other data in your Project. To use Fetch Page module, first enter the URL of the site you want. The module will read the page's source as a string. You can choose to only get part of the page by setting the starting point using the 'Cut content from' field and the end point by using the 'to' field. Only the part of the page between these two strings will be returned.
The output metadata of the HTPP_READER is a one string field named 'content' and can be used on Metadata semantics like :

<Metadata>
    <DataRecord name="mdata">
        <DataField name="content" type="string" /> 
    </DataRecord>
</Metadata>

AttributeDescriptionDefault
id component node identification
type component type HTPP_READER
url Url of web page to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
from Cut content from
to Cut content to
split Split the string from from to to by string
reSplit Split again each result
stripTags Strip HTML tags from the result FALSE
skiprows specifies how many records/rows should be skipped. 0
maxrows specifies how many records/rows should be read.

Example:
<Node id="Fetch hostgator" type="HTTP_READER" 
    url   = "http://www.hostgator.com/shared.shtml" 
    from  = "sharedtable" 
    to    = "&lt;/table&gt;" 
    split = "&lt;\/tr&gt;" 
/>

Top

CSV Reader

Component type : CSV_READER.
The comma-separated values (or CSV; also known as a comma-separated list or comma-separated variables) file format is a file type that stores tabular data. CSV is one implementation of a delimited text file, which uses a comma to separate values.

A short CSV example file :

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
Start configuring CSV_READER by entering the URL of a CSV file. Next, choose the character that separates fields in each row ...

AttributeDescriptionDefault
id component node identification
type component type CSV_READER
url URL of a CSV file to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
quoted field can be quoted by ' or " FALSE
delimiter comma that separate values ";"
charset Convert character encoding of the data source. UTF-8
newline newline (also known as a line break or end-of-line / EOL character) is a special character or sequence of characters signifying the end of a line of text. "\n"
skiprows specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data. 0
maxrows specifies how many records/rows should be read from the source.

Example:
<Node id="Read CSV file" type="CSV_READER" 
    url       = "http://www.abc.virginia.gov/Pricelist/text/disjan08.csv" 
    quoted    = "true" 
    delimiter = "," 
    charset   = "ISO-8859-15" 
    newline   = "\n" 
    skiprows  = "5" 
    maxrows   = "15" 
/>

Top

SQL_READER

Component type : SQL_READER.
SQL_READER component reads data from DB. It first executes specified query on DB and then extracts all the rows returned. MixDEM analyze DB structures and create output metadata automatically.

AttributeDescriptionDefault
id component node identification
type component type SQL_READER
dsn Database Source Names, more commonly seen as the abbreviation, DSN, are data structures used to describe a connection to a database. This DSN will take the form of adapter://user:password@host:port/database so as to completely specify all parameters of the connection.

The list below explains common dsn parameters recognized by SQL_READER :

  • adapter: RDBMS server : ( IBM DB2 'DB2', MySQL 'MYSQL', Microsoft SQL Server 'MSSQL', Oracle 'ORACLE', PostgreSQL 'PGSQL', SQLite 'SQLITE' )
  • user: account identifier for authenticating a connection to the RDBMS server.
  • password: account password credential for authenticating a connection to the RDBMS server.
  • host: a string containing a hostname or IP address of the database server. If the database is running on the same host as the MixDEM, you may use 'localhost' or '127.0.0.1'.
  • port: the port parameter allow you to specify the port to which to connects, to match the port configured on the RDBMS server.
  • database: database instance name on the RDBMS server.

sqlQuery query to be sent to database.
adapter if dsn is not given, you can define adapter here. adapter is the RDBMS server : ( IBM DB2 'DB2', MySQL 'MYSQL', Microsoft SQL Server 'MSSQL', Oracle 'ORACLE', PostgreSQL 'PGSQL', SQLite 'SQLITE' )
hostname if dsn is not given, you can define hostname here. hostname is a string containing a hostname or IP address of the database server.
hostport if dsn is not given, you can define hostport here. hostport is the port parameter allow you to specify the port to which to connects, to match the port configured on the RDBMS server.
username if dsn is not given, you can define username here. username is the account identifier for authenticating a connection to the RDBMS server
password if dsn is not given, you can define password here. password is the account password credential for authenticating a connection to the RDBMS server.
database if dsn is not given, you can define database here. database is the database instance name on the RDBMS server.

fetchSize if dsn is not given, you can define hostname here.number of rows fetched from the DB engine in one step 1000
skiprows specifies how many records/rows should be skipped from the source. 0
maxrows specifies how many records/rows should be read from the source.

SQL_READER can prepares and executes an SQL statement with bound data. The bound data is an list of values to substitute for parameter placeholders in the SQL statement.
Bound data can be stored on the <Attrib> (Node attibute) :
AttributeDescriptionDefault
name name of the argument
value value of name argument
field if no value given and if there are another component connected as input node, SQL_READER will use values of field as value for name argument.

Examples:
<Node id="Read from mysql table" type="SQL_READER" 
     dsn     = "mysql://root:root@127.0.0.1/information_schema" 
    sqlQuery = "SELECT * FROM TABLES" 
/>
<Node id="Read from mysql table"   type="SQL_READER" 
dsn="mysql://root:root@127.0.0.1/db_client"
sqlQuery="SELECT id, email FROM table_client WHERE id = ? and date_add < ? "
>
<Attrib name="b1" value="2" />
<Attrib name="b2" value="2008-04-01 00:00:00" />
</Node>
Note:
SQL_READER can be used as a writer to execute insert, replace or delete SQL statement with bound data, but SQL WRITER is more adapted to deal with thoses operations.

Top