MixDEM help - Components Readers

XML Reader

Component type : XML_READER.
XML_READER can access and extract data from XML document or URL on the web. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute	Description	Default
`id`	component node identification
`type`	component type	XML_READER
`url`	URL of the XML document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
`path`	You can extract just a portion of the data by listing the nested XML elements, separating each with a dot ("."). If `path` is not given `XML_READER` will start reading the XML root tag.
`charset`	Convert character encoding of the data source.	UTF-8
`skiprows`	specifies how many records/rows should be skipped.	0
`maxrows`	specifies how many records/rows should be read.

Example:

<Node id="Read xml - Yahoo! Answers" type="XML_READER" 
      url  = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&amp;query=mars" 
      path = "Question" 
/>

JSON Reader

Component type : JSON_READER.
JSON_READER can access and extract data from JSON (Javascript Object Notation) data sources. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute	Description	Default
`id`	component node identification
`type`	component type	JSON_READER
`url`	URL of the JSON document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
`path`	You can extract just a portion of the data by listing the nested JSON elements, separating each with a dot ("."). If `path` is not given `JSON_READER` will start reading the JSON root tag.
`charset`	Convert character encoding of the data source.	UTF-8
`skiprows`	specifies how many records/rows should be skipped.	0
`maxrows`	specifies how many records/rows should be read.

Example:

<Node id="Read xml - Yahoo! Answers" type="JSON_READER" 
      url  = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&amp;query=mars" 
      path = "Question" 
/>

FEED Reader

Component type : FEED_READER.
FEED_READER can access and extract data from feeds in RSS, Atom, and RDF formats. This data can then be converted into supported type with the results viewer or merged with other data in your Project.

Attribute	Description	Default
`id`	component node identification
`type`	component type	FEED_READER
`url`	URL of the RSS or ATOM document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
`path`	You can extract just a portion of the data by listing the nested channel.item (items) elements or only channel elements.	channel.item
`charset`	Convert character encoding of the data source.	UTF-8
`skiprows`	specifies how many records/rows should be skipped.	0
`maxrows`	specifies how many records/rows should be read.

Examples:

<Node id="Read channel infos fo feedburner feed" type="FEED_READER" 
     url="http://feeds.feedburner.com/BurnThisRSS2" 
    path="channel" 
/>

<Node id="Read digg items feed" type="FEED_READER" 
     url="http://www.digg.com/rss/index.xml" 
    path="channel.item" 
/>

HTTP Reader

Component type : HTPP_READER.
HTPP_READER this module fetches the source of a given web page as a string. This data can then be converted into supported type with the results viewer or merged with other data in your Project. To use Fetch Page module, first enter the URL of the site you want. The module will read the page's source as a string. You can choose to only get part of the page by setting the starting point using the 'Cut content from' field and the end point by using the 'to' field. Only the part of the page between these two strings will be returned.
The output metadata of the HTPP_READER is a one string field named 'content' and can be used on Metadata semantics like :

<Metadata>
    <DataRecord name="mdata">
        <DataField name="content" type="string" /> 
    </DataRecord>
</Metadata>

Attribute	Description	Default
`id`	component node identification
`type`	component type	HTPP_READER
`url`	Url of web page to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
`from`	Cut content from
`to`	Cut content to
`split`	Split the string from `from` to `to` by string
`reSplit`	Split again each result
`stripTags`	Strip HTML tags from the result	FALSE
`skiprows`	specifies how many records/rows should be skipped.	0
`maxrows`	specifies how many records/rows should be read.

Example:

<Node id="Fetch hostgator" type="HTTP_READER" 
    url   = "http://www.hostgator.com/shared.shtml" 
    from  = "sharedtable" 
    to    = "&lt;/table&gt;" 
    split = "&lt;\/tr&gt;" 
/>

CSV Reader

Component type : CSV_READER.
The comma-separated values (or CSV; also known as a comma-separated list or comma-separated variables) file format is a file type that stores tabular data. CSV is one implementation of a delimited text file, which uses a comma to separate values.

A short CSV example file :

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""",,4900.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

Start configuring CSV_READER by entering the URL of a CSV file. Next, choose the character that separates fields in each row ...

Attribute	Description	Default
`id`	component node identification
`type`	component type	CSV_READER
`url`	URL of a CSV file to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation.
`quoted`	field can be quoted by ' or "	FALSE
`delimiter`	comma that separate values	";"
`charset`	Convert character encoding of the data source.	UTF-8
`newline`	`newline` (also known as a line break or end-of-line / EOL character) is a special character or sequence of characters signifying the end of a line of text.	"\n"
`skiprows`	specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data.	0
`maxrows`	specifies how many records/rows should be read from the source.

Example:

<Node id="Read CSV file" type="CSV_READER" 
    url       = "http://www.abc.virginia.gov/Pricelist/text/disjan08.csv" 
    quoted    = "true" 
    delimiter = "," 
    charset   = "ISO-8859-15" 
    newline   = "\n" 
    skiprows  = "5" 
    maxrows   = "15" 
/>

SQL_READER

Component type : SQL_READER.
SQL_READER component reads data from DB. It first executes specified query on DB and then extracts all the rows returned. MixDEM analyze DB structures and create output metadata automatically.

Attribute	Description	Default
`id`	component node identification
`type`	component type	SQL_READER
`dsn`	Database Source Names, more commonly seen as the abbreviation, DSN, are data structures used to describe a connection to a database. This DSN will take the form of `adapter://user:password@host:port/database` so as to completely specify all parameters of the connection. The list below explains common dsn parameters recognized by `SQL_READER` : `adapter`: RDBMS server : ( IBM DB2 'DB2', MySQL 'MYSQL', Microsoft SQL Server 'MSSQL', Oracle 'ORACLE', PostgreSQL 'PGSQL', SQLite 'SQLITE' ) `user`: account identifier for authenticating a connection to the RDBMS server. `password`: account password credential for authenticating a connection to the RDBMS server. `host`: a string containing a hostname or IP address of the database server. If the database is running on the same host as the MixDEM, you may use 'localhost' or '127.0.0.1'. `port`: the port parameter allow you to specify the port to which to connects, to match the port configured on the RDBMS server. `database`: database instance name on the RDBMS server.
`sqlQuery`	query to be sent to database.
`adapter`	if `dsn` is not given, you can define `adapter` here. `adapter` is the RDBMS server : ( IBM DB2 'DB2', MySQL 'MYSQL', Microsoft SQL Server 'MSSQL', Oracle 'ORACLE', PostgreSQL 'PGSQL', SQLite 'SQLITE' )
`hostname`	if `dsn` is not given, you can define `hostname` here. `hostname` is a string containing a hostname or IP address of the database server.
`hostport`	if `dsn` is not given, you can define `hostport` here. `hostport` is the port parameter allow you to specify the port to which to connects, to match the port configured on the RDBMS server.
`username`	if `dsn` is not given, you can define `username` here. `username` is the account identifier for authenticating a connection to the RDBMS server
`password`	if `dsn` is not given, you can define `password` here. `password` is the account password credential for authenticating a connection to the RDBMS server.
`database`	if `dsn` is not given, you can define `database` here. `database` is the database instance name on the RDBMS server.
`fetchSize`	if `dsn` is not given, you can define `hostname` here.number of rows fetched from the DB engine in one step	1000
`skiprows`	specifies how many records/rows should be skipped from the source.	0
`maxrows`	specifies how many records/rows should be read from the source.

SQL_READER can prepares and executes an SQL statement with bound data. The bound data is an list of values to substitute for parameter placeholders in the SQL statement.
Bound data can be stored on the <Attrib> (Node attibute) :

Attribute	Description	Default
`name`	name of the argument
`value`	value of `name` argument
`field`	if no `value` given and if there are another component connected as input node, `SQL_READER` will use values of `field` as value for `name` argument.

Examples:

<Node id="Read from mysql table" type="SQL_READER" 
     dsn     = "mysql://root:root@127.0.0.1/information_schema" 
    sqlQuery = "SELECT * FROM TABLES" 
/>

MixDEM components - Readers

XML Reader

JSON Reader

FEED Reader

HTTP Reader

CSV Reader

SQL_READER