Readers are initial components of graph that reads data from input source. The source can be for example a file placed on local disk, ftp, ldap, http or database tables, etc.
Component type : XML_READER
.
XML_READER
can access and extract data from XML document or URL on the web.
This data can then be converted into supported type with the results viewer
or merged with other data in your Project.
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | XML_READER | url |
URL of the XML document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation. |
path |
You can extract just a portion of the data by listing the nested XML
elements, separating each with a dot ("."). If path is not given XML_READER will start reading the XML root tag. |
|
charset |
Convert character encoding of the data source. | UTF-8 |
skiprows |
specifies how many records/rows should be skipped. | 0 |
maxrows |
specifies how many records/rows should be read. |
<Node id="Read xml - Yahoo! Answers" type="XML_READER" url = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&query=mars" path = "Question" />
Component type : JSON_READER
.
JSON_READER
can access and extract data from JSON (Javascript Object Notation) data sources.
This data can then be converted into supported type with the results viewer
or merged with other data in your Project.
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | JSON_READER | url |
URL of the JSON document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation. |
path |
You can extract just a portion of the data by listing the nested JSON
elements, separating each with a dot ("."). If path is not given JSON_READER will start reading the JSON root tag. |
|
charset |
Convert character encoding of the data source. | UTF-8 |
skiprows |
specifies how many records/rows should be skipped. | 0 |
maxrows |
specifies how many records/rows should be read. |
<Node id="Read xml - Yahoo! Answers" type="JSON_READER" url = "http://answers.yahooapis.com/AnswersService/V1/questionSearch?appid=YahooDemo&query=mars" path = "Question" />
Component type : FEED_READER
.
FEED_READER
can access and extract data from feeds in RSS, Atom, and RDF formats.
This data can then be converted into supported type with the results viewer
or merged with other data in your Project.
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | FEED_READER | url |
URL of the RSS or ATOM document to read. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation. |
path |
You can extract just a portion of the data by listing the nested channel.item (items) elements or only channel elements. | channel.item |
charset |
Convert character encoding of the data source. | UTF-8 |
skiprows |
specifies how many records/rows should be skipped. | 0 |
maxrows |
specifies how many records/rows should be read. |
<Node id="Read channel infos fo feedburner feed" type="FEED_READER" url="http://feeds.feedburner.com/BurnThisRSS2" path="channel" />
<Node id="Read digg items feed" type="FEED_READER" url="http://www.digg.com/rss/index.xml" path="channel.item" />
Component type : HTPP_READER
.
HTPP_READER
this module fetches the source of a given web page as a string. This data can then be converted into supported type with the results viewer or merged with other data in your Project. To use Fetch Page module, first enter the URL of the site you want. The module will read the page's source as a string. You can choose to only get part of the page by setting the starting point using the 'Cut content from' field and the end point by using the 'to' field. Only the part of the page between these two strings will be returned.
The output metadata of the HTPP_READER
is a one string field named 'content
' and can be used on Metadata
semantics like :
<Metadata> <DataRecord name="mdata"> <DataField name="content" type="string" /> </DataRecord> </Metadata>
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | HTPP_READER |
url |
Url of web page to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation. | |
from |
Cut content from | |
to |
Cut content to | |
split |
Split the string from from to to by string |
|
reSplit |
Split again each result | |
stripTags |
Strip HTML tags from the result | FALSE |
skiprows |
specifies how many records/rows should be skipped. | 0 |
maxrows |
specifies how many records/rows should be read. |
<Node id="Fetch hostgator" type="HTTP_READER" url = "http://www.hostgator.com/shared.shtml" from = "sharedtable" to = "</table>" split = "<\/tr>" />
Component type : CSV_READER
.
The comma-separated values (or CSV; also known as a comma-separated list or comma-separated variables) file format is a file type that stores tabular data. CSV is one implementation of a delimited text file, which uses a comma to separate values.
A short CSV example file :
1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00Start configuring
CSV_READER
by entering the URL of a CSV file. Next, choose the character that separates fields in each row ...
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | CSV_READER |
url |
URL of a CSV file to fetch. The list of supported protocols can be found in List of Supported Protocols/Wrappers on PHP documentation. | |
quoted |
field can be quoted by ' or " | FALSE |
delimiter |
comma that separate values | ";" |
charset |
Convert character encoding of the data source. | UTF-8 |
newline |
newline (also known as a line break or end-of-line / EOL character) is a special character or sequence of characters signifying the end of a line of text. |
"\n" |
skiprows |
specifies how many records/rows should be skipped from the source file. Good for handling files where first rows is a header not a real data. | 0 |
maxrows |
specifies how many records/rows should be read from the source. |
<Node id="Read CSV file" type="CSV_READER" url = "http://www.abc.virginia.gov/Pricelist/text/disjan08.csv" quoted = "true" delimiter = "," charset = "ISO-8859-15" newline = "\n" skiprows = "5" maxrows = "15" />
Component type : SQL_READER
.
SQL_READER
component reads data from DB. It first executes specified query on DB and then extracts all the rows returned. MixDEM analyze DB structures and create output metadata automatically.
Attribute | Description | Default |
---|---|---|
id |
component node identification | |
type |
component type | SQL_READER |
dsn |
Database Source Names, more commonly seen as the abbreviation, DSN, are data structures used to describe a connection to a database. This DSN will take the form of adapter://user:password@host:port/database so as to completely specify all parameters of the connection.
The list below explains common dsn parameters recognized by
|
|
sqlQuery |
query to be sent to database. | |
adapter |
if dsn is not given, you can define adapter here. adapter is the RDBMS server : ( IBM DB2 'DB2', MySQL 'MYSQL', Microsoft SQL Server 'MSSQL', Oracle 'ORACLE', PostgreSQL 'PGSQL', SQLite 'SQLITE' ) |
|
hostname |
if dsn is not given, you can define hostname here. hostname is a string containing a hostname or IP address of the database server. |
|
hostport |
if dsn is not given, you can define hostport here. hostport is the port parameter allow you to specify the port to which to connects, to match the port configured on the RDBMS server. |
|
username |
if dsn is not given, you can define username here. username is the account identifier for authenticating a connection to the RDBMS server |
|
password |
if dsn is not given, you can define password here. password is the account password credential for authenticating a connection to the RDBMS server. |
|
database |
if dsn is not given, you can define database here. database is the database instance name on the RDBMS server.
|
|
fetchSize |
if dsn is not given, you can define hostname here.number of rows fetched from the DB engine in one step |
1000 |
skiprows |
specifies how many records/rows should be skipped from the source. | 0 |
maxrows |
specifies how many records/rows should be read from the source. |
SQL_READER
can prepares and executes an SQL statement with bound data. The bound data is an list of values to substitute for parameter placeholders in the SQL statement.
Attribute | Description | Default |
---|---|---|
name |
name of the argument | |
value |
value of name argument |
|
field |
if no value given and if there are another component connected as input node, SQL_READER will use values of field as value for name argument. |
|
<Node id="Read from mysql table" type="SQL_READER" dsn = "mysql://root:root@127.0.0.1/information_schema" sqlQuery = "SELECT * FROM TABLES" />
<Node id="Read from mysql table" type="SQL_READER"Note:
dsn="mysql://root:root@127.0.0.1/db_client"
sqlQuery="SELECT id, email FROM table_client WHERE id = ? and date_add < ? "
>
<Attrib name="b1" value="2" />
<Attrib name="b2" value="2008-04-01 00:00:00" />
</Node>
SQL_READER
can be used as
a writer to execute insert, replace or delete SQL statement with bound data,
but SQL WRITER is more adapted to deal with thoses operations.