Extractor Plug-in
Introduction
The Extractor plug-in allows data from different sources to be turned into dataviews so that they may be monitored using the Geneos system - with rules and alerts applied, values logged to a database, etc.
The views are entirely configurable, with the setup file either defining the names of all rows, columns and headlines, or specifying the location in the source data where they should be constructed from.
Sources
Each instance of an Extractor sampler may have a single source configured. This defines where the data comes from. The sources that may be configured are described below:
Sample Data Transfer
Other samplers may provide the source data. The samplers that can provide data sources are found in the following table:
Plug-in | Description |
---|---|
WEB-MON | Each stage in a WEB-MON scenario relates to a web page on a web server. The WEB-MON sampler verifies that the server is returning the pages correctly. By using the sampler data transfer mechanism the page contents may be monitored by Geneos as well. Multiple sources may be created in the same WEB-MON sampler if required. |
More details about the use of these other samplers may be found in the relevant sampler manuals.
Views
Any number of views may be created for the same source. This is useful if a single source contains several different types of information. Each view is defined using a template that describes how to turn the source data into the dataview - how to break it into rows and columns, and what additional information to provide as headlines. These templates are described below.
In addition, information about the source may be included, such as how long it has been since the source sent a new copy of the data.
Pre-configured Templates
These will eventually be made available for status information generated from common systems but at the moment the custom template is the only option available.
If you find that you are configuring the same system again and again and believe the output should be included as a template then please contact your support representative.
Custom Templates
The custom templates allow complete control over what is displayed. There are several formats available, which let you select the type of source data. There are then several methods available, which let you decide how you wish to configure it.
Some methods work with several formats, although in some cases a conversion is performed internally to allow the format to be used. The table below shows the available formats and methods and gives any relevant information.
Format | Method | Notes |
---|---|---|
XML | XPath | |
HTML | XPath | HTML to XHTML conversion is performed.* |
JSON | XPath | JSON to XHTML conversion is performed.** |
See the Menu Options section for information about how to view the results of the conversion.
The JSON to XML converter can also be invoked via the "-json2xml" command-line switch (i.e. netprobe.<platform> -json2xml <filename>).
XPath Method
Introduction
XPath was defined by the World Wide Web Consortium (W3C) and is a query language for selecting information from XML documents. The Extractor plugin currently supports the main features of XPath 1.0 location paths.
XHTML is an HTML document which is also a valid XML document. Since we perform an HTML to XHTML conversion, the XPath method can be used with HTML as well as XML format sources.
Configuration
The configuration is a hierarchical series of XPaths, each of which is relative to the last, allowing you to home in on the required information.
The 'Names' and 'Values' in the configuration are the lowest level items, and these will result in single 'string-values' - a single piece of text to turn into a row, column or headline name or value.
All other XPaths can be wildcarded and if they return multiple items then multiple items will potentially end up in the view. As an example, if you were to put a wildcarded xpath into the row configuration then the number of rows in the view may vary, depending on how many items matched each time the source was read.
In order to make this happen, if multiple items are returned from an evaluation then XPaths at the next level down will be evaluated relative to each of these 'context' nodes in turn. If no items are returned from the XPath evaluation then processing of this part of the configuration will stop, even if one of the XPaths at the next level down is specified relative to the root.
Basic Examples
XPath works in a similar way to directories on disk, with a / being used to separate levels, although a big difference is that you can have multiple items with the same name at the same level in XML, but on disks you can only have one file with each name in each folder.
Here is an example of a simple XML file:
<a>
<b/>
<b>
<c/>
</b>
</a>
The XPath
/a/b
would select all the <b> elements inside the <a> element. In this case it would select 2 items:
<a> <b/> <---------- This is the first b node, which doesn't have any children
<b> <----------- This is the start of the second b node
<c/>
</b> <---------- This is the end of the second b node
</a>
A Path of
/a/b[1]
Would just select the first of the two <b> nodes from the previous example. Putting a 2 there would select the second, 3 would select the third (no nodes in this case), etc. If you don't know how many nodes there will be, you can select the last one with:
/a/b[last()]
Each XPath is evaluated relative to the last. Continuing with our example, if the top level XPath configured is:
/a/b
then to select the <c> element you can simply set the next xpath to be:
c
and that will select all of the <c> elements that are directly underneath a <b> element. If you want to ignore the current context then you can start your path with a slash, and this will return you to the top level of the document (the root) so in this case you could do:
/a/b/c
Sometimes you don't know how many levels deep the item you want is. In this case, a double slash will jump as many levels as needed. The path:
//c
will be able to select the c node, wherever it is. Because it starts with a slash it will always start searching at the root. If you want to start searching relative to the context node, you need to use a period like this:
.//c
as a period (.) means the current node.
Attributes may be read with an @ sign. You can select nodes based on attributes, or extract an attribute from the source. Given the source:
<a>
<b x="x1" y="y1">
<b x="x2">
<b x="x3" y="y3">
</a>
The following xpath will get all of the b nodes with x set to x3 (1 result):
//b[@x="x3"]
This will get all of the b nodes that have a y node set, regardless of its value (2 results):
//b[@y]
Given that you have selected a set of nodes, the following will get the value of the x attribute:
@x
Examples for use with HTML
HTML has a specific set of elements. The most common for use with this sampler is the table. There are lots of table options in HTML, but the simplest case is that <tr> defines a table row and <td> defines table data (a cell). A typical table may then look like this:
...
<table>
<tr><td>Food</td><td>Calories</td><td>Protein</td><td>Fat</td></tr>
<tr><td>Apples</td><td>47.0 kcal</td><td>0.4 g</td><td>0.1 g</td></tr>
<tr><td>Oranges</td><td>37.0 kcal</td><td>1.1 g</td><td>0.1 g</td></tr>
<tr><td>Bananas</td><td>95.0 kcal</td><td>1.2 g</td><td>0.3 g</td></tr>
</table>
...
If this is the first table in the setup, then the top-level XPath, to select the table, would look like this:
//table[1]
The other paths can then be done relative to this.
If the table had a particular id, like this:
...
<table id="foodinfo">
...
</table>
...
then the id can be used instead, as these should be unique within a page:
//table[@id="foodinfo"]
The columns could either be hard coded to 'Food', 'Calories', 'Protein' and 'Fat' or could be populated automatically from the first row of the table. In this case, the Column XPath would be:
tr[1]/td
which says that each column should be taken from a cell in the first row. The 'Name' would then be set to an XPath of:
text()
The rows, and cell data, come from everything except the first row in the source, so the row XPath would be:
tr[position()!=1]
The 'Name' of the row would be the first cell in the row:
td[1]
and the 'Cells' would be everything except the first cell in the row:
td[position()!=1]
and the 'Value' would be:
text()
This would simply copy the table directly from the source page into a Geneos table.
Resources
There are lots of resources available online describing XPath, along with tools to try it out either online or with downloadable applications. Here are a few of the available resources. Many more are available by searching for 'XPath' using a search engine such as Google.
About XPath:
Evaluation Tools:
http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm
Use smaller-sized files to ensure better results.
Plug-in Configuration
source
The source defines where the information comes from. A single source may be specified per sampler instance.
Mandatory: Yes
source > samplerDataTransfer
This type of source allows data to be read from other samplers, such as WEB-MON.
source > samplerDataTransfer > name
Specifies the name of the data transfer source. Multiple instances of the Extractor sampler may read from the same source, but only one source may use each name per physical Netprobe.
Variables may be used to create unique names if the same sampler instance is used several times (e.g. if several Managed Entities or Types are used on the Netprobe)
Mandatory: No
views > view > name
Specifies the name of a view.
As with system-created views in other samplers, more configuration options for the view are available in the Dataviews section of the Sampler's Advanced tab.
Mandatory: Yes
views > view > template > custom
The custom template allows the user to define all the details of the output.
views > view > template > custom > format
The format should be chosen to match the source data type.
The methods available for selection will be determined by the format chosen, as some methods are not appropriate to all formats.
views > view > template > custom > format > html
The HTML option should be chosen for normal web pages.
views > view > template > custom > format > xml
The XML option should be chosen for normal web pages.
views > view > template > custom > format > any > method
This option allows the user to choose the method by which to configure the output.
views > view > template > custom > format > any > method > xpath
The XPath method allows XPaths to be used to select the information.
See Also: XPath Method
views > view > template > custom > format > any > method > xpath > headlines
Headline variables may be defined in this section of the configuration.
views > view > template > custom > format > any > method > xpath > columns
The names of the columns may be defined in this section of the configuration. These may be pulled from the source data or statically defined, but ideally would not change.
views > view > template > custom > format > any > method > xpath > rows
The row names and table data may be defined in this section of the configuration. Rows may be specified individually, or if wildcarded XPaths are used then one definition may result in several rows being created.
views > view > sourceInfo
Allows additional information about the source to be displayed in the created view
views > view > sourceInfo > timeSinceUpdate
For Sample Data Transfer sources, this allows the number of seconds since the source sent a new copy of the data to be shown as a headline variable in the view. Every time the source sends data, regardless of the content and whether it has changed, the headline will be returned to 0.
For File sources, this allows the number of seconds since the file was last updated to be shown as a headline variable in the view. Every time the file's timestamp is changed (even though the contents are the same), the headline will be returned to 0.
By default this headline will be called timeSinceUpdate but can be changed if required.
views > view > sourceInfo > timeSinceUpdate > show
Whether to show the time since the source sent a new copy of the data in the timeSinceUpdate headline.
views > view > sourceInfo > timeSinceUpdate > name
If the view template configuration results in a headline called timeSinceUpdate then the default name will conflict. This setting allows the source info headline name to be changed. In most cases you will not need to set this.
views > view > sourceInfo > timeSinceModification
For Sample Data Transfer sources, this allows the number of seconds since the source data was different to the current data to be shown as a headline variable in the view. If the source sends a new copy of the data and it is the same as last time then this headline will continue to increase. If the source data changes (even if it changes something that isn't being extracted and displayed in the view) then this will be reset to 0.
For File sources, this allows the number of seconds since the file contents were changed to be shown as a headline variable in the view. If the file's contents are changed, this headline will be reset to 0.
By default this headline will be called timeSinceModification but can be changed if required.
views > view > sourceInfo > timeSinceModification > show
Whether to show the time since the source data changed in the timeSinceModification headline.
views > view > sourceInfo > timeSinceModification > name
If the view template configuration results in a headline called timeSinceModification then the default name will conflict. This setting allows the source info headline name to be changed. In most cases you will not need to set this.