Data Schema
Overview Copied
Data schemas give Gateway Hub information about the type of data being published from Gateway.
Each dataview that is published to Gateway Hub requires a data schema definition.
The data schema for a dataview specifies:
- The name, data type, and any units of measure of each headline.
- The name, data type, and any units of measure of each column.
- If the dataview requires pivoting.
Schemas are defined in the Publishing tab of the sampler in the Gateway Setup Editor (GSE).
The Gateway comes packaged with some schemas. See Sampler schema types.
Metrics collected using the client library provide their own schemas automatically, so long as a valid Dynamic Entities mapping is defined.
When a data schema does not exist, you can create user defined schema and add these to the sampler. See Create a data schema.
Caution
Data schemas describe dataviews sent from Gateway to Gateway Hub. This should not be confused with the configuration schema that describes the correct XML formatting of Gateway setup files.
Built-in schemas Copied
The Gateway is packaged with built-in schemas for plug-ins with dataviews containing a set of known column names and data types.
Built-in schemas specify pivoting for dataviews containing rows that have only one value column and the data type of this column varies between the rows. For example, see the Hardware Plug-in - Technical Reference. This allows Gateway Hub to treat the rows of these dataviews as columns.
If a built-in schema exists for a sampler, This <sampler name> sampler has predefined schema(s)
is displayed in the Publishing tab of a sampler in the GSE. For a list of samplers with built-in schemas,
see Sampler schema types.
The GSE also has a command that can be used to view the schemas currently defined for dataviews on a sampler.
If you make any changes to the dataviews in these samplers, for example by adding headlines and columns using the Compute Engine, you must add these additions to the existing schema.
User-defined schemas Copied
Some plug-ins do not come with any data schema definitions. These are plug-ins where the columns are data types are unknown. You must define the data schema for:
- All toolkit-like plug-ins. Examples of these include the SQL-TOOLKIT, JMX, and TOOLKIT plug-ins.
- Any pre-existing dataviews with any user-defined headlines or columns.
Consider the following examples where a data schema definition is required for pre-existing dataviews:
- Additional columns added to an FKM sampler (see Change FKM dataview columns). If the columns are not defined in the schema, an error is generated in Gateway Hub .
- Additional columns or headlines added to a CPU sampler via the Compute Engine (see Adding to existing dataviews). If the columns or headlines are not defined in the schema, an error is generated in the GSE.
For how to create a schema, see Create a data schema.
Pivoted dataviews Copied
If you add rows to pre-existing pivoted dataviews, you must define them as if they were additional columns. You do not specify the pivot option if there is a built-in schema for the sampler. If you do, and it conflicts with the built-in schema, it is discarded in favour of the user-defined schema.
View existing schemas Copied
To view existing schemas currently defined for dataviews on a sampler, open Active Console and select Show Current Schema.
In the GSE, the Show Current Schema command is available in the Publishing tab of the sampler.
When the command is run, a window opens showing one table per dataview, describing the schema defined on the sampler. There is a table for every dataview that has a defined schema, irrespective of whether the dataview is in use. Only dataviews that have a schema definition are shown.
Each table combines information from the built-in schema shipped with the Gateway and any additional information added in the GSE.
A description of the columns shown in each table is below:
Column Name | Description |
---|---|
Component | Describes if this component is a headline or a column. |
Name | Name of the headline or column. |
Type |
Data type of the headline or column. The data types are:
boolean
;
date
;
dateTime
;
float32
;
float64
;
int32
;
int64
;
string .
|
Units | Unit of measure assigned to this headline or column (if present). |
Source |
Origin of the headline or column and the schema. Cells in this column display one of the following:
|
Create a data schema Copied
When connected to a running Gateway you should use the Propose Schema command to create data schemas.
When creating and editing XML configuration files without being connected to a Gateway, you should create schemas manually.
The Propose Schema command Copied
Before using the Propose Schema command, please note the following:
- The command attempts to deduce the data types of columns and headlines from each dataview. The deduced data types may be incorrect.
- You must always supply any units of measure post-generation.
- Any types or units of measure already defined by you in the Publishing tab of a sampler take precedence over those inferred from dataviews by the command.
- Tables with only two columns are pivoted in the generated XML schema. A comment is included in the XML highlighting this.
You can run the Propose Schema command from the Active Console or the GSE, it will act differently depending on which component you are using.
When using the Gateway Setup Editor:
- The schema is automatically applied to the sampler.
- The command runs against all the samplers with using the same plugin on the Gateway. This process can take some time if you have many samplers using the same plugin.
- The command times out after 30 seconds.
- You can cancel waiting for the command result, but this does not stop the command running on the Gateway.
When using the Active Console:
- When used in the Active Console, the schema appears as XML in a new window. You must then copy and paste this XML to your sampler. - You can also use this to create schema definitions as static variables.
Examples Copied
The following are examples of the output of Propose Schema:
- If you have added two columns to the CPU sampler, this command generates only the schema definition for those additional columns.
- A toolkit-like sampler does not have a built-in schema, and therefore the command generates a complete schema definition for the sampler.
Propose a schema in the GSE Copied
When using the Gateway Setup Editor running the Propose Schema command will query the Gateway and all its running instances of the sampler in order to build the schema.
Caution
Proposing a schema is computationally intensive and may impact the performance of a Gateway.
To generate a schema definition using the Propose Schema command in the GSE, follow these steps:
- Select the desired sampler in the GSE.
- Navigate to the Publishing tab of the sampler.
- Select Propose Schema.
Note
If user-defined schema is already present for the sampler, a dialog opens asking you to if you wish to overwrite the schema information.
The generated schema is directly added to the Publishing section. Any existing schema definitions are overwritten.
After using the command, perform the following:
- Select Schema > Dataviews >Â Data.
- Review the schema for errors because the data types and pivoting inferred by the Propose Schema command may be incorrect.
Note
If the value forPivot
is incorrect you must create the schema manually. - Add any units of measure to the headlines and/or columns.
For more information about types, units of measure, and pivoting, see Gateway Hub configuration.
Propose a schema in Active Console Copied
In Active Console you can use the Propose Schema command to build schema for specific dataviews. This will not effect any connected Gateways.
To generate a schema definition using the Propose Schema command in Active Console, follow these steps:
- Make sure Gateway Hub is enabled in the GSE.
- Right-click a sampler or dataview.
- Navigate to Sampler Schema.
- Select Propose Schema. The generated XMLÂ schema definition for the sampler appears in a new window.
- Right-click the window with the generated XML in the Active Console and select Copy All.
- Navigate to the GSE, right-click the correct sampler and select Paste Schema.
Warning
No checks are performed when using Paste Schema on a sampler. Any existing schema is overwritten, and any copied schema can be pasted on to any sampler.
After using Paste Schema, perform the following:
- Navigate to the Publishing tab of the sampler to view the schema definition.
- Select Schema > Dataviews >Â Data.
- Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
- Add any units of measure to the headlines and/or columns.
- (Optional) Specify pivoting.
For more information about types, units of measure, and pivoting, see Gateway Hub configuration.
Note
Paste Schema is only available when valid XML has been copied to the clipboard.
How to use Paste Schema to create static variables Copied
The generated XML output of the Propose Schema command in the Active Console can also be used to create sampler schemas as static variables. Schemas saved as static variables can be selected in the Publishing tab of a sampler.
To use the output of the Propose Schema command to produce schemas as static variables, follow these steps:
- Right-click on the window with the generated XML in the Active Console and select Copy All.
- Navigate to the GSE.
- Right-click Static variables >Â Sampler-schemas and select Paste Schema.
A static variable containing a schema definition for each dataview in the copied sampler is created. The name used for the static variable is the name of the dataview.
After using Paste Schema, perform the following for each static variable:
- Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
- Add any units of measure to the headlines and/or columns.
- (Optional) Specify pivoting.
For more information about types, units of measure, and pivoting, see Gateway Hub configuration.
Note
Paste Schema is only available when valid XML has been copied to the clipboard.
How to define a schema manually Copied
To define a schema manually, perform the following:
- Open your Gateway Setup Editor.
- Navigate to the Publishing tab of the sampler you want to make a schema for.
- In Schema > Dataviews, click Add new. You must provide an entry for each dataview in the sampler that requires a schema definition.
- In the Dataview field, enter the name of the dataview.
- In the Schema field, choose
data
. - Click Data.
- Add as many new Headlines and Columns entries as you require.
- In the Name field, add the name of the headline or column in the dataview.
- Under options, choose the type of data represented by the headline or column.
If you choose
Int32
,Int64
,Float32
orFloat64
, select the appropriate Unit of measure. - (Optional) If the dataview is pivoted, tick the box by Pivot.
- Close the tab.
- Click Validate current document to review any errors.
- Click Save current document .
For more information about types, units of measure, and pivoting, see Gateway Hub configuration.
Schema inference Copied
Beginning with version 2.4.x, Gateway Hub can use the schema inference feature to infer data schemas for data published by a Gateway when a built-in or user-defined data schema does not exist. This is useful in cases where you have a large number of toolkits, and creating user-defined schemas for each may take a long time.
Inference produces only a best guess of the appropriate data schema, and the ultimate quality of an inferred data schema is dependent on the quality and consistency of the published data.
Note the following limitations:
- Inference only recognises dates in the
yyyy-mm-dd
format. - Inference does not capture units, and interprets values with a unit as a raw number. For example, if a cell has a value of
987 MB
, then schema inference interprets this as987
. - Headline data schema are inferred separately from table data schema.
In most cases, it is highly recommended that the you provide your own data schemas since this is the best way to ensure data schema accuracy and prevent loss of data due to misconfiguration.
Inference modes Copied
You can configure schema inference to run in one of three modes: Naive, Basic, or Smart. By default, schema inference will run in Smart mode and this is the recommenced setting.
Caution
In all cases, the data schema inferred is only as good as the data that has been observed. If the data structure changes after inference, then you must update the data schema manually or risk dropping further datapoints.
Mode | Description | Advantages | Disadvantages |
---|---|---|---|
Naive | Naive inference uses a single datapoint to infer a very basic data schema where all fields are type string . |
|
|
Basic | Basic inference uses a single datapoint to infer a more detailed data schema than Naive inference. Where field data is parsed as numeric, they are assigned the type float64 making it accessible to all metric query functionality. Non-numeric fields are assigned the type string . |
|
|
Smart |
Smart inference uses a multi-datapoint inference model. This is the default and recommend setting. You must configure the minimum number of datapoints to use over a defined inference period. Once the inference period is over (measured using sample time not clock time), if the inference engine has at least the minimum number of datapoints, it will perform a smart evaluation of property types. All supported types will be inferred, and any numerics will always be When setting the minimum number of datapoints, you should consider that you lose the datapoints used for inference. Additionally, any datapoints currently in use for inference are lost if the normaliser is shut down. This restarts the inference period and requires that datapoints are collected again. The higher the number of datapoints used for inference the higher the accuracy, but this also increases the amount of datapoints that can be lost. Variations in inference duration are small, as no inference will be performed until the full period is complete. |
|
|
Consider as an example, a sequence of four datapoints received over the inference time period such as 10 minutes. Each inference mode will treat the same data differently.
Datapoint | Naive | Basic | Smart (using 3 samples for inference) |
---|---|---|---|
123.45 days (string ) |
123.45 days (string ) |
123.45 (float64 ) |
Used as training data, not stored. |
250.56 days (string ) |
250.56 days (string ) |
250.56 (float64 ) |
Used as training data, not stored. |
unavailable (string ) |
unavailable (string ) |
ingestion error | Used as training data, not stored. |
4.36 days (string ) |
4.36 days (string ) |
4.36 (float64 ) |
4.36 days (string ) |
In cases with missing data, this can change the inferences made.
Datapoint | Naive | Basic | Smart (using 3 samples for inference) |
---|---|---|---|
no data | omitted | omitted | Used as training data, not stored. |
no data | omitted | omitted | Used as training data, not stored. |
321.54 days (string ) |
ingestion error | ingestion error | Used as training data, not stored. |
4.36 days (string ) |
ingestion error | ingestion error | 4.36 (float64 ) |
Gateway Hub configuration Copied
You can configure data schema inference in Gateway Hub during installation or using hubctl
with your installation descriptor.
For the most up-to-date information about configuration options, see Install Gateway Hub and hubctl tool.
The following configuration options are available:
Option | Description |
---|---|
hub_normaliserd_inference_enabled | Enable or disable data schema inference. Choose from true or false . |
hub_normaliserd_inference_mode | Set the inference mode. Choose from Naive , Basic or Smart . |
hub_normaliserd_inference_smart_min_samples |
Minimum number of samples required before Smart inference can be used. This setting only applies if Smart inference occurs after a duration set by |
hub_normaliserd_inference_smart_duration_seconds | Duration in seconds to wait before performing Smart inference. This setting only applies if hub_normaliserd_inference_mode is set to Smart . |
hub_normaliserd_inference_smart_threshold |
Percentage of samples received inside the inference duration that must be of a specific type, for a field to be matched to that type. For example, if Gateway Hub has received |
Gateway configuration Copied
Gateway version 5.7.x or later is required in order to publish dataviews without a schema. If dataviews without a schema are published to a Gateway Hub version that does not include schema inference, then a large number of ingestion errors will be reported.
Gateway will try to publish with a data schema if possible, and will not publish data if it has a data schema with errors.
As a result, the following scenarios are possible:
- If a dataview has a data schema and Gateway detects no errors, then it will be published with the provided data schema.
- If the publish setting of a dataview is set to false, then it will not be published.
- If a dataview has an empty data schema or a data schema that contains errors, then it will not be published.
- If a dataview has no data schema, then the dataview will be published without a data schema.
Data schema parameters Copied
Units of measure used in schemas Copied
Name | Symbol |
---|---|
percent | % |
seconds | s |
milliseconds | ms |
microseconds | μs |
nanoseconds | ns |
days | d |
per second | s-1 |
megahertz | MHz |
bytes | B |
kibibytes | KiB |
mebibytes | MiB |
gibibytes | GiB |
bytes per second | B/s |
megabits | Mbit |
megabits per second | Mbit/s |
Sampler schema types Copied
Below is a list of samplers and if they have an entirely built-in schema, a partially built-in schema, or are entirely user-defined.
Plugin | Type | Comments |
---|---|---|
Gateway-breachPredictor | Built-in | |
Gateway-clientConnectionData | Built-in | |
Gateway-databaseLogging | Built-in | |
Gateway-exportedData | Built-in | |
Gateway-gatewayHubData | Built-in | |
Gateway-gatewayLoad | Built-in | |
Gateway-importedData | Built-in | |
Gateway-includesData | Built-in | |
Gateway-licenceUsage | Built-in | |
Gateway-severityCount | Built-in | |
Gateway-includesData | Built-in | |
Gateway-licenceUsage | Built-in | |
Gateway-managedEntitiesData | Partial | |
Gateway-probeData | Built-in | |
Gateway-scheduledCommandData | Built-in | |
Gateway-scheduledCommandsHistoryData | Built-in | |
Gateway-severityCount | Built-in | |
Gateway-severityData | Built-in | |
Gateway-snoozeData | Built-in | |
Gateway-sql | User-defined | |
Gateway-userAssignmentData | Built-in | |
api | User-defined | |
api-streams | Built-in | |
bloomberg-bpipe | Built-in | |
citrix-apps | Built-in | |
citrix-processes | Built-in | |
citrix-sessions | Built-in | |
citrix-summary | Built-in | |
clearvision-status | Built-in | |
combo | User-defined | |
component-versions | Built-in | |
control-m | Built-in | |
cpu | Built-in | |
desktop-pc-monitoring | Built-in | |
deviceio | Built-in | |
disk | Built-in | |
e4jms-bridges | Built-in | |
e4jms-connections | Built-in | |
e4jms-durables | Built-in | |
e4jms-non-durables | Built-in | |
e4jms-queues | Built-in | |
e4jms-routes | Built-in | |
e4jms-server | Built-in | |
e4jms-topics | Built-in | |
e4jms-usersummary | Built-in | |
euem | Built-in | |
extractor | User-defined | |
fidessa | Built-in | |
fidessa-dq | User-defined | |
fix | Built-in | |
fix-analyser2 | Partial |
Admin data view schema provided, user must define schema for all other dataviews. |
fkm | Partial | |
flm | Partial |
User must define schema for additional data displayed based on configuration . |
ftm | Built-in | |
gl-greffon | Built-in | |
gl-lostorders | User-defined | |
gl-orderbook | User-defined | |
gl-permissions | Built-in | |
gl-router | Built-in | |
gl-slc | Partial |
User must define schema for additional data displayed based on configuration or SLC log file. |
gl-slc-relay | Built-in | |
gl-sle | Built-in | |
gl-sle-tcp | Built-in | |
hardware | Built-in | |
ibmi-job | Built-in | |
ibmi-message | Built-in | |
ibmi-pool | Built-in | |
ibmi-queue | Built-in | |
ibmi-subsystem | Built-in | |
ibmi-system | Built-in | |
informix | Built-in | |
ipc | Built-in | |
ix-ma | User-defined | |
jmx-server | User-defined | |
jmx-threadinfo | Built-in | |
market-data-monitor | User-defined | |
message-tracker | Built-in | |
mibmon | User-defined | |
miss-x | Built-in | |
mq-channel | Built-in | |
mq-qinfo | Built-in | |
mq-queue | Built-in | |
net-ping | Built-in | |
network | Built-in | |
nyxt-papastats | Built-in | |
oracle | Built-in | |
orc | Built-in | |
pats-status | Built-in | |
pats-trading-breaches | Built-in | |
pats-users | Built-in | |
perfmon | User-defined | |
processes | Built-in | |
rest-extractor | User-defined | |
rmc-interface | User-defined | |
sets-slc | Built-in | |
sql-toolkit | User-defined | |
stateTracker | User-defined |
User must define schema for user defined custom column names. |
su | Built-in | |
sybase | Built-in | |
sybase-server | Built-in | |
tcp-links | Built-in | |
tib-rv | Built-in | |
tib-rvpublisher | Built-in | |
tib-rvstream | Built-in | |
toolkit | User-defined | |
top | Built-in | |
trading-technologies | Built-in | |
trapmon | Partial |
User must define schema for user-defined columns in custom view. |
unix-users | Built-in | |
veritas-cluster-server | Built-in | |
web-mon | Built-in | |
win-cluster | Built-in | |
win-services | Built-in | |
winapps | Built-in | |
wmi | User-defined | |
wts-licenses | Built-in | |
wts-processes | Built-in | |
wts-sessions | Built-in | |
wts-summary | Built-in | |
x-broadcast | Built-in | |
x-mcast | Built-in | |
x-multicast | Built-in | |
x-ping | Built-in | |
x-route | Built-in | |
x-services | Built-in | |
x-top | Built-in | |
x-traffic | Built-in |