Data Schema

Overview Copied

Data schemas give Gateway Hub information about the type of data being published from Gateway.

Each dataview that is published to Gateway Hub requires a data schema definition.

The data schema for a dataview specifies:

The name, data type, and any units of measure of each headline.
The name, data type, and any units of measure of each column.
If the dataview requires pivoting.

Schemas are defined in the Publishing tab of the sampler in the Gateway Setup Editor (GSE).

The Gateway comes packaged with some schemas. See Sampler schema types.

Metrics collected using the client library provide their own schemas automatically, so long as a valid Dynamic Entities mapping is defined.

When a data schema does not exist, you can create user defined schema and add these to the sampler. See Create a data schema.

Caution
Data schemas describe dataviews sent from Gateway to Gateway Hub. This should not be confused with the configuration schema that describes the correct XML formatting of Gateway setup files.

Built-in schemas Copied

The Gateway is packaged with built-in schemas for plug-ins with dataviews containing a set of known column names and data types.

Built-in schemas specify pivoting for dataviews containing rows that have only one value column and the data type of this column varies between the rows. For example, see the Hardware Plug-in - Technical Reference. This allows Gateway Hub to treat the rows of these dataviews as columns.

If a built-in schema exists for a sampler, This <sampler name> sampler has predefined schema(s) is displayed in the Publishing tab of a sampler in the GSE. For a list of samplers with built-in schemas, see Sampler schema types.

The GSE also has a command that can be used to view the schemas currently defined for dataviews on a sampler.

If you make any changes to the dataviews in these samplers, for example by adding headlines and columns using the Compute Engine, you must add these additions to the existing schema.

User-defined schemas Copied

Some plug-ins do not come with any data schema definitions. These are plug-ins where the columns are data types are unknown. You must define the data schema for:

All toolkit-like plug-ins. Examples of these include the SQL-TOOLKIT, JMX, and TOOLKIT plug-ins.
Any pre-existing dataviews with any user-defined headlines or columns.

Consider the following examples where a data schema definition is required for pre-existing dataviews:

Additional columns added to an FKM sampler (see Change FKM dataview columns). If the columns are not defined in the schema, an error is generated in Gateway Hub .
Additional columns or headlines added to a CPU sampler via the Compute Engine (see Adding to existing dataviews). If the columns or headlines are not defined in the schema, an error is generated in the GSE.

For how to create a schema, see Create a data schema.

Pivoted dataviews Copied

If you add rows to pre-existing pivoted dataviews, you must define them as if they were additional columns. You do not specify the pivot option if there is a built-in schema for the sampler. If you do, and it conflicts with the built-in schema, it is discarded in favour of the user-defined schema.

View existing schemas Copied

To view existing schemas currently defined for dataviews on a sampler, open Active Console and select Show Current Schema.

In the GSE, the Show Current Schema command is available in the Publishing tab of the sampler.

When the command is run, a window opens showing one table per dataview, describing the schema defined on the sampler. There is a table for every dataview that has a defined schema, irrespective of whether the dataview is in use. Only dataviews that have a schema definition are shown.

Each table combines information from the built-in schema shipped with the Gateway and any additional information added in the GSE.

A description of the columns shown in each table is below:

Column Name	Description
Component	Describes if this component is a headline or a column.
Name	Name of the headline or column.
Type	Data type of the headline or column. The data types are: `boolean` ; `date` ; `dateTime` ; `float32` ; `float64` ; `int32` ; `int64` ; `string` .
Units	Unit of measure assigned to this headline or column (if present).
Source	Origin of the headline or column and the schema. Cells in this column display one of the following: `Base` — The built-in schema shipped with the Gateway. `Overridden` — A built-in schema exists for this headline or column, but has been superseded by the user-defined schema in the Publishing tab of the sampler. `Enriched by Compute Engine` — Headline or column added to the dataview via the Compute Engine. Schema has been defined in the Publishing tab of the sampler. `Defined by User in plugin configuration` — Headline or column added to the dataview via a method other than the Compute Engine e.g. a Toolkit plug-in. Schema has been defined in the Publishing tab of the sampler.

Create a data schema Copied

When connected to a running Gateway you should use the Propose Schema command to create data schemas.

When creating and editing XML configuration files without being connected to a Gateway, you should create schemas manually.

The Propose Schema command Copied

Before using the Propose Schema command, please note the following:

The command attempts to deduce the data types of columns and headlines from each dataview. The deduced data types may be incorrect.
You must always supply any units of measure post-generation.
Any types or units of measure already defined by you in the Publishing tab of a sampler take precedence over those inferred from dataviews by the command.
Tables with only two columns are pivoted in the generated XML schema. A comment is included in the XML highlighting this.

You can run the Propose Schema command from the Active Console or the GSE, it will act differently depending on which component you are using.

When using the Gateway Setup Editor:

The schema is automatically applied to the sampler.
The command runs against all the samplers with using the same plugin on the Gateway. This process can take some time if you have many samplers using the same plugin.
The command times out after 30 seconds.
You can cancel waiting for the command result, but this does not stop the command running on the Gateway.

When using the Active Console:

When used in the Active Console, the schema appears as XML in a new window. You must then copy and paste this XML to your sampler. - You can also use this to create schema definitions as static variables.

Examples Copied

The following are examples of the output of Propose Schema:

If you have added two columns to the CPU sampler, this command generates only the schema definition for those additional columns.
A toolkit-like sampler does not have a built-in schema, and therefore the command generates a complete schema definition for the sampler.

Propose a schema in the GSE Copied

When using the Gateway Setup Editor running the Propose Schema command will query the Gateway and all its running instances of the sampler in order to build the schema.

Caution
Proposing a schema is computationally intensive and may impact the performance of a Gateway.

To generate a schema definition using the Propose Schema command in the GSE, follow these steps:

Select the desired sampler in the GSE.
Navigate to the Publishing tab of the sampler.
Select Propose Schema.

Note
If user-defined schema is already present for the sampler, a dialog opens asking you to if you wish to overwrite the schema information.

The generated schema is directly added to the Publishing section. Any existing schema definitions are overwritten.

After using the command, perform the following:

Select Schema > Dataviews > Data.
Review the schema for errors because the data types and pivoting inferred by the Propose Schema command may be incorrect.

Note
If the value for Pivot is incorrect you must create the schema manually.
Add any units of measure to the headlines and/or columns.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Propose a schema in Active Console Copied

In Active Console you can use the Propose Schema command to build schema for specific dataviews. This will not effect any connected Gateways.

To generate a schema definition using the Propose Schema command in Active Console, follow these steps:

Make sure Gateway Hub is enabled in the GSE.
Right-click a sampler or dataview.
Navigate to Sampler Schema.
Select Propose Schema. The generated XML schema definition for the sampler appears in a new window.
Right-click the window with the generated XML in the Active Console and select Copy All.
Navigate to the GSE, right-click the correct sampler and select Paste Schema.

Warning
No checks are performed when using Paste Schema on a sampler. Any existing schema is overwritten, and any copied schema can be pasted on to any sampler.

After using Paste Schema, perform the following:

Navigate to the Publishing tab of the sampler to view the schema definition.
Select Schema > Dataviews > Data.
Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
Add any units of measure to the headlines and/or columns.
(Optional) Specify pivoting.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Note
Paste Schema is only available when valid XML has been copied to the clipboard.

How to use Paste Schema to create static variables Copied

The generated XML output of the Propose Schema command in the Active Console can also be used to create sampler schemas as static variables. Schemas saved as static variables can be selected in the Publishing tab of a sampler.

To use the output of the Propose Schema command to produce schemas as static variables, follow these steps:

Right-click on the window with the generated XML in the Active Console and select Copy All.
Navigate to the GSE.
Right-click Static variables > Sampler-schemas and select Paste Schema.

A static variable containing a schema definition for each dataview in the copied sampler is created. The name used for the static variable is the name of the dataview.

After using Paste Schema, perform the following for each static variable:

Review the schema for errors because the data types inferred by the Propose Schema command may be incorrect.
Add any units of measure to the headlines and/or columns.
(Optional) Specify pivoting.

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Note
Paste Schema is only available when valid XML has been copied to the clipboard.

How to define a schema manually Copied

To define a schema manually, perform the following:

Open your Gateway Setup Editor.
Navigate to the Publishing tab of the sampler you want to make a schema for.
In Schema > Dataviews, click Add new. You must provide an entry for each dataview in the sampler that requires a schema definition.
In the Dataview field, enter the name of the dataview.
In the Schema field, choose data.
Click Data.
Add as many new Headlines and Columns entries as you require.
In the Name field, add the name of the headline or column in the dataview.
Under options, choose the type of data represented by the headline or column. If you choose Int32, Int64, Float32 or Float64, select the appropriate Unit of measure.
(Optional) If the dataview is pivoted, tick the box by Pivot.
Close the tab.
Click Validate current document to review any errors.
Click Save current document .

For more information about types, units of measure, and pivoting, see Gateway Hub configuration.

Schema inference Copied

Beginning with version 2.4.x, Gateway Hub can use the schema inference feature to infer data schemas for data published by a Gateway when a built-in or user-defined data schema does not exist. This is useful in cases where you have a large number of toolkits, and creating user-defined schemas for each may take a long time.

Inference produces only a best guess of the appropriate data schema, and the ultimate quality of an inferred data schema is dependent on the quality and consistency of the published data.

Note the following limitations:

Inference only recognises dates in the yyyy-mm-dd format.
Inference does not capture units, and interprets values with a unit as a raw number. For example, if a cell has a value of 987 MB, then schema inference interprets this as 987.
Headline data schema are inferred separately from table data schema.

In most cases, it is highly recommended that the you provide your own data schemas since this is the best way to ensure data schema accuracy and prevent loss of data due to misconfiguration.

Inference modes Copied

You can configure schema inference to run in one of three modes: Naive, Basic, or Smart. By default, schema inference will run in Smart mode and this is the recommenced setting.

Caution
In all cases, the data schema inferred is only as good as the data that has been observed. If the data structure changes after inference, then you must update the data schema manually or risk dropping further datapoints.

Mode	Description	Advantages	Disadvantages
Naive	Naive inference uses a single datapoint to infer a very basic data schema where all fields are type `string`.	Simplicity means that the generated data schema ensures that data is always ingested providing that the structure of the data does not change after inference. Using one datapoint ensures that you do not lose data during inference.	Some metric functionality may be unavailable, since all fields are typed as `string`. If the data structure changes after inference then ingestion is impacted.
Basic	Basic inference uses a single datapoint to infer a more detailed data schema than Naive inference. Where field data is parsed as numeric, they are assigned the type `float64` making it accessible to all metric query functionality. Non-numeric fields are assigned the type `string`.	Improved numerical data handing compared to Naive inference. Using one datapoint ensures that you do not lose data during inference.	Increased likelihood of errors resulting from using a single datapoint. Especially if new fields are added after inference.
Smart	Smart inference uses a multi-datapoint inference model. This is the default and recommend setting. You must configure the minimum number of datapoints to use over a defined inference period. Once the inference period is over (measured using sample time not clock time), if the inference engine has at least the minimum number of datapoints, it will perform a smart evaluation of property types. All supported types will be inferred, and any numerics will always be `float64`. When setting the minimum number of datapoints, you should consider that you lose the datapoints used for inference. Additionally, any datapoints currently in use for inference are lost if the normaliser is shut down. This restarts the inference period and requires that datapoints are collected again. The higher the number of datapoints used for inference the higher the accuracy, but this also increases the amount of datapoints that can be lost. Variations in inference duration are small, as no inference will be performed until the full period is complete.	Significantly improved inference compared to Naive and Basic methods. Includes increased user control. Can handle new fields added during the inference period. Where any field cannot be inferred confidently, the engine will revert to a Naive inference and assign the `string` type.	Additional configuration required. Datapoints used to infer the schema are lost.

Consider as an example, a sequence of four datapoints received over the inference time period such as 10 minutes. Each inference mode will treat the same data differently.

Datapoint	Naive	Basic	Smart (using 3 samples for inference)
123.45 days (`string`)	123.45 days (`string`)	123.45 (`float64`)	Used as training data, not stored.
250.56 days (`string`)	250.56 days (`string`)	250.56 (`float64`)	Used as training data, not stored.
unavailable (`string`)	unavailable (`string`)	ingestion error	Used as training data, not stored.
4.36 days (`string`)	4.36 days (`string`)	4.36 (`float64`)	4.36 days (`string`)

In cases with missing data, this can change the inferences made.

Datapoint	Naive	Basic	Smart (using 3 samples for inference)
no data	omitted	omitted	Used as training data, not stored.
no data	omitted	omitted	Used as training data, not stored.
321.54 days (`string`)	ingestion error	ingestion error	Used as training data, not stored.
4.36 days (`string`)	ingestion error	ingestion error	4.36 (`float64`)

Gateway Hub configuration Copied

You can configure data schema inference in Gateway Hub during installation or using hubctl with your installation descriptor.

For the most up-to-date information about configuration options, see Install Gateway Hub and hubctl tool.

The following configuration options are available:

Option	Description
hub_normaliserd_inference_enabled	Enable or disable data schema inference. Choose from `true` or `false`.
hub_normaliserd_inference_mode	Set the inference mode. Choose from `Naive`, `Basic` or `Smart`.
hub_normaliserd_inference_smart_min_samples	Minimum number of samples required before Smart inference can be used. This setting only applies if `hub_normaliserd_inference_mode` is set to `Smart`. Smart inference occurs after a duration set by `inferenceWaitDurationSeconds`. If at that time Gateway Hub has received at least `minSamplesForInference` samples, then Smart inference is performed. Otherwise, Naive inference is used.
hub_normaliserd_inference_smart_duration_seconds	Duration in seconds to wait before performing Smart inference. This setting only applies if `hub_normaliserd_inference_mode` is set to `Smart`.
hub_normaliserd_inference_smart_threshold	Percentage of samples received inside the inference duration that must be of a specific type, for a field to be matched to that type. For example, if Gateway Hub has received `10` samples by the end of the inference duration, and the threshold is `0.5`, then `5` samples must be type numeric and the remainder null (or effectively null) in order for the associated field to be considered type numeric.

Gateway configuration Copied

Gateway version 5.7.x or later is required in order to publish dataviews without a schema. If dataviews without a schema are published to a Gateway Hub version that does not include schema inference, then a large number of ingestion errors will be reported.

Gateway will try to publish with a data schema if possible, and will not publish data if it has a data schema with errors.

As a result, the following scenarios are possible:

If a dataview has a data schema and Gateway detects no errors, then it will be published with the provided data schema.
If the publish setting of a dataview is set to false, then it will not be published.
If a dataview has an empty data schema or a data schema that contains errors, then it will not be published.
If a dataview has no data schema, then the dataview will be published without a data schema.

Data schema parameters Copied

Units of measure used in schemas Copied

Name	Symbol
percent	%
seconds	s
milliseconds	ms
microseconds	μs
nanoseconds	ns
days	d
per second	s-1
megahertz	MHz
bytes	B
kibibytes	KiB
mebibytes	MiB
gibibytes	GiB
bytes per second	B/s
megabits	Mbit
megabits per second	Mbit/s

Sampler schema types Copied

Below is a list of samplers and if they have an entirely built-in schema, a partially built-in schema, or are entirely user-defined.

Plugin	Type	Comments
Gateway-breachPredictor	Built-in
Gateway-clientConnectionData	Built-in
Gateway-databaseLogging	Built-in
Gateway-exportedData	Built-in
Gateway-gatewayHubData	Built-in
Gateway-gatewayLoad	Built-in
Gateway-importedData	Built-in
Gateway-includesData	Built-in
Gateway-licenceUsage	Built-in
Gateway-severityCount	Built-in
Gateway-includesData	Built-in
Gateway-licenceUsage	Built-in
Gateway-managedEntitiesData	Partial
Gateway-probeData	Built-in
Gateway-scheduledCommandData	Built-in
Gateway-scheduledCommandsHistoryData	Built-in
Gateway-severityCount	Built-in
Gateway-severityData	Built-in
Gateway-snoozeData	Built-in
Gateway-sql	User-defined
Gateway-userAssignmentData	Built-in
api	User-defined
api-streams	Built-in
bloomberg-bpipe	Built-in
citrix-apps	Built-in
citrix-processes	Built-in
citrix-sessions	Built-in
citrix-summary	Built-in
clearvision-status	Built-in
combo	User-defined
component-versions	Built-in
control-m	Built-in
cpu	Built-in
desktop-pc-monitoring	Built-in
deviceio	Built-in
disk	Built-in
e4jms-bridges	Built-in
e4jms-connections	Built-in
e4jms-durables	Built-in
e4jms-non-durables	Built-in
e4jms-queues	Built-in
e4jms-routes	Built-in
e4jms-server	Built-in
e4jms-topics	Built-in
e4jms-usersummary	Built-in
euem	Built-in
extractor	User-defined
fidessa	Built-in
fidessa-dq	User-defined
fix	Built-in
fix-analyser2	Partial	Admin data view schema provided, user must define schema for all other dataviews.
fkm	Partial
flm	Partial	User must define schema for additional data displayed based on configuration .
ftm	Built-in
gl-greffon	Built-in
gl-lostorders	User-defined
gl-orderbook	User-defined
gl-permissions	Built-in
gl-router	Built-in
gl-slc	Partial	User must define schema for additional data displayed based on configuration or SLC log file.
gl-slc-relay	Built-in
gl-sle	Built-in
gl-sle-tcp	Built-in
hardware	Built-in
ibmi-job	Built-in
ibmi-message	Built-in
ibmi-pool	Built-in
ibmi-queue	Built-in
ibmi-subsystem	Built-in
ibmi-system	Built-in
informix	Built-in
ipc	Built-in
ix-ma	User-defined
jmx-server	User-defined
jmx-threadinfo	Built-in
market-data-monitor	User-defined
message-tracker	Built-in
mibmon	User-defined
miss-x	Built-in
mq-channel	Built-in
mq-qinfo	Built-in
mq-queue	Built-in
net-ping	Built-in
network	Built-in
nyxt-papastats	Built-in
oracle	Built-in
orc	Built-in
pats-status	Built-in
pats-trading-breaches	Built-in
pats-users	Built-in
perfmon	User-defined
processes	Built-in
rest-extractor	User-defined
rmc-interface	User-defined
sets-slc	Built-in
sql-toolkit	User-defined
stateTracker	User-defined	User must define schema for user defined custom column names.
su	Built-in
sybase	Built-in
sybase-server	Built-in
tcp-links	Built-in
tib-rv	Built-in
tib-rvpublisher	Built-in
tib-rvstream	Built-in
toolkit	User-defined
top	Built-in
trading-technologies	Built-in
trapmon	Partial	User must define schema for user-defined columns in custom view.
unix-users	Built-in
veritas-cluster-server	Built-in
web-mon	Built-in
win-cluster	Built-in
win-services	Built-in
winapps	Built-in
wmi	User-defined
wts-licenses	Built-in
wts-processes	Built-in
wts-sessions	Built-in
wts-summary	Built-in
x-broadcast	Built-in
x-mcast	Built-in
x-multicast	Built-in
x-ping	Built-in
x-route	Built-in
x-services	Built-in
x-top	Built-in
x-traffic	Built-in

Previous article Next article

Data Schema

Overview Copied

Built-in schemas Copied

User-defined schemas Copied

Pivoted dataviews Copied

View existing schemas Copied

Create a data schema Copied

The Propose Schema command Copied

Examples Copied

Propose a schema in the GSE Copied

Propose a schema in Active Console Copied

How to use Paste Schema to create static variables Copied

How to define a schema manually Copied

Schema inference Copied

Inference modes Copied

Gateway Hub configuration Copied

Gateway configuration Copied

Data schema parameters Copied

Units of measure used in schemas Copied

Sampler schema types Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!