Grouping and bucketing data
Overview
The REST API uses two types of optional grouping syntaxes, depending on the type of data to be grouped. These are grouping and bucketing.
These are explained in further detail below.
Grouping Syntax
Groupings specify how data is grouped for aggregation. Grouping is how data is grouped in a data-related dimension.
The Grouping options are:
rowname
— data is grouped using therowname
property. If norowname
exists for a given metric, it is grouped into an empty group.entity
— data is grouped by the entity ID.entity:[attribute]
— data is grouped by a specified entity attribute. For example,entity:itrs.os
groups by Operating System.
In a query, you can only specify each grouping once or not at all. Specifying no options results in no data groupings being used.
Bucketing Syntax
Bucketing is how data is grouped in the time dimension. Buckets are a fixed set of time intervals into which data is placed.
Bucket intervals
Data is bucketed into a fixed set of bucket sizes which are time intervals. The time intervals are:
1 minute
5 minute
15 minute
1 hour
3 hour
12 hour
1 day
1 week
1 month
Gateway Hub does not support other bucket sizes.
Alignment of buckets
Buckets are aligned to time boundaries. The alignment of buckets is fixed and does not vary with the time range of the data. The alignment of buckets is the following:
- Minute buckets align to exact minutes and divisions of the hour:
1 minute
buckets align to xx:00, xx:01, xx02, and so on.5 minute
buckets align to xx:00, xx:05, xx:10, and so on.15 minute
buckets align to xx:00, xx:15, xx:30, xx:45.
- Hour buckets align to exact divisions of a 24 hour day:
1 hour
buckets align to 12:00, 01:00, 02:00, and so on.3 hour
buckets align to 00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00, 21:00.12 hour
buckets align to 00:00 and 12:00.
1 day
buckets align to the start of a day at 00:00.1 week
buckets follow the ISO 8601 standard for week date, aligned to Mondays.-
1 month
buckets align to calendar month boundaries. That is, 00:00 on 01/01, 00:00 on 01/02, and so on.
Specify bucketing in a query
When specifying a REST API query, there are three options you can use for bucketing:
No buckets
If you do not specify bucketing in your query, data is not grouped in the time dimension. No bucket will be specified in the output. Grouping, if present, is used to group the data.
Bucket duration
You bucket by duration in the following way, where <bucket size>
is one of the bucket sizes listed above:
bucketing: duration: <bucket size>
For example:
bucketing: duration: 5 minute
Gateway Hub buckets the data by the specified size without any fitting.
Bucket count
When bucketing by count, you specify the number of buckets to sort the data in to. You bucket by count in the following way, where <bucket count>
is an integer:
bucketing: count: <bucket count>
For example:
bucketing: count: 24
Gateway Hub selects the most appropriate bucket size that returns a bucket count closest to the count you specified. This calculation is performed in the following way:
- Gateway Hub takes the full duration covered in the time range.
- For each bucket size above, Gateway Hub calculates the number of buckets required using the following information:
- The bucket that the start of the time range falls into.
- The bucket that the end of the time range falls into.
- The total number of buckets required for the time range.
- Gateway Hub selects the bucket size that produced the number of buckets closest to the count you requested. If bucket sizes are equally close to the count requested, Gateway Hub selects the smallest bucket size.
The bucketing process is a best fit and does not throw errors for inexact matches. The result may not contain the number of buckets you requested in the count. This can occur in the following instances:
- If you specify a time range and a count
n
where the resulting bucket size is larger than1 month
, the result will contain more thann
1 month
buckets. - If you specify a range that is not exactly divisible by any duration, the result may contain more or fewer buckets than requested.
Note: If a bucket contains zero data samples, it is not shown in the result. For example, if you specify count: 10
over a 10 hour period, the result only contains 10 buckets if there is actually data present in each hour of the time range.
Examples
You specify an 8 hour range from 08:00-16:00, and request count: 4
:
- Although four
2 hour
buckets is the most precise result for this time range and count, Gateway Hub does not support2 hour
buckets. - Using
1 hour
buckets, eight buckets are required. - Using
3 hour
buckets, the start of the time range at 08:00 is in the 06:00-09:00 bucket. The end of the time range at 16:00 is in the 15:00-18:00 bucket. Between the start and end bucket, there are two more buckets. Therefore, four buckets are required. - Using
12 hour
buckets, the start of the time range at 08:00 is in the 00:00-12:00 bucket. The end of the time range at 16:00 is in the 12:00-00:00 bucket. This covers the whole time range, and therefore two buckets are required. - You requested
count: 4
, and the closest in absolute terms to this is four3 hour
buckets. Be aware that the first bucket starts at 06:00, not 08:00. This is important to understand how the bucketing output works.
You specify a 9 hour range from 01:00-10:00, and request count: 2
:
- Using
1 hour
buckets, nine buckets are required. - Using
3 hour
buckets, the start of the time range at 01:00 is in the 00:00-03:00 bucket. The end of the time range at 10:00 is in the 09:00-12:00 bucket. Between the start and end bucket, there are two more buckets. Therefore, four buckets are required. - Using
12 hour
buckets, the start of the time range at 01:00 and the end of the time range at 10:00 are in the 00:00-12:00 bucket. Therefore, only one bucket is required. - You requested
count: 2
, and the closest in absolute terms to this is one12 hour
bucket. This is less than the count you requested, but is the closest result with the supported bucket sizes.