# ConsecutiveSequences
# Usage
This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data.
Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed.
Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest.
Name: CONSECUTIVESEQUENCES
Input Series: Support multiple input series. The type is arbitrary but the data is strictly equispaced.
Parameters:
gap
: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals.
Output Series: Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence.
Note: For input series that is not strictly equispaced, there is no guarantee on the output.
# Examples
# Manually Specify the Standard Time Interval
It's able to manually specify the standard time interval by the parameter gap
. It's notable that false parameter leads to false output.
Input series:
+-----------------------------+---------------+---------------+
| Time|root.test.d1.s1|root.test.d1.s2|
+-----------------------------+---------------+---------------+
|2020-01-01T00:00:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:05:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:10:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:20:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:25:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:30:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:35:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:40:00.000+08:00| 1.0| null|
|2020-01-01T00:45:00.000+08:00| 1.0| 1.0|
|2020-01-01T00:50:00.000+08:00| 1.0| 1.0|
+-----------------------------+---------------+---------------+
SQL for query:
select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1
Output series:
+-----------------------------+------------------------------------------------------------------+
| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")|
+-----------------------------+------------------------------------------------------------------+
|2020-01-01T00:00:00.000+08:00| 3|
|2020-01-01T00:20:00.000+08:00| 4|
|2020-01-01T00:45:00.000+08:00| 2|
+-----------------------------+------------------------------------------------------------------+
# Automatically Estimate the Standard Time Interval
When gap
is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended.
Input series is the same as above, the SQL for query is shown below:
select consecutivesequences(s1,s2) from root.test.d1
Output series:
+-----------------------------+------------------------------------------------------+
| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)|
+-----------------------------+------------------------------------------------------+
|2020-01-01T00:00:00.000+08:00| 3|
|2020-01-01T00:20:00.000+08:00| 4|
|2020-01-01T00:45:00.000+08:00| 2|
+-----------------------------+------------------------------------------------------+