# Split

# Usage

The function is used to split text with given regular expression and return specific element.

Name: SPLIT

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • regex: The regular expression used to split the text. All grammars supported by Java are acceptable, for example, ['"] is expected to match ' and ".
  • index: The wanted index of elements in the split result. It should be an integer no less than -1, default to -1 which means the length of the result array is returned and any non-negative integer is used to fetch the text of the specific index starting from 0.

Output Series: Output a single series. The type is INT32 when index is -1 and TEXT when it's an valid index.

Note: When index is out of the range of the result array, for example 0,1,2 split with , and index is set to 3, no result are returned for that record.

# Examples

Input series:

+-----------------------------+---------------+
|                         Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00|      A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00|      A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00|         B+,B,B|
|2021-01-01T00:00:04.000+08:00|      A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00|       A,B-,B,B|
+-----------------------------+---------------+

SQL for query:

select split(s1, "regex"=",", "index"="-1") from root.test.d1

Output series:

+-----------------------------+-------------------------------------------------+
|                         Time|split(root.test.d1.s1, "regex"=",", "index"="-1")|
+-----------------------------+-------------------------------------------------+
|2021-01-01T00:00:01.000+08:00|                                                4|
|2021-01-01T00:00:02.000+08:00|                                                4|
|2021-01-01T00:00:03.000+08:00|                                                3|
|2021-01-01T00:00:04.000+08:00|                                                4|
|2021-01-01T00:00:05.000+08:00|                                                4|
+-----------------------------+-------------------------------------------------+

Another SQL for query:

SQL for query:

select split(s1, "regex"=",", "index"="3") from root.test.d1

Output series:

+-----------------------------+------------------------------------------------+
|                         Time|split(root.test.d1.s1, "regex"=",", "index"="3")|
+-----------------------------+------------------------------------------------+
|2021-01-01T00:00:01.000+08:00|                                              B-|
|2021-01-01T00:00:02.000+08:00|                                              B+|
|2021-01-01T00:00:04.000+08:00|                                               A|
|2021-01-01T00:00:05.000+08:00|                                               B|
+-----------------------------+------------------------------------------------+