# RegexSplit

# Usage

The function is used to split text with given regular expression and return specific element.

Name: REGEXSPLIT

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • regex: The regular expression used to split the text. All grammars supported by Java are acceptable, for example, ['"] is expected to match ' and ".
  • index: The wanted index of elements in the split result. It should be an integer no less than -1, default to -1 which means the length of the result array is returned and any non-negative integer is used to fetch the text of the specific index starting from 0.

Output Series: Output a single series. The type is INT32 when index is -1 and TEXT when it's an valid index.

Note: When index is out of the range of the result array, for example 0,1,2 split with , and index is set to 3, no result are returned for that record.

# Examples

Input series:

+-----------------------------+---------------+
|                         Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00|      A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00|      A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00|         B+,B,B|
|2021-01-01T00:00:04.000+08:00|      A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00|       A,B-,B,B|
+-----------------------------+---------------+

SQL for query:

select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1

Output series:

+-----------------------------+------------------------------------------------------+
|                         Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
+-----------------------------+------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00|                                                     4|
|2021-01-01T00:00:02.000+08:00|                                                     4|
|2021-01-01T00:00:03.000+08:00|                                                     3|
|2021-01-01T00:00:04.000+08:00|                                                     4|
|2021-01-01T00:00:05.000+08:00|                                                     4|
+-----------------------------+------------------------------------------------------+

Another SQL for query:

SQL for query:

select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1

Output series:

+-----------------------------+-----------------------------------------------------+
|                         Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00|                                                   B-|
|2021-01-01T00:00:02.000+08:00|                                                   B+|
|2021-01-01T00:00:04.000+08:00|                                                    A|
|2021-01-01T00:00:05.000+08:00|                                                    B|
+-----------------------------+-----------------------------------------------------+