General Functions
Transformation functions manipulate and process data values.
ABS
Returns the absolute value of the given input.
SYNTAX
abs(input)
input
- Type: number
- Description: The attribute to return absolute value for
OUTPUT
number
EXAMPLE
SELECT abs(number_value) as absolute_value
number_value | → | absolute_value |
---|---|---|
1.0 | 1.0 | |
-1.0 | 1.0 | |
-123.45 | 123.45 | |
0 | 0 | |
Infinity | Infinity | |
-Infinity | Infinity | |
null | null |
ADD
Returns the sum of two integers.
SYNTAX
add(integer1, integer2)
integer1
- Type: int
- Description: The first integer to add
integer2
- Type: int
- Description: The second integer to add
OUTPUT
int
EXAMPLE
SELECT add(apples, oranges) AS total_fruit
apples | oranges | → | total_fruit |
---|---|---|---|
13 | 3 | 16 | |
8 | 2 | 10 | |
-15 | 5 | -10 | |
2 | 4 | 6 | |
11 | 5 | 16 | |
4 | null | null | |
null | 3 | null | |
null | null | null |
ARRAY
Returns an array with the given elements.Tip: To find the function that best meets your needs, review similar function array.of.
SYNTAX
array(elements)
elements
- Type: any
- Description: The elements to include in the array
OUTPUT
{'type': 'array', 'elementType': 'any'}
EXAMPLES
SELECT array(opening_act, headliner) AS bands
opening_act | headliner | → | bands |
---|---|---|---|
'Pink Floyd' | 'Jimi Hendrix Experience' | ['Pink Floyd', 'Jimi Hendrix Experience'] | |
null | 'Beyoncé' | [null, 'Beyoncé'] |
SELECT array(...some_array) AS identical
some_array | → | identical |
---|---|---|
[1, 2, 3] | [1, 2, 3] | |
[2] | [2] | |
[null, 2, null] | [null, 2, null] |
ARRAY_CONCAT
Converts all values in an array into strings and then concatenates them together into a single string. Tip: To find the function that best meets your needs, review these similar functions: ARRAY.CONCAT, STR_JOIN, ARRAY.APPEND.
SYNTAX
array_concat(array)
array
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The array whose values will be concatenated into a single string
OUTPUT
string
EXAMPLE
SELECT array_concat(example_arrays) AS concatenated_arrays
example_arrays | → | concatenated_arrays |
---|---|---|
['meow'] | 'meow' | |
['This', ' ', 'is', ' ', 'a', ' ', 'sentence.'] | 'This is a sentence.' | |
[''] | '' | |
[null] | '' | |
[null, 'ta', '', null, 'mr', ''] | 'tamr' | |
null | null |
ATOF
Converts a string to a floating point value. (ASCII-to-Float)Note: This function is available to maintain backward compatibility only. Use TO_DOUBLE instead.
SYNTAX
atof(string_to_convert)
string_to_convert
- Type: string
- Description: The string to convert to a floating point value.
OUTPUT
double
EXAMPLE
SELECT atof(example_strings) AS example_strings_as_double;
example_strings | → | example_strings_as_double |
---|---|---|
'42' | 42.0 | |
'-42' | -42.0 | |
'-273.15' | -273.15 | |
'7.' | 7.0 | |
'.7' | 0.7 | |
'-.7' | -0.7 | |
'42E5' | 4200000.0 | |
'42.E+42' | 4.2e+43 | |
'6.022E-23' | 6.022e-23 | |
'abc123' | null | |
'' | null | |
null | null |
ATOI
Converts a string to an integer value. (ASCII-to-Integer)Note: This function is available to maintain backward compatibility only. Use TO_INT instead.
SYNTAX
atoi(string_to_convert)
string_to_convert
- Type: string
- Description: The string to convert to an integer value.
OUTPUT
int
EXAMPLE
SELECT atoi(example_strings) AS example_strings_as_int;
example_strings | → | example_strings_as_int |
---|---|---|
'42' | 42 | |
'42.0' | 42 | |
'3' | 3 | |
'3.14' | 3 | |
'1' | 1 | |
'1.' | 1 | |
'-300' | -300 | |
'-300.00000' | -300 | |
'abc123' | null | |
'' | null | |
null | null |
CEIL
Returns the smallest integer that is greater than or equal to the given number.Tip: To find the function that best meets your needs, review these similar functions: ROUND, FLOOR.
SYNTAX
ceil(input)
input
- Type: number
- Description: The input to compute the ceiling for.
OUTPUT
long
EXAMPLES
SELECT ceil(input_value) AS ceiling
input_value | → | ceiling |
---|---|---|
1.1 | 2 | |
1.9 | 2 | |
-1.1 | -1 | |
-1.9 | -1 | |
0 | 0 | |
NaN | 0 | |
-Infinity | -9223372036854775808 | |
Infinity | 9223372036854775807 | |
5e-324 | 1 | |
1.7976931348623157e+308 | 9223372036854775807 | |
null | null |
SELECT ceil(input_value) AS ceiling
input_value | → | ceiling |
---|---|---|
42 | 42 | |
-42 | -42 | |
null | null |
COALESCE
Returns the first input that is not null, or null if all inputs are null. When using COALESCE on an array, note that arrays that contain only nulls or only empty values are not themselves null. COALESCE will return these as the first non-null value found. Similar functions: NON_NULL and NON_EMPTIES.
SYNTAX
coalesce(a)
a
- Type: any
- Description: the inputs to coalesce
OUTPUT
any
EXAMPLES
SELECT coalesce(Attr_1, Attr_2) AS nonNullNumber
Attr_1 | Attr_2 | → | nonNullNumber |
---|---|---|---|
null | 10 | 10 | |
10 | null | 10 | |
null | null | null | |
0 | 10 | 0 |
SELECT coalesce(attr_1, attr_2) AS nonNullValue
attr_1 | attr_2 | → | nonNullValue |
---|---|---|---|
[null] | ['c', 'd'] | [null] | |
['', ''] | ['c', 'd'] | ['', ''] | |
['a', 'b'] | ['c', 'd'] | ['a', 'b'] | |
[null, 'b'] | ['c', 'd'] | [null, 'b'] | |
null | ['c', 'd'] | ['c', 'd'] | |
[] | ['c', 'd'] | [] |
CONCAT
Concatenates string values. If any of the arguments to concat is null, it returns null.
SYNTAX
concat(a)
a
- Type: string
- Description: String values to concatenate
OUTPUT
string
EXAMPLES
SELECT concat(first_name, last_name) AS combined
first_name | last_name | → | combined |
---|---|---|---|
'david' | 'bowie' | 'davidbowie' | |
'prince' | null | null |
SELECT concat(...string_array) AS combined
string_array | → | combined |
---|---|---|
['a', 'b', 'c'] | 'abc' | |
['a', null] | 'a' | |
[] | '' |
DATEDIFF
Gets the difference in days between the start and end date. The result will be negative if the end date occurs before the start date. More precisely, this function counts the number of 24-hour periods between the start and end dates.Tip: To find the function that best meets your needs, review similar function ISO_DIFFERENCE_DAYS.
SYNTAX
datediff(endDate, startDate)
endDate
- Type: string
- Description: The end date
startDate
- Type: string
- Description: The start date
OUTPUT
long
EXAMPLE
SELECT start, "end", datediff("end", start) AS diff
start | end | → | start | diff | end |
---|---|---|---|---|---|
'1991' | '1992' | '1991' | 365 | '1992' | |
'2004-03-01' | '2004-02-28' | '2004-03-01' | -2 | '2004-02-28' | |
'2016-12-25T06:44+02:00' | '2016-12-24 23:44-05:00' | '2016-12-25T06:44+02:00' | 0 | '2016-12-24 23:44-05:00' | |
'1877-05-03 11:18:34.399+14:00' | '1877-05-03T11:18:34.399-10:00' | '1877-05-03 11:18:34.399+14:00' | 1 | '1877-05-03T11:18:34.399-10:00' | |
'2004-12-12' | null | '2004-12-12' | null | null | |
'1982-02-01:35:55' | '1980-15-21 06:30' | '1982-02-01:35:55' | null | '1980-15-21 06:30' | |
'7 March 1779' | '1779-03-11' | '7 March 1779' | null | '1779-03-11' |
DATETIME_ERROR
Tries to convert the input to a datetime, returning the conversion error, if any.
SYNTAX
datetime_error(input)
input
- Type: string
- Description: The input to try to convert.
OUTPUT
string
EXAMPLE
SELECT datetime_error(date) as error
date | → | error |
---|---|---|
null | null | |
'1990-03-07' | null | |
'1990-03-07T15:44' | null | |
'1990-03-07T15:44:10.392+06:00' | null | |
'' | 'Text '' could not be parsed at index 0' | |
'I am not a date' | 'Text 'I am not a date' could not be parsed at index 0' | |
'2018-03-oops' | 'Text '2018-03-oops' could not be parsed, unparsed text found at index 7' | |
'2018-03-95' | 'Text '2018-03-95' could not be parsed: Invalid value for DayOfMonth (valid values 1 - 28/31): 95' |
DATETIME_TO_ISO
Formats a datetime string to the ISO_LOCAL_DATE_TIME
format using the specified patterns. Patterns are matched in sequence, and the function will return the first matched pattern. Will try out a preset list of patterns in addition to the patterns supplied by the user (see second example below for some details). If no patterns are matched, this function returns null.
SYNTAX
datetime_to_iso(datetime_string, patterns)
datetime_string
- Type: string
- Description: A datetime string
patterns
- Type: {'type': 'array', 'elementType': 'string'}
- Description: An array of patterns for parsing the input
OUTPUT
string
EXAMPLES
SELECT datetime_to_iso(datetime, patterns_array) AS datetimes_as_iso
datetime | patterns_array | → | datetimes_as_iso |
---|---|---|---|
'2013-08-23-08:45:00' | ['yyyy-MM-dd-HH:mm:ss'] | '2013-08-23T08:45:00' | |
'01 Apr 2017' | ['dd MMM yyyy'] | '2017-04-01' | |
'24 05 1997' | ['dd MMM yyyy', 'dd MM yyyy'] | '1997-05-24' | |
'24 May 1997' | ['dd MMM yyyy', 'dd MM yyyy'] | '1997-05-24' |
SELECT datetime_to_iso(datetime, patterns_array) AS datetimes_as_iso
datetime | patterns_array | → | datetimes_as_iso |
---|---|---|---|
'' | ['dd MMM yyyy'] | null | |
null | ['dd MMM yyyy'] | null | |
'01 Apr 2017' | null | '2017-04-01' | |
'01 Apr 2017' | [null] | '2017-04-01' | |
'01-Apr-2017' | [''] | '2017-04-01' | |
'nomatch' | ['dd MMM yyyy'] | null | |
'01-Apr-2017' | ['%@#*$&'] | '2017-04-01' |
DATE_AND_TIME_TO_ISO
Formats a date and time string to ISO datetime format using the specified patterns. Patterns are matched in sequence, and the function will return the first matched pattern. Will try out a preset list of patterns in addition to the patterns supplied by the user (see second example below for some details). If no patterns are matched, this function returns null.
SYNTAX
date_and_time_to_iso(date_string, time_string, patterns)
date_string
- Type: string
- Description: The string representing the date
time_string
- Type: string
- Description: The string representing the time
patterns
- Type: {'type': 'array', 'elementType': 'string'}
- Description: The patterns used for matching the date and time to the ISO format
OUTPUT
string
EXAMPLES
SELECT date_and_time_to_iso(dates, times, patterns_arrays) AS datetimes_as_iso
dates | times | patterns_arrays | → | datetimes_as_iso |
---|---|---|---|---|
'2013-08-23' | '08:45:00' | ['yyyy-MM-dd:HH:mm:ss'] | '2013-08-23T08:45:00' | |
'26-Feb-2009' | '08 27' | ['dd-MMM-yyyy:HH mm'] | '2009-02-26T08:27:00' | |
'2017-08-14' | '11:45:47.999' | ['yyyy-MM-dd:HH:mm:ss.SSS'] | '2017-08-14T11:45:47.999' | |
'Aug-23-2013' | null | ['MMM-dd-yyyy'] | '2013-08-23' | |
null | 'Aug-23-2013' | ['MMM-dd-yyyy'] | '2013-08-23' |
SELECT date_and_time_to_iso(dates, times, patterns_arrays) AS datetimes_as_iso
dates | times | patterns_arrays | → | datetimes_as_iso |
---|---|---|---|---|
'' | '08:45:00' | ['yyyy-MM-dd:HH:mm:ss'] | null | |
'2013-08-23' | '' | ['yyyy-MM-dd:HH:mm:ss'] | '2013-08-23' | |
'2013-08-23' | '08:45:00' | [''] | '2013-08-23T08:45:00' | |
null | '08:45:00' | ['yyyy-MM-dd:HH:mm:ss'] | null | |
'2013-08-23' | null | ['yyyy-MM-dd:HH:mm:ss'] | '2013-08-23' | |
'2013-08-23' | '08:45:00' | [null] | '2013-08-23T08:45:00' | |
'2013-08-23' | '08:45:00' | null | '2013-08-23T08:45:00' |
DAYOFMONTH
Gets the day-of-month field of a date.
SYNTAX
dayofmonth(datetime)
datetime
- Type: string
- Description: The timestamp to get the day of month from
OUTPUT
int
EXAMPLES
SELECT dayOfMonth(date) AS day
date | → | day |
---|---|---|
'1997-10-03' | 3 | |
'2077-06-23+06:45' | 23 | |
'2001-03-14T15:43:19.155' | 14 | |
'1580-01-04 06:30-04:00' | 4 |
SELECT dayOfMonth(date) AS day, datetime_error(date) as error
description | date | → | error | day |
---|---|---|---|---|
'null value' | null | null | null | |
'empty string' | '' | 'Text '' could not be parsed at index 0' | null | |
'not a date' | 'I am not a date' | 'Text 'I am not a date' could not be parsed at index 0' | null | |
'starts date-like' | '2018-03-oops' | 'Text '2018-03-oops' could not be parsed, unparsed text found at index 7' | null | |
'bad date' | '2018-03-95' | 'Text '2018-03-95' could not be parsed: Invalid value for DayOfMonth (valid values 1 - 28/31): 95' | null |
DECIMAL
Converts a numeric value to double type.Tip: For most use cases, Tamr recommends using the more fully-featured TO_DOUBLE function instead of DECIMAL.
SYNTAX
decimal(input)
input
- Type: number
- Description: The value to convert
OUTPUT
double
EXAMPLE
SELECT decimal(integer_value) as double_value
integer_value | → | double_value |
---|---|---|
1 | 1.0 | |
-1 | -1.0 | |
0 | 0.0 | |
null | null |
DISTINCT
Returns an array of all distinct items from the given array converted to strings.Tip: For most use cases, Tamr recommends using the more fully-featured ARRAY.DISTINCT function instead of DISTINCT.
SYNTAX
distinct(array)
array
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The array to remove duplicates from
OUTPUT
{'type': 'array', 'elementType': 'string'}
EXAMPLE
SELECT distinct(array) as distinctified
array | → | distinctified |
---|---|---|
[1, 1, 2, 1, 2, 3, 1, 3] | ['1', '2', '3'] | |
[1, 2, 3] | ['1', '2', '3'] | |
[null, null] | [null] | |
[] | [] | |
null | null |
DURATION_TO_ISO
Converts to a given duration in the specified time unit to the ISO-8601 format. If minimal is true, the format is PnYnMnDTnHnMnS, which stands for 'Period: number of Years, number of Months, number of Days, Time: number of Hours, number of Minutes, number of Seconds.' If minimal is false, only the significant units are shown.
SYNTAX
duration_to_iso(duration, time_unit, minimal)
duration
- Type: double
- Description: The positive duration value to convert to iso
time_unit
- Type: string
- Description: A three letter string representing the unit of the duration. Valid options are: 'SEC', 'MIN', 'HRS', 'DAY', 'WKS'
minimal
- Type: bool
- Description: Determines whether the duration is returned with only the significant values
OUTPUT
string
EXAMPLES
SELECT duration_to_iso(duration, time_unit, minimal) as isofied_duration
duration | time_unit | minimal | → | isofied_duration |
---|---|---|---|---|
10.0 | 'SEC' | False | 'P0Y0M0DT0H0M10.000S' | |
60.0 | 'SEC' | False | 'P0Y0M0DT0H1M0.000S' | |
61.5 | 'MIN' | False | 'P0Y0M0DT1H1M30.000S' | |
32 | 'HRS' | False | 'P0Y0M1DT8H0M0.000S' | |
2.483 | 'DAY' | False | 'P0Y0M2DT11H35M31.200S' | |
3.42 | 'WKS' | False | 'P0Y0M23DT22H33M36.000S' | |
10.5 | 'MIN' | True | 'PT10M30S' | |
2.483 | 'DAY' | True | 'PT59H35M31.2S' | |
-3.45 | 'daY' | False | 'P0Y0M-3DT-10H-48M0.000S' |
SELECT duration_to_iso(duration, time_unit, minimal) as isofied_duration
duration | time_unit | minimal | → | isofied_duration |
---|---|---|---|---|
1.0 | '' | True | null | |
null | 'SEC' | False | null | |
60.0 | null | False | null | |
61.5 | 'MIN' | null | null |
EQUALS_IGNORE_CASE
Case-insensitively checks whether two values are equal.
SYNTAX
equals_ignore_case(str1, str2)
str1
- Type: string
- Description: The first value to compare
str2
- Type: string
- Description: The second value to compare
OUTPUT
bool
EXAMPLE
SELECT equals_ignore_case(family_member, state) as is_equal
family_member | state | → | is_equal |
---|---|---|---|
'ma' | 'MA' | True | |
'PA' | 'pa' | True | |
'Sister' | 'CA' | False | |
'Brother' | null | null | |
null | 'WY' | null | |
null | null | null |
EXTRACT
Matches a string against a Java regular expression, and returns the specified group. The group 0 refers to the entire pattern. Indexes of individual groups start at 1. Returns null if no match is found, the group number or regular expression is invalid, or any of the input, regular expression, or group is null, the result is null. See Working with Regular Expressions.
SYNTAX
extract(input, regex, group_index)
input
- Type: string
- Description: The string to be matched
regex
- Type: string
- Description: The Java regular expression specifying the pattern to extract
group_index
- Type: int
- Description: The index of the group in the regular expression to extract
OUTPUT
string
EXAMPLES
SELECT extract('2001: A Space Odyssey', '(\\d+):\\s?([\\w\\s]+)', group_idx) AS result
group_idx | → | result |
---|---|---|
0 | '2001: A Space Odyssey' | |
1 | '2001' | |
2 | 'A Space Odyssey' | |
3 | null | |
-1 | null | |
null | null |
SELECT extract(input, regex, group_index) as result
input | regex | group_index | → | result |
---|---|---|---|---|
'null values' | 'null' | null | null | |
'null values' | null | 1 | null | |
null | 'null' | 1 | null | |
null | null | null | null |
SELECT extract('invalid regex', regex, 1) as result
regex | → | result |
---|---|---|
'(invalid regex' | null | |
'(wrong regex)' | null |
EXTRACT_ALL
Matches a string against a Java regular expression, and returns results for all groups. Returns null if no match is found, the regular expression is invalid, or either of the input or regular expression is null. See Working with Regular Expressions.
SYNTAX
extract_all(input, regex)
input
- Type: string
- Description: The string to be matched
regex
- Type: string
- Description: The Java regular expression specifying the pattern to extract
OUTPUT
{'type': 'array', 'elementType': 'string'}
EXAMPLE
SELECT extract_all(input, regex) as result
input | regex | → | result |
---|---|---|---|
'2001: A Space Odyssey' | '(\d+):\s?([\w\s]+)' | ['2001', 'A Space Odyssey'] | |
'Gone With The Wind' | '^.The.$' | [] | |
'The Good, The Bad and The Ugly' | '(The)' | ['The', 'The', 'The'] | |
'The given regex is invalid' | '(' | null | |
'barb' | '(b*)' | ['b', '', '', 'b', ''] | |
null | '(The)' | null | |
'The Third Man' | null | null | |
null | null | null |
FLOOR
Returns the greatest integer that is smaller than or equal to the given number.Tip: To find the function that best meets your needs, review these similar functions: ROUND, CEIL.
SYNTAX
floor(input)
input
- Type: number
- Description: The input to compute the floor for.
OUTPUT
long
EXAMPLE
SELECT floor(input_value) AS floored_value
input_value | → | floored_value |
---|---|---|
1.1 | 1 | |
1.9 | 1 | |
-1.1 | -2 | |
-1.9 | -2 | |
0.0 | 0 | |
null | null | |
NaN | 0 | |
-Infinity | -9223372036854775808 | |
Infinity | 9223372036854775807 |
GET
Return the value in the array at the specified index. If the index is invalid, return null. An index is invalid if it is negative, not an integer, or greater than or equal to the size of the array it is applied to.Tip: To find the function that best meets your needs, review these similar functions: ARRAY.SLICE, ARRAY.SLICE2.
SYNTAX
get(data, index)
data
- Type: {'type': 'array', 'elementType': 'any'}
- Description: An array of values that you can index into.
index
- Type: int
- Description: The index to apply to the data argument. Indexes are integers between 0 (inclusive) and the length of the array (exclusive).
OUTPUT
any
EXAMPLE
SELECT get(col, idx) AS new_col
col | idx | → | new_col |
---|---|---|---|
['a', 'b', 'c'] | 0 | 'a' | |
['a', 'b', 'c'] | 1 | 'b' | |
['a', 'b', 'c'] | -1 | null | |
['a', 'b', 'c'] | 4 | null | |
['a', 'b', 'c'] | null | null | |
[] | 0 | null | |
null | 0 | null |
GREATEST
Returns the greatest value of the list of values, skipping null values. It will return null if and only if all parameters are null. String characters are ranked from least to greatest in the order specified under ARRAY.SORT. Tip: To find the function that best meets your needs, review similar function MAX.
SYNTAX
greatest(a)
a
- Type: any
- Description: Values to compare.
OUTPUT
any
EXAMPLES
SELECT branch, greatest(sales_jan, sales_feb, sales_mar) AS q1_highest_month_sales
branch | sales_jan | sales_feb | sales_mar | → | q1_highest_month_sales | branch |
---|---|---|---|---|---|---|
'Cambridge' | 100 | 120 | 80 | 120 | 'Cambridge' | |
'San Francisco' | null | 70 | 90 | 90 | 'San Francisco' | |
'London' | null | null | null | null | 'London' |
SELECT greatest(...string_array) AS greatest_string
string_array | → | greatest_string |
---|---|---|
['a', 'b'] | 'b' | |
['ab', 'aa'] | 'ab' | |
['a', 'aa'] | 'aa' | |
[null, 'a', null] | 'a' | |
['b', 'a', null] | 'b' | |
[null, 'ab', 'az'] | 'az' | |
[null] | null | |
[null, null] | null |
SELECT greatest(...array_array) AS greatest_array
array_array | → | greatest_array |
---|---|---|
[[1], [2]] | [2] | |
[[1, 2], [1, 3]] | [1, 3] | |
[[null, 3], [null, 5]] | [null, 5] | |
[[0, 3], [0, null]] | [0, 3] | |
[[3], [1, 2]] | [3] | |
[[3], [3, -1]] | [3, -1] | |
[null, [3]] | [3] | |
[null, [null]] | [null] | |
[null, [null], [null, null]] | [null, null] |
HASH
Returns a number, called a hash code, based on the input data. The hash code is always the same given the same inputs and is practically guaranteed to be different given different inputs.
SYNTAX
hash(inputs)
inputs
- Type: any
- Description: The input values to hash
OUTPUT
long
EXAMPLES
SELECT hash(A, B) AS result
A | B | → | result |
---|---|---|---|
'twenty' | [3.0, 8.0, 9.0] | 1830378446688894221 | |
'forty' | [18.9, 21.1] | -5987411484308668877 | |
'forty' | [] | 4495871505993623155 | |
'forty' | null | 353237852084435270 | |
null | null | -7218393911220984134 |
SELECT hash(A, B) AS result
A | B | → | result |
---|---|---|---|
[null] | [null] | 8068983417364430449 | |
[null] | [[]] | -4768040760762171177 | |
[[]] | [null] | 865141773612688779 | |
null | [[]] | 5085241997507925413 | |
[[]] | null | -2601053220775996364 |
SELECT hash(A, B) = hash(array(A, B)) AS "hash(elements) = hash(array)", hash(A, B) = hash({A, B}) AS "hash(elements) = hash(struct)"
A | B | → | hash(elements) = hash(array) | hash(elements) = hash(struct) |
---|---|---|---|---|
'one' | 'two' | False | False |
HEX
Returns the hexadecimal value of the argument.
SYNTAX
hex(input)
input
- Type: int
- Description: The input value.
OUTPUT
string
EXAMPLE
SELECT hex(input_value) AS hex_value
input_value | → | hex_value |
---|---|---|
1 | '1' | |
-1 | 'FFFFFFFFFFFFFFFF' | |
0 | '0' | |
12345 | '3039' | |
-54321 | 'FFFFFFFFFFFF2BCF' | |
-2147483648 | 'FFFFFFFF80000000' | |
2147483647 | '7FFFFFFF' | |
null | null |
ENCODE_BASE64
Returns the Base64 encoded value of the argument.
SYNTAX
encode_base64(input)
input
- Type: long
- Description: The input value.
OUTPUT
string
EXAMPLE
SELECT encode_base64(input_value) AS base64_value
input_value | → | base64_value |
---|---|---|
1 | 'AAAAAAAAAAE=' | |
-1 | '//////////8=' | |
0 | 'AAAAAAAAAAA=' | |
0 | 'AAAAAAAAAAA=' | |
8921367987289891273 | 'e88Q6sD0ock=' | |
-3344781287187723481 | '0ZTzJAXNSyc=' | |
-9223372036854775808 | 'gAAAAAAAAAA=' | |
9223372036854775807 | 'f/////////8=' | |
-9223372036854775807 | 'gAAAAAAAAAE=' | |
null | null |
INTEGER
Converts a numeric value to int type.Tip: For most use cases, Tamr recommends using the more fully-featured TO_INT function instead of INTEGER.
SYNTAX
integer(input)
input
- Type: number
- Description: The value to convert
OUTPUT
int
EXAMPLE
SELECT integer(double_value) as integer_value
double_value | → | integer_value |
---|---|---|
1.1 | 1 | |
-1.1 | -1 | |
1.9 | 1 | |
-1.9 | -1 | |
0.0 | 0 | |
null | null | |
NaN | 0 | |
-Infinity | -2147483648 | |
Infinity | 2147483647 |
ISNULL
Returns true if and only if the argument is null.
SYNTAX
isnull(input)
input
- Type: any
- Description: The input value.
OUTPUT
bool
EXAMPLES
SELECT isnull(input_value) AS is_it_null
input_value | → | is_it_null |
---|---|---|
'i am not null' | False | |
'' | False | |
null | True |
SELECT isnull(input_value) AS is_it_null
input_value | → | is_it_null |
---|---|---|
True | False | |
False | False | |
null | True |
SELECT isnull(input_value) AS is_it_null
input_value | → | is_it_null |
---|---|---|
[1] | False | |
[null] | False | |
[] | False | |
null | True |
ISO_DIFFERENCE_DAYS
Returns the number of days between two ISO date-time strings rounded down to the nearest integer. If the first day comes after the second day, the result will be negative.Tip: To find the function that best meets your needs, review similar function DATEDIFF.
SYNTAX
iso_difference_days(date1, date2)
date1
- Type: string
- Description: The first date in the difference
date2
- Type: string
- Description: The second date in the difference
OUTPUT
int
EXAMPLE
SELECT iso_difference_days(date1, date2) AS date_difference
date1 | date2 | → | date_difference |
---|---|---|---|
'1987-10-05' | '1987-10-06' | 1 | |
'1987-10-05' | '1987-10-04' | -1 | |
'1987-10-05' | '1987-10-05' | 0 | |
'1987-10-05T01:26:00' | '1987-10-06' | 1 | |
'1987-10-05' | '1987-10-06T01:26:00' | 1 | |
'1987-10-05T01:26:00' | '1987-10-06T01:25:00' | 1 | |
'1987-10-05-28T01:26:00' | '1987-10-06T01:25:00' | null | |
'1987-10-34T01:26:00' | '1987-10-06T01:25:00' | null | |
null | '1987-10-05' | null | |
'1987-10-05' | null | null | |
null | null | null |
ISO_DATETIME_ADD
Adds an amount of time to a datetime string in ISO format.
SYNTAX
iso_datetime_add(datetime, duration, time_unit)
datetime
- Type: string
- Description: A datetime string formatted to ISO-8601 duration format PnYnMnDTnHnMnS
duration
- Type: double
- Description: Time unit, must be whole number if unit is 'YRS' or 'MTH'
time_unit
- Type: string
- Description: A 3 letter string representing the unit of the duration. Valid options are: 'SEC', 'MIN', 'HRS', 'DAY', 'WKS', 'YRS', 'MTH'
OUTPUT
string
EXAMPLES
SELECT iso_datetime_add(datetime, duration, time_unit) AS new_datetime
datetime | duration | time_unit | → | new_datetime |
---|---|---|---|---|
'2050-02-04T12:30:00' | 3.1 | 'day' | '2050-02-07T14:54:00' | |
'2010-08-04T00:00:00' | 2.0 | 'yRs' | '2012-08-04T00:00:00' | |
'2000-08-04T00:00:00' | -5.0 | 'MtH' | '2000-03-04T00:00:00' |
SELECT iso_datetime_add(datetime, duration, time_unit) AS new_datetime
datetime | duration | time_unit | → | new_datetime |
---|---|---|---|---|
'2000-08-04T00:00:00' | -5.0 | 'mTh' | '2000-03-04T00:00:00' | |
'2000-08-04T00:00:00' | -5.1 | 'mTh' | null | |
'2000-08-04T00:00:00' | -5.0 | 'yrs' | '1995-08-04T00:00:00' | |
'2000-08-04T00:00:00' | -5.1 | 'yrs' | null | |
'2000-08-04T00:00:00' | -5.0 | '' | null | |
null | -5.0 | 'mTh' | null | |
'' | -5.0 | 'mTh' | null | |
'2050-02-04T12:30:00' | null | 'dAy' | null | |
'2050-02-04T12:30:00' | 1.0 | null | null |
IS_NOT_NULL
Returns false if the input is equal to null, and true if it is not equal to null.
SYNTAX
is_not_null(input)
input
- Type: any
- Description: The input to check
OUTPUT
bool
EXAMPLES
SELECT is_not_null(string_col) as string_not_null
string_col | → | string_not_null |
---|---|---|
'' | True | |
'null' | True | |
'content!' | True | |
null | False |
SELECT is_not_null(string_array_col) as array_not_null
string_array_col | → | array_not_null |
---|---|---|
[] | True | |
[null] | True | |
[''] | True | |
['content!'] | True | |
null | False |
IS_NOT_EMPTY
Returns true if the input is not null and not equal to the empty string. Returns false if the input is equal to the empty string. Returns null if the input is null. Tip: Tamr Core does not offer an IS_EMPTY function. For this logical comparator use an expression with IS EMPTY syntax.
SYNTAX
is_not_empty(input)
input
- Type: string
- Description: The input to check
OUTPUT
bool
EXAMPLE
SELECT is_not_empty(string_col) as not_empty
string_col | → | not_empty |
---|---|---|
' ' | True | |
'null' | True | |
'content' | True | |
'' | False | |
null | null |
JACCARD_INDEX
Computes the Jaccard similarity index between two sets. It is calculated as the intersection of the two sets divided by the union of the two sets.
SYNTAX
jaccard_index(a, b)
a
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The first set
b
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The second set
OUTPUT
double
EXAMPLE
SELECT jaccard_index(set_a, set_b) AS similarity
set_a | set_b | → | similarity |
---|---|---|---|
[1, 2, 3] | [1, 2, 3] | 1.0 | |
[1, 2, 3] | [2, 1, 3] | 1.0 | |
[1, 2, 3] | [2, 2, 1, 3] | 1.0 | |
[1, 2, 3, 4, 5] | [1, 3, 5, 7] | 0.5 | |
[9, 8, 7, 6] | [5, 2, 1, 9] | 0.14285714285714285 | |
[5, 2, 1, 9] | [9, 8, 7, 6] | 0.14285714285714285 | |
[] | [1] | 0.0 | |
[1] | [] | 0.0 | |
[] | [] | NaN | |
null | [] | null | |
[] | null | null | |
null | null | null |
LEAST
Returns the smallest value of the list of values, skipping null values. It will return null if and only if all parameters are null. String characters are ranked from least to greatest in the order specified under ARRAY.SORT.Tip: To find the function that best meets your needs, review similar function MIN.
SYNTAX
least(a)
a
- Type: any
- Description: Values to compare.
OUTPUT
any
EXAMPLES
SELECT branch, least(sales_jan, sales_feb, sales_mar) AS q1_lowest_month_sales
branch | sales_jan | sales_feb | sales_mar | → | q1_lowest_month_sales | branch |
---|---|---|---|---|---|---|
'Cambridge' | 100 | 120 | 80 | 80 | 'Cambridge' | |
'San Francisco' | null | 70 | 90 | 70 | 'San Francisco' | |
'London' | null | null | null | null | 'London' |
SELECT least(...string_array) AS least_string
string_array | → | least_string |
---|---|---|
['a', 'b'] | 'a' | |
['ab', 'aa'] | 'aa' | |
['a', 'aa'] | 'a' | |
[null, 'a', null] | 'a' | |
['b', 'a', null] | 'a' | |
[null, 'ab', 'az'] | 'ab' | |
[null] | null | |
[null, null] | null |
SELECT least(...array_array) AS least_array
array_array | → | least_array |
---|---|---|
[[1], [2]] | [1] | |
[[1, 2], [1, 3]] | [1, 2] | |
[[null, 3], [null, 5]] | [null, 3] | |
[[0, 3], [0, null]] | [0, 3] | |
[[3], [1, 2]] | [1, 2] | |
[[3], [3, -1]] | [3] | |
[null, [3]] | [3] | |
[null, [null]] | [null] | |
[null, [null], [null, null]] | [null] |
LEN
Returns the number of non-null, non-empty, non-whitespace items in an array.
SYNTAX
len(array)
array
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The array to count items of
OUTPUT
int
EXAMPLE
SELECT len(array) AS array_length
array | → | array_length |
---|---|---|
['1', '2', '3'] | 3 | |
['whitespace', "doesn't", 'get', ' \n ', 'counted'] | 4 | |
[''] | 0 | |
[null] | 0 | |
[] | 0 | |
[null, 'non_null', '', 'non_empty'] | 2 | |
null | null |
LENGTH
Returns a count of the number of characters in the given string value. Tip: The '···' example represents three empty spaces.
SYNTAX
length(input)
input
- Type: string
- Description: The value to measure
OUTPUT
int
EXAMPLE
SELECT movie, length(movie) as num_characters
movie | → | num_characters | movie |
---|---|---|---|
'Citizen Kane' | 12 | 'Citizen Kane' | |
'Casablanca' | 10 | 'Casablanca' | |
'···' | 3 | '···' | |
'' | 0 | '' | |
null | null | null |
LOWER
Converts a string to lowercase. Applies to all characters.
SYNTAX
lower(input)
input
- Type: string
- Description: The string to convert to lowercase
OUTPUT
string
EXAMPLE
SELECT lower(string) as lowercase_string
string | → | lowercase_string |
---|---|---|
'all_lower' | 'all_lower' | |
'ALL UPPER' | 'all upper' | |
'mIxEd' | 'mixed' | |
'' | '' | |
null | null |
MATCHES
Matches a string against a Java regular expression. Returns true if the regular expression matches the entire input string, false if it does not, and null if the regular expression is invalid or if either of the input or regular expression is null.See Working with Regular Expressions.
SYNTAX
matches(regex, input)
regex
- Type: string
- Description: The Java regular expression specifying the pattern to match
input
- Type: string
- Description: The string to be matched
OUTPUT
bool
EXAMPLE
SELECT matches(regex, input) as result
input | regex | → | result |
---|---|---|---|
'2001: A Space Odyssey' | '(\d+):\s?([\w\s]+)' | True | |
'Gone With The Wind' | '^.The.$' | True | |
'Gone With The Wind' | 'Casablanca' | False | |
'The Good, The Bad and The Ugly' | '(The)' | False | |
'Citizen Kane' | '(invalid' | null | |
null | '(The)' | null | |
'The Third Man' | null | null | |
null | null | null |
MD5
Returns a hash value, according to the MD5 algorithm.
SYNTAX
md5(input)
input
- Type: string
- Description: The input value.
OUTPUT
string
EXAMPLE
SELECT md5(input_value) AS md5_hash
input_value | → | md5_hash |
---|---|---|
'I am a string.' | 'a6be8c1f86bffe9f7a9397e578f7a446' | |
'I will be hashed.' | '6e76226f2116eea6e5cf0f4783762e19' | |
'Unless of course' | 'da6cd9f98615af7b253a4d321e12c6a4' | |
'Your computer has crashed.' | 'd45aee724e0cfbe2a1f00613ce118f81' | |
'' | 'd41d8cd98f00b204e9800998ecf8427e' | |
null | null |
MONTH
Gets the month field of a date.
SYNTAX
month(datetime)
datetime
- Type: string
- Description: The timestamp to get the month of
OUTPUT
int
EXAMPLES
SELECT month(date) AS month
date | → | month |
---|---|---|
'1997-10' | 10 | |
'2077-06-23+06:45' | 6 | |
'2001-03-14T15:43:19.155' | 3 | |
'1580-01-04 06:30-04:00' | 1 |
SELECT month(date) AS month, datetime_error(date) as error
description | date | → | error | month |
---|---|---|---|---|
'null value' | null | null | null | |
'empty string' | '' | 'Text '' could not be parsed at index 0' | null | |
'not a date' | 'I am not a date' | 'Text 'I am not a date' could not be parsed at index 0' | null | |
'starts date-like' | '2018-03-oops' | 'Text '2018-03-oops' could not be parsed, unparsed text found at index 7' | null | |
'bad date' | '2018-03-95' | 'Text '2018-03-95' could not be parsed: Invalid value for DayOfMonth (valid values 1 - 28/31): 95' | null |
MONOTONICALLY_INCREASING_ID
Returns a monotonically increasing identifier. The identifiers are guaranteed to be unique but not sequential. The results of this function are non-deterministic, different runs on the same dataset could produce different results.
SYNTAX
monotonically_increasing_id()
OUTPUT
long
EXAMPLE
SELECT monotonically_increasing_id() AS id, *
name | → | id | name |
---|---|---|---|
'Alice' | 0 | 'Alice' | |
'Bob' | 1 | 'Bob' | |
'Carol' | 2 | 'Carol' | |
'Dave' | 3 | 'Dave' |
NUM
Returns the first numeric element of an array parsed as a double.Tip: For most use cases, Tamr recommends using the more fully-featured TO_DOUBLE function instead of NUM.
SYNTAX
num(array)
array
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The array to find the first numeric element of
OUTPUT
double
EXAMPLE
SELECT num(array) AS first_num_from_array
array | → | first_num_from_array |
---|---|---|
['1.2', null] | 1.2 | |
[null, '23', null] | 23 | |
[null, '3E7', '1.2'] | 30000000.0 | |
['abc', 'true'] | null | |
null | null | |
[] | null | |
[null] | null |
PARSE_JSON_ARRAY
Parses strings as JSON-encoded arrays of strings, returning the decoded values. All results are combined into a single multivalue, with no deduplication. Inputs that cannot be parsed as a JSON array are ignored. Values in arrays that are not strings are ignored.
SYNTAX
parse_json_array(json_array)
json_array
- Type: {'type': 'array', 'elementType': 'any'}
- Description: A multivalue containing JSON-encoded arrays of string
OUTPUT
{'type': 'array', 'elementType': 'string'}
EXAMPLE
SELECT parse_json_array(json_array) AS parsed
json_array | → | parsed |
---|---|---|
['["a"]'] | ['a'] | |
['["a", "b", "c"]', '["d"]', '["e", "f"]'] | ['a', 'b', 'c', 'd', 'e', 'f'] | |
['1', '"x"', '["x"', '["a"]', null, '["b", 1, ["x"], {"x":"x"}, "c"]', '{"x":"x"}'] | ['a', 'b', 'c'] | |
['a', '1', '[', '["unterminated"', '{"un":"terminated"'] | [] | |
[] | [] | |
null | null |
PMOD
Computes the positive remainder after division.
SYNTAX
pmod(dividend, divisor)
dividend
- Type: number
- Description: The dividend number on which to calculate the remainder
divisor
- Type: number
- Description: The divisor number on which to calculate the remainder
OUTPUT
number
EXAMPLE
SELECT pmod(dividend, divisor) as result
dividend | divisor | → | result |
---|---|---|---|
5.0 | 3.0 | 2.0 | |
-19.0 | 5.0 | 1.0 | |
5.5 | 2.5 | 0.5 | |
Infinity | 3.0 | NaN | |
NaN | 3.0 | NaN | |
null | 3.0 | null | |
5.0 | 0.0 | NaN |
POW
Returns the value of the first argument raised to the power of the second argument.
SYNTAX
pow(base, exponent)
base
- Type: number
- Description: The base of the exponential expression
exponent
- Type: number
- Description: The exponent of the exponential expression
OUTPUT
double
EXAMPLE
SELECT pow(base, exponent) as result
base | exponent | → | result |
---|---|---|---|
3 | 2 | 9 | |
3.5 | 2.0 | 12.25 | |
-2 | 3 | -8 | |
2 | -3 | 0.125 | |
2 | 2.5 | 5.656854249492381 | |
2 | -2.5 | 0.1767766952966369 | |
-2 | -2.5 | NaN | |
0 | 0 | 1 | |
null | 2 | null | |
2 | null | null | |
null | null | null |
SQRT
Returns the square root.
SYNTAX
sqrt(input)
input
- Type: number
- Description: The input to get the square root of
OUTPUT
double
EXAMPLE
SELECT sqrt(input) as result
input | → | result |
---|---|---|
9 | 3 | |
5.76 | 2.4 | |
1 | 1 | |
0 | 0 | |
-1 | NaN | |
null | null |
RAND
Returns a random value within the range 0.0 (inclusive) to 1.0 (exclusive).
SYNTAX
rand()
OUTPUT
double
EXAMPLE
SELECT row_number, rand() as random_double
row_number | → | row_number | random_double |
---|---|---|---|
1 | 1 | 0.7056362008458043 | |
2 | 2 | 0.8900315077291642 | |
3 | 3 | 0.48696337565223247 | |
4 | 4 | 0.4192577676064432 |
REPLACE
Replaces all occurrences of the target in the given string with the replacement.
SYNTAX
replace(string, target, replacement)
string
- Type: string
- Description: The string to replace parts of
target
- Type: string
- Description: The substring to replace in the string
replacement
- Type: string
- Description: The string to replace the target with
OUTPUT
string
EXAMPLES
SELECT replace(string, target, replacement) AS replaced
string | target | replacement | → | replaced |
---|---|---|---|---|
'aabacad' | 'a' | 'x' | 'xxbxcxd' | |
'' | 'a' | 'x' | '' | |
'123' | '' | 'x' | 'x1x2x3x' | |
'too much space' | ' ' | '' | 'toomuchspace' | |
null | 'a' | 'x' | null | |
'aabacad' | null | 'x' | null | |
'aabacad' | 'a' | null | null |
SELECT replace('Hi my name is {name}!', '{name}', names) AS introductions
names | → | introductions |
---|---|---|
'Sam' | 'Hi my name is Sam!' | |
'Alex' | 'Hi my name is Alex!' | |
'Sky' | 'Hi my name is Sky!' | |
'Timothy' | 'Hi my name is Timothy!' | |
'Sharon' | 'Hi my name is Sharon!' | |
'Harsh' | 'Hi my name is Harsh!' |
REPLACE_ALL
Replaces all matches of the target regex in the given string with the replacement.See Working with Regular Expressions.
SYNTAX
replace_all(string, target_regex, replacement)
string
- Type: string
- Description: The string to replace parts of
target_regex
- Type: string
- Description: The regex to replace in the string
replacement
- Type: string
- Description: The string to replace the target with
OUTPUT
string
EXAMPLE
SELECT replace_all(string, target_regex, replacement) AS replaced
string | target_regex | replacement | → | replaced |
---|---|---|---|---|
'aaaaabaacad' | '[a]' | 'x' | 'xxxxxbxxcxd' | |
'aaaaabaacad' | '[a]+' | 'x' | 'xbxcxd' | |
'this and that or this' | '(this|that)' | 'UNK' | 'UNK and UNK or UNK' | |
'invalid regex returns null' | 'invalid[regex' | 'x' | null | |
null | '[a]' | 'x' | null | |
'aaaaabaacad' | null | 'x' | null | |
'aaaaabaacad' | '[a]' | null | null | |
'aaaaaBaacad' | '[A-Z]' | ' $0' | 'aaaaa Baacad' | |
'(617)333-3333' | '[0-9()-]+' | '+1$0' | '+1(617)333-3333' |
RINT
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
SYNTAX
rint(input)
input
- Type: number
- Description: The input value
OUTPUT
double
EXAMPLE
SELECT rint(input_value) as rounded_value
input_value | → | rounded_value |
---|---|---|
1.1 | 1.0 | |
-1.9 | -2.0 | |
3.500001 | 4.0 | |
NaN | NaN | |
Infinity | Infinity | |
-Infinity | -Infinity | |
null | null |
ROUND
Returns the closest Long value to the argument.Tip: To find the function that best meets your needs, review these similar functions: CEIL, FLOOR.
SYNTAX
round(input)
input
- Type: number
- Description: The input value
OUTPUT
long
EXAMPLE
SELECT round(input_value) as rounded_value
input_value | → | rounded_value |
---|---|---|
1.1 | 1 | |
-1.9 | -2 | |
3.500001 | 4 | |
NaN | 0 | |
Infinity | 9223372036854775807 | |
-Infinity | -9223372036854775808 | |
null | null |
RUNTIME_DATE
Returns a string representing the date in ISO Format at the time the function is executed for each value. Each row may have a different runtime date.
SYNTAX
runtime_date()
OUTPUT
string
EXAMPLE
SELECT row_number, runtime_date() as date
row_number | → | row_number | date |
---|---|---|---|
1 | 1 | '2024-03-25Z' | |
2 | 2 | '2024-03-25Z' | |
3 | 3 | '2024-03-25Z' |
RUNTIME_DATETIME
Returns a string representing the date and time in ISO Format at the time the function is executed for each value. Each row may have a different runtime date and time.
SYNTAX
runtime_datetime()
OUTPUT
string
EXAMPLE
SELECT row_number, runtime_datetime() as datetime
row_number | → | row_number | datetime |
---|---|---|---|
1 | 1 | '2024-03-25T21:01:40Z' | |
2 | 2 | '2024-03-25T21:01:40Z' | |
3 | 3 | '2024-03-25T21:01:40Z' |
SHA1
Returns a hash value, according to the SHA-1 algorithm.
SYNTAX
sha1(input)
input
- Type: string
- Description: The input value to be hashed.
OUTPUT
string
EXAMPLE
SELECT sha1(value) AS sha1_hashified
value | → | sha1_hashified |
---|---|---|
'My birthday is October 1' | '97e79c798abe758b227e111c2ddc98b01a4cc607' | |
'My birthday is October 2' | '0294ac31298fd3e129380e770c22db998ac71e58' | |
'My borthday is October 1' | '0e7fa8dce7a3927f28413e0eef65a62e5a6c6dfa' | |
'' | 'da39a3ee5e6b4b0d3255bfef95601890afd80709' | |
null | null |
SIZE
Returns the number of elements in a given array.
SYNTAX
size(input)
input
- Type: {'type': 'array', 'elementType': 'any'}
- Description: The array in which to count elements
OUTPUT
int
EXAMPLE
SELECT collection, size(members) as num_members
collection | members | → | collection | num_members |
---|---|---|---|---|
'Spices' | ['Salt', 'Pepper'] | 'Spices' | 2 | |
'Destiny's Child' | ['Beyoncé Knowles', 'Kelly Rowland', 'Michelle Williams'] | 'Destiny's Child' | 3 | |
'Array containing null' | ['the next value is null', null] | 'Array containing null' | 2 | |
'Empty Array' | [] | 'Empty Array' | 0 | |
'Null Array' | null | 'Null Array' | null |
SORT_ARRAY
Sorts the input array in ascending order, according to the natural ordering of the array elements. This will filter out null elements. String characters are ranked from least to greatest in the following order: !, ", #, $, %, &, ', (, ), *, +, ,, -, ., /, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, :, ;, <, =, >, ?, @, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, [, \, ], ^, _, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, {, |, }, ~
Tip: For most use cases, Tamr recommends using the more fully-featured ARRAY.SORT function instead of SORT_ARRAY.
SYNTAX
sort_array(input)
input
- Type: {'type': 'array', 'elementType': 'any'}
- Description: An array of values to be sorted.
OUTPUT
{'type': 'array', 'elementType': 'any'}
EXAMPLES
SELECT sort_array(unsorted_array) AS sorted_array
unsorted_array | → | sorted_array |
---|---|---|
['def', '', 'ghi', null, 'abc'] | ['', 'abc', 'def', 'ghi'] | |
['10', '11', '100', '101', '111'] | ['10', '100', '101', '11', '111'] | |
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\', ']', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~'] | ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\', ']', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~'] |
SELECT sort_array(unsorted_array) AS sorted_array
unsorted_array | → | sorted_array |
---|---|---|
[True, False, False, True, null] | [False, False, True, True] |
SELECT sort_array(unsorted_array) AS sorted_array
unsorted_array | → | sorted_array |
---|---|---|
[3.2, -3.2, 0.0, 'NaN', '-Infinity', 'Infinity'] | ['-Infinity', -3.2, 0.0, 3.2, 'Infinity', 'NaN'] |
SELECT sort_array(unsorted_array) AS sorted_array
unsorted_array | → | sorted_array |
---|---|---|
[[1, 2, 4], [1, 2, 3]] | [[1, 2, 3], [1, 2, 4]] |
SPLIT
Splits a string into an array of strings based on where the provided regex matches.See Working with Regular Expressions.
SYNTAX
split(string, regex)
string
- Type: string
- Description: The string to split
regex
- Type: string
- Description: The regex to split by when matched
OUTPUT
{'type': 'array', 'elementType': 'string'}
EXAMPLE
SELECT split(string, regex) as result
string | regex | → | result |
---|---|---|---|
'This is a sentence.' | '\s' | ['This', 'is', 'a', 'sentence.'] | |
'12345' | '2' | ['1', '345'] | |
'There is no match!' | 'pineapple' | ['There is no match!'] | |
'grape' | '' | ['g', 'r', 'a', 'p', 'e'] | |
'' | 'grape' | [''] | |
'' | '' | [''] | |
null | 'grape' | null | |
'grape' | null | null |
STRIP_ACCENTS
Removes diacritics from a string.
SYNTAX
strip_accents(string)
string
- Type: string
- Description: The string to strip the accents from
OUTPUT
string
EXAMPLE
SELECT strip_accents(with_accents) as no_accents
with_accents | → | no_accents |
---|---|---|
'Et ça sera sa moitié.' | 'Et ca sera sa moitie.' | |
'aàáâäãåāæ' | 'aaaaaaaaæ' | |
'ᾳ' | 'α' | |
'cçćč' | 'cccc' | |
'eèéêëēėę' | 'eeeeeeee' | |
'ἦἧᾚ' | 'ηηΗ' | |
'iîïíīįìi' | 'iiiiiiii' | |
'lł' | 'll' | |
'nñń' | 'nnn' | |
'oôöòóōõøœ' | 'oooooooøœ' | |
'sśšß' | 'sssß' | |
'uûüùúū' | 'uuuuuu' | |
'yÿ' | 'yy' | |
'zžźż' | 'zzzz' |
STR
Converts any input into a string.Note: This function is available to maintain backward compatibility only. USE TO_STRING instead.
SYNTAX
str(some_object)
some_object
- Type: any
- Description: The object to convert to a string
OUTPUT
string
EXAMPLES
SELECT str(booleans) AS stringified
booleans | → | stringified |
---|---|---|
True | 'true' | |
False | 'false' |
SELECT str(int) AS stringified
int | → | stringified |
---|---|---|
1 | '1' |
SELECT str(double) AS stringified
double | → | stringified |
---|---|---|
1.0 | '1.0' |
SELECT str(string) AS stringified
string | → | stringified |
---|---|---|
'this is a string' | 'this is a string' | |
null | null |
STR_FORMAT
Returns a string using the specified format template with the inserted string arguments.
SYNTAX
str_format(format_template, objects_to_insert)
format_template
- Type: string
- Description: A string detailing how and where the string arguments should be formatted and inserted. For a syntax reference, see Java Formatter.
objects_to_insert
- Type: {'type': 'array', 'elementType': 'any'}
- Description: An array of objects to format and insert into the template
OUTPUT
string
EXAMPLES
SELECT str_format(templates, integer_values) AS format_result
description | templates | integer_values | → | format_result |
---|---|---|---|---|
'Is it non-null?' | '%b, %b, %b' | [3, null, 7] | 'true, false, true' | |
'Format integer' | '%d' | [1] | '1' | |
'Pad integer with some number of zeros' | '%010d' | [1] | '0000000001' | |
'Use thousands-separator' | '%,d' | [10000000] | '10,000,000' | |
'Print positive integers with a plus' | '%+d' | [1] | '+1' | |
'Add a space before positive numbers' | '% d' | [1] | ' 1' | |
'Enclose negative ints with parenthesis' | '%(d' | [-1] | '(1)' | |
'Some combination of the above' | '%+(010d' | [1] | '+000000001' | |
'Some combination of the above' | '%+(010d' | [-1] | '(00000001)' | |
'Octal string' | '%o' | [142] | '216' | |
'Format number as string' | '%s' | [3] | '3' | |
'Hex string' | '%x' | [142] | '8e' | |
'Uppercase hex string' | '%X' | [142] | '8E' |
SELECT str_format(templates, double_values) as format_result
description | templates | double_values | → | format_result |
---|---|---|---|---|
'Double to hex' | '%a' | [120.382] | '0x1.e1872b020c49cp6' | |
'Double in scientific notation' | '%e' | [15042.0] | '1.504200e+04' | |
'Double' | '%f' | [142.0] | '142.000000' | |
'Double, possibly in scientific notation if long enough' | '%g' | [100000.0] | '100000' | |
'Double, possibly in scientific notation if long enough' | '%g' | [1000000.0] | '1.00000e+06' | |
'Four decimal places' | '%.4f' | [1.234] | '1.2340' | |
'No decimal places' | '%.0f' | [1.234] | '1' | |
'Given null value' | '%d' | [null] | 'null' | |
'Returns null given a null array' | '%d' | null | null |
SELECT str_format(templates, string_values) as format_result
description | templates | string_values | → | format_result |
---|---|---|---|---|
'Hex String of value from hashCode() method' | '%h' | ['example'] | 'b125116a' | |
'Format as string' | '%s' | ['i am string'] | 'i am string' | |
'Maximum 7 characters' | '%.7s' | ['i am string'] | 'i am st' | |
'Specify argument index' | '%2$s, %1$s' | ['one', 'two'] | 'two, one' | |
'Use literal percent sign' | '%%' | [] | '%' |
SELECT str_format(time_format, array(1350933383321L)) AS time_formatted
description | time_format | → | time_formatted |
---|---|---|---|
'Complete date formatting example' | '%tB %<td, %<tY, %<tl:%<tM%<tp' | 'October 22, 2012, 7:16pm' | |
'Hour of the day for the 24-hour clock, with a leading zero if necessary' | '%tH' | '19' | |
'Hour for the 12-hour clock, with a leading zero as necessary' | '%tI' | '07' | |
'Hour of the day for the 24-hour clock' | '%tk' | '19' | |
'Hour for the 12-hour clock' | '%tl' | '7' | |
'Minute within the hour formatted as two digits with a leading zero as necessary' | '%tM' | '16' | |
'Seconds within the minute, formatted as two digits with a leading zero as necessary' | '%tS' | '23' | |
'Millisecond within the second formatted as three digits with leading zeros as necessary' | '%tL' | '321' | |
Nanosecond within the second, as 9 digits with leading zeros as necessary | '%tN' | '321000000' | |
'Locale-specific morning or afternoon marker in lower case' | '%tp' | 'pm' | |
'Locale-specific morning or afternoon marker in UPPER CASE' | '%Tp' | 'PM' | |
'RFC 822 style numeric time zone offset from GMT' | '%tz' | '+0000' | |
'A string representing the abbreviation for the time zone' | '%tZ' | 'UTC' | |
'Seconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC' | '%ts' | '1350933383' | |
'Milliseconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC' | '%tQ' | '1350933383321' | |
'Locale-specific full month name' | '%tB' | 'October' | |
'Locale-specific abbreviated month name' | '%tb' | 'Oct' | |
'Same as %%tb' | '%th' | 'Oct' | |
'Locale-specific full name of the day of the week' | '%tA' | 'Monday' | |
'Locale-specific short name of the day of the week' | '%ta' | 'Mon' | |
'Four-digit year divided by 100, formatted as two digits with leading zero as necessary' | '%tC' | '20' | |
'Last two digits of the year, formatted with leading zeros as necessary' | '%ty' | '12' | |
'Year, formatted as at least four digits with leading zeros as necessary' | '%tY' | '2012' | |
'Day of year, formatted as three digits with leading zeros as necessary' | '%tj' | '296' | |
'Month, formatted as two digits with leading zeros as necessary' | '%tm' | '10' | |
'Day of month, formatted as two digits with leading zeros as necessary' | '%td' | '22' | |
'Day of month, formatted as two digits with no leading zeros' | '%te' | '22' | |
'Time formatted for the 24-hour clock as "%%tH:%%<tM"' | '%tR' | '19:16' | |
'Time formatted for the 24-hour clock as "%%tH:%%<tM:%%<tS"' | '%tT' | '19:16:23' | |
'Time formatted for the 12-hour clock as "%%tI:%%<tM:%%<tS %%<Tp"' | '%tr' | '07:16:23 PM' | |
'Date formatted as "%%tm/%%<td/%%<ty"' | '%tD' | '10/22/12' | |
'ISO 8601 complete date formatted as "%%tY-%%<tm-%%<td"' | '%tF' | '2012-10-22' | |
'Date and time formatted as "%%ta %%<tb %%<td %%<tT %%<tZ %%<tY"' | '%tc' | 'Mon Oct 22 19:16:23 UTC 2012' |
STR_JOIN
Concatenates all strings in an array into a single string, inserting the delimiter between each item. Tip: To find the function that best meets your needs, review these similar functions: ARRAY.CONCAT, ARRAY_CONCAT, ARRAY.APPEND.
SYNTAX
str_join(delimiter, string_array)
delimiter
- Type: string
- Description: The string to insert between each item in the array
string_array
- Type: {'type': 'array', 'elementType': 'string'}
- Description: The array of string to join into a single string
OUTPUT
string
EXAMPLE
SELECT str_join(delimiter, string_values) AS concatenated_array
delimiter | string_values | → | concatenated_array |
---|---|---|---|
'-' | ['1', '2', '3'] | '1-2-3' | |
', ' | ['1', '2', '3'] | '1, 2, 3' | |
'' | ['1', '2', '3'] | '123' | |
'|' | ['make', 'a', 'regular', 'expression'] | 'make|a|regular|expression' | |
'-' | ['foo', null, 'bar'] | 'foo-bar' | |
'-' | [] | '' | |
null | ['1', '2', '3'] | null | |
'-' | null | null | |
'-' | [null] | '' |
SUBSTR
Removes the portion of the string before the start_index
. If start_index
is positive or zero, it indexes the string starting from the beginning; if negative, it indexes the string starting from the end (so -1 gives you the last character of the string, -2 the last two characters and so on). The function returns the substring starting from the indexed position and continuing to the end of the string, or null if the indexed position is past the end of the string.Tip: To find the function that best meets your needs, review similar function SUBSTR2.
SYNTAX
substr(string, start_index)
string
- Type: string
- Description: The string to get the substring of
start_index
- Type: int
- Description: The index to start the substring at
OUTPUT
string
EXAMPLE
SELECT substr(string, start_index) as substrings
string | start_index | → | substrings |
---|---|---|---|
'pie' | 0 | 'pie' | |
'pie' | 1 | 'ie' | |
'pie' | 3 | '' | |
'pie' | 4 | null | |
'pie' | -1 | 'e' | |
'pie' | -4 | 'pie' | |
null | 0 | null | |
'pie' | null | null |
SUBSTR2
Returns the part of the given string between the start_index
and end_index
. The values start_index
and end_index
may be either zero or positive (indexing from the start of the string) or negative (indexing from the end). Together, start_index
(inclusive) and end_index
(exclusive) determine the range of the string. If the range is empty then null is returned. Tip: To find the function that best meets your needs, review similar function SUBSTR.
SYNTAX
substr2(string, start_index, end_index)
string
- Type: string
- Description: The string to get the substring of
start_index
- Type: int
- Description: The index to start the substring at
end_index
- Type: int
- Description: The index to end the substring at
OUTPUT
string
EXAMPLE
SELECT substr2(string, start_index, end_index) as substrings
string | start_index | end_index | → | substrings |
---|---|---|---|---|
'0123456789' | 3 | 7 | '3456' | |
'"quote"' | 1 | 6 | 'quote' | |
'start_index = end_index' | 5 | 5 | '' | |
'end_index < start_index' | 5 | 4 | null | |
'start_index < 0' | -1 | 0 | null | |
'start index is negative' | 0 | -1 | 'start index is negativ' | |
'both start_index and end_index are negative' | -10 | -1 | 'e negativ' | |
'negative start_index > negative end_index' | -1 | -3 | null | |
'end_index > length' | 0 | 19 | 'end_index > length' | |
null | 0 | 0 | null | |
'start_index is null' | null | 1 | null | |
'end_index is null' | 0 | null | null |
TAMR_ID
Computes a unique key from a string source ID and string entity ID. This key is identical to the one generated for the unified dataset based on origin_source_name
and origin_entity_id
.
SYNTAX
tamr_id(source_id, entity_id)
source_id
- Type: string
- Description: the source identifier
entity_id
- Type: string
- Description: the entity identifier
OUTPUT
string
EXAMPLE
SELECT *, tamr_id(source_id, entity_id) as tamr_id
source_id | entity_id | → | source_id | tamr_id | entity_id |
---|---|---|---|---|---|
null | null | null | null | null | |
'source1.csv' | null | 'source1.csv' | null | null | |
null | 'entity-id' | null | null | 'entity-id' | |
'another_source' | 'entity-id' | 'another_source' | '9055106661574376142' | 'entity-id' | |
'source1.csv' | 'rec-653-org' | 'source1.csv' | '-3639548815871415108' | 'rec-653-org' | |
'source1.csv' | 'rec-9996-org' | 'source1.csv' | '713467184200753482' | 'rec-9996-org' | |
'source1.csv' | 'rec-6137-org' | 'source1.csv' | '-5300772691460392676' | 'rec-6137-org' | |
'source1.csv' | 'rec-7100-org' | 'source1.csv' | '6054007634817115275' | 'rec-7100-org' | |
'source1.csv' | 'rec-9526-org' | 'source1.csv' | '-8395450447775343480' | 'rec-9526-org' |
TAMR_ID_128
Computes a unique key from a string source ID and string entity ID. This key is identical to the one generated for the unified dataset based on origin_source_name
and origin_entity_id
. This key is equal to the RFC-4648 URL-safe unpadded Base64 representation of the 128-bit hash value of the source_id and entity_id.
SYNTAX
tamr_id_128(source_id, entity_id)
source_id
- Type: string
- Description: the source identifier
entity_id
- Type: string
- Description: the entity identifier
OUTPUT
string
EXAMPLE
SELECT *, tamr_id_128(source_id, entity_id) as tamr_id_128
source_id | entity_id | → | source_id | tamr_id_128 | entity_id |
---|---|---|---|---|---|
null | null | null | null | null | |
'source1.csv' | null | 'source1.csv' | null | null | |
null | 'entity-id' | null | null | 'entity-id' | |
'another_source' | 'entity-id' | 'another_source' | 'WU1V6-8PgJ7SAowkhdVBJg' | 'entity-id' | |
'source1.csv' | 'rec-653-org' | 'source1.csv' | '4amv8rljpvli3tpWDnt2SA' | 'rec-653-org' | |
'source1.csv' | 'rec-9996-org' | 'source1.csv' | 'mi0Qn8pZda5EQdUsLDxZRQ' | 'rec-9996-org' | |
'source1.csv' | 'rec-6137-org' | 'source1.csv' | 'TnrB0n4kcXKifOD6IMfpOQ' | 'rec-6137-org' | |
'source1.csv' | 'rec-7100-org' | 'source1.csv' | 'VAd7AXiqeAgjx5VIXRxlRA' | 'rec-7100-org' | |
'source1.csv' | 'rec-9526-org' | 'source1.csv' | 'wnq9hMxC05jdynTfQEAr3Q' | 'rec-9526-org' |
TO_BOOLEAN
Attempts to convert a value of any type to a Boolean. If conversion fails, it returns null. Error messages for conversion failures can be accessed with to_boolean_error()
.
SYNTAX
to_boolean(input)
input
- Type: any
- Description: the input value
OUTPUT
bool
EXAMPLES
SELECT to_boolean(string_col) as converted
string_col | → | converted |
---|---|---|
'true' | True | |
'True' | True | |
' true ' | False | |
'false' | False | |
'NOT TRUE' | False | |
null | null |
SELECT to_boolean(double_col) as converted
double_col | → | converted |
---|---|---|
1.0 | True | |
0.0 | False | |
0.5 | True | |
5e-324 | True | |
1.7976931348623157e+308 | True | |
Infinity | True | |
-Infinity | True | |
NaN | False |
SELECT to_boolean(string_arr_col) as converted
string_arr_col | → | converted |
---|---|---|
[] | null | |
['true'] | True | |
['false'] | False | |
['false', 'true'] | null |
TO_BOOLEAN_ERROR
If converting the given value to a Boolean using to_boolean()
would result in an error, it returns the error; otherwise returns null.
SYNTAX
to_boolean_error(input)
input
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLES
SELECT to_boolean(string_col) as converted, to_boolean_error(string_col) as error
string_col | → | error | converted |
---|---|---|---|
'true' | null | True | |
'false' | null | False | |
null | null | null |
SELECT to_boolean(string_arr_col) as converted, to_boolean_error(string_arr_col) as error
string_arr_col | → | error | converted |
---|---|---|---|
[] | 'ConversionException: can't convert empty array' | null | |
['true'] | null | True | |
['true', 'false'] | 'ConversionException: can't convert array of length: 2' | null | |
null | null | null |
TO_DOUBLE
Attempts to convert a value of any type to a Double. If conversion fails, it returns null. Error messages for conversion failures can be accessed with to_double_error()
.
SYNTAX
to_double(input)
input
- Type: any
- Description: The input value
OUTPUT
double
EXAMPLES
SELECT to_double(long_col) as converted
long_col | → | converted |
---|---|---|
9223372036854775807 | 9.223372036854776e+18 | |
-9223372036854775808 | -9.223372036854776e+18 |
SELECT to_double(float_col) as converted
float_col | → | converted |
---|---|---|
1.0 | 1.0 | |
0.25 | 0.25 | |
1.4e-45 | 1.401298464324817e-45 | |
3.4028235e+38 | 3.4028234663852886e+38 | |
Infinity | Infinity | |
NaN | NaN |
SELECT to_double(string_col) as converted
string_col | → | converted |
---|---|---|
'one' | null | |
'1' | 1.0 | |
'1F' | 1.0 | |
'1D' | 1.0 | |
'1.7976931348623157E308' | 1.7976931348623157e+308 |
SELECT to_double(bool_col) as converted
bool_col | → | converted |
---|---|---|
True | 1.0 | |
False | 0.0 | |
null | null |
SELECT to_double(string_arr_col) as converted
string_arr_col | → | converted |
---|---|---|
[] | null | |
['1'] | 1.0 | |
['0'] | 0.0 | |
['NaN'] | NaN | |
['0', '1'] | null |
TO_DOUBLE_ERROR
If converting the given value to a Double using to_double()
would result in an error, it returns the error; otherwise returns null.
SYNTAX
to_double_error(input)
input
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLES
SELECT to_double(string_col) as converted, to_double_error(string_col) as error
string_col | → | error | converted |
---|---|---|---|
'one' | 'NumberFormatException: For input string: "one"' | null | |
'1' | null | 1.0 |
SELECT to_double(string_arr_col) as converted, to_double_error(string_arr_col) as error
string_arr_col | → | error | converted |
---|---|---|---|
[] | 'ConversionException: can't convert empty array' | null | |
['1'] | null | 1.0 | |
['one'] | 'NumberFormatException: For input string: "one"' | null | |
['1', '2'] | 'ConversionException: can't convert array of length: 2' | null | |
null | null | null |
TO_INT
Attempts to convert a value of any type to an Integer. If conversion fails, it returns null. Error messages for conversion failures can be accessed with to_int_error()
.
SYNTAX
to_int(value)
value
- Type: any
- Description: The input value
OUTPUT
int
EXAMPLES
SELECT to_int(double_col) as converted
double_col | → | converted |
---|---|---|
0.25 | 0 | |
0.5 | 1 | |
100.0 | 100 | |
1.7976931348623157e+308 | null | |
Infinity | null | |
NaN | null |
SELECT to_int(long_col) as converted
long_col | → | converted |
---|---|---|
2147483647 | 2147483647 | |
2147483648 | null |
SELECT to_int(bool_col) as converted
bool_col | → | converted |
---|---|---|
True | 1 | |
False | 0 | |
null | null |
SELECT to_int(string_col) as converted
string_col | → | converted |
---|---|---|
'1' | 1 | |
'1.0' | null | |
'one' | null |
SELECT to_int(string_arr_col) as converted
string_arr_col | → | converted |
---|---|---|
[] | null | |
['1'] | 1 | |
['0'] | 0 | |
['NaN'] | null | |
['0', '1'] | null |
TO_INT_ERROR
If converting the given value to an Integer using to_int()
would result in an error, it returns the error; otherwise returns null.
SYNTAX
to_int_error(input)
input
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLES
SELECT to_int(string_col) as converted, to_int_error(string_col) as error
string_col | → | error | converted |
---|---|---|---|
'one' | 'NumberFormatException: For input string: "one"' | null | |
'1' | null | 1 |
SELECT to_int(string_arr_col) as converted, to_int_error(string_arr_col) as error
string_arr_col | → | error | converted |
---|---|---|---|
[] | 'ConversionException: can't convert empty array' | null | |
['1'] | null | 1 | |
['one'] | 'NumberFormatException: For input string: "one"' | null | |
['1', '2'] | 'ConversionException: can't convert array of length: 2' | null | |
null | null | null |
SELECT to_int(double_col) as converted, to_int_error(double_col) as error
double_col | → | error | converted |
---|---|---|---|
42.0 | null | 42 | |
NaN | ConversionException: Integer cannot represent NaN | null | |
1.7976931348623157e+308 | 'MagnitudeUnsupportedException: Integer cannot represent a number of this magnitude: 1.7976931348623157E308' | null |
TO_LONG
Attempts to convert a value of any type to a Long. If conversion fails, it returns null. Error messages for conversion failures can be accessed with to_long_error()
.
SYNTAX
to_long(value)
value
- Type: any
- Description: the input value
OUTPUT
long
EXAMPLES
SELECT to_long(double_col) as converted
double_col | → | converted |
---|---|---|
0.25 | 0 | |
0.5 | 1 | |
100.0 | 100 | |
1.7976931348623157e+308 | null | |
Infinity | null | |
NaN | null |
SELECT to_long(bool_col) as converted
bool_col | → | converted |
---|---|---|
True | 1 | |
False | 0 | |
null | null |
SELECT to_long(string_col) as converted_str
string_col | → | converted_str |
---|---|---|
'1' | 1 | |
'1L' | null | |
'1.0' | null | |
'one' | null |
SELECT to_long(string_arr_col) as converted
string_arr_col | → | converted |
---|---|---|
[] | null | |
['1'] | 1 | |
['0'] | 0 | |
['NaN'] | null | |
['0', '1'] | null |
TO_LONG_ERROR
If converting the given value to a Long using to_long()
would result in an error, it returns the error; otherwise returns null.
SYNTAX
to_long_error(input)
input
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLES
SELECT to_long(string_col) as converted, to_long_error(string_col) as error
string_col | → | error | converted |
---|---|---|---|
'one' | 'NumberFormatException: For input string: "one"' | null | |
'1' | null | 1 |
SELECT to_long(string_arr_col) as converted, to_long_error(string_arr_col) as error
string_arr_col | → | error | converted |
---|---|---|---|
[] | 'ConversionException: can't convert empty array' | null | |
['1'] | null | 1 | |
['one'] | 'NumberFormatException: For input string: "one"' | null | |
['1', '2'] | 'ConversionException: can't convert array of length: 2' | null | |
null | null | null |
SELECT to_long(double_col) as converted, to_long_error(double_col) as error
double_col | → | error | converted |
---|---|---|---|
42.0 | null | 42 | |
NaN | ConversionException: Long cannot represent NaN | null | |
1.7976931348623157e+308 | 'MagnitudeUnsupportedException: Long cannot represent a number of this magnitude: 1.7976931348623157E308' | null |
TO_STRING
Attempts to convert a value of any type to a String. If conversion fails, it returns null. Error messages for conversion failures can be accessed with to_string_error()
.
SYNTAX
to_string(value)
value
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLES
SELECT to_string(long_col) as converted
long_col | → | converted |
---|---|---|
0 | '0' | |
9223372036854775807 | '9223372036854775807' | |
null | null |
SELECT to_string(double_col) as converted
double_col | → | converted |
---|---|---|
1.0 | '1.0' | |
0.5 | '0.5' | |
5e-324 | '4.9E-324' | |
1.7976931348623157e+308 | '1.7976931348623157E308' | |
Infinity | Infinity | |
NaN | NaN |
SELECT to_string(boolean_col) as converted
boolean_col | → | converted |
---|---|---|
True | 'true' | |
False | 'false' | |
null | null |
SELECT to_string(array_col) as converted
array_col | → | converted |
---|---|---|
[] | null | |
[1] | '1' | |
[1, 2, 3] | null |
TO_STRING_ERROR
If converting the given value to a String using to_string()
would result in an error, it returns the error; otherwise returns null.
SYNTAX
to_string_error(input)
input
- Type: any
- Description: the input value
OUTPUT
string
EXAMPLE
SELECT to_string(int_arr_col) as converted, to_string_error(int_arr_col) as error
int_arr_col | → | error | converted |
---|---|---|---|
[] | 'ConversionException: can't convert empty array' | null | |
[1] | null | '1' | |
[1, 2] | 'ConversionException: can't convert array of length: 2' | null | |
null | null | null |
TRIM
Trim the spaces from both ends for the specified string.
SYNTAX
trim(input)
input
- Type: string
- Description: The value to trim
OUTPUT
string
EXAMPLE
SELECT trimmable, trim(trimmable) as trimmed
trimmable | → | trimmed | trimmable |
---|---|---|---|
'hair ' | 'hair' | 'hair ' | |
' beard' | 'beard' | ' beard' | |
' hedges ' | 'hedges' | ' hedges ' | |
'eyebrows' | 'eyebrows' | 'eyebrows' | |
null | null | null |
TYPEOF
Returns the Spark type of the given object as a string.
SYNTAX
typeof(object)
object
- Type: any
- Description: The object to find the type of
OUTPUT
string
EXAMPLES
SELECT typeof(int) AS type
int | → | type |
---|---|---|
13 | 'Integer' | |
-13 | 'Integer' |
SELECT typeof(double) AS type
double | → | type |
---|---|---|
13.0 | 'Double' | |
-13.0 | 'Double' |
SELECT typeof(boolean) AS type
boolean | → | type |
---|---|---|
True | 'Boolean' | |
False | 'Boolean' |
SELECT typeof(string) AS type
string | → | type |
---|---|---|
'this is a string' | 'String' | |
'' | 'String' |
SELECT typeof(arrays) AS type
arrays | → | type |
---|---|---|
['a', 'b', 'c'] | 'Object[]' | |
[] | 'Object[]' | |
[null] | 'Object[]' | |
null | null |
UPPER
Converts a string to uppercase. Applies to all characters.
SYNTAX
upper(input)
input
- Type: string
- Description: The string to convert to uppercase
OUTPUT
string
EXAMPLE
SELECT upper(string) as uppercase_string
string | → | uppercase_string |
---|---|---|
'all_lower' | 'ALL_LOWER' | |
'ALL UPPER' | 'ALL UPPER' | |
'mIxEd' | 'MIXED' | |
'' | '' | |
null | null |
VALID_EMAIL
Returns whether the supplied string is a valid email or not.
SYNTAX
valid_email(this_might_be_an_email)
this_might_be_an_email
- Type: string
- Description: The string to be checked for email validity
OUTPUT
bool
EXAMPLE
SELECT valid_email(emails) as is_valid_email
emails | → | is_valid_email |
---|---|---|
'[email protected]' | True | |
'nathan' | False | |
null | null |
YEAR
Gets the year field of a date.
SYNTAX
year(datetime)
datetime
- Type: string
- Description: The timestamp to get the year of
OUTPUT
int
EXAMPLES
SELECT year(date) AS year
date | → | year |
---|---|---|
'0020' | 20 | |
'1997-10' | 1997 | |
'2077-06-23+06:45' | 2077 | |
'2001-03-14T15:43:19.155' | 2001 | |
'1580-01-04 06:30-04:00' | 1580 |
SELECT year(date) AS year, datetime_error(date) as error
description | date | → | error | year |
---|---|---|---|---|
'null value' | null | null | null | |
'empty string' | '' | 'Text '' could not be parsed at index 0' | null | |
'not a date' | 'I am not a date' | 'Text 'I am not a date' could not be parsed at index 0' | null | |
'starts date-like' | '2018-03-oops' | 'Text '2018-03-oops' could not be parsed, unparsed text found at index 7' | null | |
'bad date' | '2018-03-95' | 'Text '2018-03-95' could not be parsed: Invalid value for DayOfMonth (valid values 1 - 28/31): 95' | null |
INT.RANGE
Returns an array containing all integers, in order, between start
(inclusive) and stop
(exclusive).
SYNTAX
int.range(start, stop)
start
- Type: int
- Description: The starting integer
stop
- Type: int
- Description: The integer to stop before
OUTPUT
{'type': 'array', 'elementType': 'int'}
EXAMPLE
SELECT int.range(start, stop) as range_of_integers;
start | stop | → | range_of_integers |
---|---|---|---|
0 | 3 | [0, 1, 2] | |
1 | 3 | [1, 2] | |
1 | 4 | [1, 2, 3] | |
5 | 5 | [] | |
-5 | -2 | [-5, -4, -3] | |
-2 | 2 | [-2, -1, 0, 1] | |
-2 | -3 | null | |
6 | 5 | null | |
null | 1 | null | |
1 | null | null |
LEGACY.HASH
NOTE: this function is DEPRECATED because of incorrectness. This incorrectness is the result of nulls and arrays not being considered when calculating the hash value. Instead, we suggest using the correct hash()
function.
Returns a number, called a hash code, based on the input data. The hash code is always the same given the same inputs and is practically guaranteed to be different given different inputs.
SYNTAX
legacy.hash(inputs)
inputs
- Type: any
- Description: The input values to hash
OUTPUT
long
EXAMPLES
SELECT legacy.hash(A, B, C) AS result
A | B | C | → | result |
---|---|---|---|---|
20 | 'twenty' | [3.0, 8.0, 9.0] | -3390176128415294659 | |
40 | 'forty' | [18.9, 21.1] | -6183879932931889889 | |
null | null | [] | 0 |
SELECT legacy.hash(A) AS result
A | → | result |
---|---|---|
null | 0 | |
[] | 0 |
SELECT legacy.hash(array(array.concat(A, B))) == legacy.hash(array(A, B)) AS result
A | B | → | result |
---|---|---|---|
['a', 'b'] | ['10', '20'] | True |
SELECT legacy.hash(A) == legacy.hash(B) AS result
A | B | → | result |
---|---|---|---|
[1, 2] | [2, 1] | False | |
[null] | [null, null] | True |
SELECT legacy.hash(A) == legacy.hash(B) AS result
A | B | → | result |
---|---|---|---|
['true', 'false'] | [True, False] | False |
SELECT legacy.hash(A) == legacy.hash(B) AS result
A | B | → | result |
---|---|---|---|
['1', '2'] | [1, 2] | False |
MATH.SIN
Returns the Sine of the input in radians.
SYNTAX
math.sin(input)
input
- Type: double
- Description: The attribute to return the sine of
OUTPUT
double
EXAMPLE
SELECT math.sin(radians) as sine_output
radians | → | sine_output |
---|---|---|
1.5707963267948966 | 1.0 | |
0.0 | 0.0 | |
NaN | NaN | |
Infinity | NaN | |
-Infinity | NaN | |
null | null |
MATH.COS
Returns the Cosine of the input in radians.
SYNTAX
math.cos(input)
input
- Type: double
- Description: The attribute to return the cosine of
OUTPUT
double
EXAMPLE
SELECT math.cos(radians) as cosine_output
radians | → | cosine_output |
---|---|---|
3.141592653589793 | -1.0 | |
0.0 | 1.0 | |
NaN | NaN | |
Infinity | NaN | |
-Infinity | NaN | |
null | null |
MATH.TAN
Returns the trigonometric Tangent of an angle.
SYNTAX
math.tan(input)
input
- Type: double
- Description: An angle, in radians
OUTPUT
double
EXAMPLE
SELECT math.tan(input) as tan_output
input | → | tan_output |
---|---|---|
0.0 | 0.0 | |
Infinity | NaN | |
-Infinity | NaN | |
NaN | NaN | |
null | null |
MATH.ASIN
Computes the Inverse Sine of the input in radians.
SYNTAX
math.asin(input)
input
- Type: double
- Description: The attribute to return the arcsine of
OUTPUT
double
EXAMPLE
SELECT math.asin(input) as arcsine_output
input | → | arcsine_output |
---|---|---|
1.0 | 1.5707963267948966 | |
20.0 | NaN | |
0.0 | 0.0 | |
Infinity | NaN | |
-Infinity | NaN | |
null | null |
MATH.ACOS
Computes the Inverse Cosine of the input in radians.
SYNTAX
math.acos(input)
input
- Type: double
- Description: The attribute to return the arccosine of
OUTPUT
double
EXAMPLE
SELECT math.acos(input) as arccosine_output
input | → | arccosine_output |
---|---|---|
1.0 | 0.0 | |
0.0 | 1.5707963267948966 | |
Infinity | NaN | |
-Infinity | NaN | |
null | null |
MATH.ATAN
Computes the Inverse Tangent of the input in radians.
SYNTAX
math.atan(input)
input
- Type: double
- Description: The attribute to return the arctangent of
OUTPUT
double
EXAMPLE
SELECT math.atan(input) as arctan_output
input | → | arctan_output |
---|---|---|
0.0 | 0.0 | |
1.0 | 0.7853981633974483 | |
NaN | NaN | |
null | null |
MATH.ATAN2
Returns the angle from the conversion of rectangular coordinates to polar coordinates using the Two Argument Arctangent.
SYNTAX
math.atan2(y, x)
y
- Type: double
- Description: The y coordinate in the cartesian plane
x
- Type: double
- Description: The x coordinate in the cartesian plane
OUTPUT
double
EXAMPLE
SELECT math.atan2(y, x) as arctan2_output
y | x | → | arctan2_output |
---|---|---|---|
0.0 | 0.0 | 0.0 | |
0.0 | Infinity | 0.0 | |
0.0 | -Infinity | 3.141592653589793 | |
NaN | 0.0 | NaN | |
null | 0.0 | null |
MATH.DEGREES
Converts the radian input to degrees.
SYNTAX
math.degrees(input)
input
- Type: double
- Description: An angle, in radians
OUTPUT
double
EXAMPLE
SELECT math.degrees(radians) as degrees
radians | → | degrees |
---|---|---|
0.0 | 0.0 | |
3.141592653589793 | 180.0 | |
Infinity | Infinity | |
-Infinity | -Infinity | |
NaN | NaN | |
null | null |
MATH.RADIANS
Converts the degree input to radians.
SYNTAX
math.radians(input)
input
- Type: double
- Description: An angle, in degrees
OUTPUT
double
EXAMPLE
SELECT math.radians(degrees) as radians
degrees | → | radians |
---|---|---|
0.0 | 0.0 | |
180.0 | 3.141592653589793 | |
Infinity | Infinity | |
-Infinity | -Infinity | |
NaN | NaN | |
null | null |
MATH.LN
Computes the natural logarithm.
SYNTAX
math.ln(input)
input
- Type: double
- Description: The input value
OUTPUT
double
EXAMPLE
SELECT math.ln(input) as naturalLog
input | → | naturalLog |
---|---|---|
0.0 | -Infinity | |
1.0 | 0 | |
2.718281828459045 | 1 | |
Infinity | Infinity | |
-Infinity | NaN | |
-1.0 | NaN | |
NaN | NaN | |
null | null |
MATH.LOG10
Computes the logarithm with base 10.
SYNTAX
math.log10(input)
input
- Type: double
- Description: The input value
OUTPUT
double
EXAMPLE
SELECT math.log10(input) as log
input | → | log |
---|---|---|
0.0 | -Infinity | |
1.0 | 0 | |
10 | 1 | |
1000 | 3 | |
Infinity | Infinity | |
-Infinity | NaN | |
-1.0 | NaN | |
NaN | NaN | |
null | null |
MATH.LOG
Computes the logarithm of a value for a specific base.
SYNTAX
math.log(input, base)
input
- Type: double
- Description: The input value
base
- Type: double
- Description: The base value
OUTPUT
double
EXAMPLE
SELECT math.log(input, base) as log
input | base | → | log |
---|---|---|---|
0.0 | 0 | NaN | |
1.0 | 1 | NaN | |
10 | 10 | 1 | |
10000 | 100 | 2 | |
Infinity | 2 | Infinity | |
-Infinity | 4 | NaN | |
-1.0 | 10 | NaN | |
NaN | NaN | NaN | |
null | 10 | null | |
10 | null | null | |
null | null | null |
MATH.DOT_PRODUCT
Computes the dot product of the input arrays
SYNTAX
math.dot_product(array1, array2)
array1
- Type: {'type': 'array', 'elementType': 'double'}
- Description: First array to compare in dot product calculation
array2
- Type: {'type': 'array', 'elementType': 'double'}
- Description: Second array to compare in dot product calculation
OUTPUT
double
EXAMPLE
SELECT math.dot_product(array1, array2) as dot_product
array1 | array2 | → | dot_product |
---|---|---|---|
[1.0, 2.0, 3.0] | [4.0, -5.0, 6.0] | 12.0 | |
[1.0, 2.0, 3.0] | null | null | |
null | [4.0, -5.0, 6.0] | null | |
[1.0, 'NaN', 3.0] | [4.0, -5.0, 6.0] | NaN | |
[1.0, 2.0, 3.0] | [4.0, -5.0] | null | |
[1.0, 2.0] | [4.0, -5.0, 6.0] | null | |
[1.0, 2.0, 4.0] | [4.0, -5.0, 6.0] | 18.0 | |
[1.0, null, 4.0] | [4.0, -5.0, 6.0] | null | |
[] | [4.0, -5.0, 6.0] | null | |
[] | [] | 0.0 | |
[1.0, 2.0, 3.0] | [2.0, 3.0, -0.0] | 8.0 | |
[1.0, 2.0, 3.0] | [2.0, 'Infinity', -0.0] | Infinity | |
[1.0, 2.0, 3.0] | [2.0, '-Infinity', -0.0] | -Infinity | |
[1.0, -2.0, 3.0] | [2.0, 1.7976931348623157e+308, -0.0] | -Infinity | |
[1.0, -2.0, 3.0] | [2.0, 5e-324, -0.0] | 2.0 |
MATH.NORMALIZED_DOT_PRODUCT
Computes the normalized dot product between two vectors of numbers using the Model Assurance Criterion
SYNTAX
math.normalized_dot_product(array1, array2)
array1
- Type: {'type': 'array', 'elementType': 'double'}
- Description: First array to compare in dot product calculation
array2
- Type: {'type': 'array', 'elementType': 'double'}
- Description: Second array to compare in dot product calculation
OUTPUT
double
EXAMPLE
SELECT math.normalized_dot_product(array1, array2) as dot_product
array1 | array2 | → | dot_product |
---|---|---|---|
[1.0, 2.0, 3.0] | [4.0, -5.0, 6.0] | 0.13358070500927643 | |
[1.0, 2.0, 3.0] | null | null | |
null | [4.0, -5.0, 6.0] | null | |
[1.0, 'NaN', 3.0] | [4.0, -5.0, 6.0] | NaN | |
[1.0, 2.0, 3.0] | [4.0, -5.0] | null | |
[1.0, 2.0] | [4.0, -5.0, 6.0] | null | |
[1.0, 2.0, 4.0] | [4.0, -5.0, 6.0] | 0.20037105751391465 | |
[1.0, null, 3.0] | [4.0, -5.0, 6.0] | null | |
[] | [4.0, -5.0, 6.0] | null | |
[] | [] | 0.0 | |
[1.0, 2.0, 3.0] | [2.0, 3.0, -0.0] | 0.3516483516483517 | |
[1.0, 2.0, 3.0] | [2.0, 'Infinity', -0.0] | NaN | |
[1.0, 2.0, 3.0] | [2.0, '-Infinity', -0.0] | NaN | |
[1.0, -2.0, 3.0] | [2.0, 1.7976931348623157e+308, -0.0] | NaN | |
[1.0, -2.0, 3.0] | [2.0, 5e-324, -0.0] | 0.07142857142857142 | |
[1.0, -2.0, 3.0] | [1.0, -2.0, 3.0] | 1.0 | |
[1.0, -2.0, 3.0] | [0.0, 0.0, 0.0] | 0.0 |
Updated 6 months ago