Functions
Tamr Core provides a number of functions for you to use to manipulate data in a unified dataset or in one or more of the input datasets of a schema mapping, mastering, or categorization project.
Tamr Core provides a number of functions and operators to transform data.
General Functions
Name | Syntax | Output |
---|---|---|
ABS | abs(input) | number |
ADD | add(integer1, integer2) | int |
ARRAY | array(elements) | array |
ARRAY_CONCAT | array_concat(array) | string |
ATOF | atof(string_to_convert) | double |
ATOI | atoi(string_to_convert) | int |
CEIL | ceil(input) | long |
COALESCE | coalesce(a) | any |
CONCAT | concat(a) | string |
DATEDIFF | datediff(endDate, startDate) | long |
DATETIME_ERROR | datetime_error(input) | string |
DATETIME_TO_ISO | datetime_to_iso(datetime_string, patterns) | string |
DATE_AND_TIME_TO_ISO | date_and_time_to_iso(date_string, time_string, patterns) | string |
DAYOFMONTH | dayofmonth(datetime) | int |
DECIMAL | decimal(input) | double |
DISTINCT | distinct(array) | array |
DURATION_TO_ISO | duration_to_iso(duration, time_unit, minimal) | string |
EQUALS_IGNORE_CASE | equals_ignore_case(str1, str2) | bool |
EXTRACT | extract(input, regex, group_index) | string |
EXTRACT_ALL | extract_all(input, regex) | array |
FLOOR | floor(input) | long |
GET | get(data, index) | any |
GREATEST | greatest(a) | any |
HASH | hash(inputs) | long |
HEX | hex(input) | string |
ENCODE_BASE64 | encode_base64(input) | string |
INTEGER | integer(input) | int |
ISNULL | isnull(input) | bool |
ISO_DIFFERENCE_DAYS | iso_difference_days(date1, date2) | int |
ISO_DATETIME_ADD | iso_datetime_add(datetime, duration, time_unit) | string |
IS_NOT_NULL | is_not_null(input) | bool |
IS_NOT_EMPTY | is_not_empty(input) | bool |
JACCARD_INDEX | jaccard_index(a, b) | double |
LEAST | least(a) | any |
LEN | len(array) | int |
LENGTH | length(input) | int |
LOWER | lower(input) | string |
MATCHES | matches(regex, input) | bool |
MD5 | md5(input) | string |
MONTH | month(datetime) | int |
MONOTONICALLY_INCREASING_ID | monotonically_increasing_id() | long |
NUM | num(array) | double |
PARSE_JSON_ARRAY | parse_json_array(json_array) | array |
PMOD | pmod(dividend, divisor) | number |
POW | pow(base, exponent) | double |
SQRT | sqrt(input) | double |
RAND | rand() | double |
REPLACE | replace(string, target, replacement) | string |
REPLACE_ALL | replace_all(string, target_regex, replacement) | string |
RINT | rint(input) | double |
ROUND | round(input) | long |
RUNTIME_DATE | runtime_date() | string |
RUNTIME_DATETIME | runtime_datetime() | string |
SHA1 | sha1(input) | string |
SIZE | size(input) | int |
SORT_ARRAY | sort_array(input) | array |
SPLIT | split(string, regex) | array |
STRIP_ACCENTS | strip_accents(string) | string |
STR | str(some_object) | string |
STR_FORMAT | str_format(format_template, objects_to_insert) | string |
STR_JOIN | str_join(delimiter, string_array) | string |
SUBSTR | substr(string, start_index) | string |
SUBSTR2 | substr2(string, start_index, end_index) | string |
TAMR_ID | tamr_id(source_id, entity_id) | string |
TAMR_ID_128 | tamr_id_128(source_id, entity_id) | string |
TO_BOOLEAN | to_boolean(input) | bool |
TO_BOOLEAN_ERROR | to_boolean_error(input) | string |
TO_DOUBLE | to_double(input) | double |
TO_DOUBLE_ERROR | to_double_error(input) | string |
TO_INT | to_int(value) | int |
TO_INT_ERROR | to_int_error(input) | string |
TO_LONG | to_long(value) | long |
TO_LONG_ERROR | to_long_error(input) | string |
TO_STRING | to_string(value) | string |
TO_STRING_ERROR | to_string_error(input) | string |
TRIM | trim(input) | string |
TYPEOF | typeof(object) | string |
UPPER | upper(input) | string |
VALID_EMAIL | valid_email(this_might_be_an_email) | bool |
YEAR | year(datetime) | int |
INT.RANGE | int.range(start, stop) | array |
LEGACY.HASH | legacy.hash(inputs) | long |
Array Functions
Array functions operate exclusively on arrays, allowing easy creation of and interaction with arrays.
Name | Syntax | Output |
---|---|---|
ARRAY.APPEND | array.append(array, value) | array |
ARRAY.CONCAT | array.concat(arrays) | array |
ARRAY.DIFF | array.diff(array1, array2) | array |
ARRAY.CONTAINS | array.contains(array, item) | bool |
ARRAY.DISTINCT | array.distinct(array) | array |
ARRAY.MOST_FREQUENT | array.most_frequent(N, array) | array |
ARRAY.FIRST | array.first(array) | any |
ARRAY.LAST | array.last(array) | any |
ARRAY.REST | array.rest(array) | array |
ARRAY.REVERSE | array.reverse(array) | array |
ARRAY.SORT | array.sort(array) | array |
ARRAY.NULLS | array.nulls(array) | array |
ARRAY.NON_NULLS | array.non_nulls(array) | array |
ARRAY.NON_EMPTIES | array.non_empties(array) | array |
ARRAY.SLICE | array.slice(array, start_index) | array |
ARRAY.SLICE2 | array.slice2(array, start_index, end_index) | array |
ARRAY.SUM | array.sum(array) | number |
Mathematical Functions
Mathematical trigonometric functions to calculate the sine, cosine and tangent of angles in radians.
Name | Syntax | Output |
---|---|---|
MATH.SIN | math.sin(input) | double |
MATH.COS | math.cos(input) | double |
MATH.TAN | math.tan(input) | double |
MATH.ASIN | math.asin(input) | double |
MATH.ACOS | math.acos(input) | double |
MATH.ATAN | math.atan(input) | double |
MATH.ATAN2 | math.atan2(y, x) | double |
MATH.DEGREES | math.degrees(input) | double |
MATH.RADIANS | math.radians(input) | double |
MATH.LN | math.ln(input) | double |
MATH.LOG10 | math.log10(input) | double |
MATH.LOG | math.log(input, base) | double |
MATH.DOT_PRODUCT | math.dot_product(array1, array2) | double |
MATH.NORMALIZED_DOT_PRODUCT | math.normalized_dot_product(array1, array2) | double |
GIS Functions
Geographic Information System (GIS) functions.
Name | Syntax | Output |
---|---|---|
GIS.POINT | gis.point(lon, lat) | record |
GIS.MULTI_POINT | gis.multi_point(multiPoint) | record |
GIS.LINE_STRING | gis.line_string(lineString) | record |
GIS.MULTI_LINE_STRING | gis.multi_line_string(multiLineString) | record |
GIS.POLYGON | gis.polygon(polygon) | record |
GIS.MULTI_POLYGON | gis.multi_polygon(polygon) | record |
GIS.AREA | gis.area(geometry) | double |
GIS.CENTROID | gis.centroid(geo) | record |
GIS.GEOMETRY_ERROR | gis.geometry_error(geo) | string |
GIS.PERIMETER | gis.perimeter(geo) | double |
GIS.RELOCATE_TO_ORIGIN | gis.relocate_to_origin(geo) | record |
Aggregate Functions
Functions that only work with WINDOW
Functions that are only supported for
Window
statements must be used in theselect
list. When used, theWindow
statement must include anOrder By
clause.
For more information on using aggregate functions, see aggregate expressions.
Name | Output | Kind |
---|---|---|
DENSE_RANK | int | Window |
LAG | any | Window |
LEAD | any | Window |
RANK | int | Window |
ROW_NUMBER | int | Window |
MIN | any | Window, Group By |
MIN_SIZE | any | Window, Group By |
MAX | any | Window, Group By |
MAX_SIZE | any | Window, Group By |
COUNT | long | Window, Group By |
COUNT_DISTINCT | long | Window, Group By |
SUM | number | Window, Group By |
AVG | double | Window, Group By |
MEAN | double | Window, Group By |
MODE | any | Window, Group By |
FIRST | any | Window, Group By |
LAST | any | Window, Group By |
TOP | array | Window, Group By |
HISTOGRAM | array | Window, Group By |
COLLECT_LIST | array | Window, Group By |
COLLECT_SET | array | Window, Group By |
COLLECT_SUBSET | array | Window, Group By |
STDDEV_SAMP | double | Window, Group By |
STDDEV_POP | double | Window, Group By |
VAR_SAMP | double | Window, Group By |
VAR_POP | double | Window, Group By |
Map, Filter, and Reduce
Transformations support the higher order functions map
, filter
, and reduce
. They allow you to apply a Tamr Core-defined function across multi-value fields. Map
applies the function to each element of an array field, filter
filters elements of array fields, and reduce
accumulates values. See here for more details.
Functions supported by Map, Filter, and Reduce
Currently, the only functions that can be used with map and filter are ones that take a single argument (for example, upper(String)).
Updated 10 months ago