Functions
Tamr Core provides a number of functions for you to use to manipulate data in a unified dataset or in one or more of the input datasets of a schema mapping, mastering, or categorization project.
Tamr Core provides a number of functions and operators to transform data.
General Functions
Name | Syntax | Output |
---|---|---|
ABS | abs(input) | number |
ADD | add(integer1, integer2) | int |
ARRAY | array(elements) | array |
ARRAY_CONCAT | array_concat(array) | string |
ATOF | atof(string_to_convert) | double |
ATOI | atoi(string_to_convert) | int |
CEIL | ceil(input) | long |
COALESCE | coalesce(a) | any |
CONCAT | concat(a) | string |
DATEDIFF | datediff(endDate, startDate) | long |
DATETIME_ERROR | datetime_error(input) | string |
DATETIME_TO_ISO | datetime_to_iso(datetime_string, patterns) | string |
DATE_AND_TIME_TO_ISO | date_and_time_to_iso(date_string, time_string, patterns) | string |
DAYOFMONTH | dayofmonth(datetime) | int |
DECIMAL | decimal(input) | double |
DISTINCT | distinct(array) | array |
DURATION_TO_ISO | duration_to_iso(duration, time_unit, minimal) | string |
EQUALS_IGNORE_CASE | equals_ignore_case(str1, str2) | bool |
EXTRACT | extract(input, regex, group_index) | string |
EXTRACT_ALL | extract_all(input, regex) | array |
FLOOR | floor(input) | long |
GET | get(data, index) | any |
GREATEST | greatest(a) | any |
HASH | hash(inputs) | long |
HEX | hex(input) | string |
ENCODE_BASE64 | encode_base64(input) | string |
INTEGER | integer(input) | int |
ISNULL | isnull(input) | bool |
ISO_DIFFERENCE_DAYS | iso_difference_days(date1, date2) | int |
ISO_DATETIME_ADD | iso_datetime_add(datetime, duration, time_unit) | string |
IS_NOT_NULL | is_not_null(input) | bool |
IS_NOT_EMPTY | is_not_empty(input) | bool |
JACCARD_INDEX | jaccard_index(a, b) | double |
LEAST | least(a) | any |
LEN | len(array) | int |
LENGTH | length(input) | int |
LOWER | lower(input) | string |
MATCHES | matches(regex, input) | bool |
MD5 | md5(input) | string |
MONTH | month(datetime) | int |
MONOTONICALLY_INCREASING_ID | monotonically_increasing_id() | long |
NUM | num(array) | double |
PARSE_JSON_ARRAY | parse_json_array(json_array) | array |
PMOD | pmod(dividend, divisor) | number |
POW | pow(base, exponent) | double |
SQRT | sqrt(input) | double |
RAND | rand() | double |
REPLACE | replace(string, target, replacement) | string |
REPLACE_ALL | replace_all(string, target_regex, replacement) | string |
RINT | rint(input) | double |
ROUND | round(input) | long |
RUNTIME_DATE | runtime_date() | string |
RUNTIME_DATETIME | runtime_datetime() | string |
SHA1 | sha1(input) | string |
SIZE | size(input) | int |
SORT_ARRAY | sort_array(input) | array |
SPLIT | split(string, regex) | array |
STRIP_ACCENTS | strip_accents(string) | string |
STR | str(some_object) | string |
STR_FORMAT | str_format(format_template, objects_to_insert) | string |
STR_JOIN | str_join(delimiter, string_array) | string |
SUBSTR | substr(string, start_index) | string |
SUBSTR2 | substr2(string, start_index, end_index) | string |
TAMR_ID | tamr_id(source_id, entity_id) | string |
TAMR_ID_128 | tamr_id_128(source_id, entity_id) | string |
TO_BOOLEAN | to_boolean(input) | bool |
TO_BOOLEAN_ERROR | to_boolean_error(input) | string |
TO_DOUBLE | to_double(input) | double |
TO_DOUBLE_ERROR | to_double_error(input) | string |
TO_INT | to_int(value) | int |
TO_INT_ERROR | to_int_error(input) | string |
TO_LONG | to_long(value) | long |
TO_LONG_ERROR | to_long_error(input) | string |
TO_STRING | to_string(value) | string |
TO_STRING_ERROR | to_string_error(input) | string |
TRIM | trim(input) | string |
TYPEOF | typeof(object) | string |
UPPER | upper(input) | string |
VALID_EMAIL | valid_email(this_might_be_an_email) | bool |
YEAR | year(datetime) | int |
INT.RANGE | int.range(start, stop) | array |
LEGACY.HASH | legacy.hash(inputs) | long |
Array Functions
Array functions operate exclusively on arrays, allowing easy creation of and interaction with arrays.
Name | Syntax | Output |
---|---|---|
ARRAY.APPEND | array.append(array, value) | array |
ARRAY.CONCAT | array.concat(arrays) | array |
ARRAY.DIFF | array.diff(array1, array2) | array |
ARRAY.CONTAINS | array.contains(array, item) | bool |
ARRAY.DISTINCT | array.distinct(array) | array |
ARRAY.MOST_FREQUENT | array.most_frequent(N, array) | array |
ARRAY.FIRST | array.first(array) | any |
ARRAY.LAST | array.last(array) | any |
ARRAY.REST | array.rest(array) | array |
ARRAY.REVERSE | array.reverse(array) | array |
ARRAY.SORT | array.sort(array) | array |
ARRAY.NULLS | array.nulls(array) | array |
ARRAY.NON_NULLS | array.non_nulls(array) | array |
ARRAY.NON_EMPTIES | array.non_empties(array) | array |
ARRAY.SLICE | array.slice(array, start_index) | array |
ARRAY.SLICE2 | array.slice2(array, start_index, end_index) | array |
ARRAY.SUM | array.sum(array) | number |
Mathematical Functions
Mathematical trigonometric functions to calculate the sine, cosine, and tangent of angles in radians.
Name | Syntax | Output |
---|---|---|
MATH.SIN | math.sin(input) | double |
MATH.COS | math.cos(input) | double |
MATH.TAN | math.tan(input) | double |
MATH.ASIN | math.asin(input) | double |
MATH.ACOS | math.acos(input) | double |
MATH.ATAN | math.atan(input) | double |
MATH.ATAN2 | math.atan2(y, x) | double |
MATH.DEGREES | math.degrees(input) | double |
MATH.RADIANS | math.radians(input) | double |
MATH.LN | math.ln(input) | double |
MATH.LOG10 | math.log10(input) | double |
MATH.LOG | math.log(input, base) | double |
GIS Functions
Geographic Information System (GIS) functions.
Name | Syntax | Output |
---|---|---|
GIS.POINT | gis.point(lon, lat) | record |
GIS.MULTI_POINT | gis.multi_point(multiPoint) | record |
GIS.LINE_STRING | gis.line_string(lineString) | record |
GIS.MULTI_LINE_STRING | gis.multi_line_string(multiLineString) | record |
GIS.POLYGON | gis.polygon(polygon) | record |
GIS.MULTI_POLYGON | gis.multi_polygon(polygon) | record |
GIS.AREA | gis.area(geometry) | double |
GIS.CENTROID | gis.centroid(geo) | record |
GIS.GEOMETRY_ERROR | gis.geometry_error(geo) | string |
GIS.PERIMETER | gis.perimeter(geo) | double |
GIS.RELOCATE_TO_ORIGIN | gis.relocate_to_origin(geo) | record |
Aggregate Functions
Note: Functions that are only supported for Window
statements must be used in the select
list. When used, the Window
statement must include an Order By
clause.
For more information on using aggregate functions, see Aggregate Expressions.
Name | Output | Kind |
---|---|---|
DENSE_RANK | int | Window |
LAG | any | Window |
LEAD | any | Window |
RANK | int | Window |
ROW_NUMBER | int | Window |
LEGACY.TOP | array | Window, Group By |
MIN | any | Window, Group By |
MIN_SIZE | any | Window, Group By |
MAX | any | Window, Group By |
MAX_SIZE | any | Window, Group By |
COUNT | long | Window, Group By |
COUNT_DISTINCT | long | Window, Group By |
SUM | number | Window, Group By |
AVG | double | Window, Group By |
MEAN | double | Window, Group By |
MODE | any | Window, Group By |
FIRST | any | Window, Group By |
LAST | any | Window, Group By |
TOP | array | Window, Group By |
HISTOGRAM | array | Window, Group By |
COLLECT_LIST | array | Window, Group By |
COLLECT_SET | array | Window, Group By |
COLLECT_SUBSET | array | Window, Group By |
STDDEV_SAMP | double | Window, Group By |
STDDEV_POP | double | Window, Group By |
VAR_SAMP | double | Window, Group By |
VAR_POP | double | Window, Group By |
Map, Filter, and Reduce
Transformations support the higher order functions map
, filter
, and reduce
. They allow you to apply a Tamr Core-defined function across multi-value fields. Map
applies the function to each element of an array field, filter
filters elements of array fields, and reduce
accumulates values. See Map, Filter, and Reduce.
Note: Currently, the only functions that can be used with map
and filter
are ones that take a single argument (for example, upper(String)
).
Updated over 2 years ago