Tamr Documentation

Formulas can operate on values within one column and access values in other columns within the row. Common use cases are concatenation, conditional logic, arithmetic operations, and replication of part or all of another column. Many scripts that begin with SELECT * can be expressed as a formula, because formulas automatically plug a single expression into a well-formed SELECT.

Currently, the default for all columns is to be arrays of strings, even if they contain only one value. As such, certain additional array functions will be needed to handle transformations correctly. Some useful ones are:

  • array makes a string into an array again, in order to keep columns consistent.
  • len is useful when writing CASE statements if the user is handling different array lengths differently. This is a different function than length, which returns the length of a string.
  • get is used to access a particular index of a multi-value field. The format is get(<attributeName>, <index>). get(<attributeName>, 0) is the preferred way to access strings from a single-value field.
  • array_concat lets the user concatenate arrays. Note that array concatenation adds values to an array in such a way that array_concat(["A", "B"]) and array_concat(["AB"]) both end up as "AB".

The output attribute is which column to write output to. Note that this will overwrite any values already in that column. Attribute names are case sensitive, but function names and keywords like case, when, and empty are not.


What's Next

Unpivot