User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Using the Formula Option

Use the Formula option to add a single expression in a well-formed SELECT statement.

Formulas are a tool for quickly writing scripts that operate on values within one column and access values in other columns. A formula offers a structured format for adding simple transformations, Formulas and scripts use the same building blocks, expressions(which may include functions), so you can use formulas to express many transformation scripts that begin with SELECT *.

Formula or Script?

The following image illustrates how you can use either a formula or a script to complete the same transformation: converting the values in the "name" column to uppercase and populating the "name_cleaned" column with the resulting values.

798

The options for a formula in the transformation editor.

When you use the Formula option to make this transformation:

  • The transformation editor supplies the SELECT *, statement
  • You supply the expression, which consists of a function upper() and the attribute you want the function to act on, name
  • You select the output attribute, name_cleaned, from a dropdown list of the dataset's attributes

Alternatively, when you use the Script option you apply the required syntax as you type in each of these components. The result of using this formula or this script is identical.

874

Using a formula streamlines the composition of a simple transformation.

Tips for Formulas

Common use cases for Formula are concatenation, conditional logic, arithmetic operations, and replication of part of another column or an entire column.

The following tips can be helpful when creating formulas.

The default data type for all columns in the unified dataset is arrays of strings, even if they contain only one value. Therefore, the available array functions can help you handle transformations correctly. The following array functions are frequently useful:

  • array() converts a string into an array. This allows you to keep columns consistently typed. See array().
  • len() is useful when writing case expressions and allows you to handle different array lengths. See len().
    Note: This function differs from the length() function, which returns the length of a string.
  • get() allows you to access a particular index of an array. For example, get(<attributeName>, 1) returns the second value from an array. See get().
  • to_string() converts the input value to a string, including converting an array containing only one value to a string. For example, to_string(<attributeName>). This is the preferred way to access strings from a single-value field. See to_string().
  • array_concat() concatenates array values into a string. Array concatenation adds values to an array in such a way that array_concat(["A", "B"]) and array_concat(["AB"]) both end up as "AB". See array_concat().
  • array.concat() concatenates array values into a combined array. Array concatenation adds values to an array in such a way that array.concat(['A', 'B'], ['CD']) becomes ['A', 'B', 'CD']. See array.concat().

The output attribute is the one to which output is written.

  • The transformation overwrites any values already in that attribute.
  • Attribute names are case sensitive in formulas and scripts. However, statements, function names and keywords, such as case, when, and empty, are not.

If you need to apply the same formula to more than one of the columns in a dataset, the multiformula option can be more efficient.