Tamr Documentation

Overview

Transformations are tools that affect the final output of a unified dataset, replacing the effort of individually transforming each input dataset before you load them into Tamr.

You can apply transformations to all records in the unified dataset, or to records from specific input datasets. As a best practice, Tamr recommends that you apply transformations to the unified dataset whenever possible.

  • Applying transformations to the unified dataset is, in most cases, more efficient than applying them to the input datasets, so processing completes faster.
  • You can use the Tamr-created column origin_source_name to apply transformations to specific input datasets, even when they are run on the unified dataset. For example, you can use a case expression that checks the value of origin_source_name before applying a data cleaning transformation. Alternatively, you can use origin_source_name as part of the condition of a JOIN statement to apply the JOIN to a subset of input datasets. See Managing Primary Keys and Join.

Transformations operate at a project level and produce a single output dataset based on one or more input datasets. Transformations never change input datasets.

Accessing Transformations

You can add transformations to schema mapping, categorization, and mastering projects. See Enabling Transformations in Categorization and Mastering projects.

To access transformations:

  1. On the Unified Dataset page, select Show Transformations. The transformation editor appears with drop-down menus for adding transformations.
  1. Choose Add Transformation.
    If you add a transformation to the Input Datasets section, the note "For records from <name> datasets" appears in the bottom left corner of the transformation panel. Select this note to specify the datasets affected by this transformation. By default, the transformation is applied to all input datasets.

  2. To change the scope of a transformation from the input datasets to the unified dataset or vice versa, open both sections in the transformation editor and drag the transformation between the sections.

Displaying and Previewing Transformations

Once you write your transformation, you can use Preview all in the transformations panel to preview a transformation. If the transformation violates any rules, Tamr issues an error message explaining why it failed. This preview allows you to test transformations without affecting the entire dataset. You can iterate and improve your transformations quickly, viewing the results as you go before saving any changes.

Once you are satisfied with the transformation results, save the changes. The number on the Save button indicates how many changes have been made since the last time you saved changes. If you don't save transformations, they will not persist if you navigate away from the page. The Save button is disabled if you have any transformations with errors and when no transformations have changed.

To revert transformations and go back to the last time they were saved, choose Cancel Changes. This reverts all changes that weren't saved.

You can also preview a set of transformations by clicking Preview on each individual transformation. Subsequent transformations are then grayed out to signify that they are not included in the preview, but you can still edit and reorder them. To save changes, select Save changes. This saves all changes, and not only those made to the previewed set of transformations.

To preview your data before any transformations are applied, choose Preview at the top of the Input Datasets section.

Reordering Transformations

Transformations only have local effects and you can reorder them at any time. Reordering may change the output.

To reorder transformations, select and hold the icon with two horizontal lines at the top left of the transformations panel and then drag it to the desired location within the transformations script.

Saving and Applying Transformations

Applying Transformations

To apply transformations so that they become part of the data pipeline for your unified dataset, choose Update Unified Dataset on the Unified Dataset and Schema Mapping pages. This applies transformations to the unified dataset.

The Save changes button on the transformation panel keeps your work in case you navigate away or want to come back to it later. To apply this saved work to all records in the unified dataset, export the transformed unified dataset. To include the transformed unified dataset in your data pipeline, select Update Unified Dataset.

Data Types

Tamr transformations support multiple data types. See Data Types and Transformations.

Enabling Transformations in Categorization and Mastering Projects

You can enable transformations in Categorization and Mastering projects during project creation or after a project is already created. See Transformations. Once enabled, transformations cannot be disabled.

If you are writing transformations in a Categorization or Mastering project, or plan to use a unified dataset containing transformations in a second project, it is important that the Tamr-generated columns origin_source_name and origin_entity_id meet certain conditions. Transformations can be used to maintain these conditions:

  • origin_source_name must be a string type. Each string should be a name of one of the input datasets.
  • origin_entity_id must be a string type.

Additionally, the column tamr_id generated by Tamr must be a unique string type, since it is a primary key that Tamr manages for you.

Additional Information

Functions List

For a full list of all supported functions, see column-producing functions. You can also get transformation help in-product.

Referencing Attributes

To reference attributes in a transformation script, wrap them in double quotes, although this is not required (attribute and "attribute" both work). You may reference an attribute without using any quotes, however, any attribute containing spaces or escaped characters must be wrapped in double quotes. An attribute name containing double quotes itself can be referenced by escaping the double quotes. For example, this is an "attribute name" becomes "this is an ""attribute name""".

Attributes in transformations are case sensitive.

Referencing Datasets

Dataset names follow the same pattern as attributes. Wrap dataset names in double quotes if they include spaces or escaped characters, such as USE "myData.csv"; or USE my_data;. See join for an example referencing an input dataset.

Using Single Quotes

Single quotes are interpreted as string literals 'string'.

Tab Autocomplete

For transformations such as Script and Formula, pressing the tab key provides a list of suggested inputs, including functions and attributes.
Hints autocomplete with tab in the code editor.

Updated about a month ago



Overview


Transformations are tools that affect the final output of a unified dataset, replacing the effort of individually transforming each input dataset before you load them into Tamr.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.