HomeGuidesAPI ReferenceChangelog
HomeGuidesTamr API ReferenceTutorialsEnrichment API ReferenceSupport Help CenterLog In

Statement Modifiers

You can apply labels and hints to modify the statements in your transformation scripts. Use scope syntax { } to apply labels and hints to multiple statements at a time.

Label

To provide a name and save the state of a dataset output of any transformation statement, you can apply a label. You can later reference this label anywhere a dataset could be referenced, such as in JOIN, UNION, or USE statements.

Label examples

my_label: SELECT *;
A_different_label: FILTER company_name IS NOT NULL;

Here is how you can reference a labeled dataset:
USE my_label;

See USE for an additional example.

Hint

HINT is an advanced feature that allows you to modify how Tamr runs a transformation. Add the HINT statement directly before the transformation that you would like to modify.

You supply a predefined value for each HINT statement to modify transformation execution for a specific purpose. A description of each HINT value follows.

checkpoint.reliable

This value for the HINT statement defines a reliable store for Spark to complete a CHECKPOINT.
See Checkpoint.

checkpoint.local

This value for the HINT statement specifies a local (unreliable) store for Spark to complete a CHECKPOINT. See Checkpoint.

Note: Checkpointing to a local store can result in better performance, depending on the setup of the underlying Spark cluster. Use this HINT value with caution and consult with Tamr Support.

join.broadcast

Note: Use this HINT value with caution and consult with Tamr Support.

This value for the HINT statement changes the way Spark completes a join operation. A broadcast join is more efficient than the default join when the joined dataset is much smaller than the unified dataset.

HINT(join.broadcast) JOIN WITH "my_small_data.csv" AS sm on id == sm.id;

pkmanagement.manual

This value for the HINT statement disables the automatic management (assignment) of the primary key (tamr_id) for records in unified datasets. If you use this HINT value, there is no guarantee that tamr_id is a unique string for every record of the unified dataset. You will need to ensure the uniqueness of your primary key manually. See Primary Key Management.

HINT(pkmanagement.manual) INNER JOIN WITH "example_dataset.csv" AS ex ON my_id == ex.id;

pkmanagement.auto

This value for the HINT statement enables the automatic management of the primary key of unified datasets (tamr_id). See Primary Key Management.

HINT(pkmanagement.auto) MERGE BY city;

pkmanagement.default

This value for the HINT statement sets the automatic management of the primary key to the default value. Since the current default is having primary key management enabled, this is the same as HINT(pkmanagement.auto).

Since this is the default behavior for managing primary keys, this HINT statement can be omitted and primary key management will have the same default behavior.

HINT(pkmanagement.default) MERGE BY city;

Scope

Scoping allows you to apply a HINT or a label to a group of transformations rather than to a single statement. A scope begins with a { and ends with a }.

For example:

// Use scope to label the output of three statements:

my_label: {
  SELECT *, str_join(', ', array(address_line1, city, state, zip)) AS full_address;
  FILTER full_address IS NOT EMPTY;
  MERGE BY full_address;
};

// Use scope to apply a `HINT` to two joins:

HINT(join.broadcast) {
  JOIN WITH "state_lkp.csv" AS st ON state == st.name;
  JOIN WITH "price_lkp.csv" AS pr ON part_name == pr.product;
};

Did this page help you?