HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Pivot

PIVOT reorganizes the structure of a table by moving rows of data into different columns. For example, with PIVOT, you can change the shape of a table from skinny to wide.

The syntax of PIVOT is as follows:
PIVOT <pivots> WHEN <pivot-values> SELECT <generated-columns> ... <merge>,
where:

  • <pivots> is the list of columns to pivot on.
  • <pivot-values> are values of the <pivots> columns.
  • <generated-columns> are those columns that will be generated if <pivots> = <pivot-values>.
  • <merge> is a MERGE statement, since PIVOT must happen in the context of an aggregating operation. If you leave out the MERGE statement, the error message "Error loading data" appears.

PIVOT does not carry over the <pivots> or any columns that are mentioned in an expression or used as the output of an expression.

  • The MERGE substatement may access:
    • Any output column of an expression and
    • Any input column that is not aliased by an output column.

The types of the columns generated by a PIVOT are identical to the types generated by its MERGE statement. All columns aggregated by the merge substatement are multi-value columns (arrays). As a result, in Tamr Core PIVOT generates all multi-value columns.

You can make a number of changes to a PIVOT to make it more complex. For example, you can run PIVOT on more than one column, specify multiple feature-value pairs, or gather unexpected data into its own column with an else statement.

Important: Changing a PIVOT statement can result in an unintentional change to tamr_id values in the unified dataset. To avoid data loss and maintain the stability of tamr_id in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set the tamr_id to a value you're sure is stable in the last transformation. See Managing Primary Keys.

Example

In this example, you want to pivot the generic columns Feature and Value into individual columns for height and age, and collapse the records. Before the pivot, the dataset looks like this, with empty attributes Height and Age waiting to be filled:

A dataset with separate records for the same Name value to report each subject's Height and Age. Empty attributes are prepared for Height and Age.A dataset with separate records for the same Name value to report each subject's Height and Age. Empty attributes are prepared for Height and Age.

A dataset with separate records for the same Name value to report each subject's Height and Age. Empty attributes are prepared for Height and Age.

PIVOT matches certain object values from the Feature column. These strings are being cast as arrays, since they are stored in the Feature column as string arrays.

Pivots must end with a MERGE statement, since they must happen in the context of an aggregating operation.

The complete script for pivoting the Feature values into separate rows for Height and Age, based on identical Name values.The complete script for pivoting the Feature values into separate rows for Height and Age, based on identical Name values.

The complete script for pivoting the Feature values into separate rows for Height and Age, based on identical Name values.

This transformation produces the following result:

There is now one record for each Name, the  Feature and Value attributes are now empty, and the Height and Age attributes are populated.There is now one record for each Name, the  Feature and Value attributes are now empty, and the Height and Age attributes are populated.

There is now one record for each Name, the Feature and Value attributes are now empty, and the Height and Age attributes are populated.

The id column, previously unique, is now multi-valued, as the MERGE BY grouped them. This allows data to be traced back to its old record ID, if desired.

Like with unpivots, you can regenerate unique IDs using a hash function. Here, the arrays are concatenated and then hashed.

A formula, md5(array_concat(id)), that concatenates the values in the id field and then outputs the result back to the id attribute.A formula, md5(array_concat(id)), that concatenates the values in the id field and then outputs the result back to the id attribute.

A formula, md5(array_concat(id)), that concatenates the values in the id field and then outputs the result back to the id attribute.

This produces the following final result with one record per person and unique IDs identifying each record:

The id column, previously multi-valued, is now a single hash value.The id column, previously multi-valued, is now a single hash value.

The id column, previously multi-valued, is now a single hash value.


Did this page help you?