Overview

A pivot structure is as follows:
PIVOT <pivots> WHEN <pivot-values> SELECT <generated-columns> ... <merge>,
where:

<pivots> is the list of columns to pivot on.
<pivot-values> are values of the <pivots> columns.
<generated-columns> are those columns that will be generated if <pivots> = <pivot-values>.
<merge> is a MERGE operation, since pivot must happen in the context of an aggregating operation. If you leave out the MERGE operation, you will get an error message: Error loading data.

PIVOT does not carry over the <pivots> or any columns that are mentioned in an expression or used as the output of an expression.

The merge substatement may access:
- Any output column of an expression and
- Any input column that is not aliased by an output column.
The types of the columns generated by a pivot are identical to the types generated by its merge operation. All columns aggregated by the merge substatement are multivalue columns. Thus, a pivot in Tamr will generate all multivalue columns.

You can make a number of changes to a PIVOT to make it more complex. For example, you can run PIVOT on more than one column, specify multiple feature-value pairs, or gather unexpected data into its own column with an else statement.

Example

In this example, the generic columns "Feature" and "Value" will be pivoted into individual columns for height and age, and records will be collapsed. Before the pivot, the dataset looked like this, with empty attributes "Height" and "Age" waiting to be filled:

PIVOT is used to match certain object values from the "Feature" column. These strings are being cast as arrays, since they are stored in the "Feature" column as string multivalues.

Pivots must end with a merge operation, since they must happen in the context of an aggregating operation.

This transformation produces the following result:

Alice, Bob, and Charlie each now only have one record, Feature and Value are empty, and Height and Age are filled. The id column, once unique, is now multi-valued, as the MERGE BY grouped them. This allows data to be traced back to its old record ID, if desired.

Like with unpivots, you can regenerate unique IDs using a hash function. Here, the multi-values are concatenated and then hashed.

This produces the following final result with one record per person and unique IDs identifying each record: