Pivot
PIVOT
reorganizes the structure of a table by moving rows of data into different columns. For example, with PIVOT
, you can change the shape of a table from skinny to wide.
The syntax of PIVOT
is as follows:
PIVOT <pivots> WHEN <pivot-values> SELECT <generated-columns> ... <merge>
,
where:
<pivots>
is the list of columns to pivot on.<pivot-values>
are values of the<pivots>
columns.<generated-columns>
are those columns that will be generated if<pivots> = <pivot-values>
.<merge>
is aMERGE
statement, sincePIVOT
must happen in the context of an aggregating operation. If you leave out theMERGE
statement, the error message "Error loading data" appears.
PIVOT
does not carry over the <pivots>
or any columns that are mentioned in an expression or used as the output of an expression.
- The
MERGE
substatement may access:- Any output column of an expression and
- Any input column that is not aliased by an output column.
The types of the columns generated by a PIVOT
are identical to the types generated by its MERGE
statement. All columns aggregated by the merge
substatement are multi-value columns (arrays). As a result, in Tamr Core PIVOT
generates all multi-value columns.
You can make a number of changes to a PIVOT
to make it more complex. For example, you can run PIVOT
on more than one column, specify multiple feature-value pairs, or gather unexpected data into its own column with an else
statement.
Important: Changing a PIVOT
statement can result in an unintentional change to tamr_id
values in the unified dataset. To avoid data loss and maintain the stability of tamr_id
in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set the tamr_id
to a value you're sure is stable in the last transformation. See Managing Primary Keys.
Example
In this example, you want to pivot the generic columns Feature and Value into individual columns for height and age, and collapse the records. Before the pivot, the dataset looks like this, with empty attributes Height and Age waiting to be filled:
PIVOT
matches certain object values from the Feature column. These strings are being cast as arrays, since they are stored in the Feature column as string arrays.
Pivots must end with a MERGE
statement, since they must happen in the context of an aggregating operation.
This transformation produces the following result:
The id column, previously unique, is now multi-valued, as the MERGE BY
grouped them. This allows data to be traced back to its old record ID, if desired.
Like with unpivots, you can regenerate unique IDs using a hash function. Here, the arrays are concatenated and then hashed.
This produces the following final result with one record per person and unique IDs identifying each record:
Updated almost 3 years ago