Pivot
PIVOT
reorganizes the structure of a table by moving rows of data into different columns. For example, with PIVOT
, you can change the shape of a table from skinny to wide.
The syntax of PIVOT
is as follows:
PIVOT <pivots> WHEN <pivot-values> SELECT <generated-columns> ... <merge>
,
where:
<pivots>
is the list of columns to pivot on.<pivot-values>
are values of the<pivots>
columns.<generated-columns>
are those columns that will be generated if<pivots> = <pivot-values>
.<merge>
is aMERGE
statement, sincePIVOT
must happen in the context of an aggregating operation. If you leave out theMERGE
statement, the error message "Error loading data" appears.
PIVOT
does not carry over the <pivots>
or any columns that are mentioned in an expression or used as the output of an expression.
- The
MERGE
substatement may access:- Any output column of an expression and
- Any input column that is not aliased by an output column.
The types of the columns generated by a PIVOT
are identical to the types generated by its MERGE
statement. All columns aggregated by the merge
substatement are multi-value columns (arrays). As a result, in Tamr Core PIVOT
generates all multi-value columns.
You can make a number of changes to a PIVOT
to make it more complex. For example, you can run PIVOT
on more than one column, specify multiple feature-value pairs, or gather unexpected data into its own column with an else
statement.
Important: Changing a PIVOT
statement can result in an unintentional change to tamr_id
values in the unified dataset. To avoid data loss and maintain the stability of tamr_id
in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set the tamr_id
to a value you're sure is stable in the last transformation. See Managing Primary Keys.
Example
In this example, you want to pivot the generic columns Feature and Value into individual columns for height and age, and collapse the records. Before the pivot, the dataset looks like this, with empty attributes Height and Age waiting to be filled:
data:image/s3,"s3://crabby-images/587a1/587a12c6e1f75e1a397402404e1262b76e3fea7c" alt="data_pre_pivot.png 1486"
A dataset with separate records for the same Name value to report each subject's Height and Age. Empty attributes are prepared for Height and Age.
PIVOT
matches certain object values from the Feature column. These strings are being cast as arrays, since they are stored in the Feature column as string arrays.
Pivots must end with a MERGE
statement, since they must happen in the context of an aggregating operation.
data:image/s3,"s3://crabby-images/463e6/463e626c3519bac36fc84cc0dfde8b017dc70aea" alt="pivot_script.png 792"
The complete script for pivoting the Feature values into separate rows for Height and Age, based on identical Name values.
This transformation produces the following result:
data:image/s3,"s3://crabby-images/dc699/dc699925c92353c82f5e305d7a565adeaa3666ac" alt="data_post_pivot.png 1486"
There is now one record for each Name, the Feature and Value attributes are now empty, and the Height and Age attributes are populated.
The id column, previously unique, is now multi-valued, as the MERGE BY
grouped them. This allows data to be traced back to its old record ID, if desired.
Like with unpivots, you can regenerate unique IDs using a hash function. Here, the arrays are concatenated and then hashed.
data:image/s3,"s3://crabby-images/de339/de339da8b557680ae467e133f485754c7e540b59" alt="generate_unique_id.png 794"
A formula, md5(array_concat(id)), that concatenates the values in the id field and then outputs the result back to the id attribute.
This produces the following final result with one record per person and unique IDs identifying each record:
data:image/s3,"s3://crabby-images/bb136/bb1368ac47a660441f86e6290370c6c2cccb7389" alt="data_with_unique_ids.png 1480"
The id column, previously multi-valued, is now a single hash value.
Updated about 3 years ago