The syntax of
PIVOT is as follows:
PIVOT <pivots> WHEN <pivot-values> SELECT <generated-columns> ... <merge>,
<pivots>is the list of columns to pivot on.
<pivot-values>are values of the
<generated-columns>are those columns that will be generated if
<pivots> = <pivot-values>.
PIVOTmust happen in the context of an aggregating operation. If you leave out the
MERGEstatement, the error message "Error loading data" appears.
PIVOT does not carry over the
<pivots> or any columns that are mentioned in an expression or used as the output of an expression.
MERGEsubstatement may access:
- Any output column of an expression and
- Any input column that is not aliased by an output column.
The types of the columns generated by a
PIVOT are identical to the types generated by its
MERGE statement. All columns aggregated by the
merge substatement are multi-value columns (arrays). As a result, in Tamr Core
PIVOT generates all multi-value columns.
You can make a number of changes to a
PIVOT to make it more complex. For example, you can run
PIVOT on more than one column, specify multiple feature-value pairs, or gather unexpected data into its own column with an
Important: Changing a
PIVOT statement can result in an unintentional change to
tamr_id values in the unified dataset. To avoid data loss and maintain the stability of
tamr_id in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set the
tamr_id to a value you're sure is stable in the last transformation. See Managing Primary Keys.
In this example, you want to pivot the generic columns Feature and Value into individual columns for height and age, and collapse the records. Before the pivot, the dataset looks like this, with empty attributes Height and Age waiting to be filled:
PIVOT matches certain object values from the Feature column. These strings are being cast as arrays, since they are stored in the Feature column as string arrays.
Pivots must end with a
MERGE statement, since they must happen in the context of an aggregating operation.
This transformation produces the following result:
The id column, previously unique, is now multi-valued, as the
MERGE BY grouped them. This allows data to be traced back to its old record ID, if desired.
Like with unpivots, you can regenerate unique IDs using a hash function. Here, the arrays are concatenated and then hashed.
This produces the following final result with one record per person and unique IDs identifying each record:
Updated about 1 year ago