Explode
EXPLODE
creates one row for every element in an array.
You can EXPLODE
an attribute if you want a separate row for each value in an array.
The template for EXPLODE
is
EXPLODE attribute;
or EXPLODE expression AS attribute;
You can only EXPLODE
attributes or expressions which are arrays. If an attribute or expression contains nested arrays, you can EXPLODE
the attribute multiple times. Each statement, however, can only include one EXPLODE
at a time.
Important: Changing an
EXPLODE
statement can result in an unintentional change totamr_id
values in the unified dataset. To avoid data loss and maintain the stability oftamr_id
in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set thetamr_id
to a value you're sure is stable in the last transformation. See Managing Primary Keys.
Basic Example
Your data looks like this, with an array of strings in the Name attribute.
State | Name |
---|---|
MA | John, Smith |
NY | Carol, Baldwin |
When you write the statement EXPLODE Name
, each element of the Name array is separated into its own row:
State | Name |
---|---|
MA | John |
MA | Smith |
NY | Carol |
NY | Baldwin |
INNER and OUTER
There are two forms of EXPLODE
:
EXPLODE INNER
drops the corresponding row when the attribute or expression is an empty array or null.EXPLODE OUTER
preserves empty arrays and nulls, generating a null in the exploded row instead.
If you do not specify INNER
or OUTER
, EXPLODE
is INNER
by default.
Examples of these two forms follow, using the same initial data containing different types of empty and null values. A description is provided for each value so you can see how they are treated differently.
State | Name | Description |
---|---|---|
MA | John, Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex, null | The second value in the array is null |
MI | , | Two empty strings in an array |
CO | null | No array, the value in this cell is null |
TX | null , null | An array of nulls |
KY | An empty array |
If you EXPLODE INNER Name
, the result is:
State | Name | Description |
---|---|---|
MA | John | No nulls or empty values |
MA | Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex | The second value in the array is null |
CA | null | The second value in the array is null |
MI | Two empty strings in an array | |
MI | Two empty strings in an array | |
TX | null | An array of nulls |
TX | null | An array of nulls |
Notice how the rows for CO and KY disappeared: INNER
drops a row if it is an empty array or null. Any nulls or empty values in an array, however, receive their own row. You can see this behavior for CA, MI, and TX: these values were each given their own row.
Alternatively, if you write EXPLODE OUTER Name
, the result is:
State | Name | Description |
---|---|---|
MA | John | No nulls or empty values |
MA | Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex | The second value in the array is null |
CA | null | The second value in the array is null |
MI | Two empty strings in an array | |
MI | Two empty strings in an array | |
CO | null | No array, the value in this cell is null |
TX | null | An array of nulls |
TX | null | An array of nulls |
KY | null | An empty array |
Here you can see the null value for CO is generated by a new null, and a new null is also generated for the empty array in KY.
Updated about 3 years ago