Explode
EXPLODE
creates one row for every element in an array.
You may choose to EXPLODE
an attribute if you want a separate row for each value in an array.
The template for EXPLODE
is
EXPLODE attribute;
or EXPLODE expression AS attribute;
You can only EXPLODE
attributes or expressions which are arrays. If an attribute or expression contains nested arrays, you can EXPLODE
the attribute multiple times. Each statement, however, can only include one EXPLODE
at a time.
Basic Example
Let's say our data looks like this, with an array of strings in the Name attribute.
State | Name |
---|---|
MA | John, Smith |
NY | Carol, Baldwin |
You could write the statement EXPLODE Name
and each element of the Name array will be separated into its own row:
State | Name |
---|---|
MA | John |
MA | Smith |
NY | Carol |
NY | Baldwin |
INNER and OUTER
There are two forms of EXPLODE
:
-
EXPLODE INNER
drops the corresponding row when the attribute or expression is an empty array or null -
EXPLODE OUTER
preserves empty arrays and nulls, generating a null in the exploded row instead.
If you do not specify INNER
or OUTER
, EXPLODE
is INNER
by default.
Let's explore these two forms. We'll start with initial data containing different types of empty and nulls, which we describe in a third column so you can see how they are treated differently.
Our initial data looks like this:
State | Name | Description |
---|---|---|
MA | John, Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex, null | The second value in the array is null |
MI | , | Two empty strings in an array |
CO | null | No array, the value in this cell is null |
TX | null , null | An array of nulls |
KY | An empty array |
If we EXPLODE INNER Name
, the result will be:
State | Name | Description |
---|---|---|
MA | John | No nulls or empty values |
MA | Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex | The second value in the array is null |
CA | null | The second value in the array is null |
MI | Two empty strings in an array | |
MI | Two empty strings in an array | |
TX | null | An array of nulls |
TX | null | An array of nulls |
Notice how the rows for CO and KY disappeared since INNER
drops a row if it is an empty array or null. Any nulls or empty values in an array, however, received their own row. You can see this behavior for CA, MI, and TX, these values were each given their own row.
Alternatively, if we had written EXPLODE OUTER Name
, the result will be:
State | Name | Description |
---|---|---|
MA | John | No nulls or empty values |
MA | Smith | No nulls or empty values |
NY | Carol | A single value in an array |
CA | Alex | The second value in the array is null |
CA | null | The second value in the array is null |
MI | Two empty strings in an array | |
MI | Two empty strings in an array | |
CO | null | No array, the value in this cell is null |
TX | null | An array of nulls |
TX | null | An array of nulls |
KY | null | An empty array |
Here you can see the null value for CO is generated by a new null, and a new null is also generated for the empty array in KY.
Updated over 4 years ago