Checkpoint
CHECKPOINT
is a transformation that does not change the content of your data, but rather changes how your scripts interact with Spark.
If you are experiencing slow performance or problems when migrating datasets between environments that use different versions of Spark, using CHECKPOINT
statements can enhance system performance.
Important: This is an advanced feature. Using
CHECKPOINT
incorrectly can decrease system performance instead of improving it. If you are not experiencing performance delays, there is no need to addCHECKPOINT
transformations.
CHECKPOINT
breaks a series of transformations into more manageably-sized chunks. Instead of asking the Tamr Core transformation service to remember all of the transformations you are trying to complete at once, CHECKPOINT
tells it to work on a set of transformations and cache the results before moving on to the next set of transformations.
You place CHECKPOINT
statements between logical chunks of transformation work. Choosing where to add checkpoints takes experience and experimentation. The script to add a checkpoint looks like this:
CHECKPOINT;
When you add a CHECKPOINT
, you have the option to include a HINT
to specify the Spark store behavior as either checkpoint.reliable
(the default) or checkpoint.local
. See Statement Modifiers.
Note: Depending on the setup of the underlying Spark cluster, checkpointing to a local store can result in better performance. Use this HINT
value with caution and consult with Tamr Support at [email protected].
To include a HINT
in a CHECKPOINT
statement, use the following syntax:
HINT(checkpoint.local) CHECKPOINT;
Updated over 2 years ago