Checkpoint

CHECKPOINT is a transformation that does not change the content of your data, but rather changes how your scripts interact with Spark.

Suggest Edits

If you are experiencing slow performance or problems when migrating datasets between environments that use different versions of Spark, using CHECKPOINT statements can enhance system performance.

Important: This is an advanced feature. Using CHECKPOINT incorrectly can decrease system performance instead of improving it. If you are not experiencing performance delays, there is no need to add CHECKPOINT transformations.

CHECKPOINT breaks a series of transformations into more manageably-sized chunks. Instead of asking the Tamr Core transformation service to remember all of the transformations you are trying to complete at once, CHECKPOINT tells it to work on a set of transformations and cache the results before moving on to the next set of transformations.

You place CHECKPOINT statements between logical chunks of transformation work. Choosing where to add checkpoints takes experience and experimentation. The script to add a checkpoint looks like this:

CHECKPOINT;

When you add a CHECKPOINT, you have the option to include a HINT to specify the Spark store behavior as either checkpoint.reliable (the default) or checkpoint.local. See Statement Modifiers.

Note: Depending on the setup of the underlying Spark cluster, checkpointing to a local store can result in better performance. Use this HINT value with caution and consult with Tamr Support at [email protected].

To include a HINT in a CHECKPOINT statement, use the following syntax:

HINT(checkpoint.local) CHECKPOINT;

Updated over 2 years ago