Checkpoint
Overview
This is an advanced feature
Using
checkpoint
incorrectly may actually decrease your performance. If you are not experiencing performance delays, then you should not use it.
Checkpoint
is a transformation that does not change the content of your data, but rather changes how your scripts interact with Spark.
If you are experiencing slow performance, checkpoint
can enhance your performance by breaking down the series of transformations into manageable sized chunks. Instead of asking Tamr's transformation service to remember all of the transformations you are trying to complete at once, checkpoint
tells it to work on a set of transformations at once, and then cache the results before moving on to the next set of transformations.
If you are experiencing performance delays, place a checkpoint between logical chunks of work. Choosing when and where to write checkpoints takes experience and experimentation. To write a checkpoint:
Checkpoint;
Updated over 5 years ago