Checkpoint
CHECKPOINT
is a transformation that does not change the content of your data, but rather changes how your scripts interact with Spark.
This is an advanced feature
Using
CHECKPOINT
incorrectly may actually decrease your performance. If you are not experiencing performance delays, then you should not use it.
If you are experiencing slow performance, CHECKPOINT
can enhance your performance by breaking down the series of transformations into manageable sized chunks. Instead of asking Tamr's transformation service to remember all of the transformations you are trying to complete at once, CHECKPOINT
tells it to work on ONE set of transformations and then cache the results before moving on to the next set of transformations.
If you are experiencing performance delays, place a CHECKPOINT
between logical chunks of work. Choosing when and where to add check points takes experience and experimentation. To add a check point:
CHECKPOINT;
Updated over 4 years ago