BackUp Failed Error SnapshotDoesNotExistException

Note: This workaround should only be applied for certainly derived datasets (e.g. a ‘grouped entities’ dataset) and not for any other dataset. Contact [email protected] if you have questions on whether or not this article applies to you.

Problem: Backup fails due to error SnapshotDoesNotExistException.

{

"errorMessage": "org.apache.hadoop.hbase.snapshot.SnapshotDoesNotExistException: org.apache.hadoop.hbase.snapshot.SnapshotDoesNotExistException: Snapshot '_2021_2D_05_2D_06__19_2D_53_2D_44_2D_869_CUSTOMER_SITE_MASTERING_unified_dataset_dedup_grouped_entities' doesn't exist on the filesystem\n\tat org.apache.hadoop.hbase.master.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:273)\n\tat org.apache.hadoop.hbase.master.MasterRpcServices.deleteSnapshot(MasterRpcServices.java:521)\n\tat org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58583)\n\tat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)\n\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)\n",

}

No definitive cause has been discovered yet.

Releases Affected: First observed on v2020.20, might still recur on recent versions

Releases Fixed: N/A

Resolution/Workaround:

Phase I:

  1. Stop Tamr.
  2. Drop the table and recreate it:

a. You can see the table name from the error message. In the example above it is -CUSTOMER_SITE_MASTERING_unified_dataset_dedup_grouped__entities

b. Open the hbase shell via

${TAMR_HOME}/hbase-1.3.1/bin/hbase shell

and then execute the following commands - replacing the table name below with the table from your error message:

drop

'tamr:CUSTOMER__SITE__MASTERING__unified__dataset__dedup__grouped__entities'

create

'tamr:CUSTOMER__SITE__MASTERING__unified__dataset__dedup__grouped__entities', {NAME=>'C', VERSIONS=>2147483647, KEEP_DELETED_CELLS => false, COMPRESSION => 'SNAPPY' }

Phase II:

Let Tamr repopulate the data:

  1. Update the unified dataset of that project.
  2. Set TAMR_DEDUP_DISABLE_INCREMENTAL configuration variable to TRUE. See instructions here.
"disableIncrementalDedup": true
  1. Generate pairs.
  2. Update results.
  3. Publish clusters.
  4. Set TAMR_DEDUP_DISABLE_INCREMENTAL configuration variable back to FALSE. Follow the same instructions described here.
"disableIncrementalDedup": false