How to Optimize Large Single-Node Tamr Core with Elasticsearch Enabled

Configuration variable optimization for large single-node Tamr Core with Elasticsearch enabled

In order to keep adequate resources free for user interactions through the UI and background tasks functioning when Tamr Jobs are running, apply the following Tamr configuration optimizations.

  1. Set the following configuration variables. See Setting Configuration Variables :
TAMR_SPARK_CORES: 21
TAMR_JOB_SPARK_EXECUTOR_INSTANCES: 4
TAMR_JOB_SPARK_DRIVER_MEM: 8G
TAMR_OS_HEADROOM: 32G
TAMR_CONGLOMERATE_MEMORY: 16G
ES_HEAP_SIZE: 32G
TAMR_ES_BATCH_ERROR_BUDGET: 0.02
TAMR_HBASE_GC_INTERVAL_BETWEEN_GC_IN_SECONDS: 86400
TAMR_HBASE_GC_MAX_CONCURRENT_TABLES: 1
TAMR_DATASET_NUMBER_OF_VERSIONS_TO_KEEP: 2
  1. Restart Tamr Core and its dependencies. See Restarting Tamr Core.
  2. Trigger a manual Hbase Garbage Collection job for a one time cleanup/catchup. Visit <tamr-host>docs?service=dataset and run POST /hbase/submitGarbageCollectionJob.