How To Set Elasticsearch Heap Size?

What is Elasticsearch and how is Tamr using it?

Elasticsearch is an open-source, RESTful, search engine based on the Apache Lucene library. Initially released in 2010 by Elastic, Elasticsearch was designed as a distributed Java solution for bringing full-text search functionality into schema-free JSON documents across multiple database types. It is packaged with Tamr which enables usage of the search engine to search for datasets, records, clusters, and so on in Tamr.

Why is Elasticsearch heap size important?

As we deal with lots of records in datasets, Elasticsearch demands an ample amount of memory to be allocated to it. The memory that is allocated to Elasticsearch is referred to as Elasticsearch Heap size and depends on the total memory of the server and other memory settings. If the heap size is set very low, there is a chance of Elasticsearch running into an “Out of Memory” issue which disrupts the proper functioning of the Tamr instance.

How to set Elasticsearch heap size?

There is already a default value set to the Elasticsearch heap size. You can check the value by using the following steps:

  1. Change directory to 'utils' directory:

cd <TAMR_HOME>/tamr/utils

  1. Set the config variable ES_HEAP_SIZE:

/unify-admin.sh config:get ES_HEAP_SIZE=""

We can customize the heap size by using the following steps:

  1. Change directory to 'utils' directory:
cd \<TAMR_HOME>/tamr/utils
  1. Set the config variable ES_HEAP_SIZE:
./unify-admin.sh config:set ES_HEAP_SIZE="<new value in G>"
  1. Change directory to 'tamr' directory:
cd \<TAMR_HOME>/tamr/
  1. Stop Tamr and Tamr dependencies:
./stop-unify.sh  
./stop-dependencies.sh
  1. Start Tamr and Tamr dependencies:
./start-dependencies.sh  
./start-unify.sh

You should be all set!