How Can I Resolve an "Elasticsearch Connection refused" Error?

Problem: A published clusters job runs for some time (hours) and then fails with an "Elasticsearch Connection refused" error.

Cause: This error is most likely to be the result of a memory constraint. To verify that this is the cause, search for an operating system-level diagnostic message that includes the phrase “Out of memory”. For example:

dmesg -T | grep Out

Results in "[datetime] Out of memory: Kill process nnn (elasticsearch[l) score nnn or sacrifice child", where the datetime falls after the job start and before the failure.

Resolution: To avoid this issue, decrease the settings for the ES_HEAP_SIZE and TAMR_OS_HEADROOM configuration variables by 1G each. See Setting Configuration Variables in the Tamr Documentation.