Problem: A published clusters job runs for some time (hours) and then fails with an "Elasticsearch Connection refused" error.
Cause: This error is most likely to be the result of a memory constraint. To verify that this is the cause, search for an operating system-level diagnostic message that includes the phrase “Out of memory”. For example:
dmesg -T | grep Out
Results in "[datetime] Out of memory: Kill process nnn (elasticsearch[l) score nnn or sacrifice child", where the datetime falls after the job start and before the failure.
Resolution: To avoid this issue, decrease the settings for the ES_HEAP_SIZE and TAMR_OS_HEADROOM configuration variables by 1G each. See Setting Configuration Variables in the Tamr Documentation.
Updated 4 months ago