KeyValue Size Too Large Error
Problem: This error, KeyValue size too large
shows up when either running an update pairs job or publish clusters job.
Log Error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 162 in stage 2.0 failed 4 times, most recent failure: Lost task 162.3 in stage 2.0 (TID 234, tamr-tamr-dev.c.tamr-cus-staples.internal, executor 1): java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.lang.IllegalArgumentException: KeyValue size too large
at org.apache.hadoop.hbase.client.ConnectionUtils.validatePut(ConnectionUtils.java:599)
This error may occur if any record ends up having a long list of values or an array value with too many elements after using either the merge transformation or pre-group-by.
Fix:
Option 1: You may reduce the number of elements in any arrays to 25 values max either of the following:
- If you are using
merge
transformation
Add a MultiFormula (with all columns selected) likearray.slice2($COL, 0, 25) AS $COL
- If you are using pre-group-by, replace
collect_set
withcollect_subset k=25
.
Option 2: Figure out the large key that is causing the issue.
Workarounds (if large key values cannot be found) to increase the threshold:
The KeyValue size limit is configurable and can be disabled. It has to be set both on the client and the server.
Client configuration
TAMR_HBASE_EXTRA_CONFIG: {"hbase.client.keyvalue.maxsize": "10485760"}
Server configuration
For single node deployment, add hbase.server.keyvalue.maxsize
to hbase-site.xml.j2
configuration file.
<property>
<name>hbase.server.keyvalue.maxsize</name>
<value>10485760</value>
</property>
Restart Tamr and Tamr dependencies for the changes to take effect. Note that this configuration file will get overwritten on upgrades.
Updated almost 2 years ago