How to Update Table in BigQuery if Streaming Does Not Work
This article might be helpful if you’re trying to upsert data in a BigQuery table but the table doesn’t update. Here you will find both the error message and the solution to the problem.
Problem:
You are hitting an error when you are trying to upsert a table in BigQuery using a query string (insert) that looks like this:
{
"createTable": false,
"intermittentCommits": false,
"queryConfig": {
"fetchSize": 20000,
"jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=dataset-id;"
},
"truncateBeforeLoad": true,
"unifyDatasetName": "dataset_name",
"batchInsertSize": 20000,
"targetTableName": "target_table"
}
Error message:
ERROR<287cbf39-eed6-4c37-bdf6-c0e0a26561ea> [2022-07-15 17:13:37,688] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 9728ffd3fa950894
! java.sql.SQLException: UPDATE or DELETE statement over table test_automation would affect rows in the streaming buffer, which is not supported
! at XcoreXgooglebigqueryX210X8102.ntb.a(Unknown Source)
! at XcoreXgooglebigqueryX210X8102.ntb.a(Unknown Source)
! at XcoreXgooglebigqueryX210X8102.vp.execute(Unknown Source)
! at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
! at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94)
! at com.tamr.core.connect.service.streamwrapper.jdbc.DatabaseWrapper.truncateTable(DatabaseWrapper.java:201)
! at com.tamr.core.connect.service.UrlExportService.exportToJdbc(UrlExportService.java:274)
! at com.tamr.core.connect.api.resources.UrlExportResource.exportToJdbc(UrlExportResource.java:204)
! at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
! at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.lang.reflect.Method.invoke(Method.java:498)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
Resolution:
The default insert mode is streaming. When a table is updated via streaming in BigQuery it can take a while (even up to ~90min as reported by some users) for the Table to be editable again. If you need to update soon after write, use the GCSStaging mode by adding ;InsertMode=GCSStaging;GCSBucket=<a-bucket-the-instance-has-read-and-write-access-to>;
to the query string. Have a look at the following guide explaining the parameters: https://cdn.cdata.com/help/DBG/jdbc/pg_finetuneaccess.htm . Once a table has been created or modified using Insert Mode, you must wait until the streaming buffer has been cleared to edit the table again. If you anticipate needing to edit the table soon after after writes you can append ';InsertMode=GCSStaging;GCSBucket=<a-bucket-the-instance-has-read-and-write-access-to>;
' to your JdbcUrl. This insert mode allows edits without a waiting period. Note that the GCS bucket and your BQ table must be in the same google location.
Example:
{
"createTable": false,
"intermittentCommits": false,
"queryConfig": {
"fetchSize": 20000,
"jdbcUrl": "jdbc:googlebigquery:AuthScheme=GCPInstanceAccount;ProjectId=tamr-proj;DatasetId=dataset-id;InsertMode=GCSStaging;GCSBucket=<a-bucket-the-instance-has-read-and-write-acess-to>;"
},
"truncateBeforeLoad": false,
"unifyDatasetName": "dataset_name",
"batchInsertSize": 20000,
"targetTableName": "target_table"
Note: InsertMode=Upload
avoids the limitation of BigQuery's streaming buffer. However, this mode should be used with caution as it temporarily uploads the data to public storage. Alternatively, GCSStaging uses your Google Cloud Storage account to store data.
Updated 11 months ago