Error in Uploading UTF-8 with BOM

Tamr does not support the upload of UTF-8 with BOM (byte order mark), though it is a feature that may be incorporated in a later release.

The error that occurs when a file with a BOM is uploaded resembles:

    Upload failed. com.tamr.common.except.ServiceException: Error while processing input stream for dataset record updates.
    Caused by:
{"processedCommands":0,"successful":false,"revisionId":null,"validationErrors":["Command '{\"action\":\"CREATE\",\"record\":{details}' had errors [EMPTY_RECORD_ID]"]}. Please check the logs for more information.

If you encounter this error, it may be because your file has been saved with BOM.

Unfortunately, Excel automatically saves UTF-8 files with BOM, as do some other text editors. To remove the BOM, first see if the editor you are using allows you to save without BOM. If not, there are a few options to get around this issue, depending on your operating system.

Once the BOM is removed, the file will not be rendered properly by some editors, such as Excel, but can be uploaded to Unify.

Linux or Mac

This solution will also work on a bash shell on Windows, as long as Vim is present. Vim is by default configured in Git BASH. It is not a default package in Cygwin, so it must be selected during setup (which can be run again to add packages).

  1. Open terminal.
  2. Navigate to the directory containing the file using
    cd <path> For example, if the file is saved in Documents/Data,
    cd Documents/Data
  3. Open the file with Vim.
    vi <filename> Note that if your filename has spaces, you will need to put the name in quotes. The filename also includes the extension (probably .csv).
  4. Check if it has BOM with
    :setlocal bomb? which will output
    bomb If it does not, then BOM is not the reason for the upload error.
  5. Remove the BOM with
    :setlocal nobomb
  6. Save changes and exit vim with
    :wq

Windows

On Windows, BOM can be removed using PowerShell.

  1. Open Windows PowerShell.
  2. Navigate to the directory containing the file using
    cd <path> For example, if the file is saved in Documents/Data,
    cd Documents/Data
  3. Check if the file contains a BOM with
    Get-Content <filename> -Encoding Byte -TotalCount 3 Note that if your filename has spaces, you will need to put the name in quotes. The filename also includes the extension (probably .csv).
    This command returns the first three byte values in the file. If there is a BOM, the result will be:
    239 187 191 If it is something else, there is no BOM, so there is a different reason for the upload error.
  4. Remove the BOM with
    Get-Content <filename> -Encoding Byte -Tail ((Get-Item <filename>).length - 3) | Set-Content -Path <new-filename> -Encoding Byte This command gets all of the content of the file except for the first three bytes, and outputs that content to the new file.
  5. Repeat step 3 to ensure that there is no more BOM. Some files may contain multiple BOM, and in this case, step 4 must be repeated.

Once this is done, the old file will be as before, while the file created as <new-filename> will contain the same content without a BOM.