HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Deploying Tamr Core on Google Cloud Platform

Deploy a production-grade Tamr Core solution to a single instance in Google Cloud Platform (GCP) in a few clicks.

This topic describes deployment steps for launching and accessing Tamr on GCP Marketplace, as well as basic network and security requirements, status checks, costs, backups, and support.

Requirements for Users Deploying on GCP

Users deploying Tamr Core on GCP must have PostgreSQL experience and basic Linux experience. Users also must have permissions to run bash scripts and commands on the GCP Tamr Core instance.

What Is Included in the Tamr Marketplace Image on GCP?

The Tamr Core software offered on Google Cloud Platform (GCP) Marketplace is a pre-configured and fully integrated package. It allows you to run expert-assisted machine learning jobs on vast amounts of data.

The Tamr GCP Marketplace Image has the following characteristics:

  • It is a single virtual machine (VM) image for the Linux Platform (Ubuntu 18.04).
  • The image includes:
  • A version of the Tamr Core software, optimized for use with Google Cloud Platform Compute Engine.
  • Software dependencies that Tamr Core requires, including HBase, Spark on Yarn, Elasticsearch, a PostgreSQL instance, and ZooKeeper. For a diagram that shows these components, see Deployments. The list of licenses for Open Source software used in the Tamr package is included in the home/ubuntu/tamr/licenses/unify-licenses/licenses directory. After you deploy the instance, you can access this directory by connecting to the instance via SSH.
  • The image uses the BYOL (Bring Your Own License) model in Google Marketplace, which requires you to have a valid license to use it. You are responsible for purchasing and managing your own license from Tamr. For information, see Accessing the Tamr Instance in this guide.

Before You Begin

Before you deploy Tamr Core, complete or verify the following:

  • You have a Google Cloud account. To create an account, see Get Started with Google Cloud Platform.
  • You have sufficient Google Compute Engine (GCE) resource quota limits and your quota limits allow you to create instances with characteristics that Tamr Core requires. For more information, see GCP Sizing and Limits in this guide.
  • You have installed the Google Cloud SDK. You can use it to manage your instance after it is deployed. See Google Cloud SDK Quickstarts in the GCP documentation.
  • You have generated SSH keys. You will need the keys to connect to your instance with SSH. See Managing SSH Keys in the GCP documentation.
  • You are familiar with GCP access management and shared projects. Shared projects allow multiple users in your team to access any virtual machine instance created within the project. Project users can then establish an SSH connection to the GCP instance. To keep your VM instance and SSH key private, create a GCP project in a VPC and then create and launch your VM instances in this project. See Granting Access to Projects in the GCP documentation.
  • You have set firewall rules for your instance. (See Setting Up Firewall Rules.) Firewall rules allow you to specify the type of access - internal access via a VPC network, or a secure public access over HTTPS, and to specify ports for each type of connection that must be kept open. You will use these ports to access the Tamr Core user interface and run commands to check the health of the Tamr Core instance.
  • For more information, use the links to the GCP documentation about security included in the Security section in this guide.

Setting Up Firewall Rules

Configure the firewall in GCP. See VPC Firewall Rules Overview in the Google Virtual Private Cloud documentation for instructions. Firewall configuration requirements:

  • Allow only internal access to Tamr Core default port 9100 (via TCP).
  • Open port 443 for HTTPS, with a restrictive IP range that you specify using IPv4 addresses in CIDR notation, such as 1.2.3.4/32.
    Note: If you plan to forward HTTP traffic to HTTPS, also open port 80.

Accessing Tamr via HTTPS

If you are not using a VPC, secure external access to Tamr Core via HTTPS via a reverse proxy from the NGINX application server. For more information, see Installing NGINX and Configuring HTTPS.

For non-production environments configuring a firewall, NGINX, and HTTPS are strongly recommended but not required.

importantimportant Important: If you do not configure a firewall, NGINX, and HTTPS in a non-production deployment, all users on the network will have access to the data. *Use a unique password for this deployment.

To configure a firewall rule for the Tamr Core instance

  1. Sign in to https://console.cloud.google.com.
  2. Select the correct project from the dropdown. This is project that will contain the Tamr Core instance. This project’s VPC should allow access to those users in your team who will have access to Tamr Core.
  3. Select Products and Services.
  4. In the Networking section, select VPC Network > Firewall Rules.
  5. Select Create Firewall Rule and enter the following information:
  • Name: Tamr recommends the following naming format: “default-allow-9100”
  • Direction of traffic: Ingress
  • Action on match: Allow
  • Targets: All instances in the network
  • Source filter: IP ranges. Tamr recommends only allowing ingress traffic from a private VPC. If you wish to allow ingress traffic over the public Internet, specify a restrictive CIDR range only for port 443 and configure Tamr Core with an SSL certificate via an NGINX server reverse proxy. Tamr does not recommend allowing access via the public Internet on ports 80 or 9100. For more information, see Installing NGINX and Configuring HTTPS.
  • Source IP ranges: Specify the ranges for your VPC.
  • Protocols and ports: Select Specified protocols and ports, then enter: 9100: “tcp:9100” .
  1. Choose Create. Your new firewall rule should appear on the Firewall Rules page. This rule will affect all instances in the project.

Deploying a Tamr Core Instance from the GCP Console

To deploy a Tamr Core instance

  1. Sign in to GCP at https://console.cloud.google.com/ using your existing Google account.
  2. Select the correct project.
  3. Select Products and Services > Marketplace, or open the console at https://cloud.google.com/marketplace/ and choose Explore Marketplace.
  4. Search for and select the Tamr cloud image.
  5. On the Tamr solution page, choose Launch on Compute Engine.
  6. Configure the Tamr GCP deployment:
  • Select a zone.
  • Select a machine type. Optionally change the number of cores and amount of memory. See GCP Sizing and Limits.
  • Specify the boot disk type and size.
  • Optionally change the network name and subnetwork names. Be sure that whichever network you specify has port 9100 (TCP) exposed via a firewall rule. See Setting Up Firewall Rules.
  1. Read and accept the GCP Marketplace Terms of Service.
  2. Choose Deploy when you are done. Tamr Core begins deploying. Note that this can take several minutes. A summary page displays when Tamr Core is successfully deployed. This page includes the instance ID.
  3. Select the Instance link to retrieve the external IP address for accessing Tamr Core. This is the host address of your Tamr instance. You will later use it in http://<hostname>:9100.
  4. Obtain a license key, username, and password by contacting Tamr Support. You must provide the license key and these credentials when accessing the Tamr Core instance via a browser.

Configure the DMS

To move data files from cloud storage into Tamr Core, and exported datasets from Tamr Core to cloud storage, you use the Data Movement Service. See Configuring the Data Movement Service.

Accessing the Tamr Core Instance

Note: The following procedure assumes that users in your team are already connected via your own VPC network.

  • To access the Tamr Core instance for the first time after it has been deployed, you need to have a license key and an initial set of credentials (username and password).
  • To access the Tamr Core instance on a regular basis (after you have provided the license key) inside a VPC, use: http://<hostname>:9100.

To access the Tamr instance for the first time and provide the license key

  1. In your browser, sign in to the Google Compute Engine Console at https://console.cloud.google.com/.
  2. Select the correct project and locate your Tamr instance.
  3. On the VM Instances page, SSH to the new VM Instance.
  4. From the SSH drop-down menu, select Open in Browser Window.
  5. Using the command line, provide the license key to Tamr: ${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_LICENSE_KEY="<license-key-value>". See Setting the License Key.
    Note: To connect via SSH, you can also use the gcloud compute ssh, or your own terminal with SSH.
  6. Restart Tamr Core:
cd ${TAMR_UNIFY_HOME}/tamr
 ./stop-unify.sh
 ./stop-dependencies.sh
 ./start-dependencies.sh
 ./start-unify.sh

See Restarting Tamr Core.

  1. Go to the following URL to access the Tamr Core instance: http://<hostname>:9100.
  2. Enter the set of credentials you received from Tamr. Change the password immediately.

Now you are able to use Tamr Core. To verify your installation, see Tamr Installation Verification Steps.

Checking Tamr Core Health Status

Use the Tamr Core health check API to check Tamr Core health status. The health API endpoint returns health checks for the service and for ZooKeeper, which Tamr Core uses for configuration management.

To check Tamr Core health status

  1. Open the health API endpoint at: http://<hostname>:9100/docs#!/.
  2. Navigate to service/health.
  3. Select Try it out, or use the curl command:
    curl -X GET --header 'Accept: application/json' 'http://<hostname>:9100/api/service/health'.

If a health check failure occurs, restart the Tamr Core instance to recover from the failure. To help troubleshoot the instance, access the Help Center knowledge base.

Checking the Status of the Tamr Core License

To check the status of your Tamr Core license

  1. Open the health API endpoint at http://<hostname>:9020/docs#!/api/service/health.
  2. Select Try it out, or use the curl command curl -X GET --header 'Accept: application/json' 'http://<hostname>:9020/api/service/health'.
  3. Check that the response body for license and health return true.

If true is not returned, contact Tamr Support to request a new license.

Starting and Stopping Tamr Core

Start and stop Tamr Core using these scripts located in the ${TAMR_UNIFY_HOME}/tamr directory:

  • ./stop-unify.sh
  • ./stop-dependencies.sh
  • ./start-dependencies.sh
  • ./start-unify.sh

For information, see Restarting Tamr.

Security

To ensure secure access, users in your team must have access to your team’s VPC that is used for the GCP project containing your instance.

Tamr recommends that you use GCP storage volume encryption to protect your data. See Data Encryption Options in the GCP documentation.

Also see Securely connecting to VM instances and the Google Cloud Security documentation.

Costs

The cost of running the Tamr Core instance is a combination of:

  • Tamr Core cost. Tamr Core cost is per license, with additional cost for optional services and support. To obtain a license key, contact Tamr Support.
  • Google infrastructure costs for the virtual machine on which you are running Tamr Core. See Google VM Instance Pricing.
  • GCP storage costs. Optionally, you can choose to store Tamr Core backups in Google storage. See Disks and images pricing.

Scaling Up

To scale your Tamr Core deployment on GCP, use individual sizing increases for your Google compute instance. If you need additional storage, attach an external storage drive in GCP. For scaling out your deployment, contact your Tamr account representative or Tamr Support.

Backups and Disaster Recovery

Take regular backups of Tamr Core and keep the backups in Google storage in a different Availability Zone than your GCP instance. To create backups, use the Tamr Core backup API. For information, see Backup.

Upgrades

Tamr releases new software versions frequently. While Tamr strives to maintain the most recent version available on Google Cloud Platform Marketplace, your instance version may not be the latest and is not automatically upgraded. To upgrade to the most recent version, or to create a custom deployment in Google Cloud Platform, contact Tamr Support.

Monitoring

Google Cloud's operation suite (formerly Stackdriver) is GCP's built-in cloud monitoring tool. It is designed to monitor, troubleshoot, and improve cloud infrastructure and application performance.

Logging

You can use Tamr Core logs, DMS logs, and GCP's logging services for Spark, Hbase, and PostgreSQL. See Logging in Cloud Platform Deployments.

Support

For technical support, contact Tamr Support.


Did this page help you?