Deploying Single-Node Tamr Core on Google Cloud Platform
Deploy a production-grade Tamr Core solution to a single instance in Google Cloud Platform (GCP) in a few clicks.
This topic describes deployment steps for launching and accessing Tamr on GCP Marketplace, as well as basic network and security requirements, status checks, costs, backups, and support.
Requirements for Users Deploying on GCP
Users deploying Tamr Core on GCP must have PostgreSQL experience and basic Linux experience. Users also must have permissions to run bash scripts and commands on the GCP Tamr Core instance.
What Is Included in the Tamr Marketplace Image on GCP?
The Tamr Core software offered on Google Cloud Platform (GCP) Marketplace is a pre-configured and fully integrated package. It allows you to run expert-assisted machine learning jobs on vast amounts of data.
The Tamr GCP Marketplace Image has the following characteristics:
- It is a single virtual machine (VM) image for the Linux Platform (Ubuntu 22.04).
- The image includes:
- A version of the Tamr Core software, optimized for use with Google Cloud Platform Compute Engine.
- Software dependencies that Tamr Core requires, including HBase, Spark on Yarn, Elasticsearch, a PostgreSQL instance, and ZooKeeper. For a diagram that shows these components, see Deployments. The list of licenses for Open Source software used in the Tamr package is included in the
home/ubuntu/tamr/licenses/unify-licenses/licenses
directory. After you deploy the instance, you can access this directory by connecting to the instance via SSH.
- The image uses the BYOL (Bring Your Own License) model in Google Marketplace, which requires you to have a valid license to use it. You are responsible for purchasing and managing your own license from Tamr. For information, see Accessing the Tamr Core Instance in this guide.
Before You Begin
Before you deploy Tamr Core, complete or verify the following:
- You have a Google Cloud account. To create an account, see Get Started with Google Cloud Platform.
- You have sufficient Google Compute Engine (GCE) resource quota limits and your quota limits allow you to create instances with characteristics that Tamr Core requires. For more information, see GCP Sizing and Limits in this guide.
- You have installed the Google Cloud SDK. You can use it to manage your instance after it is deployed. See Google Cloud SDK Quickstarts in the GCP documentation.
- You have generated SSH keys. You will need the keys to connect to your instance with SSH. See Managing SSH Keys in the GCP documentation.
- You are familiar with GCP access management and shared projects. Shared projects allow multiple users in your team to access any virtual machine instance created within the project. Project users can then establish an SSH connection to the GCP instance. To keep your VM instance and SSH key private, create a GCP project in a VPC and then create and launch your VM instances in this project. See Granting Access to Projects in the GCP documentation.
- You have set firewall rules for your instance. (See Setting Up Firewall Rules.) Firewall rules allow you to specify the type of access - internal access via a VPC network, or a secure public access over HTTPS, and to specify ports for each type of connection that must be kept open. You will use these ports to access the Tamr Core user interface and run commands to check the health of the Tamr Core instance.
- For more information, use the links to the GCP documentation about security included in the Security section in this guide.
Setting Up Firewall Rules
Configure the firewall in GCP. See VPC Firewall Rules Overview in the Google Virtual Private Cloud documentation for instructions. Firewall configuration requirements:
- Allow only internal access to Tamr Core default port
9100
(via TCP). - Open port
443
for HTTPS, with a restrictive IP range that you specify using IPv4 addresses in CIDR notation, such as1.2.3.4/32
.
Note: If you plan to forward HTTP traffic to HTTPS, also open port80
.
Accessing Tamr via HTTPS
If you are not using a VPC, secure external access to Tamr Core via HTTPS via a reverse proxy from the NGINX application server. For more information, see Installing NGINX and Configuring HTTPS.
For non-production environments configuring a firewall, NGINX, and HTTPS are strongly recommended but not required.
Important: If you do not configure a firewall, NGINX, and HTTPS in a non-production deployment, all users on the network will have access to the data. Use a unique password for this deployment.
To configure a firewall rule for the Tamr Core instance:
- Sign in to https://console.cloud.google.com.
- Select the correct project from the dropdown. This is project that will contain the Tamr Core instance. This project’s VPC should allow access to those users in your team who will have access to Tamr Core.
- Select Products and Services.
- In the Networking section, select VPC Network > Firewall Rules.
- Select Create Firewall Rule and enter the following information:
- Name: Tamr recommends the following naming format:
“default-allow-9100”
- Direction of traffic:
Ingress
- Action on match:
Allow
- Targets:
All instances in the network
- Source filter:
IP ranges
. Tamr recommends only allowing ingress traffic from a private VPC. If you wish to allow ingress traffic over the public Internet, specify a restrictive CIDR range only for port 443 and configure Tamr Core with an SSL certificate via an NGINX server reverse proxy. Tamr does not recommend allowing access via the public Internet on ports 80 or 9100. For more information, see Installing NGINX and Configuring HTTPS. - Source IP ranges: Specify the ranges for your VPC.
- Protocols and ports: Select Specified protocols and ports, then enter:
9100: “tcp:9100”
.
- Choose Create. Your new firewall rule should appear on the Firewall Rules page. This rule will affect all instances in the project.
Deploying a Tamr Core Instance from the GCP Console
To deploy a Tamr Core instance:
- Sign in to GCP at https://console.cloud.google.com using your existing Google account.
- Select the correct project.
- Select Products and Services > Marketplace, or open the console at https://cloud.google.com/marketplace/ and choose Explore Marketplace.
- Search for and select the Tamr BYOL image.
- On the Tamr solution page, choose Launch on Compute Engine.
- Configure the Tamr GCP deployment:
- Select a zone.
- Select a machine type. Optionally change the number of cores and amount of memory. See GCP Sizing and Limits.
- Specify the boot disk type and size.
- Optionally change the network name and subnetwork names. Be sure that whichever network you specify has port 9100 (TCP) exposed via a firewall rule. See Setting Up Firewall Rules.
- Read and accept the GCP Marketplace Terms of Service.
- Choose Deploy when you are done. Tamr Core begins deploying. Note that this can take several minutes. A summary page displays when Tamr Core is successfully deployed. This page includes the instance ID.
- Select the Instance link to retrieve the external IP address for accessing Tamr Core. This is the host address of your Tamr instance. You will later use it in
http://<hostname>:9100
. - Obtain a license key, username, and password by contacting Tamr Support at [email protected]. You must provide the license key and these credentials when accessing the Tamr Core instance via a browser.
- Change the default database password. See Configuring PostgreSQL for instructions.
Accessing the Tamr Core Instance
Note: The following procedure assumes that users in your team are already connected via your own VPC network.
- To access the Tamr Core instance for the first time after it has been deployed, you need to have a license key and an initial set of credentials (username and password).
- To access the Tamr Core instance on a regular basis (after you have provided the license key) inside a VPC, use:
http://<hostname>:9100
.
To access the Tamr instance for the first time and provide the license key:
- In your browser, sign in to the Google Compute Engine Console at https://console.cloud.google.com/.
- Select the correct project and locate your Tamr instance.
- On the VM Instances page, SSH to the new VM Instance.
- From the SSH drop-down menu, select Open in Browser Window.
- Using the command line, provide the license key to Tamr:
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_LICENSE_KEY="<license-key-value>"
. See Setting the License Key.
Note: To connect via SSH, you can also use the gcloud compute ssh, or your own terminal with SSH. - Restart Tamr Core:
cd ${TAMR_UNIFY_HOME}/tamr
./stop-unify.sh
./stop-dependencies.sh
./start-dependencies.sh
./start-unify.sh
See Restarting Tamr Core.
- Go to the following URL to access the Tamr Core instance:
http://<hostname>:9100
. - Enter the set of credentials you received from Tamr. Change the password immediately.
Now you are able to use Tamr Core. To verify your installation, see Tamr Installation Verification Steps.
Checking Tamr Core Health and License Status
You can use the Tamr Core health check API to check Tamr Core health status. The health API endpoint returns health checks for the service and for ZooKeeper, which Tamr Core uses for configuration management.
To check Tamr Core health status:
- Open the health API endpoint at:
http://<hostname>:9100/docs#!/
. - Navigate to
service/health
. - Select Try it out, or use the following cURL command:
curl -X GET --header 'Accept: application/json' 'http://<hostname>:9100/api/service/health'
.
If a health check failure occurs, restart the Tamr Core instance to recover from the failure. To help troubleshoot the instance, access the Tamr Core Help Center.
If you receive a "Tamr license is not valid" error on login, see the Tamr Core Help Center and To update a license key set by a configuration file.
#Starting and Stopping Tamr Core
Start and stop Tamr Core using these scripts located in the ${TAMR_UNIFY_HOME}/tamr
directory:
./stop-unify.sh
./stop-dependencies.sh
./start-dependencies.sh
./start-unify.sh
For information, see Restarting Tamr.
Security
To ensure secure access, users in your team must have access to your team’s VPC that is used for the GCP project containing your instance.
Tamr recommends that you use GCP storage volume encryption to protect your data. See Data Encryption Options in the GCP documentation.
Note: Encryption at rest: Because a standard Tamr Core deployment on cloud infrastructure uses dedicated service instances, Tamr Core depends on service-level encryption for encryption at rest. This ensures that Tamr Core’s data is encrypted using keys that are distinct from those used by other applications, while also allowing keys to be managed using the cloud provider’s standard key management service.
Also see Securely connecting to VM instances and the Google Cloud Security documentation.
Costs
The cost of running the Tamr Core instance is a combination of:
- Tamr Core cost. Tamr Core cost is per license, with additional cost for optional services and support. To obtain a license key, contact Tamr Support at [email protected].
- Google infrastructure costs for the virtual machine on which you are running Tamr Core. See Google VM Instance Pricing.
- GCP storage costs. Optionally, you can choose to store Tamr Core backups in Google storage. See Disks and images pricing.
Scaling Up
To scale your Tamr Core deployment on GCP, use individual sizing increases for your Google compute instance. If you need additional storage, attach an external storage drive in GCP. For scaling out your deployment, contact your Tamr account representative or Tamr Support at [email protected].
Backups and Disaster Recovery
Take regular backups of Tamr Core and keep the backups in Google storage in a different Availability Zone than your GCP instance. To create backups, use the Tamr Core backup API. For information, see Backup.
Upgrades
Tamr releases new software versions frequently. While Tamr strives to maintain the most recent version available on Google Cloud Platform Marketplace, your instance version may not be the latest and is not automatically upgraded. To upgrade to the most recent version, or to create a custom deployment in Google Cloud Platform, contact Tamr Support at [email protected].
Configuring Core Connect
To move data files from cloud storage into Tamr Core, and export datasets from Tamr Core to cloud storage, you use the Core Connect service. See Configuring Core Connect.
Monitoring
Google Cloud's operation suite (formerly Stackdriver) is GCP's built-in cloud monitoring tool. It is designed to monitor, troubleshoot, and improve cloud infrastructure and application performance.
Logging
You can use Tamr Core logs and GCP's logging services for Spark, Hbase, and PostgreSQL. See Logging in Cloud Platform Deployments.
Support
For technical support, contact Tamr Support at [email protected].
Updated about 1 year ago