Skip to main content

Install for High Availability on AWS

In the Amazon AWS infrastructure, the Designer Cloud Powered by Trifacta platform can be deployed in a high availability failover mode across multiple modes This section describes the process for installing the platform across multiple, highly available nodes.

Note

This section applies to customer-managed deployments of the Designer Cloud Powered by Trifacta platform on AWS.

Limitations

The following limitations apply to this feature:

  • This form of high availability is not supported for Marketplace installations.

  • During installation, the platform is configured to use the same account to access AWS resources. Per-user authentication must be set up afterward.

Prerequisites

Before you begin, please verify that you have met the following requirements.

AWS infrastructure

  • AWS account

  • EKS cluster (see below)

  • S3 bucket:

    • S3 is required for the base storage layer.

    • A set of permissions must be enabled for the accounts or IAM roles used to access the bucket. For more information, see S3 Access in the Configuration Guide.

  • EMR cluster. For more information, see Configure for EMR in the Configuration Guide.

  • Amazon RDS database:

    • The Alteryx databases must be hosted on the same instance and port in Amazon RDS.

    • PostgreSQL 9.6 or 12.3

    • To ensure sufficient database connections, the instance size must be larger than m4.large.

    • The actual databases are installed as part of the installation process.

  • EFS mounts:

EKS cluster

  • Kubernetes version 1.15+

  • Subnets are available across multiple zones

Note

You should avoid using a default namespace. This namespace should be shared by other apps using your cluster.

Instance types:

Note

Instance sizes should be larger than m4.2xlarge.

Minimum

Recommended

Cores

8

16

RAM

12 GB

16 GB

Disk space

10 GB minimum

10 GB minimum

Note

If you are publishing to S3, additional disk space should be reserved for a higher number of concurrent users or larger data volumes. For more information on fast upload to decrease disk requirements, see S3 Access in the Configuration Guide.

For more information on installing and managing an EKS cluster, see https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html.

CLIs

The following command line interfaces are referenced as part of this install process:

  • awscli

  • aws-iam-authenticator

  • kubectl

  • helm (version 3)

  • docker

Alteryx assets

The following assets are available from the Alteryx FTP site:

  1. Alteryx image file. This tar file contains the platform software to install.

  2. Alteryx helm package. This setup bundle jar includes:

    1. TGZ file

    2. Override template file for configuring initial values

  3. Alteryx license key file. After you install the software, you must upload the license key file through the application. For more information, see License Key.

Install Steps

Configure Docker image

Please complete the following steps to download and configure the Docker image for use.

Steps:

  1. Create an AWS Elastic Container Registry (ECR) repository to store Alteryx images. For more information, seehttps://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html.

  2. Download the image file from the Alteryx FTP site. Image filename should be in the following format:

    trifacta-docker-image-ha.x.y.z.tar

    where:

    x.y.z maps to the Release number (Release 7.6.0).

  3. Load the image file into your ECR repository. For more information, see https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html.

  4. The image file has been loaded into the repository.

Configure AWS Kubernetes

Prerequisites:

  • AWS Kubernetes cluster is operational.

  • These steps use the AWS CLI and kubetcl to configure your Kubernetes deployment on AWS.

Steps:

  1. Configure the AWS CLI to use the eks-admin user for your Kubernetes cluster. For more information, see https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html.

  2. Update the Kubernetes configuration (update-kubeconfig):

    aws eks update-kubeconfig --name <eks-cluster-name> --region <aws-region>

    where:

    <eks-cluster-name> is the name of the cluster to use for the cluster.

    <aws-region> is the region name where the cluster is located.

    Tip

    Retain the EKS cluster name and region. These values may be used later during configuration.

  3. Switch to the namespace in the above cluster:

    Note

    You should avoid using a default namespace. This namespace should be shared by other apps using your cluster.

    kubectl config set-context --current --namespace=<namespace>
  4. Verify that you are ready to use the namespace in the cluster:

    kubectl get pods
  5. The cluster is ready for use.

Configure DB credential secrets

For each of the Alteryx databases that you have installed, you must set up database credential secrets. Please use the following pattern for configuring your database secrets.

Note

Except for db-credentials-admin, each of these secrets maps to a specific Alteryx database. db-credentials-adminis the username/password of the admin user of the RDS instance. The admin credentials are used to create and initialize all Alteryx databases.

kubectl create secret generic db-credentials-webapp --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-scheduling-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-time-based-trigger-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-artifact-storage-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-authorization-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-configuration-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-job-metadata-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-secure-token-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-job-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-contract-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-orchestration-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-optimizer-service --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-batch-job-runner --from-literal=username=<db_username> --from-literal=password=<db_password>
kubectl create secret generic db-credentials-admin --from-literal=username=<db_username> --from-literal=password=<db_password>

where:

  • <db_username> = username to access the specified database.

  • <db_password> = password corresponding to the specified database.

Configure the deployment

Steps:

  1. Unpack the tar file obtained from the FTP site:

    untar trifacta-ha-setup-bundle-x.y.z.tar

    where:

    x.y.z maps to the Release number (Release 7.6.0).

  2. The package contains:

    1. A values override template file: values.override.template.yaml

    2. A Alteryx helm packagetgz file

  3. Create a copy of the value overrides template file:

    cp values.override.template.yaml values.override.yaml
  4. Edit the values.override.yaml file. Instructions are below.

Edit values overrides

Example file:

# Template for minimal configuration
# to get a High-availability deployment of Trifacta up and running
 
replicaCount: 2
image:
  repository: "<PATH TO IMAGE_REPO>"
loadBalancer:
  ssl:
    # ARN To certificate in ACM
    certificateARN: arn:aws:acm:XXXX:certificate/XXXXXXX
nfs:
  conf:
    server: "<NFS SERVER HOST>"
    path: "/"
  logs:
    server: "<NFS SERVER HOST>"
    path: "/"
database:
  host: "<DATABASE HOST>"
  port: "5432"
  type: postgresql
 
triconfOverrides:
  "aws.accountId" : "<AWS ACCT_ID>"
  "aws.credentialProvider": "<AWS CRED PROVIDER>"
  "aws.systemIAMRole": "arn:aws:iam::XXXX:role/XXXXXX"
  "aws.s3.bucket.name": "<AWS S3 BUCKET NAME>"
  "aws.s3.key": "<AWS S3 KEY>"
  "aws.s3.secret": "<AWS S3 SECRET>"
 
# Enable a fluentd Statefulset to collect application logs.
fluentd:
  enabled: true
  # Specify values overrides for fluentd chart here
 
# Enable a fluent DaemonSet to collect node, K8s dataplane and cluster logs
fluentd-daemonset:
  enabled: false
  # Specify values overrides for the fluentd-daemonset chart here
 
# Cluster details must be specified if fluentd logging is enabled
global:
  cluster:
    name: "<CLUSTER NAME>" # EKS Cluster name
    region: "<CLUSTER REGION>" # EKS Cluster region

Tip

Paths to values are listed below in JSON notation (item.item.item).

Value

Description

replicaCount

Number of replica nodes of the Trifacta node to maintain as failovers.

image.repository

AWS path to the ECR image repository that you created.

Configure SSL

By default, SSL is enabled, and a certificate is required.

SSL certificate requirements:

  • SSL security is served through the AWS LoadBalancer that serves the Designer Cloud Powered by Trifacta platform.

    • For more information on the supported SSL configurations, see the values.yaml file provided in the Alteryx helm package.

  • The SSL certificate must be issued for the FQDN of the Designer Cloud Powered by Trifacta platform.

Value

Description

loadBalancer.ssl.certificateARN

The ARN for the SSL certificate in the AWS Certificate Manager.

The certificate ARN value references the ARN stored in the AWS Certificate Manager, or you can import your own certificate into ACM. For more information, see https://docs.aws.amazon.com/acm/latest/userguide/import-certificate.html.

To disable:

To disable SSL, please apply the following configuration changes:

loadBalancer:
    ssl:
        enabled: false

EFS Mount points

The following values are used to define the locations of the mount points for storing configuration and log data.

Note

You should have reserved at least 10 GB for each mount point.

Value

Description

nfs.conf.server

Host of the NFS server for the configuration mount point

nfs.conf.path

On the conf server, the path to the storage area. Default is the root location.

nfs.logs.server

Host of the NFS server for the logging mount point

nfs.logs.path

On the conf server, the path to the storage area. Default is the root location.

Databases

Value

Description

database.host

Host of the Amazon RDS databases

Note

All Alteryx databases must be hosted on the same RDS instance and available through the same port.

database.port

Port number through which to access the RDS databases. The default value is 5432.

database.type

The type of database. Please leave this value as postgresql.

trifacta-conf.json overrides

Below you can specify values that are applied to trifacta-conf.json, which is the platform configuration file. For more information on these settings, see Configure for AWS in the Configuration Guide.

Value

Description

triconfOverrides.aws.accountId

The AWS account identifier to use when connecting to AWS resources.

triconfOverrides.aws.credentialProvider

The type of credential provider to use for individuals authenticating to AWS resources.

Note

During installation, the platform is configured to use the same account to access AWS resources. Per-user authentication must be set up afterward.

Supported values:

  • default - credentials are submitted as an AWS key/secret combination.

  • temporary - credentials are submitted using the same IAM role for all users.

    Tip

    Using a temporary credential provider is recommended.

Details are below.

triconfOverrides.aws.systemIAMRole

When the credential provider is set to temporary, this value defines the system-wide IAM role to use to access AWS.

triconfOverrides.aws.s3.key

When the credential is set to default, this value defines the AWS key to use for authentication.

triconfOverrides.aws.s3.secret

When the credential is set todefault, this value defines the AWS secret for the AWS key.

triconfOverrides.aws.s3.bucket.name

The default S3 bucket to use.

Note

The AWS account must have read/write access to this bucket.

After the platform is operational, you can apply additional configuration changes to this file through the command line or through the application. For more information, see Platform Configuration Methods in the Configuration Guide.

Configure fluentd

When enabled, a separate set of fluentd pods is launched to collect and forward Alteryx logs.

Value

Description

fluentd.enabled

When set to true, a fluentd Statefulset is deployed to collect application logs.

You can specify value overrides to fluentd chart in the following manner:

fluentd:
        image:
                repository: fluent/fluentd-kubernetes-daemonset
                tag: "v1.10.4-debian-cloudwatch-1.0"

See charts/fluentd/values.yaml in the helm package for supported values.

fluentd-daemonset.enabled

When set totrue, a fluentd DaemonSet is deployed to collect node, Kubernetes, dataplane, and cluster logs.

If either of the above fluentd logging options is enabled, the following must be specified:

Value

Description

global.cluster.name

This value is the name of the EKS cluster that you created.

global.cluster.region

This value is the name of the region where the EKS cluster was created.

Configure fluentd

Optionally, you can enable fluentd to collect application logs.

Log destinations:

The logs source for fluentd logs is the Alteryx log directory.

The log destination must be configured. For more information on the fluentd output plugins, see https://www.fluentd.org/dataoutputs.

  1. Create a logdestination.conf configuration file containing a ConfigMap for your log destination:

    kubectl create configmap fluentd-log-destination --from-file logdestination.conf
  2. The logdestination.conf file must be in a fluentd configuration. Below, you can see an example logdestination.conf file, which pushes Alteryx logs to AWS Cloudwatch:

        <label @NORMAL>
          <match app.*>
            @type cloudwatch_logs
            @id out_cloudwatch_logs_application
            region "#{ENV.fetch('REGION')}"
            log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
            log_stream_name_key stream_name
            auto_create_stream true
            json_handler yajl
            <buffer>
              flush_interval 5
              chunk_limit_size 2m
              queued_chunks_limit_size 32
              retry_forever true
            </buffer>
          </match>
        </label>

    For more information on fluentd configuration file syntax, see https://docs.fluentd.org/configuration/config-file.

  3. When configured, the logdestination.conf file is added as an add-on to the prepackaged fluentd configuration for the Designer Cloud Powered by Trifacta platform.

Install

Install software

After you have configured the values override file, you can use the following command to install the deployment using helm:

helm install trifacta <trifacta-helm-package-tgz-file> --namespace <namespace> --values <path-to-values-override-file>

where:

  • trifacta-helm-package-tgz-file = the name of the Helm package that you downloaded from the Alteryx FTP site.

  • namespace= the AWS Kubernetes namespace value.

  • path-to-values-override-file = the path in your local environment to the values override file.

Acquire service URL

Use the following command to retrieve the service URL.

Note

The service URL is used to access the Trifacta Application, where you complete the configuration process. Users create Alteryx objects through the Trifacta Application.

kubectl get svc trifacta -o json | jq -r '.status.loadBalancer.ingress[0].hostname'

Verify access to the application

Copy and paste the service URL into a supported version of a supported web browser. For more information on supported web browsers, see Browser Requirements in the Planning Guide.

Tip

You can map CNAME/ALIAS against this service URL through Route53 configurations. For more information, see https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/getting-started.html.

The login screen for the Trifacta Application should be displayed. Login to the application using the admin credentials.

Warning

You should change the administrator password as soon as you log in. For more information, see Change Admin Password in the Admin Guide.

For more information, see Login.

Other Commands

Scale platform

Scale the number of Designer Cloud Powered by Trifacta platform pods through kubectl:

kubectl scale statefulset trifacta --replicas=<Desired number of pods>

Restart platform

Restart the Designer Cloud Powered by Trifacta platform through kubetcl:

kubectl rollout restart statefulset trifacta

Delete pods

Use the following to delete Alteryx pods:

kubectl delete statefulset trifacta

Backups

Database

By default, Amazon RDS performs periodic backups of your installed databases.

For more information on manual backup of the databases, see Backup and Recovery in the Admin Guide.

EFS mounts

For more information on backing up your EFS mounts through AWS, see https://docs.aws.amazon.com/efs/latest/ug/awsbackup.html.

Configuration

Set S3 as base storage layer

You must configure access to S3.

Note

If you are publishing to S3, 50 GB or more is recommended for storage per node. Additional disk space should be reserved for a higher number of concurrent users or larger data volumes. You can also enable fast upload to decrease disk requirements.

For more information, see S3 Access in the Configuration Guide.

S3 must be set as the base storage layer. For more information, see Set Base Storage Layer in the Configuration Guide.

Upload the license file

When the platform is first installed, a temporary license is provided. This license key must be replaced by the license key that was provided to you. For more information, see License Key in the Admin Guide.

Configure for EMR

Additional configuration is required to enable the Designer Cloud Powered by Trifacta platform to run jobs on the EMR cluster. For more information, see Configure for EMR in the Configuration Guide.