Private Data Handling
Private data handling is a capability in Alteryx Analytics Cloud (AAC) that allows you to store your data and run data processing jobs in your own cloud infrastructure. Private data handling provides more security and control for those with sensitive data. It also results in improved performance and reduced egress costs by moving the processing in AACAAC next to your data.
Avertissement
Never delete resources provisioned for private data processing.
Overview
At the highest level, AACAAC differentiates between customer data and application metadata.
Customer data belongs to you. This is any data from one of your data sources and any data derived from it. This includes records from your databases, spreadsheets, shared drives, and data warehouses that you want to join and merge, prep and blend, analyze, and train models on. It also includes outputs, reports, and datasets created from those records.
Application metadata is everything else. This is the data that AACAAC needs to do the jobs you ask it to do. This includes workspace layout and configuration, user sign-in, roles and permissions, shared assets, workflow names, and tool configuration. Some specific examples include:
Designer Cloud tool configurations and layouts.
Auto Insights Reports text inputs and analytics parameters (filter values, column names).
App Builder dropdown box contents.
User-generated content such as comments on Magic Reports.
AACAAC uses a split-plane architecture and has divided responsibility for these 2 kinds of data into different planes to provide more flexibility to customers. These 2 planes are the control plane and the data plane.
Plane | Description |
---|---|
Control Plane | The control plane powers the user's design time experience, acts as the command and control center, and stores application metadata. |
Data Plane | The data plane is responsible for the persistent storage and processing of customer data. Persistent Storage Alteryx Analytics Cloud utilizes file and relational data stores for long-term storage of customer data. File storage is used for:
Relational storage
Processing Alteryx Analytics Cloud does many jobs that fall under the processing category, for example:
|
Private data handling allows you to run some or all of the data plane on your own infrastructure, giving you options for where data is stored and processed. This is comprised of two capabilities:
Private file storage: Use AACAAC to replace the Alteryx file store with your own cloud storage bucket. Once configured, all persistent file storage of customer data happens on your own disk. Private file storage supports AWS S3, Azure ADLS, and Google Cloud Storage.
Private data processing: This capability is conceptually similar to private file storage but applies to relational storage and processing. You can first configure your VPC, then tell Analytics Cloud to deploy a full data processing environment there. Once configured, all other data plane activities above are executed within your VPC.
Feature availability:
Feature | Availability |
---|---|
Private Data Storage |
|
Private Data Processing |
|
Architecture
When you configure private data processing for your workspace, the AACAAC control plane will initiate interactions with your private data store and the data processing environment inside your VPC. The environment also interacts with your data sources to connect to and retrieve data from them.
In some cases, customer data may be processed in the control plane, however it is never stored or cached there beyond the duration of a session with a maximum retention of 1 hour. Examples:
At Designer Cloud design-time, sample data is inspected and formatted in the control plane (for delimiter and header detection, column name and type inference, and the transform by example capability).
Auto Insights dataset ingestion does data inferencing and metadata extraction in the control plane.
Auto Insights does post-processing of query results to classify, transform, render, and present data.
Emails and PDF reports are generated in the control plan.
Gen AI prompts can include customer data and are generated in the control plane.
Gen AI responses can include customer data and are parsed and transformed in the control plane.
Customer data is never stored or cached within the control plane, though at design-time sample data might be inspected and formatted there (for delimiter and header detection, column name and type inference, and the transform by example capability). If you choose to download generated PDF reports, this also happens in the control plane.
Avis
LLM Disclaimer
Data sent to an LLM is outside the scope of this document. Alteryx cannot control how data is stored or processed once sent to an LLM. You can refer to the provider’s documentation for specifics on how they handle prompt data. Alteryx uses both Azure Open AI and Google Gemini for AI-powered features such as Alteryx Copilot and Auto Insights Playbooks.
Data Security
Alteryx offers a downloadable whitepaper that covers private data handling privacy and security in depth. You can find a link to this document at alteryx.com/trust in the Private Data Handling section.
For convenience, these are a few highlights for encryption of data in transit and at rest:
Data in transit between browser <=> control plane and control plane <=> data plane is encrypted with TLS 1.3 encryption.
Alteryx utilizes mTLS encryption for intra-cluster communications.
File storage and database credentials are stored in a database in the control plane encrypted with 256-bit AES block ciphers.
Envelope encryption is applied to these credentials before they are passed from the control plane to the data plane and made available to job pods as Kubernetes secrets.
The Private key used to decrypt the encrypted credentials is stored in the cloud provider's secret manager in the data plane and mounted into the AYX cluster using the external-secrets operator.
Workloads access secrets in the secret manager through a Kubernetes ServiceAccount.
Upgrades
One benefit of software-as-a-service is that you don't have to worry about upgrades. AACAAC manages upgrades for you.
Software upgrades for long-running services and ephemeral jobs are managed for you. When new versions of the software are available, new container images will be pushed out to our image repositories. Alteryx Analytics Cloud will retrieve these new image versions and seamlessly begin using them within the cluster without disrupting any running jobs.
Alteryx also manages infrastructure upgrades on your behalf.
Metrics Collection
AACAAC uses Datadog to collect application monitoring usage data to monitor and maintain operational stability. The Datadog agent collects these metrics:
Telemetry Metrics from the kubernetes cluster, storage bucket, spark processor (when enabled), and compute nodes.
Custom logs from the services in the processing cluster.
Cloud provider logs (for example, AWS Cloudwatch and Azure Monitor) for the public cloud-managed services used.
Configure Private Data Handling
There are two steps to configuring private data handling: first, configure private file storage, and then configure private data processing.
Private File Storage
Alteryx Data Store (ADS) is the Alteryx file store. This is the default storage location for all newly created workspaces.
Private file storage allows you to replace ADS with your own file store. Anything saved in ADS will be inaccessible after doing this. End-users will have the smoothest experience if you configure this before they start using the workspace.
Private file storage supports AWS S3, Azure ADLS, and Google Cloud Storage (GCS) as storage providers.
Once you have configured private file storage in the cloud provider of your choice, you may proceed to configure private file processing in that same cloud provider.
For setup instructions, refer to one of the following:
Private Data Processing
Private data processing allows you to run Alteryx data processing within your own VPC. To configure this capability, you must complete the setup steps to prepare your VPC to run Alteryx data processing. Each Alteryx Analytics Cloud product has separate setup instructions. You can run multiple products in the same data plane after completing the setup for each product.
After completing the VPC setup, you’ll log in to Alteryx Analytics Cloud and turn on private data processing for any solutions you want to use in your workspace.
Alteryx recommends using a dedicated account and VPC for the best security and stability, though other configurations are possible.
For more information on private data processing, including the shared responsibility model, required cloud resources for different applications, regional availability, and more, refer to Private Data Processing.
~~~~~After you enable private data processing, there might be additional setup steps needed depending on your solution. For example, after you’ve turned on private data processing for Designer Cloud, you’ll need to update your private data storage permissions to allow the data processing cluster to access your data store.
Alteryx recommends using a dedicated account and VPC for the best security and stability, although other configurations are possible.
In this step, select a region and an account. Then create a VPC, subnets, route tables, and the permissions that allow AACAAC to create and manage the data processing infrastructure and software. You'll also grant AACAAC permission to spin up the cluster, kick off the provisioning process, and update a trust relationship between your new processing cluster and your private data storage bucket.
For more information on private data processing, including the shared responsibility model, required cloud resources for different apps, regional availability and more, go to Private Data Processing
Follow these guides to set up private data processing based on you cloud provider...
Known Limitations
These are some known limitations to private data processing:
Each workspace can be attached to only one data plane.
Some Alteryx Analytics Cloud apps such as App Builder and Location Intelligence are not yet compatible with private data processing and will be disabled in a workspace where private data processing is enabled.
Using SSH Tunneling with connectors is not yet supported in a workspace with private data processing.