Machine Learning in GCP
Follow this guide to deploy the Machine Learning module for Google Cloud Platform (GCP) private data processing.
Prerequisite
Before you deploy the Machine Learning module, you must complete these steps on the Set Up GCP Project and VPC for Private Data page...
Configured a VPC dedicated to AACAAC as mentioned in the Configure Virtual Private Network section.
Service account and base IAM roles attached to the service account as mentioned in the Configure IAM section.
Successfully triggered private data processing provisioning as mentioned in the Trigger Private Data Handling Provisioning section.
Project Setup
Step 1: Configure IAM
Step 1a: IAM Binding to the Service Account
Assign these additional roles to the aac-automation-sa
service account that you created during Set Up GCP Project and VPC for Private Data:
Compute Load Balancer Admin:
roles/compute.loadBalancerAdmin
Compute Instance Admin (v1):
roles/compute.instanceAdmin.v1
Compute Storage Admin:
roles/compute.storageAdmin
Kubernetes Engine Cluster Admin:
roles/container.clusterAdmin
Storage Admin:
roles/storage.admin
Cloud Memorystore Redis Admin:
roles/redis.admin
Step 2: Configure Subnet
注意
If you purchased Designer Cloud and Machine Learning, then configure the subnets as mentioned in the Designer Cloud setup guide. Both Designer Cloud and Machine Learning resources share the same subnets.
Machine Learning in the private data processing environment requires 2 subnets.
aac-gke-node (required): The GKE cluster uses this subnet to execute Alteryx software jobs (connectivity, conversion, processing, publishing).
aac-public (required): This group doesn’t run any services, but the
gke_node group
uses it for egress out of the cluster.
Step 2a: Create Subnets in the VPC
Configure subnets in the aac-vpc
VPC.
Follow this example to create subnets with subnet name, subnet size, and other configurations (modify values, as needed, to meet your network architecture).
Subnet Name | Subnet | Secondary Subnet Name | Secondary Subnet Size | Notes |
---|---|---|---|---|
aac-gke-node | 10.0.0.0/22 | aac-gke-pod | 10.4.0.0/14 | GKE cluster, GKE pod, and GKE service subnets. |
| aac-gke-service | 10.64.0.0/20 |
| |
aac-public | 10.10.1.0/25 | N/A | N/A | Public egress. |
重要
The subnet IP addresses and sizes in the table are shown as an example. Modify values, as needed, to meet your network architecture. Subnet region must be the region where ‘Private data Handling’ is to be provisioned.
The subnet name MUST match with the name as shown in the table.
Step 2b: Subnet Route Table
Create the route table for your subnets.
重要
You must configure the Vnet with a network connection to the internet in your subscription.
注意
This route table is an example.
Address Prefix | Next Hop Type |
---|---|
/22 CIDR Block (aac-gke-node) | aac-vpc |
/24 CIDR Block (aac-private) | aac-vpc |
/25 CIDR Block (aac-public) | aac-vpc |
0.0.0.0/0 | <gateway_ID> |
注意
Your <gateway id>
can be either a NAT gateway or an internet gateway, depending on your network architecture.
Private Data Processing
小心
如果在预配了私有数据处理后修改或删除任何 AAC 预配的公有云资源,则会导致状态不一致。这种不一致性会导致在作业执行时出错,或取消预配好的私有数据平面处理。
Step 1: Trigger Machine Learning Deployment
Data processing provisioning triggers from the Admin Console inside AACAAC. You need Workspace Admin privileges within a workspace in order to see it.
From the AACAAC landing page, select the Profile menu and then select Workspace Admin.
From the Admin Console, select Private Data Handling and then select Processing.
Select the Machine Learning checkbox and then select Update.
Selecting Update triggers the deployment of the cluster and resources in the GCP project. This runs a set of validation checks to verify the correct configuration of the GCP project.
注意
The provisioning process takes approximately 35–40 minutes to complete.
After the provisioning completes, you can view the created resources (for example, VM instances and node groups) through the GCP console. It is very important that you don't modify them on your own. Manual changes might cause issues with the function of the private data processing environment.