Control plane on GCP
This guide covers deploying the Union.ai control plane in a GCP environment as part of a self-hosted deployment.
Self-hosted intra-cluster deployment is currently officially supported on AWS only. GCP support is in preview and additional cloud providers are coming soon. For production deployments, see Control plane on AWS.
Prerequisites
In addition to the general prerequisites, you need:
- Cloud SQL PostgreSQL instance (12+)
- GCS buckets for control plane metadata and artifacts storage
- GCP service accounts configured with Workload Identity for control plane services and artifacts
Installation
Step 1: Install prerequisites
Install ScyllaDB CRDs (if using embedded ScyllaDB)
cd helm-charts/charts/controlplane
./scripts/install-scylla-crds.shAdd Helm repositories
helm repo add unionai https://unionai.github.io/helm-charts/
helm repo add flyte https://helm.flyte.org
helm repo updateStep 2: Create registry image pull secret
Create the registry secret in the control plane namespace:
kubectl create namespace <controlplane-namespace>
kubectl create secret docker-registry union-registry-secret \
--docker-server="registry.unionai.cloud" \
--docker-username="<REGISTRY_USERNAME>" \
--docker-password="<REGISTRY_PASSWORD>" \
-n <controlplane-namespace>The registry username typically follows the format robot$<org-name>.
Note the backslash escape (\$) before the $ character in the username when running in a shell.
Contact Union.ai support if you haven’t received your registry credentials.
Step 3: Generate TLS certificates
gRPC requires TLS for HTTP/2 with NGINX. You can use self-signed certificates for intra-cluster communication.
Step 4: Create database password secret
kubectl create secret generic <controlplane-secrets> \
--from-literal=pass.txt='<YOUR_DB_PASSWORD>' \
-n <controlplane-namespace>The secret must contain a key named pass.txt with the database password.
The default secret name is set in your Helm values.
Step 5: Download values files
curl -O https://raw.githubusercontent.com/unionai/helm-charts/main/charts/controlplane/values.gcp.selfhosted-intracluster.yaml
curl -O https://raw.githubusercontent.com/unionai/helm-charts/main/charts/controlplane/values.registry.yamlCreate an overrides file values.gcp.selfhosted-overrides.yaml:
global:
GCP_REGION: "us-central1"
DB_HOST: "10.247.0.3"
DB_NAME: "unionai"
DB_USER: "unionai"
BUCKET_NAME: "my-company-cp-flyte"
ARTIFACTS_BUCKET_NAME: "my-company-cp-artifacts"
ARTIFACT_IAM_ROLE_ARN: "artifacts@my-project.iam.gserviceaccount.com"
FLYTEADMIN_IAM_ROLE_ARN: "flyteadmin@my-project.iam.gserviceaccount.com"
UNION_ORG: "my-company"
GOOGLE_PROJECT_ID: "my-gcp-project"To enable authentication, add the OIDC configuration to this file. See the Authentication guide.
Step 6: Install control plane
helm upgrade --install unionai-controlplane unionai/controlplane \
--namespace <controlplane-namespace> \
--create-namespace \
-f values.gcp.selfhosted-intracluster.yaml \
-f values.registry.yaml \
-f values.gcp.selfhosted-overrides.yaml \
--timeout 15m \
--waitValues file layers (applied in order):
-
values.gcp.selfhosted-intracluster.yaml— GCP infrastructure defaults (database, storage, networking) -
values.registry.yaml— Registry configuration and image pull secrets values.gcp.selfhosted-overrides.yaml— Your environment-specific overrides
Step 7: Verify installation
# Check pod status
kubectl get pods -n <controlplane-namespace>
# Verify services are running
kubectl get svc -n <controlplane-namespace>
# Check admin service logs
kubectl logs -n <controlplane-namespace> deploy/<admin-service> --tail=50
# Test internal connectivity
kubectl exec -n <controlplane-namespace> deploy/<admin-service> -- \
curl -k https://<controlplane-ingress>.<controlplane-namespace>.svc.cluster.localAll pods should be in Running state and internal connectivity should succeed.
Replace <controlplane-namespace> with your Helm release namespace (the namespace you used during helm install). Replace <admin-service> and <controlplane-ingress> with the actual deployment names from kubectl get deploy -n <controlplane-namespace>.
Key configuration
Single-tenant mode
Self-hosted deployments use single-tenant mode with an explicit organization:
global:
UNION_ORG: "my-company"TLS
Configure the namespace and name of the Kubernetes TLS secret:
global:
TLS_SECRET_NAMESPACE: "<controlplane-namespace>"
TLS_SECRET_NAME: "controlplane-tls-cert"
ingress-nginx:
controller:
extraArgs:
default-ssl-certificate: "<controlplane-namespace>/controlplane-tls-cert"Service discovery
Control plane services discover each other via Kubernetes DNS:
- Admin service:
<admin-service>.<controlplane-namespace>.svc.cluster.local:81 - NGINX Ingress:
<controlplane-ingress>.<controlplane-namespace>.svc.cluster.local - Data plane (for dataproxy):
<dataplane-ingress>.<dataplane-namespace>.svc.cluster.local
Next steps
Troubleshooting
Control plane pods not starting
kubectl describe pod -n <controlplane-namespace> <pod-name>
kubectl top nodes
kubectl get secret -n <controlplane-namespace>TLS/Certificate errors
kubectl get secret controlplane-tls-cert -n <controlplane-namespace>
kubectl get secret controlplane-tls-cert -n <controlplane-namespace> \
-o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -text -noout
kubectl logs -n <controlplane-namespace> deploy/<controlplane-ingress>Database connection failures
# Verify credentials
kubectl get secret <controlplane-secrets> -n <controlplane-namespace> \
-o jsonpath='{.data.pass\.txt}' | base64 -d
# Test connectivity
kubectl run -n <controlplane-namespace> test-db --image=postgres:14 --rm -it -- \
psql -h <DB_HOST> -U <DB_USER> -d <DB_NAME>Workload Identity issues
# Verify service account annotations
kubectl get sa -n <controlplane-namespace> -o yaml | grep iam.gke.io/gcp-service-account
# Check IAM bindings
gcloud iam service-accounts get-iam-policy <SERVICE_ACCOUNT_EMAIL>
# Verify pod can authenticate
kubectl exec -n <controlplane-namespace> deploy/<admin-service> -- \
curl -H "Metadata-Flavor: Google" \
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/emailData plane cannot connect to control plane
# Verify service endpoints
kubectl get svc -n <controlplane-namespace> | grep -E 'admin\|nginx-controller'
# Test DNS resolution from data plane namespace
kubectl run -n <dataplane-namespace> test-dns --image=busybox --rm -it -- \
nslookup <controlplane-ingress>.<controlplane-namespace>.svc.cluster.local
# Check network policies
kubectl get networkpolicies -n <controlplane-namespace>
kubectl get networkpolicies -n <dataplane-namespace>