Control plane on GCP

This guide covers deploying the Union.ai control plane in a GCP environment as part of a self-hosted deployment.

Self-hosted intra-cluster deployment is currently officially supported on AWS only. GCP support is in preview and additional cloud providers are coming soon. For production deployments, see Control plane on AWS.

Prerequisites

In addition to the general prerequisites, you need:

  1. Cloud SQL PostgreSQL instance (12+)
  2. GCS buckets for control plane metadata and artifacts storage
  3. GCP service accounts configured with Workload Identity for control plane services and artifacts

Installation

Step 1: Install prerequisites

Install ScyllaDB CRDs (if using embedded ScyllaDB)

cd helm-charts/charts/controlplane
./scripts/install-scylla-crds.sh

Add Helm repositories

helm repo add unionai https://unionai.github.io/helm-charts/
helm repo add flyte https://helm.flyte.org
helm repo update

Step 2: Create registry image pull secret

Create the registry secret in the control plane namespace:

kubectl create namespace <controlplane-namespace>

kubectl create secret docker-registry union-registry-secret \
  --docker-server="registry.unionai.cloud" \
  --docker-username="<REGISTRY_USERNAME>" \
  --docker-password="<REGISTRY_PASSWORD>" \
  -n <controlplane-namespace>

The registry username typically follows the format robot$<org-name>. Note the backslash escape (\$) before the $ character in the username when running in a shell. Contact Union.ai support if you haven’t received your registry credentials.

Step 3: Generate TLS certificates

gRPC requires TLS for HTTP/2 with NGINX. You can use self-signed certificates for intra-cluster communication.

OpenSSL (self-signed)cert-manager (recommended) ```shell openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout controlplane-tls.key \ -out controlplane-tls.crt \ -subj "/CN=..svc.cluster.local" kubectl create secret tls controlplane-tls-cert \ --key controlplane-tls.key \ --cert controlplane-tls.crt \ -n ``` For production deployments, use cert-manager with a self-signed `ClusterIssuer` or your organization's CA. See the `extraObjects` section in [`values.gcp.selfhosted-intracluster.yaml`](https://github.com/unionai/helm-charts/blob/main/charts/controlplane/values.gcp.selfhosted-intracluster.yaml) for an example configuration.

Step 4: Create database password secret

kubectl create secret generic <controlplane-secrets> \
  --from-literal=pass.txt='<YOUR_DB_PASSWORD>' \
  -n <controlplane-namespace>

The secret must contain a key named pass.txt with the database password. The default secret name is set in your Helm values.

Step 5: Download values files

curl -O https://raw.githubusercontent.com/unionai/helm-charts/main/charts/controlplane/values.gcp.selfhosted-intracluster.yaml

curl -O https://raw.githubusercontent.com/unionai/helm-charts/main/charts/controlplane/values.registry.yaml

Create an overrides file values.gcp.selfhosted-overrides.yaml:

global:
  GCP_REGION: "us-central1"
  DB_HOST: "10.247.0.3"
  DB_NAME: "unionai"
  DB_USER: "unionai"
  BUCKET_NAME: "my-company-cp-flyte"
  ARTIFACTS_BUCKET_NAME: "my-company-cp-artifacts"
  ARTIFACT_IAM_ROLE_ARN: "artifacts@my-project.iam.gserviceaccount.com"
  FLYTEADMIN_IAM_ROLE_ARN: "flyteadmin@my-project.iam.gserviceaccount.com"
  UNION_ORG: "my-company"
  GOOGLE_PROJECT_ID: "my-gcp-project"

To enable authentication, add the OIDC configuration to this file. See the Authentication guide.

Step 6: Install control plane

helm upgrade --install unionai-controlplane unionai/controlplane \
  --namespace <controlplane-namespace> \
  --create-namespace \
  -f values.gcp.selfhosted-intracluster.yaml \
  -f values.registry.yaml \
  -f values.gcp.selfhosted-overrides.yaml \
  --timeout 15m \
  --wait

Values file layers (applied in order):

  1. values.gcp.selfhosted-intracluster.yaml — GCP infrastructure defaults (database, storage, networking)
  2. values.registry.yaml — Registry configuration and image pull secrets
  3. values.gcp.selfhosted-overrides.yaml — Your environment-specific overrides

Step 7: Verify installation

# Check pod status
kubectl get pods -n <controlplane-namespace>

# Verify services are running
kubectl get svc -n <controlplane-namespace>

# Check admin service logs
kubectl logs -n <controlplane-namespace> deploy/<admin-service> --tail=50

# Test internal connectivity
kubectl exec -n <controlplane-namespace> deploy/<admin-service> -- \
  curl -k https://<controlplane-ingress>.<controlplane-namespace>.svc.cluster.local

All pods should be in Running state and internal connectivity should succeed.

Replace <controlplane-namespace> with your Helm release namespace (the namespace you used during helm install). Replace <admin-service> and <controlplane-ingress> with the actual deployment names from kubectl get deploy -n <controlplane-namespace>.

Key configuration

Single-tenant mode

Self-hosted deployments use single-tenant mode with an explicit organization:

global:
  UNION_ORG: "my-company"

TLS

Configure the namespace and name of the Kubernetes TLS secret:

global:
  TLS_SECRET_NAMESPACE: "<controlplane-namespace>"
  TLS_SECRET_NAME: "controlplane-tls-cert"

ingress-nginx:
  controller:
    extraArgs:
      default-ssl-certificate: "<controlplane-namespace>/controlplane-tls-cert"

Service discovery

Control plane services discover each other via Kubernetes DNS:

  • Admin service: <admin-service>.<controlplane-namespace>.svc.cluster.local:81
  • NGINX Ingress: <controlplane-ingress>.<controlplane-namespace>.svc.cluster.local
  • Data plane (for dataproxy): <dataplane-ingress>.<dataplane-namespace>.svc.cluster.local

Next steps

  1. Deploy the data plane
  2. Configure authentication

Troubleshooting

Control plane pods not starting

kubectl describe pod -n <controlplane-namespace> <pod-name>
kubectl top nodes
kubectl get secret -n <controlplane-namespace>

TLS/Certificate errors

kubectl get secret controlplane-tls-cert -n <controlplane-namespace>
kubectl get secret controlplane-tls-cert -n <controlplane-namespace> \
  -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -text -noout
kubectl logs -n <controlplane-namespace> deploy/<controlplane-ingress>

Database connection failures

# Verify credentials
kubectl get secret <controlplane-secrets> -n <controlplane-namespace> \
  -o jsonpath='{.data.pass\.txt}' | base64 -d

# Test connectivity
kubectl run -n <controlplane-namespace> test-db --image=postgres:14 --rm -it -- \
  psql -h <DB_HOST> -U <DB_USER> -d <DB_NAME>

Workload Identity issues

# Verify service account annotations
kubectl get sa -n <controlplane-namespace> -o yaml | grep iam.gke.io/gcp-service-account

# Check IAM bindings
gcloud iam service-accounts get-iam-policy <SERVICE_ACCOUNT_EMAIL>

# Verify pod can authenticate
kubectl exec -n <controlplane-namespace> deploy/<admin-service> -- \
  curl -H "Metadata-Flavor: Google" \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email

Data plane cannot connect to control plane

# Verify service endpoints
kubectl get svc -n <controlplane-namespace> | grep -E 'admin\|nginx-controller'

# Test DNS resolution from data plane namespace
kubectl run -n <dataplane-namespace> test-dns --image=busybox --rm -it -- \
  nslookup <controlplane-ingress>.<controlplane-namespace>.svc.cluster.local

# Check network policies
kubectl get networkpolicies -n <controlplane-namespace>
kubectl get networkpolicies -n <dataplane-namespace>