Landing Zone GCP Enterprise Foundation

Hierarchy

  • Org → Folders → Projects → Resources
  • Policies inherit downward
  • Use folders per environment (dev/staging/prod)

Shared VPC

  • Host project (network) + Service projects (workloads)
  • Centralized Cloud NAT, FW rules, VPN/Interconnect
  • Service projects get subnets via Shared VPC admin

Org Policies

  • constraints/compute.requireOsLogin
  • constraints/iam.allowedPolicyMemberDomains
  • constraints/compute.vmExternalIpAccess
  • Domain restricted sharing

VPC Service Controls

  • Perimeter around managed services (GCS, BQ, etc.)
  • Prevents data exfiltration
  • Dry-run → Enforced
  • Ingress/Egress rules for on-prem access

Checklist

  • Org + Folders created
  • Shared VPC host/service project setup
  • Cloud Interconnect/VPN to on-prem
  • Cloud DNS forwarding + on-prem resolver
  • IAM custom roles at org/folder level
  • VPC Service Controls perimeter
  • Cloud Armor + FW rules
  • Logging: aggregation sink to BQ + GCS
Hybrid Networking GCP ↔ On-Prem

Hub-and-Spoke Topology

  • Hub VPC hosts Shared VPC, VPN/Interconnect
  • Spoke VPCs peer with hub (no transitive peering!)
  • Use Network Connectivity Center for large scale

Cloud Interconnect

  • Dedicated: 10/100 Gbps, physical cross-connect
  • Partner: via supported provider (Equinix, Megaport)
  • VLAN attachments → Cloud Router → BGP

Cloud VPN

  • HA VPN: 2 tunnels (two Cloud Router IPs)
  • Classic VPN: single tunnel (deprecated)
  • BGP dynamic routing recommended

Cloud Router & BGP

  • Advertise VPC subnets via BGP
  • Custom route advertisements
  • Global dynamic routing for multi-region

Key Design Pattern

On-Prem ──[BGP]── Cloud Router ── Shared VPC Hub ── Spoke VPC (Prod) ── Spoke VPC (Dev) ── PSC to managed services
Private Service Connect PSC

Concept

  • Producer publishes service via Service Attachment
  • Consumer creates PSC endpoint (internal IP) in own VPC
  • Traffic stays on Google network — never leaves it

vs VPC Peering

  • PSC: consumer gets IP from its own range
  • Peering: full bi-directional connectivity, no IP overlap allowed
  • PSC: consumer/producer IPs can overlap!

Hands-on Commands

# Producer: create service attachment gcloud compute service-attachments create SA_NAME \ --region=us-central1 \ --producer-forwarding-rule=FWD_RULE \ --nat-subnets=NAT_SUBNET # Consumer: create PSC endpoint gcloud compute addresses create PSC_IP --region=us-central1 \ --subnet=consumer-subnet gcloud compute forwarding-rules create FR_NAME \ --region=us-central1 \ --target-service-attachment=projects/PROD/regions/.../SA_NAME \ --address=PSC_IP

Use Cases

  • Access GCP managed services (GCS, Bigtable) privately
  • Expose your internal API privately to consumers
  • Multi-tenant SaaS on GCP

DNS w/ PSC

  • Create private DNS zone with A record pointing to PSC IP
  • Attach to consumer VPC
  • On-prem: DNS forwarding zone → Cloud DNS → resolves PSC
Hybrid DNS GCP ↔ On-Prem Resolution

Cloud DNS → On-Prem

  • Create DNS forwarding zone in Cloud DNS
  • Forward onprem.example.com → on-prem DNS server IP
  • Uses VPN/Interconnect path (private zone)

On-Prem → Cloud DNS

  • On-prem DNS forwards *.gcp.example.com → Cloud DNS inbound endpoint
  • Cloud DNS inbound server policy → bound to VPC

End-to-End Flow

On-Prem App ──→ On-Prem DNS └── .gcp.example.com? ──→ Cloud DNS Inbound EP └── gcp.internal? ──→ resolves via Private Zone └── psc.example.com? ──→ PSC private zone → PSC IP

Cloud DNS Inbound Setup

gcloud dns inbound-server-policy create POLICY \ --network=NETWORK \ --private-alternative-ip-utilization

DNS Peering

  • Peer zones between projects without forwarding
  • Use for Shared VPC service project resolution
IAM & Policy Identity & Access Management

IAM Types

  • Primitive: Owner/Editor/Viewer (do not use)
  • Predefined: e.g. roles/storage.admin
  • Custom: least privilege, scoped to permissions

IAM Conditions

  • Attribute-based: resource type, IP range, time
  • resource.matchTag, request.time
  • Evaluated at access time

Deny Policies

  • Explicit deny overrides any allow
  • Org-level deny: iam.deny admin role
  • Use for break-glass scenarios

Best Practices

  • Use groups, not individual users
  • Custom roles at folder level, not project
  • Audit with Cloud Asset Inventory
  • Policy Analyzer: gcloud policies analyze

Key Terms

  • Principal = who (user, group, SA, workspace domain)
  • Role = collection of permissions
  • Policy = bind principal + role [+ condition] to resource
  • Resource Manager = org/folder/project hierarchy
Network Security

Firewall Rules (VPC FW)

  • Ingress + Egress, stateful
  • Priority: 0-65535 (lower = higher)
  • Implicit deny-all at end
  • Network tags for targeted rules
  • Service accounts as source/dest

Cloud Armor

  • WAF + DDoS protection at LB edge
  • Pre-configured rules: XSS, SQLi, L7 DDoS
  • Custom rules: rate limiting, IP block, geo
  • Can do bot management (reCAPTCHA)

VPC Firewall Rules

gcloud compute firewall-rules create allow-internal \ --network=my-vpc \ --priority=1000 \ --direction=INGRESS \ --source-ranges=10.0.0.0/8 \ --rules=tcp:0-65535,udp:0-65535

Private Google Access

  • VMs without external IP access GCP APIs privately
  • Enabled per subnet
  • Routes via default internet gateway (Google-only)

Security Layers (Defense in Depth)

  • Edge: Cloud Armor (WAF) + Google Front End
  • VPC: FW rules + Private Google Access + VPC SC
  • Workload: IAM + Workload Identity + gVisor (Sandbox)
  • Data: CMEK, CSEK, DLP
Kubernetes Architecture GKE & Core Concepts

Control Plane Components

  • API Server: front-end, auth, validation
  • etcd: distributed key-value store
  • Scheduler: assigns pods to nodes
  • Controller Manager: watches state, reconciles

Worker Node Components

  • kubelet: agent, communicates with API server
  • kube-proxy: network rules, iptables/ipvs
  • Container Runtime: containerd

GKE Specific

  • Autopilot vs Standard (node management)
  • Workload Identity: K8s SA → GCP IAM (no keys!)
  • GKE Dataplane V2 (eBPF, Cilium-based)
  • Node auto-upgrade, auto-repair

Workload Resources

  • Deployment: stateless apps, rolling update
  • StatefulSet: stable identities, sticky storage
  • DaemonSet: one pod per node (logging, monitoring)
  • Job/CronJob: batch processing

Networking Model

  • Every pod gets a real IP (flat network)
  • No NAT between pods (CNI: e.g. Calico, Cilium)
  • Service types: ClusterIP, NodePort, LoadBalancer, ExternalName
  • Ingress (L7) and Gateway API (L7/L4)
Helm Kubernetes Package Manager

Core Concepts

  • Chart: package (templates + values + deps)
  • Release: running instance of a chart
  • Repository: chart storage (OCI, HTTP)
  • Values: values.yaml overridable with --set

Helm 3 Changes

  • No Tiller (pure client-side)
  • 3-way strategic merge for upgrades
  • Release info stored as Secrets
  • OCI-based chart registry support

Common Commands

helm create my-chart # scaffold helm lint ./my-chart # validate helm template . --values prod.yaml # render locally helm install my-release ./my-chart -f prod.yaml helm upgrade --install my-release ./my-chart -f prod.yaml helm rollback my-release 2 # rollback to revision 2 helm list -A # list all releases

Chart Structure

my-chart/ Chart.yaml # metadata values.yaml # default values templates/ # Go templates charts/ # subcharts (deps) crds/ # CRDs

Hooks

  • pre/post-install, upgrade, delete, rollback
  • Useful for: DB migration, config validation
  • Annotation: helm.sh/hook: post-install
Microservices Deployment on K8s

Design Patterns

  • Sidecar: logging, proxy alongside app (e.g. Istio envoy)
  • Ambassador: proxy as sidecar for external access
  • Adapter: normalize monitoring output
  • Circuit Breaker: fail-fast, retry w/ backoff

Service Mesh (Istio)

  • Envoy sidecar proxies intercept traffic
  • mTLS between services (mutual TLS)
  • Traffic splitting (canary, blue-green)
  • Observability: traces, metrics, logs

Deployment Strategies

  • Rolling: gradual replacement (default K8s)
  • Blue-Green: two full environments, switch DNS/LB
  • Canary: small % traffic to new version, monitor, ramp
  • Feature flags: toggle on/off without deploy
CI/CD Pipeline Azure DevOps & GitHub Actions

Azure DevOps Pipeline

  • azure-pipelines.yml in repo root
  • Stages: Build → Test → Deploy
  • Environments: dev → staging → prod (approval gates)
  • Variables + Variable Groups + Key Vault integration
  • Service Connection → GCP (Workload Identity Federation)

GitHub Actions

  • .github/workflows/*.yml
  • Events: push, PR, schedule, workflow_dispatch
  • OIDC to GCP (no static creds)
  • Matrix builds, caches, artifacts

Pipeline Flow (Full)

[PR] → Lint + Unit Test + SonarQube + Checkov (IaC) ↓ merge to main [Push] → Build → Tag (semver) → Push to Artifact Registry ↓ [Terraform] → terraform plan (PR) → terraform apply (merge) ↓ [Deploy] → Helm upgrade --install → Smoke test ↓ [Prod] → Approval gate → Canary → 100% rollout

Key Practices

  • Trunk-based: short-lived branches, main is deployable
  • GitFlow: develop → release branches, for larger teams
  • Semantic versioning: vMAJOR.MINOR.PATCH
  • SBOM generation in pipeline (Trivy)

GitHub / Azure DevOps Integration

  • GitHub repos ↔ Azure Boards (cross-link)
  • GitHub Advanced Security (code scanning, secret scanning)
  • Branch protection rules: require PR, status checks
Terraform IaC for GCP

Core Workflow

  • terraform init (provider, backend, modules)
  • terraform plan (preview, output to file)
  • terraform apply (execute plan)
  • terraform destroy

State Management

  • Backend: GCS bucket (terraform state)
  • Locking: GCS supports object lock
  • terraform state list, terraform state mv
  • Remote state data source: terraform_remote_state

GCP Provider

terraform { backend "gcs" { bucket = "my-tfstate-bucket" prefix = "prod/network" } required_providers { google = { source = "hashicorp/google" version = "~> 5.0" } } } provider "google" { project = var.project_id region = var.region }

Best Practices

  • Modularize: modules/ directory
  • Env segregation: envs/dev/, envs/prod/
  • Use .tfvars files per env
  • Pin provider version
  • terraform plan in CI PR checks

Import & Move

  • terraform import google_container_cluster.my_cluster PROJECT/location/name
  • terraform state rm to remove from state
  • terraform state mv to restructure state
Ansible Configuration Management

Core Concepts

  • Playbook: YAML with plays (hosts + tasks)
  • Inventory: hosts file or dynamic (gcp_compute)
  • Roles: reusable structured playbooks
  • Modules: idempotent units (e.g. copy, template, package)

GCP Dynamic Inventory

# requirements.yml collections: - google.cloud # inventory.gcp.yml plugin: gcp_compute projects: - my-project filters: - "labels.env=prod"

Common Pattern

--- - name: Configure web servers hosts: webservers become: yes vars: app_version: "{{ lookup('env', 'APP_VERSION') }}" tasks: - name: Install nginx apt: name: nginx state: present - name: Deploy app config template: src: app.conf.j2 dest: /etc/nginx/sites-enabled/app
SonarQube & Checkov Code Quality & IaC Security

SonarQube

  • Static code analysis (SAST)
  • Quality Gates: new code coverage, bugs, vulnerabilities
  • Measures: reliability, security, maintainability, coverage, duplication
  • CI integration: sonar-scanner, GitHub Action: SonarSource/sonarcloud-github-action
  • sonar-project.properties in repo root

Checkov (IaC)

  • Scans Terraform, CloudFormation, K8s, Helm, Dockerfile
  • Checks misconfigurations (public buckets, open ports, etc.)
  • CI: checkov --directory . --framework terraform
  • .checkov.yml for skip/severity config
  • Bridgecrew cloud platform for GUI/policies

CI Integration Example

# GitHub Actions step - name: IaC Security Scan run: | checkov --directory terraform/ \ --framework terraform \ --soft-fail \ --output cli --output junitxml > checkov-report.xml - name: SonarCloud Scan uses: SonarSource/sonarcloud-github-action@master env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
Vertex AI & MLOps

Vertex AI Components

  • Vertex AI Pipelines: Kubeflow-based DAGs
  • Model Registry: versioning, staging/production
  • Feature Store: managed feature serving
  • Endpoint: model serving with autoscaling
  • Model Evaluation: classification, regression metrics

MLOps Pipeline

  • Data validation (TFX) → Feature engineering → Train → Evaluate
  • If metrics pass → Deploy to staging → Shadow traffic → Prod
  • Monitoring: prediction drift, feature drift
  • Retrain trigger: scheduled or drift threshold

MLOps Lifecycle

Raw Data → Data Validation → Feature Store ↓ Train (Vertex AI Training) → AutoML / Custom Container ↓ Evaluate (thresholds: accuracy, precision, recall) ↓ Model Registry (versioned) ↓ Deploy to Endpoint (canary → 100%) ↓ Monitor (Vertex AI Model Monitoring)

Key Concepts

  • AutoML: train without code (image, tabular, text)
  • Custom Training: bring your own container
  • Hyperparameter Tuning: Vizier-based
  • Explainable AI: feature attributions

CI/CD for ML

  • Pipeline = code (Python SDK)
  • Compile → Upload → Run (trigger on data change)
  • Model validation in CI gate
  • Deploy via CI: gcloud ai endpoints deploy-model
Gemini & Google AI Studio

Gemini Models

  • Gemini 1.5 Flash: fast, cost-effective
  • Gemini 1.5 Pro: complex reasoning, long context (1M tokens)
  • Gemini 2.0 Flash: latest (multimodal, tool use)
  • Available via: AI Studio (dev) / Vertex AI (prod)

Google AI Studio

  • Free tier, rapid prototyping
  • System instructions, safety settings
  • Export to Vertex AI for production
  • API key based (no IAM)

Vertex AI Gemini API

  • IAM-based auth (Service Account)
  • Supports grounding (search, data store)
  • Model Garden: curated + open models
  • Safety filters, content moderation

Agents & Extensions

  • Vertex AI Agent Builder: no-code/nocode agents
  • Grounding in Google Search or enterprise data
  • Function calling / tool use
  • Conversational agents (Dialogflow integration)

API Comparison

# AI Studio (dev / prototyping) curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=API_KEY" # Vertex AI (production) curl -X POST "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent" \ -H "Authorization: Bearer $(gcloud auth print-access-token)"
Grafana Observability & Dashboards

Data Sources

  • Prometheus (metrics from K8s/GKE)
  • Cloud Monitoring (GCP metrics)
  • Loki (logs)
  • Tempo (traces)
  • Cloud Logging (via Grafana Cloud or self-managed)

Key Features

  • Dashboard as Code (JSON / Terraform)
  • Alerting: rules, notification channels (PagerDuty, Slack)
  • Annotations: mark deployments on graphs
  • Explore: ad-hoc PromQL, LogQL queries

Common Dashboard Panels

  • K8s Cluster: CPU/Memory, pod status, node health
  • GKE Workload: request rate, latency (p50/p95/p99), error rate
  • Terraform: number of resources, last plan time
  • CI/CD: pipeline success rate, duration trend
  • Vertex AI: prediction latency, error budget, model drift

PromQL Quick Reference

  • rate(http_requests_total[5m]) — request rate
  • histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) — p95 latency
  • sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) — CPU by namespace
  • avg(go_memstats_alloc_bytes) by (job) — memory by job
API Gateway GCP API Management

GCP Options

  • Cloud Endpoints: Extensible Service Proxy (ESP), OpenAPI/gRPC
  • Apigee: full lifecycle API management, monetization, analytics
  • Cloud Load Balancer: L7 routing, URL maps, SSL
  • Kong / Ambassador: open-source on K8s

When to Use What

  • Simple REST/gRPC: Cloud Endpoints + Cloud LB
  • Enterprise API management: Apigee
  • K8s-native: Kong Ingress or Ambassador
  • Serverless: API Gateway (Cloud Functions, Cloud Run)

Cloud Endpoints Example

# openapi.yaml swagger: "2.0" info: title: "My API" version: "1.0.0" x-google-endpoints: - name: "my-api.endpoints.PROJECT.cloud.goog" x-google-backend: address: "https://us-central1-PROJECT.cloudfunctions.net/my-function" # Deploy: gcloud endpoints services deploy openapi.yaml
Linux Essentials Troubleshooting & Admin

Process Management

  • ps aux | grep process — find process
  • top/htop — live resource monitor
  • systemctl status/start/stop/enable
  • journalctl -u service-name -f — logs

Disk & Memory

  • df -h — disk usage
  • du -sh * — directory sizes
  • free -h — memory
  • lsblk — block devices

Network Troubleshooting

  • ping / traceroute
  • ss -tulpn — listening ports
  • curl -v http://... — HTTP debug
  • nslookup / dig — DNS
  • tcpdump -i eth0 port 80 — packet capture

File Operations

  • grep -r "pattern" . — recursive search
  • find / -name "file" 2>/dev/null
  • rsync -avz source/ dest/
  • chmod / chown — permissions
  • tar -czf archive.tar.gz dir

Diagnostic Sequence

# When something breaks: 1. Check logs: journalctl / tail -f /var/log/syslog 2. Check resources: top, df -h, free -h 3. Check network: ss -tulpn, curl, ping 4. Check config: diff expected vs actual, syntax check 5. Restart service: systemctl restart & systemctl status