Interview Reference - GCP / K8s / DevOps

Landing Zone GCP Enterprise Foundation

Hierarchy

Org → Folders → Projects → Resources
Policies inherit downward
Use folders per environment (dev/staging/prod)

Shared VPC

Host project (network) + Service projects (workloads)
Centralized Cloud NAT, FW rules, VPN/Interconnect
Service projects get subnets via Shared VPC admin

Org Policies

constraints/compute.requireOsLogin
constraints/iam.allowedPolicyMemberDomains
constraints/compute.vmExternalIpAccess
Domain restricted sharing

VPC Service Controls

Perimeter around managed services (GCS, BQ, etc.)
Prevents data exfiltration
Dry-run → Enforced
Ingress/Egress rules for on-prem access

Checklist

Org + Folders created
Shared VPC host/service project setup
Cloud Interconnect/VPN to on-prem
Cloud DNS forwarding + on-prem resolver
IAM custom roles at org/folder level
VPC Service Controls perimeter
Cloud Armor + FW rules
Logging: aggregation sink to BQ + GCS

Hybrid Networking GCP ↔ On-Prem

Hub-and-Spoke Topology

Hub VPC hosts Shared VPC, VPN/Interconnect
Spoke VPCs peer with hub (no transitive peering!)
Use Network Connectivity Center for large scale

Cloud Interconnect

Dedicated: 10/100 Gbps, physical cross-connect
Partner: via supported provider (Equinix, Megaport)
VLAN attachments → Cloud Router → BGP

Cloud VPN

HA VPN: 2 tunnels (two Cloud Router IPs)
Classic VPN: single tunnel (deprecated)
BGP dynamic routing recommended

Cloud Router & BGP

Advertise VPC subnets via BGP
Custom route advertisements
Global dynamic routing for multi-region

Key Design Pattern

On-Prem ──[BGP]── Cloud Router ── Shared VPC Hub ── Spoke VPC (Prod) ── Spoke VPC (Dev) ── PSC to managed services

Private Service Connect PSC

Concept

Producer publishes service via Service Attachment
Consumer creates PSC endpoint (internal IP) in own VPC
Traffic stays on Google network — never leaves it

vs VPC Peering

PSC: consumer gets IP from its own range
Peering: full bi-directional connectivity, no IP overlap allowed
PSC: consumer/producer IPs can overlap!

Hands-on Commands

# Producer: create service attachment gcloud compute service-attachments create SA_NAME \ --region=us-central1 \ --producer-forwarding-rule=FWD_RULE \ --nat-subnets=NAT_SUBNET # Consumer: create PSC endpoint gcloud compute addresses create PSC_IP --region=us-central1 \ --subnet=consumer-subnet gcloud compute forwarding-rules create FR_NAME \ --region=us-central1 \ --target-service-attachment=projects/PROD/regions/.../SA_NAME \ --address=PSC_IP

Use Cases

Access GCP managed services (GCS, Bigtable) privately
Expose your internal API privately to consumers
Multi-tenant SaaS on GCP

DNS w/ PSC

Create private DNS zone with A record pointing to PSC IP
Attach to consumer VPC
On-prem: DNS forwarding zone → Cloud DNS → resolves PSC

Hybrid DNS GCP ↔ On-Prem Resolution

Cloud DNS → On-Prem

Create DNS forwarding zone in Cloud DNS
Forward onprem.example.com → on-prem DNS server IP
Uses VPN/Interconnect path (private zone)

On-Prem → Cloud DNS

On-prem DNS forwards *.gcp.example.com → Cloud DNS inbound endpoint
Cloud DNS inbound server policy → bound to VPC

End-to-End Flow

On-Prem App ──→ On-Prem DNS └── .gcp.example.com? ──→ Cloud DNS Inbound EP └── gcp.internal? ──→ resolves via Private Zone └── psc.example.com? ──→ PSC private zone → PSC IP

Cloud DNS Inbound Setup

gcloud dns inbound-server-policy create POLICY \ --network=NETWORK \ --private-alternative-ip-utilization

DNS Peering

Peer zones between projects without forwarding
Use for Shared VPC service project resolution

IAM & Policy Identity & Access Management

IAM Types

Primitive: Owner/Editor/Viewer (do not use)
Predefined: e.g. roles/storage.admin
Custom: least privilege, scoped to permissions

IAM Conditions

Attribute-based: resource type, IP range, time
resource.matchTag, request.time
Evaluated at access time

Deny Policies

Explicit deny overrides any allow
Org-level deny: iam.deny admin role
Use for break-glass scenarios

Best Practices

Use groups, not individual users
Custom roles at folder level, not project
Audit with Cloud Asset Inventory
Policy Analyzer: gcloud policies analyze

Key Terms

Principal = who (user, group, SA, workspace domain)
Role = collection of permissions
Policy = bind principal + role [+ condition] to resource
Resource Manager = org/folder/project hierarchy

Network Security

Firewall Rules (VPC FW)

Ingress + Egress, stateful
Priority: 0-65535 (lower = higher)
Implicit deny-all at end
Network tags for targeted rules
Service accounts as source/dest

Cloud Armor

WAF + DDoS protection at LB edge
Pre-configured rules: XSS, SQLi, L7 DDoS
Custom rules: rate limiting, IP block, geo
Can do bot management (reCAPTCHA)

VPC Firewall Rules

gcloud compute firewall-rules create allow-internal \ --network=my-vpc \ --priority=1000 \ --direction=INGRESS \ --source-ranges=10.0.0.0/8 \ --rules=tcp:0-65535,udp:0-65535

Private Google Access

VMs without external IP access GCP APIs privately
Enabled per subnet
Routes via default internet gateway (Google-only)

Security Layers (Defense in Depth)

Edge: Cloud Armor (WAF) + Google Front End
VPC: FW rules + Private Google Access + VPC SC
Workload: IAM + Workload Identity + gVisor (Sandbox)
Data: CMEK, CSEK, DLP

Kubernetes Architecture GKE & Core Concepts

Control Plane Components

API Server: front-end, auth, validation
etcd: distributed key-value store
Scheduler: assigns pods to nodes
Controller Manager: watches state, reconciles

Worker Node Components

kubelet: agent, communicates with API server
kube-proxy: network rules, iptables/ipvs
Container Runtime: containerd

GKE Specific

Autopilot vs Standard (node management)
Workload Identity: K8s SA → GCP IAM (no keys!)
GKE Dataplane V2 (eBPF, Cilium-based)
Node auto-upgrade, auto-repair

Workload Resources

Deployment: stateless apps, rolling update
StatefulSet: stable identities, sticky storage
DaemonSet: one pod per node (logging, monitoring)
Job/CronJob: batch processing

Networking Model

Every pod gets a real IP (flat network)
No NAT between pods (CNI: e.g. Calico, Cilium)
Service types: ClusterIP, NodePort, LoadBalancer, ExternalName
Ingress (L7) and Gateway API (L7/L4)

Helm Kubernetes Package Manager

Core Concepts

Chart: package (templates + values + deps)
Release: running instance of a chart
Repository: chart storage (OCI, HTTP)
Values: values.yaml overridable with --set

Helm 3 Changes

No Tiller (pure client-side)
3-way strategic merge for upgrades
Release info stored as Secrets
OCI-based chart registry support

Common Commands

helm create my-chart # scaffold helm lint ./my-chart # validate helm template . --values prod.yaml # render locally helm install my-release ./my-chart -f prod.yaml helm upgrade --install my-release ./my-chart -f prod.yaml helm rollback my-release 2 # rollback to revision 2 helm list -A # list all releases

Chart Structure

my-chart/ Chart.yaml # metadata values.yaml # default values templates/ # Go templates charts/ # subcharts (deps) crds/ # CRDs

Hooks

pre/post-install, upgrade, delete, rollback
Useful for: DB migration, config validation
Annotation: helm.sh/hook: post-install

Microservices Deployment on K8s

Design Patterns

Sidecar: logging, proxy alongside app (e.g. Istio envoy)
Ambassador: proxy as sidecar for external access
Adapter: normalize monitoring output
Circuit Breaker: fail-fast, retry w/ backoff

Service Mesh (Istio)

Envoy sidecar proxies intercept traffic
mTLS between services (mutual TLS)
Traffic splitting (canary, blue-green)
Observability: traces, metrics, logs

Deployment Strategies

Rolling: gradual replacement (default K8s)
Blue-Green: two full environments, switch DNS/LB
Canary: small % traffic to new version, monitor, ramp
Feature flags: toggle on/off without deploy

CI/CD Pipeline Azure DevOps & GitHub Actions

Azure DevOps Pipeline

azure-pipelines.yml in repo root
Stages: Build → Test → Deploy
Environments: dev → staging → prod (approval gates)
Variables + Variable Groups + Key Vault integration
Service Connection → GCP (Workload Identity Federation)

GitHub Actions

.github/workflows/*.yml
Events: push, PR, schedule, workflow_dispatch
OIDC to GCP (no static creds)
Matrix builds, caches, artifacts

Pipeline Flow (Full)

[PR] → Lint + Unit Test + SonarQube + Checkov (IaC) ↓ merge to main [Push] → Build → Tag (semver) → Push to Artifact Registry ↓ [Terraform] → terraform plan (PR) → terraform apply (merge) ↓ [Deploy] → Helm upgrade --install → Smoke test ↓ [Prod] → Approval gate → Canary → 100% rollout

Key Practices

Trunk-based: short-lived branches, main is deployable
GitFlow: develop → release branches, for larger teams
Semantic versioning: vMAJOR.MINOR.PATCH
SBOM generation in pipeline (Trivy)

GitHub / Azure DevOps Integration

GitHub repos ↔ Azure Boards (cross-link)
GitHub Advanced Security (code scanning, secret scanning)
Branch protection rules: require PR, status checks

Terraform IaC for GCP

Core Workflow

terraform init (provider, backend, modules)
terraform plan (preview, output to file)
terraform apply (execute plan)
terraform destroy

State Management

Backend: GCS bucket (terraform state)
Locking: GCS supports object lock
terraform state list, terraform state mv
Remote state data source: terraform_remote_state

GCP Provider

terraform { backend "gcs" { bucket = "my-tfstate-bucket" prefix = "prod/network" } required_providers { google = { source = "hashicorp/google" version = "~> 5.0" } } } provider "google" { project = var.project_id region = var.region }

Best Practices

Modularize: modules/ directory
Env segregation: envs/dev/, envs/prod/
Use .tfvars files per env
Pin provider version
terraform plan in CI PR checks

Import & Move

terraform import google_container_cluster.my_cluster PROJECT/location/name
terraform state rm to remove from state
terraform state mv to restructure state

Ansible Configuration Management

Core Concepts

Playbook: YAML with plays (hosts + tasks)
Inventory: hosts file or dynamic (gcp_compute)
Roles: reusable structured playbooks
Modules: idempotent units (e.g. copy, template, package)

GCP Dynamic Inventory

# requirements.yml collections: - google.cloud # inventory.gcp.yml plugin: gcp_compute projects: - my-project filters: - "labels.env=prod"

Common Pattern

--- - name: Configure web servers hosts: webservers become: yes vars: app_version: "{{ lookup('env', 'APP_VERSION') }}" tasks: - name: Install nginx apt: name: nginx state: present - name: Deploy app config template: src: app.conf.j2 dest: /etc/nginx/sites-enabled/app

SonarQube & Checkov Code Quality & IaC Security

SonarQube

Static code analysis (SAST)
Quality Gates: new code coverage, bugs, vulnerabilities
Measures: reliability, security, maintainability, coverage, duplication
CI integration: sonar-scanner, GitHub Action: SonarSource/sonarcloud-github-action
sonar-project.properties in repo root

Checkov (IaC)

Scans Terraform, CloudFormation, K8s, Helm, Dockerfile
Checks misconfigurations (public buckets, open ports, etc.)
CI: checkov --directory . --framework terraform
.checkov.yml for skip/severity config
Bridgecrew cloud platform for GUI/policies

CI Integration Example

# GitHub Actions step - name: IaC Security Scan run: | checkov --directory terraform/ \ --framework terraform \ --soft-fail \ --output cli --output junitxml > checkov-report.xml - name: SonarCloud Scan uses: SonarSource/sonarcloud-github-action@master env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

Vertex AI & MLOps

Vertex AI Components

Vertex AI Pipelines: Kubeflow-based DAGs
Model Registry: versioning, staging/production
Feature Store: managed feature serving
Endpoint: model serving with autoscaling
Model Evaluation: classification, regression metrics

MLOps Pipeline

Data validation (TFX) → Feature engineering → Train → Evaluate
If metrics pass → Deploy to staging → Shadow traffic → Prod
Monitoring: prediction drift, feature drift
Retrain trigger: scheduled or drift threshold

MLOps Lifecycle

Raw Data → Data Validation → Feature Store ↓ Train (Vertex AI Training) → AutoML / Custom Container ↓ Evaluate (thresholds: accuracy, precision, recall) ↓ Model Registry (versioned) ↓ Deploy to Endpoint (canary → 100%) ↓ Monitor (Vertex AI Model Monitoring)

Key Concepts

AutoML: train without code (image, tabular, text)
Custom Training: bring your own container
Hyperparameter Tuning: Vizier-based
Explainable AI: feature attributions

CI/CD for ML

Pipeline = code (Python SDK)
Compile → Upload → Run (trigger on data change)
Model validation in CI gate
Deploy via CI: gcloud ai endpoints deploy-model

Gemini & Google AI Studio

Gemini Models

Gemini 1.5 Flash: fast, cost-effective
Gemini 1.5 Pro: complex reasoning, long context (1M tokens)
Gemini 2.0 Flash: latest (multimodal, tool use)
Available via: AI Studio (dev) / Vertex AI (prod)

Google AI Studio

Free tier, rapid prototyping
System instructions, safety settings
Export to Vertex AI for production
API key based (no IAM)

Vertex AI Gemini API

IAM-based auth (Service Account)
Supports grounding (search, data store)
Model Garden: curated + open models
Safety filters, content moderation

Agents & Extensions

Vertex AI Agent Builder: no-code/nocode agents
Grounding in Google Search or enterprise data
Function calling / tool use
Conversational agents (Dialogflow integration)

API Comparison

# AI Studio (dev / prototyping) curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=API_KEY" # Vertex AI (production) curl -X POST "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent" \ -H "Authorization: Bearer $(gcloud auth print-access-token)"

Grafana Observability & Dashboards

Data Sources

Prometheus (metrics from K8s/GKE)
Cloud Monitoring (GCP metrics)
Loki (logs)
Tempo (traces)
Cloud Logging (via Grafana Cloud or self-managed)

Key Features

Dashboard as Code (JSON / Terraform)
Alerting: rules, notification channels (PagerDuty, Slack)
Annotations: mark deployments on graphs
Explore: ad-hoc PromQL, LogQL queries

Common Dashboard Panels

K8s Cluster: CPU/Memory, pod status, node health
GKE Workload: request rate, latency (p50/p95/p99), error rate
Terraform: number of resources, last plan time
CI/CD: pipeline success rate, duration trend
Vertex AI: prediction latency, error budget, model drift

PromQL Quick Reference

rate(http_requests_total[5m]) — request rate
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) — p95 latency
sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace) — CPU by namespace
avg(go_memstats_alloc_bytes) by (job) — memory by job

API Gateway GCP API Management

GCP Options

Cloud Endpoints: Extensible Service Proxy (ESP), OpenAPI/gRPC
Apigee: full lifecycle API management, monetization, analytics
Cloud Load Balancer: L7 routing, URL maps, SSL
Kong / Ambassador: open-source on K8s

When to Use What

Simple REST/gRPC: Cloud Endpoints + Cloud LB
Enterprise API management: Apigee
K8s-native: Kong Ingress or Ambassador
Serverless: API Gateway (Cloud Functions, Cloud Run)

Cloud Endpoints Example

# openapi.yaml swagger: "2.0" info: title: "My API" version: "1.0.0" x-google-endpoints: - name: "my-api.endpoints.PROJECT.cloud.goog" x-google-backend: address: "https://us-central1-PROJECT.cloudfunctions.net/my-function" # Deploy: gcloud endpoints services deploy openapi.yaml

Linux Essentials Troubleshooting & Admin

Process Management

ps aux | grep process — find process
top/htop — live resource monitor
systemctl status/start/stop/enable
journalctl -u service-name -f — logs

Disk & Memory

df -h — disk usage
du -sh * — directory sizes
free -h — memory
lsblk — block devices

Network Troubleshooting

ping / traceroute
ss -tulpn — listening ports
curl -v http://... — HTTP debug
nslookup / dig — DNS
tcpdump -i eth0 port 80 — packet capture

File Operations

grep -r "pattern" . — recursive search
find / -name "file" 2>/dev/null
rsync -avz source/ dest/
chmod / chown — permissions
tar -czf archive.tar.gz dir

Diagnostic Sequence

# When something breaks: 1. Check logs: journalctl / tail -f /var/log/syslog 2. Check resources: top, df -h, free -h 3. Check network: ss -tulpn, curl, ping 4. Check config: diff expected vs actual, syntax check 5. Restart service: systemctl restart & systemctl status