InspireIT Solutions Bangalore

SaaS company migrating to Google Cloud — 22 SOP Playbook

Industry:B2B Analytics Platform GCP Org:inspireit.co Teams:Platform Eng (10) | Data Science (6) | Security (3) | Dev (25) Projects:~15 across shared-infra, dev, staging, prod Compliance:SOC 2, HIPAA (target)

Scenario

InspireIT is migrating their analytics platform from on-prem to GCP. You're the Platform Engineer leading the deployment. Each SOP below is a self-contained deployment guide with portal paths, prerequisites, step-by-step instructions, and verification steps.

SOPs 1-8: IAM Foundation

Org hierarchy, custom roles, conditions, service accounts, deny policies, VPC SC, audit logging, project factory.

SOPs 9-11: Networking

Hybrid VPN, Private Service Connect, bi-directional DNS between GCP and on-prem.

SOPs 12-13: Kubernetes

GKE cluster with Workload Identity, Ingress, Helm packaging and deployment.

SOPs 14-17: DevOps

Terraform with remote state, Azure DevOps + GitHub CI/CD, SonarQube + Checkov, Ansible automation.

SOPs 18-20: APIs + AI

Cloud Endpoints, Vertex AI + MLOps pipeline, Gemini + AI Studio integration.

SOPs 21-22: Ops

Grafana dashboards + alerting, Linux diagnostic sequence.

1 Org Hierarchy & Folder Structure IAM

Objective: Create folder hierarchy for complete environment isolation. Duration: 15 min. Why: All IAM policies and org policies inherit downward through this structure.

⚠ Prerequisites (Do Beforehand)

GCP Organization node must exist — verify at IAM & Admin / Settings
You need roles/resourcemanager.organizationAdmin or roles/resourcemanager.folderAdmin
Billing account linked to the org
Document naming convention: inspireit-{env}-{purpose}
A folder per environment is a GCP best practice for policy isolation

👉 Step-by-Step Portal Guide

Step 1 — Navigate to Resource Manager

Launch Console → IAM & Admin / Manage Resources

You'll see the Organization node at the top with ID inspireit.co. This is the root of your hierarchy. All folders and projects live under this.

Step 2 — Create Top-Level Environment Folders

Click CREATE FOLDER button (top of page) → Fill in:

Folder 1: Name = inspireit-common — Shared infrastructure (networking, CI/CD, security logging)
Folder 2: Name = inspireit-dev — Development workloads
Folder 3: Name = inspireit-staging — Pre-production validation
Folder 4: Name = inspireit-prod — Production workloads

Click CREATE for each. The parent should be your Organization node.

Step 3 — Create Team Sub-Folders

Click into each environment folder → CREATE FOLDER again:

Inside inspireit-dev: platform, data-science, apis
Inside inspireit-prod: platform, data-science, apis
Inside inspireit-common: networking, security, shared-tools

Team sub-folders let you delegate IAM at the folder level rather than per-project.

Step 4 — Create Initial Projects

Click into sub-folder → CREATE PROJECT:

Under inspireit-common/networking: inspireit-shared-networking
Under inspireit-dev/platform: inspireit-dev-platform-gke
Under inspireit-dev/data-science: inspireit-dev-ds-pipelines
Under inspireit-common/security: inspireit-security-logging

When creating, select your billing account and choose the parent folder.

Step 5 — Apply Org Policy (Optional But Recommended)

IAM & Admin / Organization Policies → find Domain restricted sharing

Set to Enforce → only allow principals from inspireit.co domain. This prevents external accounts from being granted IAM roles.

✅ Verification

Tree should look like:

inspireit.co (Organization)
+-- inspireit-common (Folder)
| +-- networking/inspireit-shared-networking (Project)
| +-- security/inspireit-security-logging (Project)
+-- inspireit-dev (Folder)
| +-- platform/inspireit-dev-platform-gke (Project)
| +-- data-science/inspireit-dev-ds-pipelines (Project)
+-- inspireit-staging (Folder)
+-- inspireit-prod (Folder)
+-- platform (Folder)
+-- data-science (Folder)

⚠ Interview Tip: Hierarchy drives inheritance. An org policy set at inspireit-prod applies to all projects inside. This is how you enforce environments cannot affect each other.

2 Custom IAM Roles (Least Privilege) IAM

Objective: Create inspireitSecurityViewer (read-only) and inspireitNetworkAdmin (scoped network mgmt). Duration: 20 min. Why: Predefined roles like roles/editor include 3000+ permissions — too broad.

⚠ Prerequisites

Need roles/iam.roleAdmin at Organization level
List of required permissions prepared in advance
Reference: IAM & Admin / Roles to see existing predefined roles

👉 Create inspireitSecurityViewer Role

Step 1 — Navigate to Roles

IAM & Admin / Roles / + CREATE ROLE

Step 2 — Configure

Fill in the form:

Title: InspireIT Security Viewer
ID: inspireitSecurityViewer (auto-generated, immutable)
Description: Read-only access to IAM policies, audit logs, org policies, and Cloud Asset Inventory
Launch Stage: General Availability

Step 3 — Add Permissions

Click + ADD PERMISSIONS and search/add each:

IAM: iam.roles.get, iam.roles.list, iam.serviceAccounts.getIamPolicy
Resource Manager: resourcemanager.projects.getIamPolicy, resourcemanager.folders.getIamPolicy
Logging: logging.logEntries.list, logging.logs.list
Org Policy: orgpolicy.policies.list, orgpolicy.policy.get
Asset: cloudasset.assets.listResource, cloudasset.assets.queryAccessPolicy

Use the filter/search bar — type each permission name.

Step 4 — Create

Click CREATE. Role is now available org-wide in the custom roles list.

👉 Create inspireitNetworkAdmin Role

Step 1

IAM & Admin / Roles / + CREATE ROLE

Step 2

Title: InspireIT Network Admin, ID: inspireitNetworkAdmin

Step 3 — Permissions

compute.networks.create, .update, .delete
compute.subnetworks.* (full CRUD)
compute.firewalls.create, .update, .delete
compute.routes.*
dns.managedZones.*
compute.interconnects.* (if on-prem)
compute.forwardingRules.*

👉 Assign Roles

Assign SecurityViewer at Org

IAM & Admin / IAM → select Organization node → + ADD

New principal: security-team@inspireit.co (Google Group)
Role: Custom → InspireIT Security Viewer
SAVE. This inherits to ALL folders and projects below.

Assign NetworkAdmin at Folder

Navigate to inspireit-common folder → + ADD

Principal: network-team@inspireit.co
Role: InspireIT Network Admin
SAVE. Scoped to shared-infra only — dev/prod teams cannot modify networking.

✅ Verification

IAM & Admin / Roles → filter by "inspireit" → both custom roles visible with permission counts. Use IAM / Policy Analyzer to verify a test user has exactly the intended permissions.

⚠ Interview: Custom roles = least privilege. Say: "We reduced blast radius from 3000+ perms (Editor) to 15-20 perms per custom role." Audit over-permissioned principals regularly with Policy Analyzer.

3 IAM Conditions (Time / IP / Resource) IAM

Objective: Add attribute-based conditions to existing role bindings using CEL (Common Expression Language). Duration: 15 min. Why: Conditions restrict access contextually without creating new roles.

⚠ Prerequisites

Existing role binding to modify (e.g., inspireitNetworkAdmin on a group)
Understand CEL syntax (shown below)
Resource Manager tags created (optional but powerful)

👉 Scenario A: Time-Based (Business Hours Only)

Step 1

IAM & Admin / IAM → select inspireit-staging folder → find the dev group

Step 2

Click the pencil icon next to the role → ADD CONDITION

Step 3 — Configure

Title: Business hours only
Condition type: Time → Temporal

The CEL expression is auto-generated when you use the condition builder:

request.time.getHours("America/Chicago") >= 9
&& request.time.getHours("America/Chicago") < 17
&& request.time.getDayOfWeek("America/Chicago") >= 1
&& request.time.getDayOfWeek("America/Chicago") <= 5

This restricts to: 9 AM to 5 PM, Monday to Friday, Chicago timezone.

Step 4

Click SAVE. The condition appears in the policy JSON under bindings[].condition.

👉 Scenario B: Resource Tag + IP Condition

Step 1 — Create Tag

IAM & Admin / Tags → + CREATE TAG

Key: environment. Values: dev, staging, prod, shared.
Scope: Organization (all projects can use). Click CREATE.

Step 2 — Tag a GCS Bucket

Cloud Storage / Bucket / Labels & Tags → attach environment=prod

Step 3 — Add Condition to Role

Edit the ops team's role binding on the prod project. ADD CONDITION:

Title: Prod-tagged only from on-prem
Condition builder: Resource → Tag → environment → prod + AND + IP → 203.0.113.0/24

Resulting CEL:

resource.matchTag("inspireit.co/environment", "prod")
&& origin.ip in ["203.0.113.0/24"]

Step 4

Click SAVE. Now ops can only modify prod-tagged resources from the on-prem IP range.

✅ Verification

Use IAM & Admin / Policy Analyzer → query the principal → the effective access shows condition status: granted or not granted based on context.

4 Service Accounts & Workload Identity Federation IAM

Objective: Create a GCP service account for GKE workloads and set up WIF so GitHub Actions deploys without static keys. Duration: 25 min. Why: No service account keys = no secrets to rotate or leak.

⚠ Prerequisites

GKE cluster with Workload Identity enabled (--workload-pool=PROJECT.svc.id.goog)
GitHub repo with OIDC provider configured
Permissions: iam.serviceAccountAdmin, iam.workloadIdentityPoolAdmin

👉 Part A: Create & Bind GCP Service Account

Step 1

IAM & Admin / Service Accounts / + CREATE SERVICE ACCOUNT

Step 2 — Configure

Fill in:

Name: gke-microservice-sa
ID: gke-microservice-sa
Description: For GKE microservices to read GCS and write logs

Step 3 — Grant Roles

Click + ADD ROLE three times:

roles/storage.objectViewer (read GCS buckets)
roles/logging.logWriter (write logs)
roles/monitoring.metricWriter (custom metrics)

Step 4 — Create Key (Fallback Only)

Click KEYS tab → ADD KEY → Create New Key → JSON

⚠ Better: skip keys entirely and use WIF below.

👉 Part B: Workload Identity Federation (GitHub Actions)

Step 1 — Create Workload Identity Pool

IAM & Admin / Workload Identity Federation / + CREATE POOL

Name: github-pool. ID: github-pool. Click CREATE.

Step 2 — Add OIDC Provider

Inside the pool → ADD PROVIDER:

Provider name: github-provider
Issuer URL: https://token.actions.githubusercontent.com
Audience (string): https://github.com/InspireIT
Attribute mapping:
google.subject = assertion.sub
attribute.repository = assertion.repository

Step 3 — Grant Access

In pool → GRANT ACCESS → select gke-microservice-sa

Add condition to limit to a specific repo:

assertion.repository == "InspireIT/backend-api"

This ensures only the InspireIT/backend-api repo can impersonate this SA.

👉 Part C: K8s Workload Identity Binding

Step 1 — Create K8s SA

kubectl create sa ksa-backend -n prod

Step 2 — Bind K8s SA to GCP SA

gcloud iam service-accounts add-iam-policy-binding
gke-microservice-sa@PROJECT.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT.svc.id.goog[prod/ksa-backend]"

Step 3 — Annotate K8s SA

kubectl annotate sa ksa-backend -n prod \
iam.gke.io/gcp-service-account=gke-microservice-sa@PROJECT.iam.gserviceaccount.com

✅ Verification

Deploy a test pod: kubectl run test --image=google/cloud-sdk:slim --serviceaccount=ksa-backend -it --rm -- gcloud auth list. The GCP SA token appears automatically. No keys needed!

5 Deny Policies (Guardrails) IAM

Objective: Set org-level explicit deny policies to prevent public bucket exposure and SA key creation. Duration: 15 min. Why: Deny always overrides allow — these are your safety net.

⚠ Prerequisites

Need roles/iam.denyAdmin at Organization (separate from regular IAM admin)
Understand: deny policies apply after IAM allow evaluation
Plan exceptions: principals or conditions that can bypass deny

👉 Policy 1: Prevent Public GCS Buckets

Step 1

IAM & Admin / Deny Policies / + CREATE DENY POLICY

Step 2 — Scope

Select Organization node (inspireit.co) → this applies to ALL projects.

Step 3 — Configure

Policy ID: deny-public-gcs
Principals: ALL
Permissions: Add these two:

storage.buckets.setIamPolicy (prevent IAM changes)
storage.buckets.setPublicAccess (prevent public access toggle)

Step 4 — Exceptions

Under Exceptions → + ADD PRINCIPAL:

principalSet:group:security-admins@inspireit.co. This allows security admins to make buckets public if absolutely necessary (break-glass).

Step 5

Click CREATE. Now no one except security-admins can make GCS buckets public.

👉 Policy 2: Block SA Key Creation in Prod

Step 1

IAM & Admin / Deny Policies / + CREATE DENY POLICY

Step 2 — Scope

Select inspireit-prod folder (not org-wide — dev/staging can still create keys for testing).

Step 3 — Configure

Policy ID: deny-sa-key-creation-prod
Principals: ALL
Permissions:

iam.serviceAccounts.create
iam.serviceAccountKeys.create
iam.serviceAccounts.uploadKey

Step 4

Click CREATE. Prod now requires Workload Identity Federation — no static keys allowed.

✅ Verification

Try making a bucket public: Cloud Storage / Permissions / + allUsers → You'll see: "Policy denied by org policy" error. Check deny policy logs: Logging / Logs Explorer → query: protoPayload.metadata.denyPolicyName

6 VPC Service Controls Perimeter IAM

Objective: Create a VPC SC perimeter around production projects to prevent data exfiltration. Duration: 30 min. Why: Even a compromised service account inside the perimeter cannot exfiltrate data to the internet.

⚠ Prerequisites

Production projects identified (under inspireit-prod folder)
On-prem IP ranges and service accounts that need ingress/egress access
Permissions: accesscontextmanager.policyAdmin (separate from Compute/Network admin)
Note: Org ID needed (found in IAM & Admin / Settings)

✅ Verification

From a VM outside the perimeter: gsutil ls gs://inspireit-prod-bucket → 403 VPC Service Controls. From inside (prod project VM or on-prem with ingress): succeeds.

⚠ Critical: Always start in DRY RUN. A misconfigured perimeter breaks all prod access. Monitor for 24-48 hours before enforcing.

7 Policy Analyzer & Audit Logging IAM

Objective: Troubleshoot who has access to what, and set up audit logging for IAM changes. Duration: 20 min. Why: You can't secure what you can't see.

⚠ Prerequisites

roles/iam.roleViewer + roles/cloudasset.viewer
Cloud Asset API enabled in at least one project

👉 Part A: Policy Analyzer Queries

Step 1 — Basic Query

IAM & Admin / Policy Analyzer

Scope: inspireit.co (Organization)
Principal: dev-team@inspireit.co
Click ANALYZE. Result shows: all roles, resources, and conditions affecting this group. Green = granted, Yellow = conditional, Red = denied.

Step 2 — Find Over-Permissioned Users

In the Custom Query tab, use:

SELECT *
FROM cloud_asset_iam_policies
WHERE roles_any("roles/editor")
AND resource LIKE "//cloudresourcemanager.googleapis.com/organizations/%"

This finds all principals with Editor role at the org level (bad practice!). Export results to CSV for review.

👉 Part B: Audit Logging for IAM

Step 1 — Enable Data Access Logs

IAM & Admin / Audit Logs

Admin Activity logs are always on (free, 400-day retention).
Under DATA ACCESS tab → search for IAM → check Admin Read + Data Access.
⚠ Data Access logs are chargeable. Enable selectively.

Step 2 — Create Log Sink to BigQuery

Logging / Logs Router / + CREATE SINK

Sink name: iam-audit-sink
Inclusion filter:

protoPayload.serviceName="iam.googleapis.com"

Destination: BigQuery dataset inspireit_audit_logs (query IAM changes with SQL).
Click CREATE SINK.

✅ Verification

Make an IAM change, then query in Logs Explorer:

protoPayload.serviceName="iam.googleapis.com"
protoPayload.methodName="SetIamPolicy"

You'll see who changed what, when, and the policy diff.

8 Project Factory Pattern IAM

Objective: Create standardized projects with baseline IAM, APIs, Shared VPC attachment, and VPC SC registration. Duration: 20 min. Why: Every new project should have the same security baseline.

⚠ Prerequisites

Shared VPC host project already deployed
Terraform service account with roles/resourcemanager.projectCreator + billing permissions
Template variables defined (folder ID, billing account, network names)

👉 Manual First-Time (Portal)

Step 1 — Create Project

IAM & Admin / Manage Resources / CREATE PROJECT

Name: inspireit-dev-backend-v2. Parent: inspireit-dev/apis folder. Billing: Link org billing.

Step 2 — Enable Baseline APIs

APIs & Services / + ENABLE APIS

compute.googleapis.com
container.googleapis.com
cloudresourcemanager.googleapis.com
iam.googleapis.com
logging.googleapis.com
monitoring.googleapis.com

Step 3 — Attach Shared VPC

VPC Network / Shared VPC / Attach

Select host project: inspireit-shared-networking. Choose subnets (e.g. dev-backend-subnet). Click SAVE.

Step 4 — Assign Baseline IAM

IAM / + ADD

dev-team@inspireit.co → roles/container.developer
ci-cd-sa@inspireit-common.iam.gserviceaccount.com → roles/container.developer
monitoring-sa@inspireit-common.iam.gserviceaccount.com → roles/monitoring.metricWriter

Step 5 — Add to VPC SC (prod only)

VPC Service Controls / Perimeter / Edit / + Add project

Select the new project. SAVE.

👉 Automate (Terraform)

# modules/project-factory/main.tf
resource "google_project" "project" {
name = var.project_name
project_id = var.project_id
folder_id = var.folder_id
billing_account = var.billing_account
}

resource "google_project_service" "apis" {
for_each = toset(var.enabled_apis)
project = google_project.project.project_id
service = each.key
}

resource "google_compute_shared_vpc_service_project" "attach" {
count = var.attach_shared_vpc ? 1 : 0
host_project = var.host_project_id
service_project = google_project.project.project_id
}

✅ Verification

Resource Manager → new project visible in correct folder with APIs enabled. IAM → baseline roles applied. Shared VPC → subnet attached.

9 Hybrid Networking + Cloud VPN Networking

Objective: Connect InspireIT on-prem to GCP via HA VPN + Cloud Router + BGP. Duration: 45 min. Why: HA VPN gives 99.99% SLA with two tunnels for redundancy.

⚠ Prerequisites

On-prem VPN gateway with BGP support (ASN: 64512)
Non-overlapping CIDRs: on-prem 10.0.0.0/8, GCP 172.16.0.0/12
Permissions: compute.networkAdmin

👉 Portal Steps

Step 1 — VPC

VPC Network / VPC Networks / + CREATE VPC

Name: inspireit-shared-vpc. Subnets: 10.0.1.0/24 (us-central1), 10.0.2.0/24 (us-west1). Mode: Custom.

Step 2 — Cloud Router

VPC Network / Cloud Routers / + CREATE ROUTER

Name: inspireit-cr-uscentral1. Network: inspireit-shared-vpc. Region: us-central1. ASN: 64513. Advertised: Custom → VPC subnets.

Step 3 — HA VPN Gateway

VPC Network / VPN / + CREATE VPN

Name: inspireit-ha-vpn. Network: inspireit-shared-vpc. Cloud Router: inspireit-cr-uscentral1. Creates two external IPs (interface 0 and 1).

Step 4 — Tunnels + BGP

In VPN → + ADD TUNNEL (do twice):

Tunnel 0: Peer IP = on-prem GW1, IKE pre-shared key, BGP peer ASN 64512
Tunnel 1: Peer IP = on-prem GW2, different PSK, same BGP ASN

Step 5 — On-Prem Config

On on-prem VPN: point to GCP HA VPN IPs with matching PSKs and BGP config.

✅ Verification

VPN / Tunnels → both Established. Cloud Routers / BGP Sessions → Established. From GCP VM: ping 10.0.0.1 (on-prem) succeeds.

⚠ Interview: HA VPN = 99.99% SLA. Cloud Router advertises VPC routes dynamically. No static routes needed. Use two gateways in different regions for regional failover.

10 Private Service Connect + DNS Networking

Objective: Expose an internal GCP service privately to consumers using PSC. Duration: 30 min. Why: No VPC peering needed — consumer gets IP from its own range. IPs can overlap!

⚠ Prerequisites

Producer: inspireit-prod-apis with Internal TCP LB deployed
Consumer: inspireit-dev-platform-gke
Permissions: compute.* in both projects

👉 Producer Side

Step 1 — Create Service Attachment

VPC Network / Private Service Connect / + CREATE SERVICE ATTACHMENT

Name: inspireit-api-sa. Region: us-central1. Target: Internal LB frontend. NAT Subnet: psc-nat-subnet (10.99.0.0/28) — consumer traffic lands here.

Step 2 — Grant IAM

In SA → IAM tab → + ADD

Principal: consumer-project-number@gcp-sa-psc.iam.gserviceaccount.com
Role: roles/compute.pscServiceAttachmentUser

👉 Consumer Side

Step 3 — Reserve IP

VPC Network / IP Addresses / + RESERVE

Name: psc-api-ip. Subnet: consumer subnet. IP: 172.16.1.100.

Step 4 — PSC Endpoint

VPC Network / Private Service Connect / + CONNECT TO SERVICE

Target: projects/inspireit-prod-apis/regions/us-central1/serviceAttachments/inspireit-api-sa. IP: 172.16.1.100.

Step 5 — Private DNS Zone

Cloud DNS / Zones / + CREATE ZONE

Type: Private. DNS name: internal-api.inspireit.io. VPC: consumer VPC. A record: api.internal-api.inspireit.io → 172.16.1.100.

✅ Verification

From consumer VM: curl http://api.internal-api.inspireit.io:8080/health → responds from producer. Traffic stays on Google network.

11 Hybrid DNS End-to-End Networking

Objective: Bi-directional DNS between GCP and on-prem. Duration: 25 min. Why: GCP apps need to resolve on-prem services and vice versa.

⚠ Prerequisites

Cloud VPN/Interconnect established (SOP 9)
On-prem DNS server IPs known (10.0.0.53, 10.0.0.54)
Cloud DNS API enabled

👉 Portal Steps

Step 1 — Forwarding Zone (GCP to On-Prem)

Cloud DNS / Zones / + CREATE ZONE / Forwarding Zone

DNS name: onprem.inspireit.io.. VPC: inspireit-shared-vpc. Forward to: 10.0.0.53 (primary), 10.0.0.54 (backup).

Step 2 — Inbound Policy (On-Prem to GCP)

Cloud DNS / Inbound Server Policies / + CREATE

Name: inspireit-dns-inbound. VPC: inspireit-shared-vpc. Allocate IPs: 10.0.1.100 and 10.0.1.101 from a subnet. These are the inbound forwarding endpoints.

Step 3 — Private Zone for GCP Services

Cloud DNS / Zones / + CREATE ZONE / Private

DNS name: gcp.internal.inspireit.io.. VPC: inspireit-shared-vpc. Add A record: api.gcp.internal.inspireit.io → 172.16.1.100 (PSC endpoint or internal LB IP).

Step 4 — On-Prem DNS Config

On on-prem DNS server, add forwarding rule:

Forward gcp.internal.inspireit.io to 10.0.1.100 and 10.0.1.101 (the inbound endpoints).

✅ Verification

From on-prem: nslookup api.gcp.internal.inspireit.io → 172.16.1.100. From GCP: nslookup db.onprem.inspireit.io → on-prem IP.

⚠ Interview: Forwarding zone (GCP→onprem) + inbound policy (onprem→GCP) = bidirectional DNS. Use DNS peering for cross-project resolution.

12 GKE Cluster + Microservices K8s

Objective: Create production GKE cluster, deploy a microservice with Ingress and Workload Identity. Duration: 40 min.

⚠ Prerequisites

container.admin permissions
Compute + Container APIs enabled
Shared VPC subnets for node IPs and pod IP ranges

👉 Create GKE Cluster

Step 1

Kubernetes Engine / Clusters / + CREATE / Standard

Name: inspireit-dev-gke. Location: us-central1 (zonal). Node pool: e2-standard-4, size 3. Networking: select shared VPC.

Step 2 — Workload Identity

Cluster → Security tab:

Enable Workload Identity. Workload pool: PROJECT.svc.id.goog (auto-filled).

Step 3 — Features

Features tab:

Dataplane V2 (eBPF/Cilium)
Cloud Logging + Cloud Monitoring
Node auto-upgrade + auto-repair

Step 4 — Connect & Deploy

Click CONNECT → copy command → run in Cloud Shell:

gcloud container clusters get-credentials inspireit-dev-gke --region us-central1
kubectl create ns prod
kubectl create deployment payments-api --image=nginx --replicas=3 -n prod
kubectl expose deployment payments-api --port=80 --name=payments-svc -n prod

Step 5 — Ingress

K8s Engine / Services & Ingress → + Create Ingress or CLI:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payments-ingress
namespace: prod
spec:
ingressClassName: gce
rules:
- host: api.inspireit.io
http:
paths:
- path: /payments
pathType: Prefix
backend:
service:
name: payments-svc
port:
number: 80

Apply: kubectl apply -f ingress.yaml

✅ Verification

kubectl get ingress -n prod → external IP assigned (2-3 min). curl http://IP/payments → 200. kubectl get pods -n prod → 3/3 running.

13 Helm Package Management K8s

Objective: Create, package, and deploy a Helm chart for the payments microservice. Duration: 20 min.

⚠ Prerequisites

GKE cluster running (SOP 12)
Helm CLI: helm version
Container image pushed to Artifact Registry

👉 CLI Steps

Step 1 — Scaffold

helm create inspireit-payments
rm inspireit-payments/templates/*.yaml

Step 2 — Templates

templates/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Values.appName }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ .Values.appName }}
template:
metadata:
labels:
app: {{ .Values.appName }}
spec:
containers:
- name: {{ .Values.appName }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: {{ .Values.service.targetPort }}

Step 3 — Values

values.yaml:

appName: payments-api
replicaCount: 3
image:
repository: us-central1-docker.pkg.dev/inspireit-dev/platform/payments
tag: v1.0.0
service:
port: 80
targetPort: 8080

Step 4 — Install

helm lint ./inspireit-payments
helm template ./inspireit-payments
helm install payments-release ./inspireit-payments -n prod

Step 5 — Upgrade & Rollback

# Edit values.yaml - change tag to v1.1.0
helm upgrade payments-release ./inspireit-payments -n prod
helm history payments-release -n prod
helm rollback payments-release 1 -n prod

✅ Verification

helm list -A → payments-release in prod namespace. kubectl get pods -n prod → running new image.

⚠ Interview: Helm 3 = 3-way strategic merge (live + last release + new spec). No Tiller. Rollbacks restore exact prior manifest.

14 Terraform Pipeline DevOps

Objective: Terraform with remote GCS state + plan/apply pipeline in CI. Duration: 30 min.

⚠ Prerequisites

GCS bucket: inspireit-tfstate-prod with Object Versioning enabled
SA with: storage.objectAdmin on bucket, compute.*, iam.*

👉 Setup

Step 1 — Create GCS Backend

Cloud Storage / Create Bucket

Name: inspireit-tfstate-prod. Location: us-central1. Enable: Object versioning + Retention policy (30 days).

Step 2 — Backend Config

backend.tf:

terraform {
backend "gcs" {
bucket = "inspireit-tfstate-prod"
prefix = "gke-cluster"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}

Step 3 — Init

gcloud auth application-default login
terraform init
terraform plan -out=tfplan
terraform apply tfplan

👉 CI Pipeline (GitHub Actions)

Step 4

.github/workflows/tf.yml:

on:
pull_request: paths: ['terraform/**']
push: branches: [main] paths: ['terraform/**']
jobs:
tf:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: 'projects/...'
service_account: 'tf-sa@inspireit-common.iam...'
- run: terraform init && terraform plan -out=tfplan
- run: terraform apply tfplan
if: github.ref == 'refs/heads/main'

✅ Verification

Cloud Storage / inspireit-tfstate-prod → gke-cluster/default.tfstate exists. Version history tab shows every apply.

15 CI/CD Pipeline (Azure DevOps + GitHub) DevOps

Objective: Full pipeline: build, push, scan, Helm deploy to GKE. Duration: 45 min.

⚠ Prerequisites

GitHub repo + Azure DevOps org
Artifact Registry repo created
GKE cluster (SOP 12) + Helm chart (SOP 13)

👉 Azure DevOps Pipeline

Step 1 — Service Connection

Project Settings / Service Connections / + New / Google Cloud

Select Workload Identity Federation (no service account keys). Copy WIF provider URL.

Step 2 — Pipeline YAML

azure-pipelines.yml:

trigger: branches: [main]
pool: ubuntu-latest
variables: projectId: inspireit-dev-platform-gke
stages:
- stage: Build
jobs:
- job: BuildAndPush
steps:
- task: Docker@2
inputs:
containerRegistry: gcp-wif
repository: us-central1-docker.pkg.dev/$(projectId)/platform/payments
tags: $(Build.BuildId)
- task: HelmDeploy@0
inputs: command: package chartPath: helm/inspireit-payments
- stage: Deploy
jobs:
- deployment: DeployToDev
environment: dev
steps:
- task: HelmDeploy@0
inputs:
command: upgrade
chartPath: '*.tgz'
releaseName: payments-release
namespace: prod

👉 GitHub Actions Equivalent

Step 3

.github/workflows/deploy.yml:

name: Deploy to GKE
on: push branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/...
service_account: ci-cd-sa@inspireit-common.iam...
- run: docker build -t us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA .
- run: docker push us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA
- run: gcloud container clusters get-credentials $CLUSTER --region $REGION
- run: helm upgrade payments-release ./helm/inspireit-payments -n prod --set image.tag=$GITHUB_SHA

✅ Verification

Push commit to main → pipeline runs. Each stage turns green. Helm deploys to GKE with new image tag.

⚠ Interview: WIF for all CI tools — no static SA keys. Azure DevOps + GitHub Actions both support OIDC to GCP.

16 SonarQube + Checkov DevOps

Objective: Add code quality (SonarCloud) and IaC security (Checkov) scanning to CI. Duration: 25 min.

⚠ Prerequisites

SonarCloud account (sonarcloud.io)
SONAR_TOKEN generated in SonarCloud

👉 SonarQube Integration

Step 1 — Config

sonar-project.properties:

sonar.projectKey=InspireIT_payments-api
sonar.organization=inspireit
sonar.sources=src/
sonar.tests=tests/
sonar.coverage.exclusions=**/*.test.js
sonar.qualitygate.wait=true

Step 2 — GitHub Action

Add to deploy workflow:

- name: SonarCloud Scan
uses: SonarSource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

👉 Checkov Integration

Step 3 — Scan Terraform + K8s

Add after checkout:

- name: Checkov IaC Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: terraform/
framework: terraform
soft_fail: false
- name: Checkov K8s Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: helm/inspireit-payments/
framework: kubernetes
soft_fail: true

Step 4 — Config (Optional)

.checkov.yml:

compact: true
skip-check:
- CKV_GCP_6
- CKV_GCP_15

✅ Verification

Push a PR with low-coverage code and insecure Terraform. Pipeline fails at SonarQube (quality gate: coverage below 80%) and Checkov (public bucket). Fix both, re-push, pipeline passes.

⚠ Interview: SonarQube gates on new code (not legacy). Checkov scans Terraform, K8s, Helm, Dockerfiles. Use soft_fail: false for critical, true for advisory.

17 Ansible Automation DevOps

Objective: Post-provisioning VM configuration with Ansible dynamic inventory. Duration: 20 min.

⚠ Prerequisites

Compute Engine VMs running (Linux)
SSH access from Ansible control node
pip install ansible + ansible-galaxy collection install google.cloud

👉 Steps

Step 1 — Dynamic Inventory

inventory.gcp.yml:

plugin: gcp_compute
projects:
- inspireit-dev-platform-gke
filters:
- "labels.env=dev"
hostnames:
- name
keyed_groups:
- key: labels.role
auth_kind: serviceaccount
service_account_file: /path/to/sa-key.json

Step 2 — Playbook

inspireit-common-setup.yml:

---
- name: Common VM setup for InspireIT
hosts: all
become: yes
vars:
app_user: inspireit
app_dir: /opt/inspireit
tasks:
- name: Install Docker
apt:
name: docker.io
state: present
update_cache: yes
- name: Create app user
user: name={{ app_user }} state=present groups=docker
- name: Start node exporter
systemd:
name: prometheus-node-exporter
state: started
enabled: yes

Step 3 — Run

ansible-playbook -i inventory.gcp.yml inspireit-common-setup.yml --check
ansible-playbook -i inventory.gcp.yml inspireit-common-setup.yml

✅ Verification

SSH into VM: docker --version installed, systemctl status prometheus-node-exporter running, /opt/inspireit exists. Second run shows ok=4 changed=0 (idempotent).

18 API Gateway (Cloud Endpoints) APIs

Objective: Deploy Cloud Endpoints with OpenAPI spec in front of Cloud Run, with API key auth. Duration: 25 min.

⚠ Prerequisites

Cloud Run service deployed: payments-backend
OpenAPI spec file ready
Permissions: endpoints.*, serviceusage.*

👉 Steps

Step 1 — OpenAPI Spec

openapi.yaml:

swagger: "2.0"
info:
title: "InspireIT Payments API"
version: "1.0.0"
host: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-endpoints:
- name: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-backend:
address: "https://payments-backend-xyz-uc.a.run.app"
path_translation: APPEND_PATH_TO_ADDRESS
schemes: [https]
paths:
/payments:
get:
summary: List payments
responses: 200: description: OK

Step 2 — Deploy

Cloud Endpoints / + CREATE SERVICE / OpenAPI or CLI:

gcloud endpoints services deploy openapi.yaml

This creates a managed service with an endpoint URL.

Step 3 — API Key Auth

Cloud Endpoints / Service / API Keys → + CREATE API KEY

Add x-google-allow: all in the OpenAPI spec to enable key-based auth. Without the key, requests return 403.

✅ Verification

curl -H "X-API-Key: AIza..." https://payments-api.endpoints.PROJECT.cloud.goog/payments → 200. Without key: curl https://payments-api.endpoints.PROJECT.cloud.goog/payments → 403.

19 Vertex AI + MLOps Pipeline AI/ML

Objective: Train an AutoML model, deploy to endpoint, automate retraining pipeline. Duration: 35 min.

⚠ Prerequisites

Vertex AI API enabled
GCS bucket: inspireit-ml-artifacts
Permissions: aiplatform.*

👉 Portal Steps

Step 1 — Upload Dataset

Vertex AI / Datasets / + CREATE

Type: Tabular. Source: CSV from GCS (gs://inspireit-ml-artifacts/dataset.csv). Target column: fraud_flag. Click CREATE.

Step 2 — Train AutoML

Vertex AI / Training / + CREATE / AutoML

Dataset: Select uploaded dataset. Objective: Classification. Budget: 1 node-hour. Training takes 1-3 hours.

Step 3 — Deploy Endpoint

Vertex AI / Models / Select model / DEPLOY TO ENDPOINT

Endpoint name: fraud-detection-endpoint. Traffic: 100% new model. Machine: n1-standard-2. Min replicas: 1, Max: 5.

Step 4 — Test Prediction

Vertex AI / Endpoints / fraud-detection-endpoint / TEST

Input: {"instances": [{"amount": 250.0, "merchant": "online", "hour": 3}]}

👉 MLOps Pipeline (Automated)

Step 5 — Create Pipeline

Vertex AI / Pipelines / + CREATE / From KFP

DAG: Import dataset → Train AutoML → Evaluate (threshold: AUC > 0.85) → Upload to Model Registry → Deploy canary 10% → Roll to 100%.

Step 6 — Schedule

Under Pipeline → Schedule tab → “Weekly (Sunday midnight)” or trigger on new data arriving in GCS.

✅ Verification

Endpoint shows Active. CLI test: gcloud ai endpoints predict fraud-detection-endpoint --region=us-central1 --json-request=input.json

20 Gemini + AI Studio AI/ML

Objective: Build a customer support assistant with Gemini via AI Studio (prototype) then Vertex AI (production). Duration: 25 min.

⚠ Prerequisites

Google AI Studio account: aistudio.google.com
Vertex AI API enabled for production deployment

👉 AI Studio (Prototype)

Step 1

Go to aistudio.google.com → Create new prompt

Step 2 — Configure

Model: Gemini 2.0 Flash
System instruction:

"You are a customer support agent for InspireIT, a B2B analytics platform. Help users with billing, API keys, and account setup."

Safety: Keep defaults

Step 3 — Test & Export

Prompt: "How do I generate an API key for InspireIT?" → Click Get Code → Choose cURL or Python to copy.

👉 Vertex AI (Production)

Step 4

Vertex AI / Generative AI Studio / Language

Step 5 — API Call

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent \
-d '{"contents": [{"parts": [{"text": "How do I reset my password?"}]}]}'

Step 6 — Grounding (Enterprise Data)

Vertex AI / Agent Builder / + CREATE APP

App type: Search + Chat. Data source: GCS with PDF documentation. Grounding: Enable enterprise grounding — Gemini answers from your docs only.

✅ Verification

AI Studio: chat works in browser. Vertex: curl returns generated text with citations. Agent: grounded answers from enterprise docs only.

⚠ Interview: AI Studio = API key auth, rapid prototyping. Vertex AI = IAM auth, production. Differences: grounding in enterprise data, safety filters, model garden, VPC SC support.

21 Grafana Dashboards Observability

Objective: Grafana with GCP monitoring data, K8s dashboard import, and alert rules. Duration: 25 min.

⚠ Prerequisites

Grafana instance (Grafana Cloud free tier or self-hosted on GKE: helm install grafana grafana/grafana -n monitoring)
Data sources: Cloud Monitoring API + Prometheus

👉 Steps

Step 1 — Add GCP Data Source

Grafana / Configuration / Data Sources / + Add / Google Cloud Monitoring

Auth: Service Account key (SA with roles/monitoring.viewer). Project: inspireit-prod-platform-gke. Click Save & Test → green.

Step 2 — Import K8s Dashboard

Grafana / Dashboards / + Import

Dashboard ID: 315 (Kubernetes cluster monitoring). Data source: Prometheus. Click Import. Panels appear: cluster CPU/memory, pod status, node health.

Step 3 — Create Alert Rule

Grafana / Alerting / + New Alert Rule

Name: K8s Pod CrashLoop. Condition: rate(kube_pod_status_phase{phase="CrashLoopBackOff"}[5m]) > 0. Contact point: Slack (inspireit-alerts) + PagerDuty. Click SAVE.

Step 4 — Dashboard as Code

Dashboard / Share / Export JSON

Export to dashboards/k8s-cluster.json in your repo. Provision via Terraform or ConfigMap.

✅ Verification

Dashboard panels loading with live data. Alert → Test → notification arrives in Slack channel. JSON export can be version-controlled.

22 Linux Diagnostics Ops

Objective: Standardized troubleshooting sequence for GCP VM or container issues. Duration: 10 min per incident.

⚠ Prerequisites

SSH access: Compute Engine / VM / SSH
Serial console access (when SSH is broken): VM / Serial console port 1
Permissions: compute.instances.getSerialPortOutput

👉 Diagnostic Sequence

Step 1 — Serial Console (SSH fails)

Compute Engine / VM / Serial console port 1

Check for: kernel panic, disk full (No space left on device), startup script failures (cloud-init), SSH daemon not starting.

Step 2 — SSH Health Check

Run these in order:

top -bn1 | head -10 # CPU/memory hogs
df -h # disk space
free -h # RAM usage
systemctl status --failed # failed services
journalctl -u docker -n 50 # Docker logs
ss -tulpn # listening ports
ping -c 2 google.com # external connectivity
nslookup api.inspireit.io # DNS resolution
curl -v http://localhost:8080/health # app health

Step 3 — GCP-Specific Checks

From Cloud Console:

VM Details / Monitoring → CPU, disk IOPS, network graphs (check for throttling)
Logging / Logs Explorer → filter: resource.type="gce_instance" + instance_id="YOUR_ID"
Logging / Logs Explorer → filter: resource.type="k8s_container" for GKE pods

Step 4 — K8s Troubleshooting

kubectl get events -n prod --sort-by='.lastTimestamp'
kubectl describe pod $POD -n prod
kubectl logs $POD -n prod --tail=100 --previous
kubectl exec -it $POD -n prod -- /bin/sh
kubectl top pod -n prod
kubectl get nodes -o wide | grep -v Ready

Quick Reference Card

Symptom

Command / Action

Can't SSH

Serial Console + VPC firewall rule check

Disk full

df -h; du -sh /var/log/*

Pod CrashLoopBackOff

kubectl describe pod + kubectl logs --previous

DNS not resolving

nslookup / dig + Cloud DNS forwarding zone check

High latency

top + ss -tulpn + Grafana dashboard

GKE node NotReady

kubectl describe node + GCE serial console

App returns 503

Check backend services: kubectl get svc -n prod

✅ Verification

Follow sequence to identify root cause within 5-10 min. Work bottom-up: serial console → OS metrics → container logs → app health endpoint.