InspireIT Solutions Bangalore

SaaS company migrating to Google Cloud — 22 SOP Playbook
Industry:B2B Analytics Platform GCP Org:inspireit.co Teams:Platform Eng (10) | Data Science (6) | Security (3) | Dev (25) Projects:~15 across shared-infra, dev, staging, prod Compliance:SOC 2, HIPAA (target)

Scenario

InspireIT is migrating their analytics platform from on-prem to GCP. You're the Platform Engineer leading the deployment. Each SOP below is a self-contained deployment guide with portal paths, prerequisites, step-by-step instructions, and verification steps.

SOPs 1-8: IAM Foundation

Org hierarchy, custom roles, conditions, service accounts, deny policies, VPC SC, audit logging, project factory.

SOPs 9-11: Networking

Hybrid VPN, Private Service Connect, bi-directional DNS between GCP and on-prem.

SOPs 12-13: Kubernetes

GKE cluster with Workload Identity, Ingress, Helm packaging and deployment.

SOPs 14-17: DevOps

Terraform with remote state, Azure DevOps + GitHub CI/CD, SonarQube + Checkov, Ansible automation.

SOPs 18-20: APIs + AI

Cloud Endpoints, Vertex AI + MLOps pipeline, Gemini + AI Studio integration.

SOPs 21-22: Ops

Grafana dashboards + alerting, Linux diagnostic sequence.

1 Org Hierarchy & Folder Structure IAM
Objective: Create folder hierarchy for complete environment isolation. Duration: 15 min. Why: All IAM policies and org policies inherit downward through this structure.

⚠ Prerequisites (Do Beforehand)

  • GCP Organization node must exist — verify at IAM & Admin / Settings
  • You need roles/resourcemanager.organizationAdmin or roles/resourcemanager.folderAdmin
  • Billing account linked to the org
  • Document naming convention: inspireit-{env}-{purpose}
  • A folder per environment is a GCP best practice for policy isolation

👉 Step-by-Step Portal Guide

Step 1 — Navigate to Resource Manager
Launch Console → IAM & Admin / Manage Resources
You'll see the Organization node at the top with ID inspireit.co. This is the root of your hierarchy. All folders and projects live under this.
Step 2 — Create Top-Level Environment Folders
Click CREATE FOLDER button (top of page) → Fill in:
Folder 1: Name = inspireit-common — Shared infrastructure (networking, CI/CD, security logging)
Folder 2: Name = inspireit-dev — Development workloads
Folder 3: Name = inspireit-staging — Pre-production validation
Folder 4: Name = inspireit-prod — Production workloads
Click CREATE for each. The parent should be your Organization node.
Step 3 — Create Team Sub-Folders
Click into each environment folder → CREATE FOLDER again:
Inside inspireit-dev: platform, data-science, apis
Inside inspireit-prod: platform, data-science, apis
Inside inspireit-common: networking, security, shared-tools
Team sub-folders let you delegate IAM at the folder level rather than per-project.
Step 4 — Create Initial Projects
Click into sub-folder → CREATE PROJECT:
Under inspireit-common/networking: inspireit-shared-networking
Under inspireit-dev/platform: inspireit-dev-platform-gke
Under inspireit-dev/data-science: inspireit-dev-ds-pipelines
Under inspireit-common/security: inspireit-security-logging
When creating, select your billing account and choose the parent folder.
Step 5 — Apply Org Policy (Optional But Recommended)
IAM & Admin / Organization Policies → find Domain restricted sharing
Set to Enforce → only allow principals from inspireit.co domain. This prevents external accounts from being granted IAM roles.

✅ Verification

Tree should look like:

inspireit.co (Organization)
+-- inspireit-common (Folder)
| +-- networking/inspireit-shared-networking (Project)
| +-- security/inspireit-security-logging (Project)
+-- inspireit-dev (Folder)
| +-- platform/inspireit-dev-platform-gke (Project)
| +-- data-science/inspireit-dev-ds-pipelines (Project)
+-- inspireit-staging (Folder)
+-- inspireit-prod (Folder)
+-- platform (Folder)
+-- data-science (Folder)

⚠ Interview Tip: Hierarchy drives inheritance. An org policy set at inspireit-prod applies to all projects inside. This is how you enforce environments cannot affect each other.

2 Custom IAM Roles (Least Privilege) IAM
Objective: Create inspireitSecurityViewer (read-only) and inspireitNetworkAdmin (scoped network mgmt). Duration: 20 min. Why: Predefined roles like roles/editor include 3000+ permissions — too broad.

⚠ Prerequisites

  • Need roles/iam.roleAdmin at Organization level
  • List of required permissions prepared in advance
  • Reference: IAM & Admin / Roles to see existing predefined roles

👉 Create inspireitSecurityViewer Role

Step 1 — Navigate to Roles
IAM & Admin / Roles / + CREATE ROLE
Step 2 — Configure
Fill in the form:
Title: InspireIT Security Viewer
ID: inspireitSecurityViewer (auto-generated, immutable)
Description: Read-only access to IAM policies, audit logs, org policies, and Cloud Asset Inventory
Launch Stage: General Availability
Step 3 — Add Permissions
Click + ADD PERMISSIONS and search/add each:
IAM: iam.roles.get, iam.roles.list, iam.serviceAccounts.getIamPolicy
Resource Manager: resourcemanager.projects.getIamPolicy, resourcemanager.folders.getIamPolicy
Logging: logging.logEntries.list, logging.logs.list
Org Policy: orgpolicy.policies.list, orgpolicy.policy.get
Asset: cloudasset.assets.listResource, cloudasset.assets.queryAccessPolicy
Use the filter/search bar — type each permission name.
Step 4 — Create
Click CREATE. Role is now available org-wide in the custom roles list.

👉 Create inspireitNetworkAdmin Role

Step 1
IAM & Admin / Roles / + CREATE ROLE
Step 2
Title: InspireIT Network Admin, ID: inspireitNetworkAdmin
Step 3 — Permissions
compute.networks.create, .update, .delete
compute.subnetworks.* (full CRUD)
compute.firewalls.create, .update, .delete
compute.routes.*
dns.managedZones.*
compute.interconnects.* (if on-prem)
compute.forwardingRules.*

👉 Assign Roles

Assign SecurityViewer at Org
IAM & Admin / IAM → select Organization node → + ADD
New principal: security-team@inspireit.co (Google Group)
Role: Custom → InspireIT Security Viewer
SAVE. This inherits to ALL folders and projects below.
Assign NetworkAdmin at Folder
Navigate to inspireit-common folder → + ADD
Principal: network-team@inspireit.co
Role: InspireIT Network Admin
SAVE. Scoped to shared-infra only — dev/prod teams cannot modify networking.

✅ Verification

IAM & Admin / Roles → filter by "inspireit" → both custom roles visible with permission counts. Use IAM / Policy Analyzer to verify a test user has exactly the intended permissions.

⚠ Interview: Custom roles = least privilege. Say: "We reduced blast radius from 3000+ perms (Editor) to 15-20 perms per custom role." Audit over-permissioned principals regularly with Policy Analyzer.

3 IAM Conditions (Time / IP / Resource) IAM
Objective: Add attribute-based conditions to existing role bindings using CEL (Common Expression Language). Duration: 15 min. Why: Conditions restrict access contextually without creating new roles.

⚠ Prerequisites

  • Existing role binding to modify (e.g., inspireitNetworkAdmin on a group)
  • Understand CEL syntax (shown below)
  • Resource Manager tags created (optional but powerful)

👉 Scenario A: Time-Based (Business Hours Only)

Step 1
IAM & Admin / IAM → select inspireit-staging folder → find the dev group
Step 2
Click the pencil icon next to the role → ADD CONDITION
Step 3 — Configure
Title: Business hours only
Condition type: Time → Temporal
The CEL expression is auto-generated when you use the condition builder:
request.time.getHours("America/Chicago") >= 9
&& request.time.getHours("America/Chicago") < 17
&& request.time.getDayOfWeek("America/Chicago") >= 1
&& request.time.getDayOfWeek("America/Chicago") <= 5
This restricts to: 9 AM to 5 PM, Monday to Friday, Chicago timezone.
Step 4
Click SAVE. The condition appears in the policy JSON under bindings[].condition.

👉 Scenario B: Resource Tag + IP Condition

Step 1 — Create Tag
IAM & Admin / Tags+ CREATE TAG
Key: environment. Values: dev, staging, prod, shared.
Scope: Organization (all projects can use). Click CREATE.
Step 2 — Tag a GCS Bucket
Cloud Storage / Bucket / Labels & Tags → attach environment=prod
Step 3 — Add Condition to Role
Edit the ops team's role binding on the prod project. ADD CONDITION:
Title: Prod-tagged only from on-prem
Condition builder: Resource → Tag → environmentprod + AND + IP → 203.0.113.0/24
Resulting CEL:
resource.matchTag("inspireit.co/environment", "prod")
&& origin.ip in ["203.0.113.0/24"]
Step 4
Click SAVE. Now ops can only modify prod-tagged resources from the on-prem IP range.

✅ Verification

Use IAM & Admin / Policy Analyzer → query the principal → the effective access shows condition status: granted or not granted based on context.

4 Service Accounts & Workload Identity Federation IAM
Objective: Create a GCP service account for GKE workloads and set up WIF so GitHub Actions deploys without static keys. Duration: 25 min. Why: No service account keys = no secrets to rotate or leak.

⚠ Prerequisites

  • GKE cluster with Workload Identity enabled (--workload-pool=PROJECT.svc.id.goog)
  • GitHub repo with OIDC provider configured
  • Permissions: iam.serviceAccountAdmin, iam.workloadIdentityPoolAdmin

👉 Part A: Create & Bind GCP Service Account

Step 1
IAM & Admin / Service Accounts / + CREATE SERVICE ACCOUNT
Step 2 — Configure
Fill in:
Name: gke-microservice-sa
ID: gke-microservice-sa
Description: For GKE microservices to read GCS and write logs
Step 3 — Grant Roles
Click + ADD ROLE three times:
roles/storage.objectViewer (read GCS buckets)
roles/logging.logWriter (write logs)
roles/monitoring.metricWriter (custom metrics)
Step 4 — Create Key (Fallback Only)
Click KEYS tab → ADD KEY → Create New Key → JSON
⚠ Better: skip keys entirely and use WIF below.

👉 Part B: Workload Identity Federation (GitHub Actions)

Step 1 — Create Workload Identity Pool
IAM & Admin / Workload Identity Federation / + CREATE POOL
Name: github-pool. ID: github-pool. Click CREATE.
Step 2 — Add OIDC Provider
Inside the pool → ADD PROVIDER:
Provider name: github-provider
Issuer URL: https://token.actions.githubusercontent.com
Audience (string): https://github.com/InspireIT
Attribute mapping:
google.subject = assertion.sub
attribute.repository = assertion.repository
Step 3 — Grant Access
In pool → GRANT ACCESS → select gke-microservice-sa
Add condition to limit to a specific repo:
assertion.repository == "InspireIT/backend-api"
This ensures only the InspireIT/backend-api repo can impersonate this SA.

👉 Part C: K8s Workload Identity Binding

Step 1 — Create K8s SA
kubectl create sa ksa-backend -n prod
Step 2 — Bind K8s SA to GCP SA
gcloud iam service-accounts add-iam-policy-binding
gke-microservice-sa@PROJECT.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT.svc.id.goog[prod/ksa-backend]"
Step 3 — Annotate K8s SA
kubectl annotate sa ksa-backend -n prod \
iam.gke.io/gcp-service-account=gke-microservice-sa@PROJECT.iam.gserviceaccount.com

✅ Verification

Deploy a test pod: kubectl run test --image=google/cloud-sdk:slim --serviceaccount=ksa-backend -it --rm -- gcloud auth list. The GCP SA token appears automatically. No keys needed!

5 Deny Policies (Guardrails) IAM
Objective: Set org-level explicit deny policies to prevent public bucket exposure and SA key creation. Duration: 15 min. Why: Deny always overrides allow — these are your safety net.

⚠ Prerequisites

  • Need roles/iam.denyAdmin at Organization (separate from regular IAM admin)
  • Understand: deny policies apply after IAM allow evaluation
  • Plan exceptions: principals or conditions that can bypass deny

👉 Policy 1: Prevent Public GCS Buckets

Step 1
IAM & Admin / Deny Policies / + CREATE DENY POLICY
Step 2 — Scope
Select Organization node (inspireit.co) → this applies to ALL projects.
Step 3 — Configure
Policy ID: deny-public-gcs
Principals: ALL
Permissions: Add these two:
storage.buckets.setIamPolicy (prevent IAM changes)
storage.buckets.setPublicAccess (prevent public access toggle)
Step 4 — Exceptions
Under Exceptions+ ADD PRINCIPAL:
principalSet:group:security-admins@inspireit.co. This allows security admins to make buckets public if absolutely necessary (break-glass).
Step 5
Click CREATE. Now no one except security-admins can make GCS buckets public.

👉 Policy 2: Block SA Key Creation in Prod

Step 1
IAM & Admin / Deny Policies / + CREATE DENY POLICY
Step 2 — Scope
Select inspireit-prod folder (not org-wide — dev/staging can still create keys for testing).
Step 3 — Configure
Policy ID: deny-sa-key-creation-prod
Principals: ALL
Permissions:
iam.serviceAccounts.create
iam.serviceAccountKeys.create
iam.serviceAccounts.uploadKey
Step 4
Click CREATE. Prod now requires Workload Identity Federation — no static keys allowed.

✅ Verification

Try making a bucket public: Cloud Storage / Permissions / + allUsers → You'll see: "Policy denied by org policy" error. Check deny policy logs: Logging / Logs Explorer → query: protoPayload.metadata.denyPolicyName

6 VPC Service Controls Perimeter IAM
Objective: Create a VPC SC perimeter around production projects to prevent data exfiltration. Duration: 30 min. Why: Even a compromised service account inside the perimeter cannot exfiltrate data to the internet.

⚠ Prerequisites

  • Production projects identified (under inspireit-prod folder)
  • On-prem IP ranges and service accounts that need ingress/egress access
  • Permissions: accesscontextmanager.policyAdmin (separate from Compute/Network admin)
  • Note: Org ID needed (found in IAM & Admin / Settings)

👉 Step-by-Step Portal Guide

Step 1 — Navigate
IAM & Admin / VPC Service Controls / + NEW PERIMETER
Step 2 — Basic Info
Title: inspireit-prod-perimeter
Type: Regular (standard) vs Bridge (use only if peering across perimeters)
Step 3 — Add Projects
Click + ADD PROJECTS → select all prod projects:
inspireit-prod-platform-gke
inspireit-prod-ds-pipelines
inspireit-prod-apis
inspireit-prod-data-lake
Step 4 — Restricted Services
Select services that hold sensitive data:
Cloud Storage
BigQuery
Bigtable
Cloud Spanner
Cloud SQL
Dataflow
Vertex AI
These services cannot exfiltrate data outside the perimeter.
Step 5 — Ingress Rules
+ ADD INGRESS RULE — allow on-prem access:
Source: IP subnet → 203.0.113.0/24 (on-prem CIDR)
Identity: SA: etl-sa@inspireit-prod-data-lake.iam.gserviceaccount.com
Services: Cloud Storage, BigQuery
Step 6 — Egress Rules
+ ADD EGRESS RULE — allow monitoring:
Destination: External → specify inspireit-common project
Identity: SA: monitoring-sa@inspireit-common.iam.gserviceaccount.com
Services: Cloud Monitoring, Cloud Logging
Step 7 — DRY RUN FIRST
Set mode to DRY RUNSAVE
Monitor violations for 24-48 hours. Check: VPC SC / Perimeter / Activity Logs. Fix any issues, then switch to ENFORCED.

✅ Verification

From a VM outside the perimeter: gsutil ls gs://inspireit-prod-bucket403 VPC Service Controls. From inside (prod project VM or on-prem with ingress): succeeds.

⚠ Critical: Always start in DRY RUN. A misconfigured perimeter breaks all prod access. Monitor for 24-48 hours before enforcing.

7 Policy Analyzer & Audit Logging IAM
Objective: Troubleshoot who has access to what, and set up audit logging for IAM changes. Duration: 20 min. Why: You can't secure what you can't see.

⚠ Prerequisites

  • roles/iam.roleViewer + roles/cloudasset.viewer
  • Cloud Asset API enabled in at least one project

👉 Part A: Policy Analyzer Queries

Step 1 — Basic Query
IAM & Admin / Policy Analyzer
Scope: inspireit.co (Organization)
Principal: dev-team@inspireit.co
Click ANALYZE. Result shows: all roles, resources, and conditions affecting this group. Green = granted, Yellow = conditional, Red = denied.
Step 2 — Find Over-Permissioned Users
In the Custom Query tab, use:
SELECT *
FROM cloud_asset_iam_policies
WHERE roles_any("roles/editor")
AND resource LIKE "//cloudresourcemanager.googleapis.com/organizations/%"
This finds all principals with Editor role at the org level (bad practice!). Export results to CSV for review.

👉 Part B: Audit Logging for IAM

Step 1 — Enable Data Access Logs
IAM & Admin / Audit Logs
Admin Activity logs are always on (free, 400-day retention).
Under DATA ACCESS tab → search for IAM → check Admin Read + Data Access.
⚠ Data Access logs are chargeable. Enable selectively.
Step 2 — Create Log Sink to BigQuery
Logging / Logs Router / + CREATE SINK
Sink name: iam-audit-sink
Inclusion filter:
protoPayload.serviceName="iam.googleapis.com"
Destination: BigQuery dataset inspireit_audit_logs (query IAM changes with SQL).
Click CREATE SINK.

✅ Verification

Make an IAM change, then query in Logs Explorer:

protoPayload.serviceName="iam.googleapis.com"
protoPayload.methodName="SetIamPolicy"

You'll see who changed what, when, and the policy diff.

8 Project Factory Pattern IAM
Objective: Create standardized projects with baseline IAM, APIs, Shared VPC attachment, and VPC SC registration. Duration: 20 min. Why: Every new project should have the same security baseline.

⚠ Prerequisites

  • Shared VPC host project already deployed
  • Terraform service account with roles/resourcemanager.projectCreator + billing permissions
  • Template variables defined (folder ID, billing account, network names)

👉 Manual First-Time (Portal)

Step 1 — Create Project
IAM & Admin / Manage Resources / CREATE PROJECT
Name: inspireit-dev-backend-v2. Parent: inspireit-dev/apis folder. Billing: Link org billing.
Step 2 — Enable Baseline APIs
APIs & Services / + ENABLE APIS
compute.googleapis.com
container.googleapis.com
cloudresourcemanager.googleapis.com
iam.googleapis.com
logging.googleapis.com
monitoring.googleapis.com
Step 3 — Attach Shared VPC
VPC Network / Shared VPC / Attach
Select host project: inspireit-shared-networking. Choose subnets (e.g. dev-backend-subnet). Click SAVE.
Step 4 — Assign Baseline IAM
IAM / + ADD
dev-team@inspireit.coroles/container.developer
ci-cd-sa@inspireit-common.iam.gserviceaccount.comroles/container.developer
monitoring-sa@inspireit-common.iam.gserviceaccount.comroles/monitoring.metricWriter
Step 5 — Add to VPC SC (prod only)
VPC Service Controls / Perimeter / Edit / + Add project
Select the new project. SAVE.

👉 Automate (Terraform)

# modules/project-factory/main.tf
resource "google_project" "project" {
name = var.project_name
project_id = var.project_id
folder_id = var.folder_id
billing_account = var.billing_account
}

resource "google_project_service" "apis" {
for_each = toset(var.enabled_apis)
project = google_project.project.project_id
service = each.key
}

resource "google_compute_shared_vpc_service_project" "attach" {
count = var.attach_shared_vpc ? 1 : 0
host_project = var.host_project_id
service_project = google_project.project.project_id
}

✅ Verification

Resource Manager → new project visible in correct folder with APIs enabled. IAM → baseline roles applied. Shared VPC → subnet attached.

9 Hybrid Networking + Cloud VPN Networking
Objective: Connect InspireIT on-prem to GCP via HA VPN + Cloud Router + BGP. Duration: 45 min. Why: HA VPN gives 99.99% SLA with two tunnels for redundancy.

⚠ Prerequisites

  • On-prem VPN gateway with BGP support (ASN: 64512)
  • Non-overlapping CIDRs: on-prem 10.0.0.0/8, GCP 172.16.0.0/12
  • Permissions: compute.networkAdmin

👉 Portal Steps

Step 1 — VPC
VPC Network / VPC Networks / + CREATE VPC
Name: inspireit-shared-vpc. Subnets: 10.0.1.0/24 (us-central1), 10.0.2.0/24 (us-west1). Mode: Custom.
Step 2 — Cloud Router
VPC Network / Cloud Routers / + CREATE ROUTER
Name: inspireit-cr-uscentral1. Network: inspireit-shared-vpc. Region: us-central1. ASN: 64513. Advertised: Custom → VPC subnets.
Step 3 — HA VPN Gateway
VPC Network / VPN / + CREATE VPN
Name: inspireit-ha-vpn. Network: inspireit-shared-vpc. Cloud Router: inspireit-cr-uscentral1. Creates two external IPs (interface 0 and 1).
Step 4 — Tunnels + BGP
In VPN → + ADD TUNNEL (do twice):
Tunnel 0: Peer IP = on-prem GW1, IKE pre-shared key, BGP peer ASN 64512
Tunnel 1: Peer IP = on-prem GW2, different PSK, same BGP ASN
Step 5 — On-Prem Config
On on-prem VPN: point to GCP HA VPN IPs with matching PSKs and BGP config.

✅ Verification

VPN / Tunnels → both Established. Cloud Routers / BGP SessionsEstablished. From GCP VM: ping 10.0.0.1 (on-prem) succeeds.

⚠ Interview: HA VPN = 99.99% SLA. Cloud Router advertises VPC routes dynamically. No static routes needed. Use two gateways in different regions for regional failover.

10 Private Service Connect + DNS Networking
Objective: Expose an internal GCP service privately to consumers using PSC. Duration: 30 min. Why: No VPC peering needed — consumer gets IP from its own range. IPs can overlap!

⚠ Prerequisites

  • Producer: inspireit-prod-apis with Internal TCP LB deployed
  • Consumer: inspireit-dev-platform-gke
  • Permissions: compute.* in both projects

👉 Producer Side

Step 1 — Create Service Attachment
VPC Network / Private Service Connect / + CREATE SERVICE ATTACHMENT
Name: inspireit-api-sa. Region: us-central1. Target: Internal LB frontend. NAT Subnet: psc-nat-subnet (10.99.0.0/28) — consumer traffic lands here.
Step 2 — Grant IAM
In SA → IAM tab → + ADD
Principal: consumer-project-number@gcp-sa-psc.iam.gserviceaccount.com
Role: roles/compute.pscServiceAttachmentUser

👉 Consumer Side

Step 3 — Reserve IP
VPC Network / IP Addresses / + RESERVE
Name: psc-api-ip. Subnet: consumer subnet. IP: 172.16.1.100.
Step 4 — PSC Endpoint
VPC Network / Private Service Connect / + CONNECT TO SERVICE
Target: projects/inspireit-prod-apis/regions/us-central1/serviceAttachments/inspireit-api-sa. IP: 172.16.1.100.
Step 5 — Private DNS Zone
Cloud DNS / Zones / + CREATE ZONE
Type: Private. DNS name: internal-api.inspireit.io. VPC: consumer VPC. A record: api.internal-api.inspireit.io172.16.1.100.

✅ Verification

From consumer VM: curl http://api.internal-api.inspireit.io:8080/health → responds from producer. Traffic stays on Google network.

11 Hybrid DNS End-to-End Networking
Objective: Bi-directional DNS between GCP and on-prem. Duration: 25 min. Why: GCP apps need to resolve on-prem services and vice versa.

⚠ Prerequisites

  • Cloud VPN/Interconnect established (SOP 9)
  • On-prem DNS server IPs known (10.0.0.53, 10.0.0.54)
  • Cloud DNS API enabled

👉 Portal Steps

Step 1 — Forwarding Zone (GCP to On-Prem)
Cloud DNS / Zones / + CREATE ZONE / Forwarding Zone
DNS name: onprem.inspireit.io.. VPC: inspireit-shared-vpc. Forward to: 10.0.0.53 (primary), 10.0.0.54 (backup).
Step 2 — Inbound Policy (On-Prem to GCP)
Cloud DNS / Inbound Server Policies / + CREATE
Name: inspireit-dns-inbound. VPC: inspireit-shared-vpc. Allocate IPs: 10.0.1.100 and 10.0.1.101 from a subnet. These are the inbound forwarding endpoints.
Step 3 — Private Zone for GCP Services
Cloud DNS / Zones / + CREATE ZONE / Private
DNS name: gcp.internal.inspireit.io.. VPC: inspireit-shared-vpc. Add A record: api.gcp.internal.inspireit.io172.16.1.100 (PSC endpoint or internal LB IP).
Step 4 — On-Prem DNS Config
On on-prem DNS server, add forwarding rule:
Forward gcp.internal.inspireit.io to 10.0.1.100 and 10.0.1.101 (the inbound endpoints).

✅ Verification

From on-prem: nslookup api.gcp.internal.inspireit.io172.16.1.100. From GCP: nslookup db.onprem.inspireit.io → on-prem IP.

⚠ Interview: Forwarding zone (GCP→onprem) + inbound policy (onprem→GCP) = bidirectional DNS. Use DNS peering for cross-project resolution.

12 GKE Cluster + Microservices K8s
Objective: Create production GKE cluster, deploy a microservice with Ingress and Workload Identity. Duration: 40 min.

⚠ Prerequisites

  • container.admin permissions
  • Compute + Container APIs enabled
  • Shared VPC subnets for node IPs and pod IP ranges

👉 Create GKE Cluster

Step 1
Kubernetes Engine / Clusters / + CREATE / Standard
Name: inspireit-dev-gke. Location: us-central1 (zonal). Node pool: e2-standard-4, size 3. Networking: select shared VPC.
Step 2 — Workload Identity
Cluster → Security tab:
Enable Workload Identity. Workload pool: PROJECT.svc.id.goog (auto-filled).
Step 3 — Features
Features tab:
Dataplane V2 (eBPF/Cilium)
Cloud Logging + Cloud Monitoring
Node auto-upgrade + auto-repair
Step 4 — Connect & Deploy
Click CONNECT → copy command → run in Cloud Shell:
gcloud container clusters get-credentials inspireit-dev-gke --region us-central1
kubectl create ns prod
kubectl create deployment payments-api --image=nginx --replicas=3 -n prod
kubectl expose deployment payments-api --port=80 --name=payments-svc -n prod
Step 5 — Ingress
K8s Engine / Services & Ingress+ Create Ingress or CLI:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payments-ingress
namespace: prod
spec:
ingressClassName: gce
rules:
- host: api.inspireit.io
http:
paths:
- path: /payments
pathType: Prefix
backend:
service:
name: payments-svc
port:
number: 80
Apply: kubectl apply -f ingress.yaml

✅ Verification

kubectl get ingress -n prod → external IP assigned (2-3 min). curl http://IP/payments → 200. kubectl get pods -n prod → 3/3 running.

13 Helm Package Management K8s
Objective: Create, package, and deploy a Helm chart for the payments microservice. Duration: 20 min.

⚠ Prerequisites

  • GKE cluster running (SOP 12)
  • Helm CLI: helm version
  • Container image pushed to Artifact Registry

👉 CLI Steps

Step 1 — Scaffold
helm create inspireit-payments
rm inspireit-payments/templates/*.yaml
Step 2 — Templates
templates/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Values.appName }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ .Values.appName }}
template:
metadata:
labels:
app: {{ .Values.appName }}
spec:
containers:
- name: {{ .Values.appName }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: {{ .Values.service.targetPort }}
Step 3 — Values
values.yaml:
appName: payments-api
replicaCount: 3
image:
repository: us-central1-docker.pkg.dev/inspireit-dev/platform/payments
tag: v1.0.0
service:
port: 80
targetPort: 8080
Step 4 — Install
helm lint ./inspireit-payments
helm template ./inspireit-payments
helm install payments-release ./inspireit-payments -n prod
Step 5 — Upgrade & Rollback
# Edit values.yaml - change tag to v1.1.0
helm upgrade payments-release ./inspireit-payments -n prod
helm history payments-release -n prod
helm rollback payments-release 1 -n prod

✅ Verification

helm list -Apayments-release in prod namespace. kubectl get pods -n prod → running new image.

⚠ Interview: Helm 3 = 3-way strategic merge (live + last release + new spec). No Tiller. Rollbacks restore exact prior manifest.

14 Terraform Pipeline DevOps
Objective: Terraform with remote GCS state + plan/apply pipeline in CI. Duration: 30 min.

⚠ Prerequisites

  • GCS bucket: inspireit-tfstate-prod with Object Versioning enabled
  • SA with: storage.objectAdmin on bucket, compute.*, iam.*

👉 Setup

Step 1 — Create GCS Backend
Cloud Storage / Create Bucket
Name: inspireit-tfstate-prod. Location: us-central1. Enable: Object versioning + Retention policy (30 days).
Step 2 — Backend Config
backend.tf:
terraform {
backend "gcs" {
bucket = "inspireit-tfstate-prod"
prefix = "gke-cluster"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
Step 3 — Init
gcloud auth application-default login
terraform init
terraform plan -out=tfplan
terraform apply tfplan

👉 CI Pipeline (GitHub Actions)

Step 4
.github/workflows/tf.yml:
on:
pull_request: paths: ['terraform/**']
push: branches: [main] paths: ['terraform/**']
jobs:
tf:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: 'projects/...'
service_account: 'tf-sa@inspireit-common.iam...'
- run: terraform init && terraform plan -out=tfplan
- run: terraform apply tfplan
if: github.ref == 'refs/heads/main'

✅ Verification

Cloud Storage / inspireit-tfstate-prodgke-cluster/default.tfstate exists. Version history tab shows every apply.

15 CI/CD Pipeline (Azure DevOps + GitHub) DevOps
Objective: Full pipeline: build, push, scan, Helm deploy to GKE. Duration: 45 min.

⚠ Prerequisites

  • GitHub repo + Azure DevOps org
  • Artifact Registry repo created
  • GKE cluster (SOP 12) + Helm chart (SOP 13)

👉 Azure DevOps Pipeline

Step 1 — Service Connection
Project Settings / Service Connections / + New / Google Cloud
Select Workload Identity Federation (no service account keys). Copy WIF provider URL.
Step 2 — Pipeline YAML
azure-pipelines.yml:
trigger: branches: [main]
pool: ubuntu-latest
variables: projectId: inspireit-dev-platform-gke
stages:
- stage: Build
jobs:
- job: BuildAndPush
steps:
- task: Docker@2
inputs:
containerRegistry: gcp-wif
repository: us-central1-docker.pkg.dev/$(projectId)/platform/payments
tags: $(Build.BuildId)
- task: HelmDeploy@0
inputs: command: package chartPath: helm/inspireit-payments
- stage: Deploy
jobs:
- deployment: DeployToDev
environment: dev
steps:
- task: HelmDeploy@0
inputs:
command: upgrade
chartPath: '*.tgz'
releaseName: payments-release
namespace: prod

👉 GitHub Actions Equivalent

Step 3
.github/workflows/deploy.yml:
name: Deploy to GKE
on: push branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/...
service_account: ci-cd-sa@inspireit-common.iam...
- run: docker build -t us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA .
- run: docker push us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA
- run: gcloud container clusters get-credentials $CLUSTER --region $REGION
- run: helm upgrade payments-release ./helm/inspireit-payments -n prod --set image.tag=$GITHUB_SHA

✅ Verification

Push commit to main → pipeline runs. Each stage turns green. Helm deploys to GKE with new image tag.

⚠ Interview: WIF for all CI tools — no static SA keys. Azure DevOps + GitHub Actions both support OIDC to GCP.

16 SonarQube + Checkov DevOps
Objective: Add code quality (SonarCloud) and IaC security (Checkov) scanning to CI. Duration: 25 min.

⚠ Prerequisites

  • SonarCloud account (sonarcloud.io)
  • SONAR_TOKEN generated in SonarCloud

👉 SonarQube Integration

Step 1 — Config
sonar-project.properties:
sonar.projectKey=InspireIT_payments-api
sonar.organization=inspireit
sonar.sources=src/
sonar.tests=tests/
sonar.coverage.exclusions=**/*.test.js
sonar.qualitygate.wait=true
Step 2 — GitHub Action
Add to deploy workflow:
- name: SonarCloud Scan
uses: SonarSource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

👉 Checkov Integration

Step 3 — Scan Terraform + K8s
Add after checkout:
- name: Checkov IaC Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: terraform/
framework: terraform
soft_fail: false
- name: Checkov K8s Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: helm/inspireit-payments/
framework: kubernetes
soft_fail: true
Step 4 — Config (Optional)
.checkov.yml:
compact: true
skip-check:
- CKV_GCP_6
- CKV_GCP_15

✅ Verification

Push a PR with low-coverage code and insecure Terraform. Pipeline fails at SonarQube (quality gate: coverage below 80%) and Checkov (public bucket). Fix both, re-push, pipeline passes.

⚠ Interview: SonarQube gates on new code (not legacy). Checkov scans Terraform, K8s, Helm, Dockerfiles. Use soft_fail: false for critical, true for advisory.

17 Ansible Automation DevOps
Objective: Post-provisioning VM configuration with Ansible dynamic inventory. Duration: 20 min.

⚠ Prerequisites

  • Compute Engine VMs running (Linux)
  • SSH access from Ansible control node
  • pip install ansible + ansible-galaxy collection install google.cloud

👉 Steps

Step 1 — Dynamic Inventory
inventory.gcp.yml:
plugin: gcp_compute
projects:
- inspireit-dev-platform-gke
filters:
- "labels.env=dev"
hostnames:
- name
keyed_groups:
- key: labels.role
auth_kind: serviceaccount
service_account_file: /path/to/sa-key.json
Step 2 — Playbook
inspireit-common-setup.yml:
---
- name: Common VM setup for InspireIT
hosts: all
become: yes
vars:
app_user: inspireit
app_dir: /opt/inspireit
tasks:
- name: Install Docker
apt:
name: docker.io
state: present
update_cache: yes
- name: Create app user
user: name={{ app_user }} state=present groups=docker
- name: Start node exporter
systemd:
name: prometheus-node-exporter
state: started
enabled: yes
Step 3 — Run
ansible-playbook -i inventory.gcp.yml inspireit-common-setup.yml --check
ansible-playbook -i inventory.gcp.yml inspireit-common-setup.yml

✅ Verification

SSH into VM: docker --version installed, systemctl status prometheus-node-exporter running, /opt/inspireit exists. Second run shows ok=4 changed=0 (idempotent).

18 API Gateway (Cloud Endpoints) APIs
Objective: Deploy Cloud Endpoints with OpenAPI spec in front of Cloud Run, with API key auth. Duration: 25 min.

⚠ Prerequisites

  • Cloud Run service deployed: payments-backend
  • OpenAPI spec file ready
  • Permissions: endpoints.*, serviceusage.*

👉 Steps

Step 1 — OpenAPI Spec
openapi.yaml:
swagger: "2.0"
info:
title: "InspireIT Payments API"
version: "1.0.0"
host: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-endpoints:
- name: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-backend:
address: "https://payments-backend-xyz-uc.a.run.app"
path_translation: APPEND_PATH_TO_ADDRESS
schemes: [https]
paths:
/payments:
get:
summary: List payments
responses: 200: description: OK
Step 2 — Deploy
Cloud Endpoints / + CREATE SERVICE / OpenAPI or CLI:
gcloud endpoints services deploy openapi.yaml
This creates a managed service with an endpoint URL.
Step 3 — API Key Auth
Cloud Endpoints / Service / API Keys+ CREATE API KEY
Add x-google-allow: all in the OpenAPI spec to enable key-based auth. Without the key, requests return 403.

✅ Verification

curl -H "X-API-Key: AIza..." https://payments-api.endpoints.PROJECT.cloud.goog/payments → 200. Without key: curl https://payments-api.endpoints.PROJECT.cloud.goog/payments → 403.

19 Vertex AI + MLOps Pipeline AI/ML
Objective: Train an AutoML model, deploy to endpoint, automate retraining pipeline. Duration: 35 min.

⚠ Prerequisites

  • Vertex AI API enabled
  • GCS bucket: inspireit-ml-artifacts
  • Permissions: aiplatform.*

👉 Portal Steps

Step 1 — Upload Dataset
Vertex AI / Datasets / + CREATE
Type: Tabular. Source: CSV from GCS (gs://inspireit-ml-artifacts/dataset.csv). Target column: fraud_flag. Click CREATE.
Step 2 — Train AutoML
Vertex AI / Training / + CREATE / AutoML
Dataset: Select uploaded dataset. Objective: Classification. Budget: 1 node-hour. Training takes 1-3 hours.
Step 3 — Deploy Endpoint
Vertex AI / Models / Select model / DEPLOY TO ENDPOINT
Endpoint name: fraud-detection-endpoint. Traffic: 100% new model. Machine: n1-standard-2. Min replicas: 1, Max: 5.
Step 4 — Test Prediction
Vertex AI / Endpoints / fraud-detection-endpoint / TEST
Input: {"instances": [{"amount": 250.0, "merchant": "online", "hour": 3}]}

👉 MLOps Pipeline (Automated)

Step 5 — Create Pipeline
Vertex AI / Pipelines / + CREATE / From KFP
DAG: Import dataset → Train AutoML → Evaluate (threshold: AUC > 0.85) → Upload to Model Registry → Deploy canary 10% → Roll to 100%.
Step 6 — Schedule
Under Pipeline → Schedule tab → “Weekly (Sunday midnight)” or trigger on new data arriving in GCS.

✅ Verification

Endpoint shows Active. CLI test: gcloud ai endpoints predict fraud-detection-endpoint --region=us-central1 --json-request=input.json

20 Gemini + AI Studio AI/ML
Objective: Build a customer support assistant with Gemini via AI Studio (prototype) then Vertex AI (production). Duration: 25 min.

⚠ Prerequisites

  • Google AI Studio account: aistudio.google.com
  • Vertex AI API enabled for production deployment

👉 AI Studio (Prototype)

Step 1
Go to aistudio.google.comCreate new prompt
Step 2 — Configure
Model: Gemini 2.0 Flash
System instruction: "You are a customer support agent for InspireIT, a B2B analytics platform. Help users with billing, API keys, and account setup."
Safety: Keep defaults
Step 3 — Test & Export
Prompt: "How do I generate an API key for InspireIT?" → Click Get Code → Choose cURL or Python to copy.

👉 Vertex AI (Production)

Step 4
Vertex AI / Generative AI Studio / Language
Step 5 — API Call
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent \
-d '{"contents": [{"parts": [{"text": "How do I reset my password?"}]}]}'
Step 6 — Grounding (Enterprise Data)
Vertex AI / Agent Builder / + CREATE APP
App type: Search + Chat. Data source: GCS with PDF documentation. Grounding: Enable enterprise grounding — Gemini answers from your docs only.

✅ Verification

AI Studio: chat works in browser. Vertex: curl returns generated text with citations. Agent: grounded answers from enterprise docs only.

⚠ Interview: AI Studio = API key auth, rapid prototyping. Vertex AI = IAM auth, production. Differences: grounding in enterprise data, safety filters, model garden, VPC SC support.

21 Grafana Dashboards Observability
Objective: Grafana with GCP monitoring data, K8s dashboard import, and alert rules. Duration: 25 min.

⚠ Prerequisites

  • Grafana instance (Grafana Cloud free tier or self-hosted on GKE: helm install grafana grafana/grafana -n monitoring)
  • Data sources: Cloud Monitoring API + Prometheus

👉 Steps

Step 1 — Add GCP Data Source
Grafana / Configuration / Data Sources / + Add / Google Cloud Monitoring
Auth: Service Account key (SA with roles/monitoring.viewer). Project: inspireit-prod-platform-gke. Click Save & Test → green.
Step 2 — Import K8s Dashboard
Grafana / Dashboards / + Import
Dashboard ID: 315 (Kubernetes cluster monitoring). Data source: Prometheus. Click Import. Panels appear: cluster CPU/memory, pod status, node health.
Step 3 — Create Alert Rule
Grafana / Alerting / + New Alert Rule
Name: K8s Pod CrashLoop. Condition: rate(kube_pod_status_phase{phase="CrashLoopBackOff"}[5m]) > 0. Contact point: Slack (inspireit-alerts) + PagerDuty. Click SAVE.
Step 4 — Dashboard as Code
Dashboard / Share / Export JSON
Export to dashboards/k8s-cluster.json in your repo. Provision via Terraform or ConfigMap.

✅ Verification

Dashboard panels loading with live data. Alert → Test → notification arrives in Slack channel. JSON export can be version-controlled.

22 Linux Diagnostics Ops
Objective: Standardized troubleshooting sequence for GCP VM or container issues. Duration: 10 min per incident.

⚠ Prerequisites

  • SSH access: Compute Engine / VM / SSH
  • Serial console access (when SSH is broken): VM / Serial console port 1
  • Permissions: compute.instances.getSerialPortOutput

👉 Diagnostic Sequence

Step 1 — Serial Console (SSH fails)
Compute Engine / VM / Serial console port 1
Check for: kernel panic, disk full (No space left on device), startup script failures (cloud-init), SSH daemon not starting.
Step 2 — SSH Health Check
Run these in order:
top -bn1 | head -10 # CPU/memory hogs
df -h # disk space
free -h # RAM usage
systemctl status --failed # failed services
journalctl -u docker -n 50 # Docker logs
ss -tulpn # listening ports
ping -c 2 google.com # external connectivity
nslookup api.inspireit.io # DNS resolution
curl -v http://localhost:8080/health # app health
Step 3 — GCP-Specific Checks
From Cloud Console:
VM Details / Monitoring → CPU, disk IOPS, network graphs (check for throttling)
Logging / Logs Explorer → filter: resource.type="gce_instance" + instance_id="YOUR_ID"
Logging / Logs Explorer → filter: resource.type="k8s_container" for GKE pods
Step 4 — K8s Troubleshooting
kubectl get events -n prod --sort-by='.lastTimestamp'
kubectl describe pod $POD -n prod
kubectl logs $POD -n prod --tail=100 --previous
kubectl exec -it $POD -n prod -- /bin/sh
kubectl top pod -n prod
kubectl get nodes -o wide | grep -v Ready

Quick Reference Card

Symptom
Command / Action
Can't SSH
Serial Console + VPC firewall rule check
Disk full
df -h; du -sh /var/log/*
Pod CrashLoopBackOff
kubectl describe pod + kubectl logs --previous
DNS not resolving
nslookup / dig + Cloud DNS forwarding zone check
High latency
top + ss -tulpn + Grafana dashboard
GKE node NotReady
kubectl describe node + GCE serial console
App returns 503
Check backend services: kubectl get svc -n prod

✅ Verification

Follow sequence to identify root cause within 5-10 min. Work bottom-up: serial console → OS metrics → container logs → app health endpoint.