InspireIT Solutions Bangalore
Scenario
InspireIT is migrating their analytics platform from on-prem to GCP. You're the Platform Engineer leading the deployment. Each SOP below is a self-contained deployment guide with portal paths, prerequisites, step-by-step instructions, and verification steps.
SOPs 1-8: IAM Foundation
Org hierarchy, custom roles, conditions, service accounts, deny policies, VPC SC, audit logging, project factory.
SOPs 9-11: Networking
Hybrid VPN, Private Service Connect, bi-directional DNS between GCP and on-prem.
SOPs 12-13: Kubernetes
GKE cluster with Workload Identity, Ingress, Helm packaging and deployment.
SOPs 14-17: DevOps
Terraform with remote state, Azure DevOps + GitHub CI/CD, SonarQube + Checkov, Ansible automation.
SOPs 18-20: APIs + AI
Cloud Endpoints, Vertex AI + MLOps pipeline, Gemini + AI Studio integration.
SOPs 21-22: Ops
Grafana dashboards + alerting, Linux diagnostic sequence.
⚠ Prerequisites (Do Beforehand)
- GCP Organization node must exist — verify at IAM & Admin / Settings
- You need
roles/resourcemanager.organizationAdminorroles/resourcemanager.folderAdmin - Billing account linked to the org
- Document naming convention:
inspireit-{env}-{purpose} - A folder per environment is a GCP best practice for policy isolation
👉 Step-by-Step Portal Guide
inspireit.co. This is the root of your hierarchy. All folders and projects live under this.inspireit-common — Shared infrastructure (networking, CI/CD, security logging)Folder 2: Name =
inspireit-dev — Development workloadsFolder 3: Name =
inspireit-staging — Pre-production validationFolder 4: Name =
inspireit-prod — Production workloadsinspireit-dev: platform, data-science, apisInside
inspireit-prod: platform, data-science, apisInside
inspireit-common: networking, security, shared-toolsinspireit-common/networking: inspireit-shared-networkingUnder
inspireit-dev/platform: inspireit-dev-platform-gkeUnder
inspireit-dev/data-science: inspireit-dev-ds-pipelinesUnder
inspireit-common/security: inspireit-security-loggingDomain restricted sharinginspireit.co domain. This prevents external accounts from being granted IAM roles.✅ Verification
Tree should look like:
+-- inspireit-common (Folder)
| +-- networking/inspireit-shared-networking (Project)
| +-- security/inspireit-security-logging (Project)
+-- inspireit-dev (Folder)
| +-- platform/inspireit-dev-platform-gke (Project)
| +-- data-science/inspireit-dev-ds-pipelines (Project)
+-- inspireit-staging (Folder)
+-- inspireit-prod (Folder)
+-- platform (Folder)
+-- data-science (Folder)
⚠ Interview Tip: Hierarchy drives inheritance. An org policy set at inspireit-prod applies to all projects inside. This is how you enforce environments cannot affect each other.
inspireitSecurityViewer (read-only) and inspireitNetworkAdmin (scoped network mgmt). Duration: 20 min. Why: Predefined roles like roles/editor include 3000+ permissions — too broad.⚠ Prerequisites
- Need
roles/iam.roleAdminat Organization level - List of required permissions prepared in advance
- Reference: IAM & Admin / Roles to see existing predefined roles
👉 Create inspireitSecurityViewer Role
InspireIT Security ViewerID:
inspireitSecurityViewer (auto-generated, immutable)Description: Read-only access to IAM policies, audit logs, org policies, and Cloud Asset Inventory
Launch Stage: General Availability
iam.roles.get, iam.roles.list, iam.serviceAccounts.getIamPolicyResource Manager:
resourcemanager.projects.getIamPolicy, resourcemanager.folders.getIamPolicyLogging:
logging.logEntries.list, logging.logs.listOrg Policy:
orgpolicy.policies.list, orgpolicy.policy.getAsset:
cloudasset.assets.listResource, cloudasset.assets.queryAccessPolicy👉 Create inspireitNetworkAdmin Role
InspireIT Network Admin, ID: inspireitNetworkAdmincompute.networks.create, .update, .deletecompute.subnetworks.* (full CRUD)compute.firewalls.create, .update, .deletecompute.routes.*dns.managedZones.*compute.interconnects.* (if on-prem)compute.forwardingRules.*👉 Assign Roles
security-team@inspireit.co (Google Group)Role: Custom → InspireIT Security Viewer
SAVE. This inherits to ALL folders and projects below.
inspireit-common folder → + ADDnetwork-team@inspireit.coRole: InspireIT Network Admin
SAVE. Scoped to shared-infra only — dev/prod teams cannot modify networking.
✅ Verification
IAM & Admin / Roles → filter by "inspireit" → both custom roles visible with permission counts. Use IAM / Policy Analyzer to verify a test user has exactly the intended permissions.
⚠ Interview: Custom roles = least privilege. Say: "We reduced blast radius from 3000+ perms (Editor) to 15-20 perms per custom role." Audit over-permissioned principals regularly with Policy Analyzer.
⚠ Prerequisites
- Existing role binding to modify (e.g.,
inspireitNetworkAdminon a group) - Understand CEL syntax (shown below)
- Resource Manager tags created (optional but powerful)
👉 Scenario A: Time-Based (Business Hours Only)
Business hours onlyCondition type: Time → Temporal
&& request.time.getHours("America/Chicago") < 17
&& request.time.getDayOfWeek("America/Chicago") >= 1
&& request.time.getDayOfWeek("America/Chicago") <= 5
bindings[].condition.👉 Scenario B: Resource Tag + IP Condition
environment. Values: dev, staging, prod, shared.Scope: Organization (all projects can use). Click CREATE.
environment=prodProd-tagged only from on-premCondition builder: Resource → Tag →
environment → prod + AND + IP → 203.0.113.0/24&& origin.ip in ["203.0.113.0/24"]
✅ Verification
Use IAM & Admin / Policy Analyzer → query the principal → the effective access shows condition status: granted or not granted based on context.
⚠ Prerequisites
- GKE cluster with Workload Identity enabled (
--workload-pool=PROJECT.svc.id.goog) - GitHub repo with OIDC provider configured
- Permissions:
iam.serviceAccountAdmin,iam.workloadIdentityPoolAdmin
👉 Part A: Create & Bind GCP Service Account
gke-microservice-saID:
gke-microservice-saDescription: For GKE microservices to read GCS and write logs
roles/storage.objectViewer (read GCS buckets)roles/logging.logWriter (write logs)roles/monitoring.metricWriter (custom metrics)👉 Part B: Workload Identity Federation (GitHub Actions)
github-pool. ID: github-pool. Click CREATE.github-providerIssuer URL:
https://token.actions.githubusercontent.comAudience (string):
https://github.com/InspireITAttribute mapping:
google.subject = assertion.subattribute.repository = assertion.repositorygke-microservice-saInspireIT/backend-api repo can impersonate this SA.👉 Part C: K8s Workload Identity Binding
gke-microservice-sa@PROJECT.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT.svc.id.goog[prod/ksa-backend]"
iam.gke.io/gcp-service-account=gke-microservice-sa@PROJECT.iam.gserviceaccount.com
✅ Verification
Deploy a test pod: kubectl run test --image=google/cloud-sdk:slim --serviceaccount=ksa-backend -it --rm -- gcloud auth list. The GCP SA token appears automatically. No keys needed!
⚠ Prerequisites
- Need
roles/iam.denyAdminat Organization (separate from regular IAM admin) - Understand: deny policies apply after IAM allow evaluation
- Plan exceptions: principals or conditions that can bypass deny
👉 Policy 1: Prevent Public GCS Buckets
inspireit.co) → this applies to ALL projects.deny-public-gcsPrincipals: ALL
Permissions: Add these two:
storage.buckets.setIamPolicy (prevent IAM changes)storage.buckets.setPublicAccess (prevent public access toggle)principalSet:group:security-admins@inspireit.co. This allows security admins to make buckets public if absolutely necessary (break-glass).👉 Policy 2: Block SA Key Creation in Prod
deny-sa-key-creation-prodPrincipals: ALL
Permissions:
iam.serviceAccounts.createiam.serviceAccountKeys.createiam.serviceAccounts.uploadKey✅ Verification
Try making a bucket public: Cloud Storage / Permissions / + allUsers → You'll see: "Policy denied by org policy" error. Check deny policy logs: Logging / Logs Explorer → query: protoPayload.metadata.denyPolicyName
⚠ Prerequisites
- Production projects identified (under
inspireit-prodfolder) - On-prem IP ranges and service accounts that need ingress/egress access
- Permissions:
accesscontextmanager.policyAdmin(separate from Compute/Network admin) - Note: Org ID needed (found in IAM & Admin / Settings)
👉 Step-by-Step Portal Guide
inspireit-prod-perimeterType: Regular (standard) vs Bridge (use only if peering across perimeters)
inspireit-prod-platform-gkeinspireit-prod-ds-pipelinesinspireit-prod-apisinspireit-prod-data-lakeBigQuery
Bigtable
Cloud Spanner
Cloud SQL
Dataflow
Vertex AI
203.0.113.0/24 (on-prem CIDR)Identity: SA:
etl-sa@inspireit-prod-data-lake.iam.gserviceaccount.comServices: Cloud Storage, BigQuery
inspireit-common projectIdentity: SA:
monitoring-sa@inspireit-common.iam.gserviceaccount.comServices: Cloud Monitoring, Cloud Logging
✅ Verification
From a VM outside the perimeter: gsutil ls gs://inspireit-prod-bucket → 403 VPC Service Controls. From inside (prod project VM or on-prem with ingress): succeeds.
⚠ Critical: Always start in DRY RUN. A misconfigured perimeter breaks all prod access. Monitor for 24-48 hours before enforcing.
⚠ Prerequisites
roles/iam.roleViewer+roles/cloudasset.viewer- Cloud Asset API enabled in at least one project
👉 Part A: Policy Analyzer Queries
inspireit.co (Organization)Principal:
dev-team@inspireit.coClick ANALYZE. Result shows: all roles, resources, and conditions affecting this group. Green = granted, Yellow = conditional, Red = denied.
FROM cloud_asset_iam_policies
WHERE roles_any("roles/editor")
AND resource LIKE "//cloudresourcemanager.googleapis.com/organizations/%"
👉 Part B: Audit Logging for IAM
Under DATA ACCESS tab → search for
IAM → check Admin Read + Data Access.⚠ Data Access logs are chargeable. Enable selectively.
iam-audit-sinkInclusion filter:
inspireit_audit_logs (query IAM changes with SQL).Click CREATE SINK.
✅ Verification
Make an IAM change, then query in Logs Explorer:
protoPayload.methodName="SetIamPolicy"
You'll see who changed what, when, and the policy diff.
⚠ Prerequisites
- Shared VPC host project already deployed
- Terraform service account with
roles/resourcemanager.projectCreator+ billing permissions - Template variables defined (folder ID, billing account, network names)
👉 Manual First-Time (Portal)
inspireit-dev-backend-v2. Parent: inspireit-dev/apis folder. Billing: Link org billing.compute.googleapis.comcontainer.googleapis.comcloudresourcemanager.googleapis.comiam.googleapis.comlogging.googleapis.commonitoring.googleapis.cominspireit-shared-networking. Choose subnets (e.g. dev-backend-subnet). Click SAVE.dev-team@inspireit.co → roles/container.developerci-cd-sa@inspireit-common.iam.gserviceaccount.com → roles/container.developermonitoring-sa@inspireit-common.iam.gserviceaccount.com → roles/monitoring.metricWriter👉 Automate (Terraform)
resource "google_project" "project" {
name = var.project_name
project_id = var.project_id
folder_id = var.folder_id
billing_account = var.billing_account
}
resource "google_project_service" "apis" {
for_each = toset(var.enabled_apis)
project = google_project.project.project_id
service = each.key
}
resource "google_compute_shared_vpc_service_project" "attach" {
count = var.attach_shared_vpc ? 1 : 0
host_project = var.host_project_id
service_project = google_project.project.project_id
}
✅ Verification
Resource Manager → new project visible in correct folder with APIs enabled. IAM → baseline roles applied. Shared VPC → subnet attached.
⚠ Prerequisites
- On-prem VPN gateway with BGP support (ASN:
64512) - Non-overlapping CIDRs: on-prem
10.0.0.0/8, GCP172.16.0.0/12 - Permissions:
compute.networkAdmin
👉 Portal Steps
inspireit-shared-vpc. Subnets: 10.0.1.0/24 (us-central1), 10.0.2.0/24 (us-west1). Mode: Custom.inspireit-cr-uscentral1. Network: inspireit-shared-vpc. Region: us-central1. ASN: 64513. Advertised: Custom → VPC subnets.inspireit-ha-vpn. Network: inspireit-shared-vpc. Cloud Router: inspireit-cr-uscentral1. Creates two external IPs (interface 0 and 1).64512Tunnel 1: Peer IP = on-prem GW2, different PSK, same BGP ASN
✅ Verification
VPN / Tunnels → both Established. Cloud Routers / BGP Sessions → Established. From GCP VM: ping 10.0.0.1 (on-prem) succeeds.
⚠ Interview: HA VPN = 99.99% SLA. Cloud Router advertises VPC routes dynamically. No static routes needed. Use two gateways in different regions for regional failover.
⚠ Prerequisites
- Producer:
inspireit-prod-apiswith Internal TCP LB deployed - Consumer:
inspireit-dev-platform-gke - Permissions:
compute.*in both projects
👉 Producer Side
inspireit-api-sa. Region: us-central1. Target: Internal LB frontend. NAT Subnet: psc-nat-subnet (10.99.0.0/28) — consumer traffic lands here.consumer-project-number@gcp-sa-psc.iam.gserviceaccount.comRole:
roles/compute.pscServiceAttachmentUser👉 Consumer Side
psc-api-ip. Subnet: consumer subnet. IP: 172.16.1.100.projects/inspireit-prod-apis/regions/us-central1/serviceAttachments/inspireit-api-sa. IP: 172.16.1.100.internal-api.inspireit.io. VPC: consumer VPC. A record: api.internal-api.inspireit.io → 172.16.1.100.✅ Verification
From consumer VM: curl http://api.internal-api.inspireit.io:8080/health → responds from producer. Traffic stays on Google network.
⚠ Prerequisites
- Cloud VPN/Interconnect established (SOP 9)
- On-prem DNS server IPs known (
10.0.0.53,10.0.0.54) - Cloud DNS API enabled
👉 Portal Steps
onprem.inspireit.io.. VPC: inspireit-shared-vpc. Forward to: 10.0.0.53 (primary), 10.0.0.54 (backup).inspireit-dns-inbound. VPC: inspireit-shared-vpc. Allocate IPs: 10.0.1.100 and 10.0.1.101 from a subnet. These are the inbound forwarding endpoints.gcp.internal.inspireit.io.. VPC: inspireit-shared-vpc. Add A record: api.gcp.internal.inspireit.io → 172.16.1.100 (PSC endpoint or internal LB IP).gcp.internal.inspireit.io to 10.0.1.100 and 10.0.1.101 (the inbound endpoints).✅ Verification
From on-prem: nslookup api.gcp.internal.inspireit.io → 172.16.1.100. From GCP: nslookup db.onprem.inspireit.io → on-prem IP.
⚠ Interview: Forwarding zone (GCP→onprem) + inbound policy (onprem→GCP) = bidirectional DNS. Use DNS peering for cross-project resolution.
⚠ Prerequisites
container.adminpermissions- Compute + Container APIs enabled
- Shared VPC subnets for node IPs and pod IP ranges
👉 Create GKE Cluster
inspireit-dev-gke. Location: us-central1 (zonal). Node pool: e2-standard-4, size 3. Networking: select shared VPC.PROJECT.svc.id.goog (auto-filled).Cloud Logging + Cloud Monitoring
Node auto-upgrade + auto-repair
kubectl create ns prod
kubectl create deployment payments-api --image=nginx --replicas=3 -n prod
kubectl expose deployment payments-api --port=80 --name=payments-svc -n prod
kind: Ingress
metadata:
name: payments-ingress
namespace: prod
spec:
ingressClassName: gce
rules:
- host: api.inspireit.io
http:
paths:
- path: /payments
pathType: Prefix
backend:
service:
name: payments-svc
port:
number: 80
kubectl apply -f ingress.yaml✅ Verification
kubectl get ingress -n prod → external IP assigned (2-3 min). curl http://IP/payments → 200. kubectl get pods -n prod → 3/3 running.
⚠ Prerequisites
- GKE cluster running (SOP 12)
- Helm CLI:
helm version - Container image pushed to Artifact Registry
👉 CLI Steps
rm inspireit-payments/templates/*.yaml
templates/deployment.yaml:kind: Deployment
metadata:
name: {{ .Values.appName }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ .Values.appName }}
template:
metadata:
labels:
app: {{ .Values.appName }}
spec:
containers:
- name: {{ .Values.appName }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: {{ .Values.service.targetPort }}
values.yaml:replicaCount: 3
image:
repository: us-central1-docker.pkg.dev/inspireit-dev/platform/payments
tag: v1.0.0
service:
port: 80
targetPort: 8080
helm template ./inspireit-payments
helm install payments-release ./inspireit-payments -n prod
helm upgrade payments-release ./inspireit-payments -n prod
helm history payments-release -n prod
helm rollback payments-release 1 -n prod
✅ Verification
helm list -A → payments-release in prod namespace. kubectl get pods -n prod → running new image.
⚠ Interview: Helm 3 = 3-way strategic merge (live + last release + new spec). No Tiller. Rollbacks restore exact prior manifest.
⚠ Prerequisites
- GCS bucket:
inspireit-tfstate-prodwith Object Versioning enabled - SA with:
storage.objectAdminon bucket,compute.*,iam.*
👉 Setup
inspireit-tfstate-prod. Location: us-central1. Enable: Object versioning + Retention policy (30 days).backend.tf:backend "gcs" {
bucket = "inspireit-tfstate-prod"
prefix = "gke-cluster"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
terraform init
terraform plan -out=tfplan
terraform apply tfplan
👉 CI Pipeline (GitHub Actions)
.github/workflows/tf.yml:pull_request: paths: ['terraform/**']
push: branches: [main] paths: ['terraform/**']
jobs:
tf:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: 'projects/...'
service_account: 'tf-sa@inspireit-common.iam...'
- run: terraform init && terraform plan -out=tfplan
- run: terraform apply tfplan
if: github.ref == 'refs/heads/main'
✅ Verification
Cloud Storage / inspireit-tfstate-prod → gke-cluster/default.tfstate exists. Version history tab shows every apply.
⚠ Prerequisites
- GitHub repo + Azure DevOps org
- Artifact Registry repo created
- GKE cluster (SOP 12) + Helm chart (SOP 13)
👉 Azure DevOps Pipeline
azure-pipelines.yml:pool: ubuntu-latest
variables: projectId: inspireit-dev-platform-gke
stages:
- stage: Build
jobs:
- job: BuildAndPush
steps:
- task: Docker@2
inputs:
containerRegistry: gcp-wif
repository: us-central1-docker.pkg.dev/$(projectId)/platform/payments
tags: $(Build.BuildId)
- task: HelmDeploy@0
inputs: command: package chartPath: helm/inspireit-payments
- stage: Deploy
jobs:
- deployment: DeployToDev
environment: dev
steps:
- task: HelmDeploy@0
inputs:
command: upgrade
chartPath: '*.tgz'
releaseName: payments-release
namespace: prod
👉 GitHub Actions Equivalent
.github/workflows/deploy.yml:on: push branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
permissions: id-token: write
steps:
- uses: actions/checkout@v4
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: projects/...
service_account: ci-cd-sa@inspireit-common.iam...
- run: docker build -t us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA .
- run: docker push us-central1-docker.pkg.dev/.../payments:$GITHUB_SHA
- run: gcloud container clusters get-credentials $CLUSTER --region $REGION
- run: helm upgrade payments-release ./helm/inspireit-payments -n prod --set image.tag=$GITHUB_SHA
✅ Verification
Push commit to main → pipeline runs. Each stage turns green. Helm deploys to GKE with new image tag.
⚠ Interview: WIF for all CI tools — no static SA keys. Azure DevOps + GitHub Actions both support OIDC to GCP.
⚠ Prerequisites
- SonarCloud account (sonarcloud.io)
- SONAR_TOKEN generated in SonarCloud
👉 SonarQube Integration
sonar-project.properties:sonar.organization=inspireit
sonar.sources=src/
sonar.tests=tests/
sonar.coverage.exclusions=**/*.test.js
sonar.qualitygate.wait=true
uses: SonarSource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
👉 Checkov Integration
uses: bridgecrewio/checkov-action@v12
with:
directory: terraform/
framework: terraform
soft_fail: false
- name: Checkov K8s Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: helm/inspireit-payments/
framework: kubernetes
soft_fail: true
.checkov.yml:skip-check:
- CKV_GCP_6
- CKV_GCP_15
✅ Verification
Push a PR with low-coverage code and insecure Terraform. Pipeline fails at SonarQube (quality gate: coverage below 80%) and Checkov (public bucket). Fix both, re-push, pipeline passes.
⚠ Interview: SonarQube gates on new code (not legacy). Checkov scans Terraform, K8s, Helm, Dockerfiles. Use soft_fail: false for critical, true for advisory.
⚠ Prerequisites
- Compute Engine VMs running (Linux)
- SSH access from Ansible control node
pip install ansible+ansible-galaxy collection install google.cloud
👉 Steps
inventory.gcp.yml:projects:
- inspireit-dev-platform-gke
filters:
- "labels.env=dev"
hostnames:
- name
keyed_groups:
- key: labels.role
auth_kind: serviceaccount
service_account_file: /path/to/sa-key.json
inspireit-common-setup.yml:- name: Common VM setup for InspireIT
hosts: all
become: yes
vars:
app_user: inspireit
app_dir: /opt/inspireit
tasks:
- name: Install Docker
apt:
name: docker.io
state: present
update_cache: yes
- name: Create app user
user: name={{ app_user }} state=present groups=docker
- name: Start node exporter
systemd:
name: prometheus-node-exporter
state: started
enabled: yes
ansible-playbook -i inventory.gcp.yml inspireit-common-setup.yml
✅ Verification
SSH into VM: docker --version installed, systemctl status prometheus-node-exporter running, /opt/inspireit exists. Second run shows ok=4 changed=0 (idempotent).
⚠ Prerequisites
- Cloud Run service deployed:
payments-backend - OpenAPI spec file ready
- Permissions:
endpoints.*,serviceusage.*
👉 Steps
openapi.yaml:info:
title: "InspireIT Payments API"
version: "1.0.0"
host: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-endpoints:
- name: "payments-api.endpoints.PROJECT.cloud.goog"
x-google-backend:
address: "https://payments-backend-xyz-uc.a.run.app"
path_translation: APPEND_PATH_TO_ADDRESS
schemes: [https]
paths:
/payments:
get:
summary: List payments
responses: 200: description: OK
x-google-allow: all in the OpenAPI spec to enable key-based auth. Without the key, requests return 403.✅ Verification
curl -H "X-API-Key: AIza..." https://payments-api.endpoints.PROJECT.cloud.goog/payments → 200. Without key: curl https://payments-api.endpoints.PROJECT.cloud.goog/payments → 403.
⚠ Prerequisites
- Vertex AI API enabled
- GCS bucket:
inspireit-ml-artifacts - Permissions:
aiplatform.*
👉 Portal Steps
gs://inspireit-ml-artifacts/dataset.csv). Target column: fraud_flag. Click CREATE.fraud-detection-endpoint. Traffic: 100% new model. Machine: n1-standard-2. Min replicas: 1, Max: 5.{"instances": [{"amount": 250.0, "merchant": "online", "hour": 3}]}👉 MLOps Pipeline (Automated)
✅ Verification
Endpoint shows Active. CLI test: gcloud ai endpoints predict fraud-detection-endpoint --region=us-central1 --json-request=input.json
⚠ Prerequisites
- Google AI Studio account: aistudio.google.com
- Vertex AI API enabled for production deployment
👉 AI Studio (Prototype)
System instruction:
"You are a customer support agent for InspireIT, a B2B analytics platform. Help users with billing, API keys, and account setup."Safety: Keep defaults
"How do I generate an API key for InspireIT?" → Click Get Code → Choose cURL or Python to copy.👉 Vertex AI (Production)
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT/locations/us-central1/publishers/google/models/gemini-2.0-flash:generateContent \
-d '{"contents": [{"parts": [{"text": "How do I reset my password?"}]}]}'
✅ Verification
AI Studio: chat works in browser. Vertex: curl returns generated text with citations. Agent: grounded answers from enterprise docs only.
⚠ Interview: AI Studio = API key auth, rapid prototyping. Vertex AI = IAM auth, production. Differences: grounding in enterprise data, safety filters, model garden, VPC SC support.
⚠ Prerequisites
- Grafana instance (Grafana Cloud free tier or self-hosted on GKE:
helm install grafana grafana/grafana -n monitoring) - Data sources: Cloud Monitoring API + Prometheus
👉 Steps
roles/monitoring.viewer). Project: inspireit-prod-platform-gke. Click Save & Test → green.315 (Kubernetes cluster monitoring). Data source: Prometheus. Click Import. Panels appear: cluster CPU/memory, pod status, node health.rate(kube_pod_status_phase{phase="CrashLoopBackOff"}[5m]) > 0. Contact point: Slack (inspireit-alerts) + PagerDuty. Click SAVE.dashboards/k8s-cluster.json in your repo. Provision via Terraform or ConfigMap.✅ Verification
Dashboard panels loading with live data. Alert → Test → notification arrives in Slack channel. JSON export can be version-controlled.
⚠ Prerequisites
- SSH access: Compute Engine / VM / SSH
- Serial console access (when SSH is broken): VM / Serial console port 1
- Permissions:
compute.instances.getSerialPortOutput
👉 Diagnostic Sequence
No space left on device), startup script failures (cloud-init), SSH daemon not starting.df -h # disk space
free -h # RAM usage
systemctl status --failed # failed services
journalctl -u docker -n 50 # Docker logs
ss -tulpn # listening ports
ping -c 2 google.com # external connectivity
nslookup api.inspireit.io # DNS resolution
curl -v http://localhost:8080/health # app health
Logging / Logs Explorer → filter:
resource.type="gce_instance" + instance_id="YOUR_ID"Logging / Logs Explorer → filter:
resource.type="k8s_container" for GKE podskubectl describe pod $POD -n prod
kubectl logs $POD -n prod --tail=100 --previous
kubectl exec -it $POD -n prod -- /bin/sh
kubectl top pod -n prod
kubectl get nodes -o wide | grep -v Ready
Quick Reference Card
df -h; du -sh /var/log/*kubectl describe pod + kubectl logs --previousnslookup / dig + Cloud DNS forwarding zone checktop + ss -tulpn + Grafana dashboardkubectl describe node + GCE serial consolekubectl get svc -n prod✅ Verification
Follow sequence to identify root cause within 5-10 min. Work bottom-up: serial console → OS metrics → container logs → app health endpoint.