Skip to main content

7 min read #terraform #gcp #infrastructure #devops #iac

Terraform cured me of clicking around the GCP console

The console is great for looking and terrible for remembering. Moving my GCP infrastructure into Terraform turned 'what did I change last Tuesday' into a diff I can read. Notes from doing it for real.

For a long time my relationship with Google Cloud was a series of clicks I couldn't reconstruct. Spin up a Cloud Run service here, grant a service account a role there, tweak a firewall rule because something couldn't reach something else. It all worked. And then three weeks later I'd stare at a permission error and have absolutely no idea what I'd changed, when, or why — because the only record was in my head, and my head had moved on.

Terraform fixed the actual problem, which was never "provisioning." It was memory. Infrastructure as code means the answer to "what does my project look like" is a file I can read, diff, and review, instead of an archaeology session in the console.

State is the whole game, so put it somewhere real

The first thing to understand about Terraform is that it keeps a state file — its record of what it believes exists. Lose it, corrupt it, or let two people edit it at once and you're in for a bad day. The default is a terraform.tfstate file on your laptop, which is exactly where it should never live. On GCP the answer is a GCS bucket with versioning on:

terraform {
  backend "gcs" {
    bucket = "my-project-tf-state"
    prefix = "prod"
  }
}

The GCS backend gives you remote state and locking for free — if a CI job is mid-apply, my local terraform apply waits instead of racing it. Versioning on the bucket means a corrupted state is a restore, not a catastrophe. And the state file never, ever goes in git: it contains resource IDs and sometimes secrets. This is rule zero.

plan is the feature, apply is the afterthought

The thing that actually changed how I work isn't apply — it's plan. Before Terraform touches anything it shows you the diff: what it will create, change, and (the line you read three times) destroy.

$ terraform plan

  # google_cloud_run_v2_service.api will be updated in-place
  ~ resource "google_cloud_run_v2_service" "api" {
      ~ template {
          ~ containers {
              ~ image = "gcr.io/.../api:v3" -> "gcr.io/.../api:v4"
            }
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

"0 to destroy" is the most reassuring line in cloud computing. Half my Terraform mistakes were caught by reading a plan and going "wait, why is it deleting my database" — a typo in a resource name that would've been an irreversible click in the console became a diff I rejected before it ran. The plan is the code review for your infrastructure.

Codify the boring, dangerous things first

It's tempting to start Terraforming the exciting resources. Don't. The things that benefit most from being in code are the ones that are tedious to set up and catastrophic to get wrong: IAM bindings and service accounts. Click-ops IAM is how projects accumulate a service account with roles/owner that nobody remembers granting.

resource "google_service_account" "api" {
  account_id   = "api-runtime"
  display_name = "API runtime identity"
}

resource "google_project_iam_member" "api_firestore" {
  project = var.project_id
  role    = "roles/datastore.user"
  member  = "serviceAccount:${google_service_account.api.email}"
}

Now least-privilege is reviewable. Anyone can read exactly what the API can touch, and broadening it is a pull request with a diff, not a quiet click that no audit will ever surface.

You don't have to recreate what already exists

The objection I had for ages: "all my stuff already exists in the console, am I supposed to delete it and rebuild through Terraform?" No. terraform import (and the newer import blocks) adopt an existing resource into state without recreating it. You write the resource definition to match what's already there, import it, run a plan, and tune the config until the plan says "no changes." Then it's managed, and you never touched production.

import {
  to = google_cloud_run_v2_service.api
  id = "projects/my-project/locations/europe-west1/services/api"
}

It's fiddly the first few times — you'll get attribute mismatches and have to fill in fields — but it lets you migrate a live project into Terraform incrementally, one resource at a time, instead of as a scary big-bang rewrite.

The one rule that keeps it from rotting

Here's the discipline that makes or breaks Terraform on a real project: once a resource is in Terraform, you change it through Terraform. The moment someone "just quickly" edits a Terraform-managed Cloud Run service in the console, state and reality drift, and the next plan wants to "fix" the manual change by reverting it. Now Terraform is fighting the console, and people learn to distrust the plan — which kills the whole value.

One change path, not two. The console becomes a read-only window: great for looking at metrics, logs, and what's running. Every actual change is a diff someone can review. That's the trade — you give up the quick click, and in return you get an infrastructure you can actually remember, reason about, and hand to someone else without a two-hour "let me show you where everything is" call.