Published
- 3 min read
Data Engineering Zoomcamp, Week 2: Terraform for GCP Infra
Data Engineering Zoomcamp, Week 2
Project infrastructure modules in GCP:
- Google Cloud Storage (GCS): Data Lake
- BigQuery: Data Warehouse
Initial Setup
For this course, we’ll use a free version (upto EUR 300 credits).
- Create an account with your Google email ID
- Setup your first project if you haven’t already
- eg. “DTC DE Course”, and note down the “Project ID” (we’ll use this later when deploying infra with TF)
- Setup service account & authentication for this project
- Grant Viewer role to begin with.
- Download service-account-keys (.json) for auth.
- Download SDK for local setup
- Set environment variable to point to your downloaded GCP keys:
export GOOGLE_APPLICATION_CREDENTIALS = "<path/to/your/service-account-authkeys>.json"
export GOOGLE_APPLICATION_CREDENTIALS = ".../keys/gcp-cred.json"
# Refresh token / session, and verify authentication
gcloud auth application -default login
Setup for Access
- IAM Roles for Service account:
- Go to the IAM section of IAM & Admin https://console.cloud.google.com/iam-admin/iam
- Click the Edit principal icon for your service account.
- Add these roles in addition to Viewer : Storage Admin + Storage Object Admin + BigQuery Admin
- Enable these APIs for your project:
- Please ensure GOOGLE_APPLICATION_CREDENTIALS env-var is set.
export GOOGLE_APPLICATION_CREDENTIALS = "<path/to/your/service-account-authkeys>.json"
Terraform Overview
Terraform is an open-source infrastructure-as-code tool developed by HashiCorp. It allows users to define and provision data center infrastructure using a declarative configuration language called HashiCorp Configuration Language (HCL).
- What is Terraform?
- open-source tool by HashiCorp, used for provisioning infrastructure resources
- supports DevOps best practices for change management
- Managing configuration files in source control to maintain an ideal provisioning state for testing and production environments
- What is IaC?
- Infrastructure-as-Code
- build, change, and manage your infrastructure in a safe, consistent, and repeatable way by defining resource configurations that you can version, reuse, and share.
- Some advantages
- Infrastructure lifecycle management
- Version control commits
- Very useful for stack-based deployments, and with cloud providers such as AWS, GCP, Azure, K8S…
- State-based approach to track resource changes throughout deployments
Declarations
- terraform: configure basic Terraform settings to provision your infrastructure
- required_version: minimum Terraform version to apply to your configuration
- backend: stores Terraform’s “state” snapshots, to map real-world resources to your configuration.
- local: stores state file locally as terraform.tfstate
- required_providers: specifies the providers required by the current module
- provider:
- adds a set of resource types and/or data sources that Terraform can manage
- The Terraform Registry is the main directory of publicly available providers from most major infrastructure platforms.
- resource
- blocks to define components of your infrastructure
- Project modules/resources: google_storage_bucket, google_bigquery_dataset, google_bigquery_table
- variable & locals
- runtime arguments and constants
Terraform sample
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.51.0"
}
}
}
provider "google" {
# Credentials only needs to be set if you do not have the GOOGLE_APPLICATION_CREDENTIALS set
# credentials = "../../keys/gcp-cred.json"
project = "terrform-demo-412419"
region = "us-central1"
zone = "us-central1-c"
}
resource "google_storage_bucket" "terra-demo" {
name = "terrform-demo-412419-bucket"
location = "US"
force_destroy = true
lifecycle_rule {
condition {
age = 1
}
action {
type = "AbortIncompleteMultipartUpload"
}
}
}
Variables in Terraform
resource "google_bigquery_dataset" "demo_dataset" {
dataset_id = "ny_taxi_dataset"
}
variable "credentials" {
description = "My Credentials"
default = "../../keys/gcp-cred.json"
}
variable "project" {
description = "Project"
default = "terrform-demo-412419"
}
variable "region" {
description = "Region"
default = "us-central1"
}
variable "location" {
description = "Project Location"
default = "US"
}
variable "bq_dataset_name" {
description = "My BigQuery Dataset Name"
default = "ny_taxi_dataset"
}
variable "gcs_bucket_name" {
description = "My Storage Bucket Name"
default = "terraform-ny-taxi-bucket"
}
variable "gcs_storage_class" {
description = "Bucket Storage Class"
default = "STANDARD"
}
Execution steps
# Refresh service-account’s auth-token for this session
gcloud auth application-default login
- terraform init:
- Initializes & configures the backend, installs plugins/providers, & checks out an existing configuration from a version control
- terraform plan:
- Matches/previews local changes against a remote state, and proposes an Execution Plan.
- terraform apply:
- Asks for approval to the proposed plan, and applies changes to cloud
- terraform destroy
- Removes your stack from the Cloud