Skip to main content

Best Practices for State

สรุป best practices การจัดการ Terraform state — สำหรับ production-grade setup

Checklist สำหรับ Production State

🏗️ Setup

  • ใช้ remote backend (S3/GCS/Azure/TFC) — ไม่ใช่ local
  • Encryption at rest เปิดด้วย KMS (S3) หรือ default (GCS/Azure)
  • State locking เปิด — DynamoDB สำหรับ S3
  • Versioning เปิด — กู้คืน state ที่ผิดได้
  • Lifecycle policy จัดการ version เก่า (90 วัน)

🔒 Security

  • State bucket private — block public access
  • IAM policy จำกัดเฉพาะ role ที่ apply ได้
  • TLS-only access — bucket policy บังคับ HTTPS
  • MFA Delete เปิด (extra protection)
  • Audit logging — CloudTrail / Cloud Audit Logs

📦 Organization

  • Split state ตาม layer (network/data/compute/app)
  • Naming convention ของ state key (<env>/<component>/terraform.tfstate)
  • Cross-state references ผ่าน terraform_remote_state
  • No circular dependencies ระหว่าง state

🔁 Workflow

  • Apply ตามลำดับ — network ก่อน, app หลัง
  • Backup ก่อน risky opsterraform state pull > backup.tfstate
  • Document layer dependencies ใน README
  • Plan + Apply approachplan -out=tfplan

🔐 Secrets

  • ใช้ external secret manager (Vault, Secrets Manager) — ไม่ hard-code
  • Sensitive marker สำหรับ variable/output ที่มี secret
  • Rotate secret regularly (30 วัน)
  • Don't commit state file หรือ .tfvars ที่มี secret

Anti-Patterns to Avoid

❌ Single State for Everything

all-infra/
└── terraform.tfstate # 1000+ resources

แก้: Split by layer + environment

❌ Local State in Production

terraform {
# ไม่มี backend block
}

แก้: ใช้ remote backend ตั้งแต่ project แรก

❌ Hard-coded Backend Config

backend "s3" {
bucket = "company-prod-tfstate" # commit ลง Git
key = "terraform.tfstate"
region = "ap-southeast-1"
}

แก้: ใช้ partial config + terraform init -backend-config=...

❌ Editing State File ตรงๆ

vi terraform.tfstate    # ❌

แก้: ใช้ terraform state mv/rm/show

❌ Sharing State via Email/Slack

แก้: ใช้ remote backend ที่ทุกคนเข้าถึงผ่าน IAM

❌ ลืม Lock

terraform apply -lock=false   # ❌ in production

แก้: ตั้ง DynamoDB lock + ใช้ default

❌ State มี Hard-coded Password

resource "aws_db_instance" "main" {
password = "supersecret123" # ❌
}

แก้: ใช้ Secrets Manager

Pattern: Mature Project Structure

my-org-infra/
├── README.md
├── _bootstrap/ # State infrastructure
│ ├── main.tf # S3 + DynamoDB + KMS
│ └── README.md

├── _global/ # Account-wide resources
│ ├── iam/
│ ├── route53/
│ └── cloudtrail/

├── prod/
│ ├── network/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.hcl # backend partial config
│ │ └── prod.tfvars
│ ├── data/
│ ├── compute/
│ └── apps/

├── staging/
│ └── ...

├── dev/
│ └── ...

└── modules/ # Reusable modules
├── vpc/
├── eks/
└── rds/

Pattern: Backend Config Partial

prod/network/backend.tf
terraform {
backend "s3" {} # empty — ส่ง config ทาง CLI
}
prod/network/backend.hcl
bucket         = "my-org-tfstate"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "terraform-locks"
encrypt = true
kms_key_id = "alias/tfstate"
cd prod/network
terraform init -backend-config=backend.hcl

Pattern: Cross-State Reference

prod/compute/main.tf
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "my-org-tfstate"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
}
}

resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
vpc_security_group_ids = [data.terraform_remote_state.network.outputs.web_sg_id]
}

Pattern: State Backup Script

scripts/backup-state.sh
#!/bin/bash
set -euo pipefail

ENV=${1:-prod}
LAYER=${2:-network}
DATE=$(date +%Y%m%d-%H%M%S)

cd "${ENV}/${LAYER}"

terraform state pull > "../../backups/${ENV}-${LAYER}-${DATE}.tfstate"

echo "Backed up state to: backups/${ENV}-${LAYER}-${DATE}.tfstate"
./scripts/backup-state.sh prod network

Pattern: Disaster Recovery

State Loss Recovery

  1. Don't panic — state ใน S3 มี version
  2. List versions:
    aws s3api list-object-versions --bucket my-tfstate --prefix prod/
  3. Restore version ก่อนเหตุการณ์:
    aws s3api copy-object \
    --copy-source "my-tfstate/prod/terraform.tfstate?versionId=abc123" \
    --bucket my-tfstate \
    --key prod/terraform.tfstate
  4. Verify with terraform plan (check no surprise changes)

Total Backend Loss

ถ้าทั้ง bucket หาย:

  1. มี cross-region replica → ใช้ replica
  2. ไม่มี → import resources กลับ (Section: Import)

Monitoring State Health

CloudWatch Alarms

  • Bucket size growth (resource creep)
  • Bucket access from unusual IPs
  • Lock table errors

Manual Health Check

# Run weekly
terraform plan
# หาก plan แสดง diff ที่ไม่คาด → drift!

Best Practices Summary

🥇 Top 10 Rules:

1. ใช้ remote backend ตั้งแต่ project แรก
2. Encrypt state (KMS for S3)
3. Lock เสมอ (DynamoDB)
4. Versioning + lifecycle policy
5. IAM ที่จำกัด
6. Split state ตาม blast radius
7. Document layer dependencies
8. Plan-out + apply-from-plan workflow
9. Backup ก่อน risky ops
10. Monitor + alert on anomalies

ตัวอย่าง: Production-Ready Setup ครบ

_bootstrap/main.tf
terraform {
required_version = ">= 1.6.0"

required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}

provider "aws" {
region = "ap-southeast-1"

default_tags {
tags = {
ManagedBy = "terraform"
Purpose = "tfstate-bootstrap"
Critical = "true"
}
}
}

# KMS Key
resource "aws_kms_key" "tfstate" {
description = "Terraform state encryption"
enable_key_rotation = true
deletion_window_in_days = 30
}

resource "aws_kms_alias" "tfstate" {
name = "alias/tfstate"
target_key_id = aws_kms_key.tfstate.key_id
}

# S3 Bucket
resource "aws_s3_bucket" "tfstate" {
bucket = "my-org-tfstate"
}

resource "aws_s3_bucket_public_access_block" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}

resource "aws_s3_bucket_versioning" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
versioning_configuration {
status = "Enabled"
}
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.tfstate.arn
}
}
}

resource "aws_s3_bucket_lifecycle_configuration" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
rule {
id = "manage-old-versions"
status = "Enabled"
filter {}
noncurrent_version_expiration {
noncurrent_days = 90
}
noncurrent_version_transition {
noncurrent_days = 30
storage_class = "STANDARD_IA"
}
}
}

resource "aws_s3_bucket_policy" "tfstate" {
bucket = aws_s3_bucket.tfstate.id

policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "DenyInsecureTransport"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.tfstate.arn,
"${aws_s3_bucket.tfstate.arn}/*"
]
Condition = {
Bool = { "aws:SecureTransport" = "false" }
}
}]
})
}

# DynamoDB
resource "aws_dynamodb_table" "tfstate_lock" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"

attribute {
name = "LockID"
type = "S"
}

point_in_time_recovery {
enabled = true
}

server_side_encryption {
enabled = true
}
}

สรุป

  • State management = หัวใจของ production Terraform
  • ทำตาม checklist ใน section นี้ = production-ready
  • Anti-pattern หลีกเลี่ยง: local state, single state, hard-code credential
  • Pattern ที่ดี: split + remote + encrypted + locked + versioned

ต่อไป → Section 11: State Commands