Best Practices for State
สรุป best practices การจัดการ Terraform state — สำหรับ production-grade setup
Checklist สำหรับ Production State
🏗️ Setup
- ใช้ remote backend (S3/GCS/Azure/TFC) — ไม่ใช่ local
- Encryption at rest เปิดด้วย KMS (S3) หรือ default (GCS/Azure)
- State locking เปิด — DynamoDB สำหรับ S3
- Versioning เปิด — กู้คืน state ที่ผิดได้
- Lifecycle policy จัดการ version เก่า (90 วัน)
🔒 Security
- State bucket private — block public access
- IAM policy จำกัดเฉพาะ role ที่ apply ได้
- TLS-only access — bucket policy บังคับ HTTPS
- MFA Delete เปิด (extra protection)
- Audit logging — CloudTrail / Cloud Audit Logs
📦 Organization
- Split state ตาม layer (network/data/compute/app)
- Naming convention ของ state key (
<env>/<component>/terraform.tfstate) - Cross-state references ผ่าน
terraform_remote_state - No circular dependencies ระหว่าง state
🔁 Workflow
- Apply ตามลำดับ — network ก่อน, app หลัง
- Backup ก่อน risky ops —
terraform state pull > backup.tfstate - Document layer dependencies ใน README
- Plan + Apply approach —
plan -out=tfplan
🔐 Secrets
- ใช้ external secret manager (Vault, Secrets Manager) — ไม่ hard-code
- Sensitive marker สำหรับ variable/output ที่มี secret
- Rotate secret regularly (30 วัน)
- Don't commit state file หรือ
.tfvarsที่มี secret
Anti-Patterns to Avoid
❌ Single State for Everything
all-infra/
└── terraform.tfstate # 1000+ resources
→ แก้: Split by layer + environment
❌ Local State in Production
terraform {
# ไม่มี backend block
}
→ แก้: ใช้ remote backend ตั้งแต่ project แรก
❌ Hard-coded Backend Config
backend "s3" {
bucket = "company-prod-tfstate" # commit ลง Git
key = "terraform.tfstate"
region = "ap-southeast-1"
}
→ แก้: ใช้ partial config + terraform init -backend-config=...
❌ Editing State File ตรงๆ
vi terraform.tfstate # ❌
→ แก้: ใช้ terraform state mv/rm/show
❌ Sharing State via Email/Slack
→ แก้: ใช้ remote backend ที่ทุกคนเข้าถึงผ่าน IAM
❌ ลืม Lock
terraform apply -lock=false # ❌ in production
→ แก้: ตั้ง DynamoDB lock + ใช้ default
❌ State มี Hard-coded Password
resource "aws_db_instance" "main" {
password = "supersecret123" # ❌
}
→ แก้: ใช้ Secrets Manager
Pattern: Mature Project Structure
my-org-infra/
├── README.md
├── _bootstrap/ # State infrastructure
│ ├── main.tf # S3 + DynamoDB + KMS
│ └── README.md
│
├── _global/ # Account-wide resources
│ ├── iam/
│ ├── route53/
│ └── cloudtrail/
│
├── prod/
│ ├── network/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── backend.hcl # backend partial config
│ │ └── prod.tfvars
│ ├── data/
│ ├── compute/
│ └── apps/
│
├── staging/
│ └── ...
│
├── dev/
│ └── ...
│
└── modules/ # Reusable modules
├── vpc/
├── eks/
└── rds/
Pattern: Backend Config Partial
prod/network/backend.tf
terraform {
backend "s3" {} # empty — ส่ง config ทาง CLI
}
prod/network/backend.hcl
bucket = "my-org-tfstate"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "terraform-locks"
encrypt = true
kms_key_id = "alias/tfstate"
cd prod/network
terraform init -backend-config=backend.hcl
Pattern: Cross-State Reference
prod/compute/main.tf
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "my-org-tfstate"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
}
}
resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
vpc_security_group_ids = [data.terraform_remote_state.network.outputs.web_sg_id]
}
Pattern: State Backup Script
scripts/backup-state.sh
#!/bin/bash
set -euo pipefail
ENV=${1:-prod}
LAYER=${2:-network}
DATE=$(date +%Y%m%d-%H%M%S)
cd "${ENV}/${LAYER}"
terraform state pull > "../../backups/${ENV}-${LAYER}-${DATE}.tfstate"
echo "Backed up state to: backups/${ENV}-${LAYER}-${DATE}.tfstate"
./scripts/backup-state.sh prod network
Pattern: Disaster Recovery
State Loss Recovery
- Don't panic — state ใน S3 มี version
- List versions:
aws s3api list-object-versions --bucket my-tfstate --prefix prod/ - Restore version ก่อนเหตุการณ์:
aws s3api copy-object \
--copy-source "my-tfstate/prod/terraform.tfstate?versionId=abc123" \
--bucket my-tfstate \
--key prod/terraform.tfstate - Verify with
terraform plan(check no surprise changes)
Total Backend Loss
ถ้าทั้ง bucket หาย:
- มี cross-region replica → ใช้ replica
- ไม่มี → import resources กลับ (Section: Import)
Monitoring State Health
CloudWatch Alarms
- Bucket size growth (resource creep)
- Bucket access from unusual IPs
- Lock table errors
Manual Health Check
# Run weekly
terraform plan
# หาก plan แสดง diff ที่ไม่คาด → drift!
Best Practices Summary
🥇 Top 10 Rules:
1. ใช้ remote backend ตั้งแต่ project แรก
2. Encrypt state (KMS for S3)
3. Lock เสมอ (DynamoDB)
4. Versioning + lifecycle policy
5. IAM ที่จำกัด
6. Split state ตาม blast radius
7. Document layer dependencies
8. Plan-out + apply-from-plan workflow
9. Backup ก่อน risky ops
10. Monitor + alert on anomalies
ตัวอย่าง: Production-Ready Setup ครบ
_bootstrap/main.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "ap-southeast-1"
default_tags {
tags = {
ManagedBy = "terraform"
Purpose = "tfstate-bootstrap"
Critical = "true"
}
}
}
# KMS Key
resource "aws_kms_key" "tfstate" {
description = "Terraform state encryption"
enable_key_rotation = true
deletion_window_in_days = 30
}
resource "aws_kms_alias" "tfstate" {
name = "alias/tfstate"
target_key_id = aws_kms_key.tfstate.key_id
}
# S3 Bucket
resource "aws_s3_bucket" "tfstate" {
bucket = "my-org-tfstate"
}
resource "aws_s3_bucket_public_access_block" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_versioning" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.tfstate.arn
}
}
}
resource "aws_s3_bucket_lifecycle_configuration" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
rule {
id = "manage-old-versions"
status = "Enabled"
filter {}
noncurrent_version_expiration {
noncurrent_days = 90
}
noncurrent_version_transition {
noncurrent_days = 30
storage_class = "STANDARD_IA"
}
}
}
resource "aws_s3_bucket_policy" "tfstate" {
bucket = aws_s3_bucket.tfstate.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "DenyInsecureTransport"
Effect = "Deny"
Principal = "*"
Action = "s3:*"
Resource = [
aws_s3_bucket.tfstate.arn,
"${aws_s3_bucket.tfstate.arn}/*"
]
Condition = {
Bool = { "aws:SecureTransport" = "false" }
}
}]
})
}
# DynamoDB
resource "aws_dynamodb_table" "tfstate_lock" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
point_in_time_recovery {
enabled = true
}
server_side_encryption {
enabled = true
}
}
สรุป
- State management = หัวใจของ production Terraform
- ทำตาม checklist ใน section นี้ = production-ready
- Anti-pattern หลีกเลี่ยง: local state, single state, hard-code credential
- Pattern ที่ดี: split + remote + encrypted + locked + versioned
ต่อไป → Section 11: State Commands