Splitting Large State
เมื่อ infrastructure ใหญ่ขึ้น state file ใหญ่ตาม → plan ช้า, blast radius กว้าง — แยก state เพื่อ scale
Symptoms ของ State ใหญ่เกิน
- ⏱️
terraform planนานเกิน 2 นาที - 💥 ผิดพลาด 1 ที่ → กระทบทั้ง infra
- 👥 ทีมหลายคน lock contention บ่อย
- 📂 State file > 5 MB
- 🔢 Resources ใน state > 200
→ ถึงเวลา split
Strategies (Recap จาก Section 10)
1. By Layer
infra/
├── network/ terraform.tfstate
├── data/ terraform.tfstate
├── compute/ terraform.tfstate
└── application/ terraform.tfstate
2. By Environment
infra/
├── dev/ terraform.tfstate
├── staging/ terraform.tfstate
└── prod/ terraform.tfstate
3. By Layer × Environment (Most Common)
infra/
├── dev/
│ ├── network/
│ ├── data/
│ └── compute/
├── staging/
│ ├── network/
│ ├── data/
│ └── compute/
└── prod/
├── network/
├── data/
└── compute/
4. By Service
infra/
├── auth-service/
├── billing-service/
└── notification-service/
Cross-State Dependencies
ใช้ terraform_remote_state data source:
prod/compute/main.tf
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "my-tfstate"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
vpc_security_group_ids = [data.terraform_remote_state.network.outputs.app_sg_id]
}
Migrate จาก Single → Split
Step 1: Plan Architecture
ดูว่าจะแยกอย่างไร — ตาม:
- Change frequency (network = rarely, app = frequently)
- Team ownership (platform team vs app team)
- Blast radius (network failure = everyone, app failure = just app)
Step 2: สร้าง Folder ใหม่
mkdir -p prod/{network,data,compute,application}
Step 3: Copy Resources
ย้าย resource จาก single state → multiple states:
# Backup
terraform state pull > backup.tfstate
# Move resources to new state
cd network/
terraform init
terraform state mv \
-state-out=../single/terraform.tfstate \
-state=../single/terraform.tfstate \
aws_vpc.main \
aws_vpc.main
# Repeat for all network resources
Step 4: Update References
# ก่อน (single state)
resource "aws_instance" "web" {
subnet_id = aws_subnet.public[0].id # direct reference
}
# หลัง (split state)
data "terraform_remote_state" "network" { ... }
resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.network.outputs.subnet_ids[0]
}
Step 5: Verify
cd network && terraform plan # → 0 changes
cd ../compute && terraform plan # → 0 changes
Apply Order
cd prod/network && terraform apply
cd ../data && terraform apply
cd ../compute && terraform apply
cd ../application && terraform apply
Destroy Order (Reverse!)
cd prod/application && terraform destroy
cd ../compute && terraform destroy
cd ../data && terraform destroy
cd ../network && terraform destroy
ใช้ Terragrunt
Terragrunt = wrapper ที่จัด multi-state ให้:
prod/terragrunt.hcl
remote_state {
backend = "s3"
config = {
bucket = "my-tfstate"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "terraform-locks"
}
}
prod/network/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../../modules/network"
}
inputs = {
vpc_cidr = "10.0.0.0/16"
}
# Apply ทุก stack ใน prod ตามลำดับ dependency
cd prod && terragrunt run-all apply
ดูเพิ่มใน Terragrunt section
Pattern: Per-Account (Multi-Account)
แยก state ตาม AWS account:
infra/
├── shared-services/ # Account: 111111111111
│ ├── route53/
│ ├── cloudtrail/
│ └── iam/
├── dev/ # Account: 222222222222
│ ├── network/
│ └── compute/
├── staging/ # Account: 333333333333
│ ├── network/
│ └── compute/
└── prod/ # Account: 444444444444
├── network/
└── compute/
→ Account isolation = ultimate blast radius limit
Pattern: Per-Region
infra/prod/
├── ap-southeast-1/ # Singapore
│ ├── network/
│ └── compute/
└── us-east-1/ # US (DR)
├── network/
└── compute/
Anti-Patterns
❌ Too Many Splits
infra/
├── prod/
│ ├── vpc/
│ ├── subnet-1/ # ← too granular
│ ├── subnet-2/
│ ├── igw/
│ ├── nat-1/
│ └── ...
→ Maintenance nightmare — แยกตาม logical group ไม่ใช่ resource
❌ Circular Dependencies
network → reads compute outputs
compute → reads network outputs
→ Apply ไม่ได้ — refactor ให้ unidirectional
❌ Hardcoded Outputs
# ❌
data "terraform_remote_state" "network" {
config = {
key = "prod/network/terraform.tfstate" # hardcoded env
}
}
→ ใช้ variable:
data "terraform_remote_state" "network" {
config = {
key = "${var.environment}/network/terraform.tfstate"
}
}
When NOT to Split
- ✅ Project ใหม่ — เริ่มเล็ก
- ✅ Resources < 50 — single state พอ
- ✅ ทีม 1-2 คน — coordination ง่าย
→ Premature splitting = over-engineering
ตัวอย่าง: Real-World Architecture
my-org-infra/
├── _bootstrap/ # State backend setup (run once)
│ └── main.tf
├── _global/ # Account-wide resources
│ └── iam/
├── prod/
│ ├── network/ # VPC + subnets + NAT
│ ├── data/ # RDS + ElastiCache + S3
│ ├── platform/ # EKS cluster + addons
│ ├── apps/
│ │ ├── auth-service/
│ │ ├── billing-service/
│ │ └── notification-service/
│ └── monitoring/ # Datadog + Grafana
├── staging/
│ └── ... (same as prod)
└── dev/
└── ... (same as prod)
Best Practices
✅ DO:
- Split ตาม blast radius + change frequency
- ใช้ Terragrunt สำหรับ multi-stack management
- Document apply order ใน README
- Apply lower layer ก่อน (network → app)
- Cross-state ผ่าน terraform_remote_state outputs
❌ DON'T:
- ห้าม split ก่อนรู้ว่า scale จริง
- ห้ามมี circular dependencies
- ห้าม hardcode output keys
- ห้ามแยก resource ที่ tightly coupled
สรุป
- Split state เมื่อ: plan ช้า, blast radius กว้าง, lock contention, > 200 resources
- Strategies: by layer, by env, by service, by account, by region
- ใช้ terraform_remote_state อ้างข้าม state
- Apply lower layer ก่อน, destroy reverse
- Terragrunt ช่วย manage multi-state
ต่อไป → Parallelism