Splitting Large State

เมื่อ infrastructure ใหญ่ขึ้น state file ใหญ่ตาม → plan ช้า, blast radius กว้าง — แยก state เพื่อ scale

Symptoms ของ State ใหญ่เกิน

⏱️ terraform plan นานเกิน 2 นาที
💥 ผิดพลาด 1 ที่ → กระทบทั้ง infra
👥 ทีมหลายคน lock contention บ่อย
📂 State file > 5 MB
🔢 Resources ใน state > 200

→ ถึงเวลา split

Strategies (Recap จาก Section 10)

1. By Layer

infra/
├── network/        terraform.tfstate
├── data/           terraform.tfstate
├── compute/        terraform.tfstate
└── application/    terraform.tfstate

2. By Environment

infra/
├── dev/        terraform.tfstate
├── staging/    terraform.tfstate
└── prod/       terraform.tfstate

3. By Layer × Environment (Most Common)

infra/
├── dev/
│   ├── network/
│   ├── data/
│   └── compute/
├── staging/
│   ├── network/
│   ├── data/
│   └── compute/
└── prod/
    ├── network/
    ├── data/
    └── compute/

4. By Service

infra/
├── auth-service/
├── billing-service/
└── notification-service/

Cross-State Dependencies

ใช้ terraform_remote_state data source:

prod/compute/main.tf
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "my-tfstate"
    key    = "prod/network/terraform.tfstate"
    region = "ap-southeast-1"
  }
}

resource "aws_instance" "app" {
  subnet_id              = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
  vpc_security_group_ids = [data.terraform_remote_state.network.outputs.app_sg_id]
}

Migrate จาก Single → Split

Step 1: Plan Architecture

ดูว่าจะแยกอย่างไร — ตาม:

Change frequency (network = rarely, app = frequently)
Team ownership (platform team vs app team)
Blast radius (network failure = everyone, app failure = just app)

Step 2: สร้าง Folder ใหม่

mkdir -p prod/{network,data,compute,application}

Step 3: Copy Resources

ย้าย resource จาก single state → multiple states:

# Backup
terraform state pull > backup.tfstate

# Move resources to new state
cd network/
terraform init
terraform state mv \
  -state-out=../single/terraform.tfstate \
  -state=../single/terraform.tfstate \
  aws_vpc.main \
  aws_vpc.main

# Repeat for all network resources

Step 4: Update References

# ก่อน (single state)
resource "aws_instance" "web" {
  subnet_id = aws_subnet.public[0].id   # direct reference
}

# หลัง (split state)
data "terraform_remote_state" "network" { ... }

resource "aws_instance" "web" {
  subnet_id = data.terraform_remote_state.network.outputs.subnet_ids[0]
}

Step 5: Verify

cd network && terraform plan   # → 0 changes
cd ../compute && terraform plan # → 0 changes

Apply Order

cd prod/network && terraform apply
cd ../data && terraform apply
cd ../compute && terraform apply
cd ../application && terraform apply

Destroy Order (Reverse!)

cd prod/application && terraform destroy
cd ../compute && terraform destroy
cd ../data && terraform destroy
cd ../network && terraform destroy

ใช้ Terragrunt

Terragrunt = wrapper ที่จัด multi-state ให้:

prod/terragrunt.hcl
remote_state {
  backend = "s3"
  config = {
    bucket         = "my-tfstate"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "ap-southeast-1"
    dynamodb_table = "terraform-locks"
  }
}

prod/network/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/network"
}

inputs = {
  vpc_cidr = "10.0.0.0/16"
}

# Apply ทุก stack ใน prod ตามลำดับ dependency
cd prod && terragrunt run-all apply

ดูเพิ่มใน Terragrunt section

Pattern: Per-Account (Multi-Account)

แยก state ตาม AWS account:

infra/
├── shared-services/    # Account: 111111111111
│   ├── route53/
│   ├── cloudtrail/
│   └── iam/
├── dev/                # Account: 222222222222
│   ├── network/
│   └── compute/
├── staging/            # Account: 333333333333
│   ├── network/
│   └── compute/
└── prod/               # Account: 444444444444
    ├── network/
    └── compute/

→ Account isolation = ultimate blast radius limit

Pattern: Per-Region

infra/prod/
├── ap-southeast-1/    # Singapore
│   ├── network/
│   └── compute/
└── us-east-1/         # US (DR)
    ├── network/
    └── compute/

Anti-Patterns

❌ Too Many Splits

infra/
├── prod/
│   ├── vpc/
│   ├── subnet-1/      # ← too granular
│   ├── subnet-2/
│   ├── igw/
│   ├── nat-1/
│   └── ...

→ Maintenance nightmare — แยกตาม logical group ไม่ใช่ resource

❌ Circular Dependencies

network → reads compute outputs
compute → reads network outputs

→ Apply ไม่ได้ — refactor ให้ unidirectional

❌ Hardcoded Outputs

# ❌
data "terraform_remote_state" "network" {
  config = {
    key = "prod/network/terraform.tfstate"   # hardcoded env
  }
}

→ ใช้ variable:

data "terraform_remote_state" "network" {
  config = {
    key = "${var.environment}/network/terraform.tfstate"
  }
}

When NOT to Split

✅ Project ใหม่ — เริ่มเล็ก
✅ Resources < 50 — single state พอ
✅ ทีม 1-2 คน — coordination ง่าย

→ Premature splitting = over-engineering

ตัวอย่าง: Real-World Architecture

my-org-infra/
├── _bootstrap/                  # State backend setup (run once)
│   └── main.tf
├── _global/                     # Account-wide resources
│   └── iam/
├── prod/
│   ├── network/                 # VPC + subnets + NAT
│   ├── data/                    # RDS + ElastiCache + S3
│   ├── platform/                # EKS cluster + addons
│   ├── apps/
│   │   ├── auth-service/
│   │   ├── billing-service/
│   │   └── notification-service/
│   └── monitoring/              # Datadog + Grafana
├── staging/
│   └── ... (same as prod)
└── dev/
    └── ... (same as prod)

Best Practices

✅ DO:
- Split ตาม blast radius + change frequency
- ใช้ Terragrunt สำหรับ multi-stack management
- Document apply order ใน README
- Apply lower layer ก่อน (network → app)
- Cross-state ผ่าน terraform_remote_state outputs

❌ DON'T:
- ห้าม split ก่อนรู้ว่า scale จริง
- ห้ามมี circular dependencies
- ห้าม hardcode output keys
- ห้ามแยก resource ที่ tightly coupled

สรุป

Split state เมื่อ: plan ช้า, blast radius กว้าง, lock contention, > 200 resources
Strategies: by layer, by env, by service, by account, by region
ใช้ terraform_remote_state อ้างข้าม state
Apply lower layer ก่อน, destroy reverse
Terragrunt ช่วย manage multi-state

ต่อไป → Parallelism

Symptoms ของ State ใหญ่เกิน​

Strategies (Recap จาก Section 10)​

1. By Layer​

2. By Environment​

3. By Layer × Environment (Most Common)​

4. By Service​

Cross-State Dependencies​

Migrate จาก Single → Split​

Step 1: Plan Architecture​

Step 2: สร้าง Folder ใหม่​

Step 3: Copy Resources​

Step 4: Update References​

Step 5: Verify​

Apply Order​

Destroy Order (Reverse!)​

ใช้ Terragrunt​

Pattern: Per-Account (Multi-Account)​

Pattern: Per-Region​

Anti-Patterns​

❌ Too Many Splits​

❌ Circular Dependencies​

❌ Hardcoded Outputs​

When NOT to Split​

ตัวอย่าง: Real-World Architecture​

Best Practices​

สรุป​