Devops Interviews
Posts
Terraform Interview Questions [Senior level - S2E3]

Terraform Interview Questions [Senior level - S2E3]

December 21, 2024

Here are the 10 senior-level, scenario-based Terraform interview questions with detailed answers to help understand how to tackle them in real-world scenarios:

1. Handling State Locking Issues

Scenario:
You are working in a collaborative environment, and your Terraform apply operation fails due to a state lock. How would you resolve this issue without risking the integrity of the state file?

Answer:

Root Cause: State locking occurs when one operation is modifying the state, and another operation tries to access it simultaneously. This typically happens when using a remote backend like S3 with DynamoDB for locking.
Steps to Resolve:
1. Identify the locking reason using terraform show -json or reviewing the DynamoDB lock table.
2. If you’re sure no other process is actively modifying the state, you can manually unlock it:
```
terraform force-unlock <LOCK_ID>
```
3. Coordinate with the team to ensure no parallel operations are happening.
Prevention:
- Enable proper state locking using a backend like S3 + DynamoDB or Terraform Cloud.
- Educate team members to avoid running concurrent commands on the same workspace.

2. Terraform Module Design for Reusability

Scenario:
Your organization manages resources in multiple AWS accounts and regions. How would you design a reusable Terraform module to deploy a highly available EC2-based application with scaling, monitoring, and logging enabled?

Answer:

Structure the Module:
- Create a main.tf, variables.tf, and outputs.tf for the module.
- Parameterize variables like instance type, region, account ID, scaling configurations, and logging settings.
- Example input variables:
```
variable "instance_type" {
  default = "t3.medium"
}
variable "region" {}
variable "logging_bucket" {}
```
Enable Features Based on Inputs:
- Use conditional logic for optional features like scaling and logging.
```
resource "aws_autoscaling_group" "asg" {
  count = var.enable_scaling ? 1 : 0
}
```

Module Usage Example:

module "ec2_app" {
  source           = "./modules/ec2_app"
  instance_type    = "t3.large"
  region           = "us-east-1"
  logging_bucket   = "my-logs"
}

Version Control:
- Use a versioning strategy (e.g., semantic versioning) for publishing modules to a central Git or Terraform registry.

3. Dependency Management in Complex Environments

Scenario:
You are provisioning resources where the creation of an RDS instance must wait for a VPC, subnets, and security groups to be fully created.
How do you ensure Terraform respects resource dependencies when there is no direct reference between resources?

Answer:

Use Implicit Dependencies: Terraform automatically understands dependencies if outputs from one resource are inputs for another.

resource "aws_rds_instance" "db" {
  vpc_security_group_ids = [aws_security_group.db_sg.id]
  subnet_group_name      = aws_db_subnet_group.default.name
}

Use Explicit Dependencies: If there’s no direct relationship, use the depends_on argument.

resource "aws_rds_instance" "db" {
  depends_on = [
    aws_security_group.db_sg,
    aws_db_subnet_group.default
  ]
}

Output Dependencies: Use outputs from one module as inputs to another.

module "vpc" {
  source = "./vpc"
}

module "rds" {
  source          = "./rds"
  vpc_id          = module.vpc.vpc_id
  security_groups = module.vpc.security_groups
}

4. Multi-Environment Management

Scenario:
Your team manages dev, staging, and prod environments, each requiring slightly different configurations.
How would you structure your Terraform codebase to support multiple environments while avoiding duplication?

Answer:

Use Workspaces:
Workspaces allow you to use a single codebase for multiple environments.
```
terraform workspace new dev
terraform workspace new prod
```

Environment-Specific Variables:

variable "instance_type" {
  default = "t3.micro"
}

terraform.workspace == "prod" ? "t3.large" : "t3.micro"

Directory Structure:
Alternatively, use a folder-based structure:

├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   ├── prod/

5. Handling Drift in Managed Resources

Scenario:
A team member made manual changes to an AWS resource managed by Terraform. During the next terraform plan, you notice drift.
How would you handle the drift?

Answer:

Identify the Drift: Run terraform plan to see the difference.
Decide Action:
- Revert manual changes by applying Terraform state (terraform apply).
- Update Terraform code to match the manual change.
Long-Term Solution:
- Use tools like AWS Config or driftctl to monitor drift.
- Educate the team on the importance of IaC to avoid manual changes.

6. Migrating Remote State Backends

Scenario:
Your company wants to migrate the Terraform remote state backend from an S3 bucket to Terraform Cloud. How would you safely perform the migration?

Answer:

Backup the current state file:

aws s3 cp s3://<bucket>/terraform.tfstate ./backup.tfstate

Update the backend block in Terraform:

terraform {
  backend "remote" {
    organization = "my-org"
    workspaces {
      name = "my-workspace"
    }
  }
}

Migrate the state using terraform init:
```
terraform init -migrate-state
```

7. Debugging Terraform Apply Failures

Scenario:
Your terraform apply fails due to a resource configuration error. How would you debug and resolve it?

Answer:

Use Terraform Logs:
Enable debug logs:
```
TF_LOG=DEBUG terraform apply
```
Validate Configuration: Run terraform validate to catch configuration errors.
Isolate Issues:
Use terraform plan to identify which resource has issues.
Test in Isolation:
Apply only the failing resource:
```
terraform apply -target=<resource_name>
```

8. Cost Optimization with Terraform

Scenario:
Your team has been asked to identify cost-saving opportunities in infrastructure managed by Terraform. How would you approach this?

Answer:

Use the terraform state command to audit existing resources.

Add cost-aware policies (e.g., use t3.micro for dev):

variable "instance_type" {
  default = terraform.workspace == "prod" ? "t3.large" : "t3.micro"
}

9. Scaling Infrastructure for Traffic Spikes

Scenario:
You need to scale your application automatically for traffic spikes. How would you implement this using Terraform?

Answer:

Use aws_autoscaling_group and aws_launch_configuration.
Integrate scaling policies with CloudWatch alarms.

10. Handling Sensitive Data

Scenario:
How do you ensure secrets (e.g., API keys) used in Terraform are securely managed?

Answer:

Use Terraform sensitive flag:

variable "api_key" {
  sensitive = true
}

Store secrets in external tools like AWS Secrets Manager or HashiCorp Vault. Retrieve them dynamically:
```
data "aws_secretsmanager_secret_version" "secret" {
  secret_id = "api_key"
}
```

Let me know if you need further elaboration on any of these!