Monitoring Setup with AWS CloudWatch & Datadog using Terraform
Objective:
The goal of this project is to set up a comprehensive monitoring solution for an AWS infrastructure using Amazon CloudWatch and
Datadog, while automating the deployment of alarms, dashboards, and monitoring configurations using Terraform.
Technology Stack:
- Cloud Provider: AWS
- Monitoring Tools: AWS CloudWatch, Datadog
- Infrastructure as Code (IaC): Terraform
- Scripting: Shell/Python (for automation)
- CI/CD Tools (Optional): Jenkins/GitHub Actions
- Alerting Tools: SNS (for CloudWatch Alarms), Slack/Webhooks (for Datadog)
Project Implementation Steps
1. Infrastructure Setup
- Ensure that AWS infrastructure (EC2, RDS, Load Balancer, etc.) is already provisioned.
- Enable VPC Flow Logs, S3 logs, and other relevant logs for monitoring.
2. CloudWatch Monitoring Setup using Terraform
- Enable CloudWatch Metrics Collection
- Configure CloudWatch agent on EC2 instances to collect CPU, memory, disk, and network usage.
- Enable AWS service-specific metrics (e.g., RDS, ALB, Lambda).
- Enable AWS X-Ray for tracing requests.
- Create CloudWatch Alarms
- Define Terraform modules for CloudWatch alarms based on CPU, memory, and disk utilization.
- Set up anomaly detection for unusual spikes in traffic.
- Trigger AWS SNS notifications for alerts.
- Create CloudWatch Dashboards
- Use Terraform to define custom dashboards for:
- EC2 instances (CPU, memory, disk)
- RDS (connections, CPU utilization, free storage)
- ALB (request count, latency, 5xx errors)
- Group dashboards per environment (Dev, Staging, Production).
3. Datadog Monitoring Setup using Terraform
- Integrate AWS with Datadog
- Configure AWS IAM permissions for Datadog.
- Set up Terraform modules for Datadog AWS integration.
- Enable Datadog Agent on EC2
- Deploy the Datadog agent via Terraform user-data scripts.
- Configure log forwarding from CloudWatch to Datadog.
- Define Datadog Monitors & Alerts
- Create monitors for:
- EC2 health (CPU, Memory, Disk)
- RDS database health
- API response times
- Configure alert thresholds and notification channels (Slack, PagerDuty, Webhooks).
- Create Datadog Dashboards
- Use Terraform to define JSON-based dashboard templates.
- Set up environment-based dashboards (e.g., Dev, QA, Production).
- Include service health and performance graphs.
4. Automation and CI/CD Integration
- Automate Monitoring Setup with Terraform
- Define reusable Terraform modules for CloudWatch and Datadog.
- Store Terraform state remotely in >S3 with DynamoDB locking.
- Integrate with Jenkins/GitHub Actions
- Automate Terraform deployments via a CI/CD pipeline.
- Implement Terraform plan/apply stages with approval for production changes..
- Configure Log Forwarding
- Stream CloudWatch logs to Datadog for unified visibility.
- Set up AWS Lambda for custom log ingestion.
5. Security & Compliance
- Implement IAM Roles & Policies
- Restrict access to monitoring tools with IAM policies.
- Enforce least privilege principle for monitoring agents.
- Enable CloudWatch Logs Insights
- Configure queries for advanced log analysis.
- Enable log retention policies.
- Set Up Multi-Region Monitoring
- Ensure CloudWatch and Datadog cover all AWS regions in use.
6. Testing & Validation
- Simulate Load and Check Alerts
- Use stress testing tools (Apache JMeter, Locust) to trigger alerts.
- Verify that notifications are sent correctly.
- Monitor Terraform Deployment
- Check Terraform state and ensure resources are correctly applied.
7. Documentation & Handover
- Create README Documentation
- Explain Terraform module usage.
- Provide troubleshooting steps.
- Knowledge Transfer
- Conduct a knowledge-sharing session for the operations team.
Terraform Code for CloudWatch & Datadog Monitoring
I'll break it into multiple modules:
- CloudWatch Agent & Log Group Setup
- CloudWatch Alarms & Dashboards
- Datadog AWS Integration
- Datadog Monitors & Dashboards
- CI/CD Pipeline Integration (Optional)
1️⃣ CloudWatch Agent & Log Group Setup
This Terraform script:
- Installs the CloudWatch agent on EC2 instances.
- Configures CloudWatch logs for monitoring.
Terraform Code: cloudwatch-agent.tf
resource "aws_ssm_parameter" "cloudwatch_agent_config" {
name = "/AmazonCloudWatch-agent-config"
type = "String"
value = <
2️⃣ CloudWatch Alarms & Dashboards
- Creates alarms for CPU, memory, and disk utilization.
- Sets up SNS notifications.
resource "aws_sns_topic" "cloudwatch_alerts" {
name = "cloudwatch-alerts"
}
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
alarm_name = "HighCPUUsage"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 60
statistic = "Average"
threshold = 80
alarm_description = "This alarm triggers when CPU usage exceeds 80%."
alarm_actions = [aws_sns_topic.cloudwatch_alerts.arn]
dimensions = {
InstanceId = "i-1234567890abcdef0"
}
}
Terraform Code: cloudwatch-dashboard.tf
resource "aws_cloudwatch_dashboard" "main_dashboard" {
dashboard_name = "CloudWatch-Monitoring"
dashboard_body = <
3️⃣ Datadog AWS Integration
This Terraform script:
- Connects AWS services to Datadog for monitoring.
Terraform Code: datadog-aws-integration.tf
provider "datadog" {
api_key = var.datadog_api_key
app_key = var.datadog_app_key
}
resource "datadog_integration_aws" "aws_integration" {
account_id = "123456789012"
role_name = "DatadogIntegrationRole"
host_tags = ["env:production", "service:webapp"]
filter_tags = ["tag-key:tag-value"]
metrics_polling = true
}
4️⃣ Datadog Monitors & Dashboards
This Terraform script:
- Creates Datadog monitors for EC2 and RDS.
- Sets up notifications via Slack.
Terraform Code: datadog-monitors.tf
resource "datadog_monitor" "cpu_monitor" {
name = "High CPU Usage"
type = "metric alert"
query = "avg(last_5m):avg:aws.ec2.cpuutilization{*} > 80"
message = "ALERT! CPU usage exceeded 80%. Please investigate."
tags = ["env:production"]
notify_no_data = true
no_data_timeframe = 10
notification {
type = "slack"
channel = "#alerts"
}
}
Terraform Code: datadog-dashboard.tf
resource "datadog_dashboard" "aws_monitoring_dashboard" {
title = "AWS Monitoring"
description = "Monitoring Dashboard for AWS Services"
widget {
title = "EC2 CPU Usage"
definition {
timeseries_definition {
request {
q = "avg:aws.ec2.cpuutilization{*}"
}
}
}
}
}
5️⃣ CI/CD Pipeline Integration (Jenkins)
This Jenkinsfile automates Terraform deployment.
Jenkinsfile
pipeline {
agent any
environment {
AWS_REGION = 'us-east-1'
DATADOG_API_KEY = credentials('datadog-api-key')
DATADOG_APP_KEY = credentials('datadog-app-key')
}
stages {
stage('Terraform Init') {
steps {
sh 'terraform init'
}
}
stage('Terraform Plan') {
steps {
sh 'terraform plan -out=tfplan'
}
}
stage('Terraform Apply') {
steps {
sh 'terraform apply -auto-approve tfplan'
}
}
}
}
Deployment Steps
1️⃣ Prerequisites
- Install Terraform and configure AWS credentials.
- Get Datadog API & APP keys.
- Set up Jenkins with Terraform plugins.
2️⃣ Clone Repository & Initialize Terraform
git clone https://github.com/your-repo/monitoring-terraform.git
cd monitoring-terraform
terraform init
3️⃣ Run Terraform Deployment
terraform plan -out=tfplan
terraform apply -auto-approve tfplan
4️⃣ Validate Monitoring Setup
- CloudWatch Dashboard: AWS Console → CloudWatch → Dashboards
- Datadog Dashboard: Datadog UI → Dashboards
- Alerts: Check SNS and Slack notifications.
Final Outcome
- AWS CloudWatch & Datadog integrated for full-stack monitoring.
- Terraform automated monitoring setup.
- Alerts & dashboards configured for real-time insights.
- CI/CD pipeline for automatic deployment.
Deliverables:
- Terraform Code Repository (CloudWatch + Datadog)
- Automated Dashboards & Alerts
- CI/CD Pipeline for Terraform Deployment
- Documentation & Playbook for Monitoring Setup
- Security & Compliance Best Practices
Expected Outcome
- Fully automated CloudWatch & Datadog monitoring for AWS infrastructure.
- Proactive alerting with SNS, Slack, and PagerDuty.
- Centralized monitoring dashboards for real-time visibility.
- Automated deployment of monitoring configurations with Terraform.
- Scalability for adding more AWS services as needed.