Disaster Recovery Automation Project Using Ansible and AWS

Task Overview

Disaster recovery (DR) is crucial for ensuring business continuity in case of failures, cyberattacks, or natural disasters. This project aims to automate backup and recovery of critical infrastructure and databases using Ansible, AWS, and Terraform. The solution will periodically back up infrastructure and databases and provide an automated mechanism for restoring services in case of failure.

Project Architecture

Key Components

Step-by-Step Implementation

Step 1: Infrastructure Provisioning with Terraform

Provision critical AWS resources:

Step 2: Automating Backups Using Ansible

Create Ansible playbooks to:

Step 3: Disaster Recovery Simulation

Implementation Details

1. Terraform Code to Deploy AWS Infrastructure


resource "aws_s3_bucket" "backup_bucket" {
  bucket = "my-disaster-recovery-bucket"
  lifecycle_rule {
    id      = "auto-expire"
    enabled = true
    expiration {
      days = 30
    }
  }
}

resource "aws_rds_instance" "database" {
  engine         = "mysql"
  instance_class = "db.t3.micro"
  allocated_storage = 20
  identifier     = "dr-db-instance"
  backup_retention_period = 7
}

resource "aws_instance" "app_server" {
  ami           = "ami-12345678"
  instance_type = "t2.micro"
  tags = {
    Name = "App-Server"
  }
}
    

2. Ansible Playbook for Backup Automation


---
- name: Backup EC2 and RDS
  hosts: localhost
  tasks:

    - name: Take EC2 snapshot
      community.aws.ec2_snapshot:
        instance_id: "{{ ec2_instance_id }}"
        region: "{{ aws_region }}"
        wait: yes
      register: ec2_snapshot

    - name: Backup RDS database
      community.aws.rds_snapshot:
        db_instance_identifier: "{{ rds_instance }}"
        db_snapshot_identifier: "rds-backup-{{ ansible_date_time.epoch }}"
        wait: yes
      register: rds_backup

    - name: Copy application data to S3
      aws_s3:
        bucket: "my-disaster-recovery-bucket"
        object: "/backups/app-data-{{ ansible_date_time.epoch }}.tar.gz"
        src: "/var/www/html/"
        mode: put
    

3. Ansible Playbook for Disaster Recovery


---
- name: Restore EC2 and RDS
  hosts: localhost
  tasks:

    - name: Restore EC2 from snapshot
      community.aws.ec2_snapshot_info:
        snapshot_ids: "{{ latest_snapshot_id }}"
      register: ec2_snapshot_info

    - name: Create new EC2 from latest snapshot
      community.aws.ec2_instance:
        name: "Recovered-App-Server"
        region: "{{ aws_region }}"
        image_id: "{{ ec2_snapshot_info.snapshots[0].image_id }}"
        instance_type: "t2.micro"
        wait: yes

    - name: Restore RDS from latest snapshot
      community.aws.rds_instance:
        identifier: "recovered-db-instance"
        snapshot_identifier: "{{ latest_rds_snapshot_id }}"
        instance_class: "db.t3.micro"
        wait: yes

    

Step 4: Monitoring & Alerts

1.Configure CloudWatch Alarms:

Set up Ansible handlers:

---
- name: Monitor EC2 and Trigger Recovery
  hosts: localhost
  tasks:

    - name: Check if EC2 instance is running
      shell: aws ec2 describe-instance-status --instance-id "{{ ec2_instance_id }}"
      register: ec2_status

    - name: Trigger recovery if EC2 is down
      command: ansible-playbook restore.yml
      when: "'running' not in ec2_status.stdout"

Step 5: Testing the Disaster Recovery Plan

Scenario 1: Application Server Failure

  1. Terminate EC2 manually.
  2. Run Ansible recovery playbook.
  3. Validate application accessibility.

Scenario 2: Database Failure

  1. Delete RDS instance.
  2. Run Ansible recovery playbook.
  3. Validate database integrity.

Final Outcome

Next Steps