Name and Sector of Client:
The client is one of the largest real-estate business agencies in India spanning business throughout the country.
Environment Details:
- The existing infrastructure is separated in two parts.
- On-Prem Environment:
- There are three monolithic multi-tiered applications hosted in their on-premise server.
- These applications are for internal usage and data in this application is private and extremely crucial for business.
- The on-prem web applications have the respective web, app and database tiers. The web applications were used for the purpose
- Vendor Management Portal: used for financial tracking for vendors on construction materials, invoice management, and submitting requirements, etc.
- Construction Management application: Used for project scheduling, Cost tracking, Site and resource management, documentation management, and contracts assigned to contractors.
- ERP for Employees: Employee attendance, HR portal and all managed from this application.
- Data distribution across multiple DBS in on-prem:
- The Vendor management database containing information of vendors, purchase information, etc was about 2 TB
- The Construction Management database spreading data volume in about 7 TB.
- The ERP application containing all details of employees and HR was having volume of 6 TB.
- AWS Infrastructure:
- Another user faced multi-tiered micro services-based applications are hosted in EKS cluster in AWS. This micro services-based are user centric and require high scalability so AWS was their choice to go.
- The data in the on-premise infrastructure contains data like the registration information, personal information of buyers and sellers with their documents, property listings, tender invoices, architectural plans designed by their architects, land records, power of attorney, etc. So, these data are very crucial for their business and they were handling with extreme care.
- These data protection was required to meet India govt. compliances like PDPB, DPDP, RERA, etc.
- The AWS EKS was running on 20 worker nodes with 4 node groups running more than 50 microservices applications in pods.
- The application running in AWS EKS serves these purposes.
- Property listings and management: This part is regulated by client and open for customers to check for available properties, their location, current rate both for sales and rental usage of private and commercial properties.
- Customer feedback portal: Provides clients with a platform to submit feedback, request support, and track issue resolution.
- Dealing for Interior design and modification: Customers can view designs and submit request for their service.
- Investment growth analysis: The YoY growth opportunities in real-estate in particular locations.
- Sales Inquiry: Open for customers to chat with sales team to purchase or analyse the opportunities.
- The data contained in the application was about 5 TB.
Problem Faced by client:
- Improper Disaster recovery configuration:
- Their current on-premise environment was lacking proper disaster recovery and was a root cause for often failure and data loss due to local disaster recovery setup.
- The existing EKS based application was also lacking disaster recovery and was pain-point in creating disturbance in future.
- Business Continuity Risks:
- The critical data was stored in a single data center, making the entire operation vulnerable to significant risks.
- Any downtime or failure in this centralized system could severely disrupt business operations, potentially leading to financial losses and erosion of customer trust.
- Increased Data Security Risks:
- The centralized data storage in a single location posed heightened security risks, as a breach or failure could compromise all critical business data, leading to severe consequences for both operations and customer relations.
Solution Details:
- Solution Overview:
Teleglobal, leveraging its expertise in cloud infrastructure, designed a robust backup and disaster recovery solution for the client by integrating AWS S3 storage services with Veeam Backup & Restore for DR of on-premise application servers. This innovative solution not only streamlined the disaster recovery processes but also included archival capabilities of earlier snapshots using Amazon S3 Glacier for long-term storage of legacy data. This is done to ensure disaster recovery and safeguarding the client’s critical business data from potential disruptions.
- Networking setup in between on-premise and AWS:
- The On-prem network and respective AWS VPC were connected via Site-to-Site connectivity to establish a smooth connection for data transfer and traffic transfer.
- The existing on-prem DNS records were removed with AWS route53. As a disaster happens the Route53 can route traffic to AWS infrastructure. Otherwise, it will serve through primary on-prem infrastructure.
- Using the site-to-site VPN connectivity the Veeam backup and replication can send the snapshots to respective repository in S3 in the form of ami.
- The load balancer in on-premises was with its private IP was associated to route53 to route traffic to primary on-premise server.
- License procurement for Veeam backup and restore:
- To use the Veeam backup and restore for DR the client procured licenses from Veeam.
- There were 9 licenses for different servers running the internal applications through which the servers can be backed up.
- On-premises setup for Disaster recovery using Veeam:
- The Veeam backup and replication agent was installed in one of the machines in on-premise infrastructure.
- The disaster recovery job was set in the agent. This agent creates incremental backup and often synthetic full backup to avoid any lag.
- These snapshots are saved in the respective S3 bucket which was provided.
- On disaster in the on-premise location this job will trigger to create similar infra in AWS from the snapshots and the infrastructure is ready within the provided RTO and RPO of 6 hours respectively.
- EKS cluster DR setup:
- For the RDS a read-replica instance was created in the DR region from the primary region. This endpoint is used in the application for the 2nd
- Two branches namely main and DR were created and parallel build pipelines were created in both the regions for active/active state.
- When traffic from Route53 be sent through Global Accelerator it checks the health of ALB of both regions, on the basis a disaster is detected and traffic is sent to DR region.
- On disaster the read replica is promoted to primary and both read write operations can be performed.
- In SLA we have agreed upon the RTO and RPO for 4 hours for EKS and in on-prem case which was achieved also during the drills we have performed.
Architecture:
 On-Premises DR Architecture:Â
    EKS DR Setup