Introduction:
A prominent healthcare company faced the challenge of modernizing its infrastructure to meet the demands of a dynamic market while ensuring cost efficiency. A hybrid structure spanned throughout AWS cloud and on-premises environments, coupled with the imperative to integrate AI-driven services into their applications.
The Challenge:
- Achieving a seamless migration to the cloud environment without disrupting critical operations.
- Orchestrating the simultaneous deployment of numerous AWS services across the organization.
- Reducing the substantial $1.5 million monthly infrastructure costs without compromising service quality.
- Incorporating AI services into existing applications while ensuring optimal performance and cost-efficiency.
Timelines:
Understanding the infra was a big challenge as the landing zones are to be created.
- Understanding the Hybrid Infra: 10-15 days (about 2 weeks)
- Migration: 4 months
- Cloud Deployment (includes Cloud + AI services): 3 months
- Cost Optimization: 2 months
The Solution:
Our team devised a comprehensive strategy to address the client’s requirements, focusing on efficient migration, AWS service deployments, including AI services, all under strict cost optimization measures.
Before Migration (Hybrid Infrastructure):
On-Premises Infrastructure:
- Physical servers hosting legacy applications and databases.
- Local storage systems for data management.
- Network infrastructure including routers, switches, and firewalls.
- Backup and disaster recovery solutions on-site.
AWS Infrastructure:
- Limited usage, primarily for testing and development environments.
- No production workloads deployed on AWS.
- Some AWS services might have been provisioned for specific use cases, such as S3 for file storage or EC2 instances for testing.
- Route 53
- AWS LBs to bring traffic into their Physical Servers.
After Migration to AWS:
AWS Infrastructure:
- Amazon EC2 (Elastic Compute Cloud)
- Amazon RDS (Relational Database Service)
- Amazon S3 (Simple Storage Service)
- Amazon CloudFront
- AWS Lambda
- Amazon SES (Simple Email Service)
- Amazon CloudWatch
- Amazon CloudFormation
- Amazon EKS (Elastic Kubernetes Service)
- Amazon Comprehend Medical.
- Amazon Transcribe
- AWS Glue
- AWS RedShift Datawarehouse
After migrating to AWS their infrastructure costs were around a million dollars per month.
Cost-Optimized Migration to the Cloud:
- Planning of the Landing Zone
Landing zones, where workloads are migrated, were planned wisely to optimize costs, if not configured up to the mark, may lead to an increase in compliance and security expenses. The setup of the landing zone included integration with identity directories, formulation of account structures, establishment of virtual private cloud (VPC) networking, establishment of infrastructure for security, monitoring, and configuration management.
- Appropriately sized Resources
Various options with reference to sizes and types were considered.
We made use of tools and monitoring to modify resource allocation based on authentic usage patterns.
Seamlessly migrated the company’s on-premises infrastructure to the cloud, optimizing resource utilization and minimizing downtime. Through rigorous assessment and planning, we achieved cost savings of up to $40-45K/month during the migration phase alone.
Cost-optimization of the Cloud Deployment:
- Start and Stop test servers when not needed automatically. Achieved via python Lambda scripts helped in saving $20-25K/month.
- Deployed ELK stack, Grafana, Prometheus in order to save money of costly monitoring systems.
- Opted for a more cost-effective approach by implementing Jenkins for our CI/CD pipeline instead of using AWS DevOps services.
Eliminating the need for manual deployments and saving an estimated 100-120 development hours per month. This translates directly to $30-40K/month saved in developmental costs.
- Used data driven approach by employing Prometheus to continuously gather pod-level resource metrics, including CPU, memory, and network I/O.
This granular data was then visualized in Grafana, allowing us to identify pods consistently underutilizing their allocated resources. Leveraging Horizontal Pod Autoscalers (HPAs), we implemented a dynamic scaling strategy.
HPAs automatically adjust the number of pod replicas based on pre-defined CPU or memory utilization thresholds. This ensured the worker nodes were populated with only the necessary number of pods, maximizing resource utilization while maintaining application performance.
By skillfully right-sizing worker node capacity based on the optimized pod density, we achieved a 20% reduction in EKS costs (approximately $30,000 per month) applied on 3 various env (dev, uat, prod) with 4 EKS clusters, 200+ worker nodes and 1000+ pods running 50+ microservices in them.
- Implemented AWS cost and Usage Reports (AWS CUR) and AWS Quick Sight.
- Developed a custom Lambda function utilizing the AWS SDK to identify idle Elastic Load Balancers (ELBs). This serverless function employed advanced filtering techniques, including a minimum request threshold sustained over a defined time window, to pinpoint truly underutilized resources.
- Through close collaboration with the application team, we implemented a two-pronged approach: consolidating applications onto a single, scalable ELB and utilizing EKS’s host/path-based routing for dedicated routing needs. This combined effort yielded a noteworthy 10-15% reduction in EKS costs, underscoring the effectiveness of proactive and serverless-powered cost management within the infrastructure.
- Reviewed the networking architecture and the findings suggested the team use NAT gateways in each AZ to reduce the Inter-AZ data transfer cost.
By understanding the architecture in and out and thus coming up with these personalized cost optimized solutions our team helped in saving more than $90-100K per month.
Cost Optimization of AI Services:
- Chosen the appropriate instance size and type based on your workload requirements
- Monitored query performance using EXPLAIN or execution plans. Identified bottlenecks like joins or scans, then optimized queries by filtering data upfront, using indexes, or leveraging materialized views.
- Profile jobs to identify resource bottlenecks. Analyse data size and processing complexity to determine optimal DPU allocation. Fine-tune scripts by utilizing efficient data structures, optimizing filtering and transformations, and strategically employing partitioning, bucketing, and broadcast joins. Continuously monitor job metrics like CPU, memory, and execution time to refine DPU allocation and script optimizations for sustained performance improvement.
- Batch multiple transcription requests in Amazon Transcribe into a single job to reduce API call overhead and optimize costs.
- Implemented lifecycle policies to automatically transition infrequently accessed data to lower-cost storage classes like S3 Glacier or S3 Glacier Deep Archive.
- Utilized S3 Intelligent-Tiering to automatically optimize storage costs by moving objects between frequent access and infrequent access tiers based on access patterns.
- Using data compression techniques during query execution, we optimized storage and reduced costs significantly. By integrating compression into the data processing pipeline and leveraging the Parquet format, we achieved remarkable reductions in storage footprint.
Through Parquet’s columnar storage design and efficient compression algorithms, we achieved storage reductions averaging around 40-50% compared to uncompressed data storage formats. This substantial decrease in storage footprint directly translated to cost savings.
By implementing these cost optimization strategies, we maximized the value of AI services while minimizing operational expenses up to $20-25K per month.
Outcomes:
Through our expertise, the healthcare company successfully transitioned to a cloud-native infrastructure, realizing substantial cost savings and operational efficiencies.
After applying all the above strategies and cost optimization techniques their million-dollar infrastructure cost came down to roundabout $800K per month.
The optimized deployment of AWS services enabled seamless scalability and enhanced performance, while the integration of AI services empowered them to deliver innovative healthcare solutions, positioning them as a leader in the digital healthcare landscape.
Conclusion:
Our expertise in cloud services, coupled with intense focus on cost optimization and AWS AI service implementation, enabled our healthcare client to navigate the complexities of digital transformation with confidence and saved up to 20% ($150K-200K per month) of the total cost. By leveraging the power of the cloud and AI, we empowered the client to enhance service delivery, drive innovation, and achieve sustainable growth in an ever-evolving healthcare ecosystem.