Introduction:
This company stands out as the leading electronic payment services provider in Iraq country. The company issues and acquires payment cards, including international options, on behalf of multiple affiliated banks including government banks within the country. This company’s cutting-edge solution leverages biometric identification to verify cardholders. Their widely used system distributes pensions, social welfare aid, and employee salaries to millions of customers through a massive network of 6000 service points nationwide.
The Challenge:
- Securely implementing an on-premises production environment that adheres to PCI-DSS security and compliance regulations.
- Orchestrating the simultaneous deployment of numerous AWS services across dev/test environments in the organizational accounts.
- Developing a robust pipeline system using approximately 35+ AWS code pipelines to deliver the required application images.
- Creation of Landing Zones for managing security, monitoring etc., centrally.
- Optimizing infrastructure costs while maintaining service quality and implementing cost-saving measures when feasible.
- Timelines extensions due to incoming of new application deployments during the other ongoing deployments.
Timelines:
Understanding the infra was a big challenge as the landing zones are to be created.
After having comprehensive discussions between Teleglobal and the stakeholders the conclusion has been derived that the dev and test environment should be deployed on AWS while the Pre-Production and Production environments should be deployed on-premises.
Understanding the Infra: 10-15 days (about 2 weeks)
Cloud Deployment (dev + test): 4 months
On-Premises Deployment (pre-prod + Prod): 6 months
Cost Optimization: 2 months
The Solution:
Our team devised a comprehensive strategy to address the client’s requirements, focusing on efficient landing zone creations, AWS service deployments, including deployments of huge amount of pipelines, all under strict cost optimization measures.
The plan was devised to deploy dev/test environments onto the AWS cloud and the pre-production and Production workloads onto on-premises in order to achieve both cost savings and PCI-DSS rules and regulations adherence.
On-Premises Infrastructure (Pre-prod and Production):
An amazing infrastructure was devised from the scratch for the deployment of production workload on on-premises.
The below are some of the components of the architecture our team devised:
- A highly Available Kubernetes setup was implemented to host containerized applications.
- It includes the presence of 3 master (control plane) and 3 worker nodes (data plane) with dedicated amt of configuration to handle weight of applications to be hosted on the Kubernetes.
- Traefik as an ingress controller was implemented to detect changes to the ingress files.
- Flannel was used for pod-to-pod networking and containerd will be used as container runtime environment for on-prem Kubernetes environment.
- NFS (Network file server) as persistent volume for the applications and Kubernetes components which requires their data persistence like Kubernetes component (etcd) and other applications.
- ECR was utilized as container registry to connect with Kubernetes and store docker images used to spin containers.
- PostgreSQL Database of various configurations was configured as per the need of various applications.
- NAT gateway was leveraged to provide secure internet access to the resources running inside private networks.
- VPN was leveraged to access the on-prem private resources securely.
- DNS server was implemented to provide routing services.
- A highly available Metal LB Load Balancer server was used to load balance traffic between various applications.
- A highly available and durable RabbitMQ servers cluster was deployed with 3 nodes to enable communication between applications through message queues.
- Dynatrace was used as a monitoring tool to monitor the Kubernetes system as well as the rabbit MQ cluster along with other infrastructure in place like Databases, MetalLB server, DNS servers etc.
- Dynatrace was used as a logging solution to view and fetch logs from the various components of infrastructure for troubleshooting scenarios and for faster identifications of Root causes.
- The same will also be leveraged for alerting methods to get quick and prompt responses over a wide variety of available options like Mail, Slack, JIRA etc., during the occurrence of events.
- The above features were utilized by deploying Dynatrace extensions or/and One Agent in order to gain full visibility of the infrastructural resources.
-
Creation of Landing Zones:
- Landing Zones were created using AWS Control Tower.
- The structure was wisely and carefully devised by our team adhering to industry standards and best-practices.
Various OUs (organizational Units) were created using various AWS services to manage certain areas of concerns like Security, Logging, Monitoring, etc., centrally
AWS Infrastructure (Dev/Test):
Below is the list of some of the resources that our team deployed for dev/test workloads to run:
- Limited usage, primarily development and testing environments are present here.
- No production workloads deployed on AWS.
- Some AWS services might have been provisioned for specific use cases, such as S3 for file storage or YAML files storage for functioning of code Build/Code Deploy.
- Deployment of EC2 instances for development/testing of various applications which are at their initial phases of development.
- Route 53 to support DNS records storage and creation of VPCs and route tables for successful routing.
- Utilizing AWS code Commit for SCM (Source code Management)
- Deployment of applications into EKS clusters through code pipelines.
- AWS LBs to bring traffic into their EKS clusters.
- Creation of Tasks and Services for EKS clusters for their functioning.
- Using AWS lambda for automation and implementation of various other processes.
By going through the Hybrid strategy Teleglobal specifically devised, this Financial Giant was saved from overspending more than 2.5-3 million dollars per month on their infrastructural cost.Cost-Optimized Deployment to the AWS Cloud:
- Planning of the Landing Zone
Landing zones, where dev/test workloads were deployed, were planned wisely to optimize costs, if not configured up to the mark, may lead to an increase in compliance and security expenses. The setup of the landing zone included integration with identity directories,
formulation of account structures, establishment of virtual private cloud (VPC) networking, establishment of infrastructure for security, monitoring, and configuration management OUs.
- Appropriately sized Resources
Various options with reference to sizes and types were considered.
We made use of tools and monitoring to modify resource allocation based on authentic usage patterns.
- Implemented lifecycle policies to automatically transition infrequently accessed data to lower-cost storage classes like S3Glacier or S3 Glacier Deep Archive.
Utilized S3 Intelligent-Tiering to automatically optimize storage costs by moving objects between frequent access and infrequent access tiers based on access patterns.
- Start and Stop test servers when not needed automatically and many other tweaks were done t via python Lambda scripts which helped in saving $150-200K/month.
- Deployed monitoring systems wisely through container insights and others in order to save money of costly monitoring systems.
- Eliminating the need for manual deployments by using AWS CodePipeline and saving an estimated 800-1K development hours per month. This translates directly to Approx $100K/month saved in developmental costs.
- Used data driven approach by employing AWS CloudWatch container Insights to continuously gather container-level resource metrics, including CPU, memory, and network I/O.
This granular data was then visualized through CloudWatch dashboards, allowing us to identify containers consistently underutilizing their allocated resources.
By skillfully right-sizing EKS worker node capacity based on the optimized container density, we achieved a 15-20% reduction in EKS costs (approximately $30,000 per month) applied on 2 various environments (dev, test) with 35+ EKS clusters, 200+ EKS worker nodes and 2000+ containers running 35+ microservices in them.
- For deeper cost insights, AWS Cost and Usage Reports were implemented, providing granular details on resource usage and costs. This data is then visualized and analyzed in AWS Quick Sight, allowing us to identify cost trends, optimize spending, and make data-driven decisions for a more cost-effective AWS environment.
- Developed a custom Lambda function utilizing the AWS SDK to identify idle Elastic Load Balancers (ELBs). This serverless function employed advanced filtering techniques, including a minimum request threshold sustained over a defined time window, to pinpoint truly underutilized resources.
- Through close collaboration with the application team, we implemented a two-pronged approach: consolidating applications onto a single, scalable ALB and utilizing host/path-based routing for dedicated routing needs. This combined effort yielded a noteworthy 10-15% reduction in EKS costs, underscoring the effectiveness of proactive and serverless-powered cost management within the infrastructure.
- Reviewed the networking architecture and the findings suggested the team use NAT gateways in each AZ to reduce the Inter-AZ data transfer cost.
Seamlessly deployed the company’s dev/test infrastructure to the cloud, optimizing resource utilization and minimizing downtimes.
By understanding the architecture in and out and thus coming up with these personalized cost optimized solutions our team helped in saving up to $400-500K/month.
Outcomes:
Through our expertise, the leading Finance company successfully transitioned to a cloud-native infrastructure, realizing substantial cost savings and operational efficiencies.
Through our expertise the Giant was able to deploy their 2 million worth cloud infrastructure on AWS cloud and 3 million worth infrastructures on On-premises environment.
After applying all the above strategies and cost optimization techniques their 2-million-dollar AWS cloud infrastructure cost came down to roundabout $1.5 million dollars per month.
Conclusion:
Our DevOps expertise, with a focus on cost optimization, empowered our Finance client to confidently navigate their digital transformation. We leveraged AWS cloud services to achieve significant cost savings, ranging from $400,000 to $450,000 per month, representing a 20-22% reduction in their overall cloud spend. This approach, combining DevOps agility with cost control, not only optimized their cloud infrastructure but also provided the foundation for enhanced service delivery and innovation. By embracing the power of the cloud and DevOps, we positioned our client for sustainable growth within the dynamic Finance landscape of Iraq.