Introduction:
This company stands out as the leading electronic payment services provider in Iraq country. The company issues and acquires payment cards, including international options, on behalf of multiple affiliated banks including government banks within the country. This company’s cutting-edge solution leverages biometric identification to verify cardholders. Their widely used system distributes pensions, social welfare aid, and employee salaries to millions of customers through a massive network of 6000 service points nationwide.
The Challenge:
- Securely implementing an on-premises production environment that adheres to PCI-DSS security and compliance regulations.
- Orchestrating the simultaneous deployment of numerous AWS services across dev/test environments in the organizational accounts.
- Developing a robust pipeline system using approximately 35+ AWS code pipelines to deliver the required application images.
- Creation of Landing Zones for managing security, monitoring etc., centrally.
- Optimizing infrastructure costs while maintaining service quality and implementing cost-saving measures when feasible.
- Timelines extensions due to incoming of new application deployments during the other ongoing deployments.
Timelines:
Understanding the infra was a big challenge as the landing zones are to be created.
After having comprehensive discussions between Teleglobal and the stakeholders the conclusion has been derived that the dev and test environment should be deployed on AWS while the Pre-Production and Production environments should be deployed on-premises.
Understanding the Infra: 10-15 days (about 2 weeks)
Cloud Deployment (dev + test): 4 months
On-Premises Deployment (pre-prod + Prod): 6 months
Cost Optimization: 2 months
The Solution:
Our team devised a comprehensive strategy to address the client’s requirements, focusing on efficient landing zone creations, AWS service deployments, including deployments of huge amount of pipelines, all under strict cost optimization measures.
The plan was devised to deploy dev/test environments onto the AWS cloud and the pre-production and Production workloads onto on-premises in order to achieve both cost savings and PCI-DSS rules and regulations adherence.
On-Premises Infrastructure (Pre-prod and Production):
An amazing infrastructure was devised from the scratch for the deployment of production workload on on-premises.
The below are some of the components of the architecture our team devised:
- A highly Available Kubernetes setup was implemented to host containerized applications.
- It includes the presence of 3 master (control plane) and 3 worker nodes (data plane) with dedicated amt of configuration to handle weight of applications to be hosted on the Kubernetes.
- Traefik as an ingress controller was implemented to detect changes to the ingress files.
- Flannel was used for pod-to-pod networking and containerd will be used as container runtime environment for on-prem Kubernetes environment.
- NFS (Network file server) as persistent volume for the applications and Kubernetes components which requires their data persistence like Kubernetes component (etcd) and other applications.
- ECR was utilized as container registry to connect with Kubernetes and store docker images used to spin containers.
- PostgreSQL Database of various configurations was configured as per the need of various applications.
- NAT gateway was leveraged to provide secure internet access to the resources running inside private networks.
- VPN was leveraged to access the on-prem private resources securely.
- DNS server was implemented to provide routing services.
- A highly available Metal LB Load Balancer server was used to load balance traffic between various applications.
- A highly available and durable RabbitMQ servers cluster was deployed with 3 nodes to enable communication between applications through message queues.
- Dynatrace was used as a monitoring tool to monitor the Kubernetes system as well as the rabbit MQ cluster along with other infrastructure in place like Databases, MetalLB server, DNS servers etc.
- Dynatrace was used as a logging solution to view and fetch logs from the various components of infrastructure for troubleshooting scenarios and for faster identifications of Root causes.
- The same will also be leveraged for alerting methods to get quick and prompt responses over a wide variety of available options like Mail, Slack, JIRA etc., during the occurrence of events.
- The above features were utilized by deploying Dynatrace extensions or/and One Agent in order to gain full visibility of the infrastructural resources.
-
Creation of Landing Zones:
- Landing Zones were created using AWS Control Tower.
- The structure was wisely and carefully devised by our team adhering to industry standards and best-practices.
Various OUs (organizational Units) were created using various AWS services to manage certain areas of concerns like Security, Logging, Monitoring, etc., centrally
AWS Infrastructure (Dev/Test):
Below is the list of some of the resources that our team deployed for dev/test workloads to run:
- Limited usage, primarily development and testing environments are present here.
- No production workloads deployed on AWS.
- Some AWS services might have been provisioned for specific use cases, such as S3 for file storage or YAML files storage for functioning of code Build/Code Deploy.
- Deployment of EC2 instances for development/testing of various applications which are at their initial phases of development.
- Route 53 to support DNS records storage and creation of VPCs and route tables for successful routing.
- Utilizing AWS code Commit for SCM (Source code Management)
- Deployment of applications into EKS clusters through code pipelines.
- AWS LBs to bring traffic into their EKS clusters.
- Creation of Tasks and Services for EKS clusters for their functioning.
- Using AWS lambda for automation and implementation of various other processes.