Atabak - Cloud Migration in a simple way

Cloud Migration in a simple way

This page would bring the challenges, architecture and then proposed target to achieve through collaborative innovation migration, and build truly customer-centric businesses on cloud simultaneously. It includes preparation, focus, transform and consulting roadmap along with cloud-based services. This would help hands-on thinking to engineer ideas that solve business problems, and enables the migration to rapidly plan, develop, design, architect, scale projects and resources including security and compliance with regulation that deliver immediate impact and sustainable growth.

The context is a space for cross-functional executive teams. The architecture will bring together IT Operations and leaders, Infrastructure architectures, Information security architecture and all decision makers from the business facing challenges at different points of the cloud transformation journey.

Objectives

Migrate an application into AWS Cloud
RTO and RPO in placed
Integrated with legacy SSO
Database migration
Reduction of Cost
Apply security, compliance and regulation

Cloud Migration

The first phases of cloud migration starts with investigation of the existing platform across on-premise environments and also cloud services required for this migration. Proper planning and strategy would be in-placed as the next step of the roadmap in high level and details. Workload migration and implementation in an agility way, then Continuously Monitor, optimize and operate on infrastructure will be the next steps.

Therefore the planning phases of the migration would be divided a bit more on 6 steps which the timeline also would be prepared based on below phases.

Moreover, the journey of this migration will be divided into more details staged as belows:

As stated earlier there are 4 main migration stages within the approach. A continuously improvement and development would be in placed until the target is achieved properly (CI/CD)

And the key Steps to Cloud Migration Process:

System Requirement

For a FinTech application like a bank or digital bank or payment gateways, a consideration of following points are critically important:

Security
Performance
SLA
RTO & RPO
Cost Effective
Fault tolerance and HA

Based on the experiences with Fintech and IoT projects, there are many compliance and regulation for banking systems and payment gateways. Therefore, making sure all regulations are in place and certifications are satisfied, not only consideration of the items one by one is important, but also a third party tool to scan the entire system would be required. Having said that, cloud migration should seamlessly have a main focus on cloud security and adjustment, high performance architecture, make sure RTO/RPO are in place and can rapidly handle load, traffic, failover and ad hoc requests.

Cloud Security and Compliance

To improve the security of the system and reduce the vulnerabilities, consideration of a wide range of items within each corner of the system and cloud is required:

Reduce Attack Surface: Security group, direct connect, Jumper & VPN , WAF, Anti-DDoS, etc
Reduce visibility and tracking: Cloudfront, API Gateway mapping, KMS, etc.
DevSecOps and Automation: CI/CD, Infra as Code, Change management, Code Review, etc.
Granular Privilege and Key Management: IAM, KMS, CloudTrail, etc.
Compliance and Governance: PCIDSS compliance, PCI 3.2, SSL/TLS Handshake, GDPR, etc.

As the entire environment should be under VPC and security of it might be handled by security groups assigned to it, connecting to that VPC would be possible through VPN and a direct connect service between on-prem and cloud environments. DNS and APIs should be covered under cloudfront and API Gateway. Infra as code should be widely used to make sure every single changes is manageable and reviewed by compliance team before deployment SSL/TLS handshaking is used across api calls on top of application layers. Using KMS to encrypt user and credit card tokens, KMS Allows to encrypt data within the application using the keys that you create and control.

Performance

Having a proper vcpu and memory for application is very important, especially when the system is not containerized yet and running in a windows server for example.

Select appropriate instances: vCPU & memory, EBS, EFS, S3, ...
Autoscaling: Enterprise ad hoc effort and on demand resources, ...
Caching service: ElastiCache, Redis, ...
Adopt to microservice architecture: Individually deploy, operate and scale, ...
Event-driven architecture: Serverless computing, Lambda, long-time operation
Monitoring and notification: Cloudwatch, APM, SNS, ...

SLA and SLO

SLO is tighter than the SLA. The SLOs are generally used for internal only, and the SLAs are for external. Initial target commonly starts from 99.5% availability.

Service-Level Objective (SLO): goal that service provider wants to reach.
- Monitoring, Tracking, CICD, Alerting
Service-Level Agreement (SLA): contract that the service provider promises customers.
- Alerting, Monitoring

The matter here is the operation team must always be beyond the SLA level, that is why it is recommended to follow SLO for operations on cloud. Therefore monitoring and alerting systems would be triggered before hitting the threshold. It will help HA and RTO to always hit within the target.

RTO and RPO

Response time and recovery time are 2 main key and advantage of using cloud when the architect is proper by using below services and targets:

Recovery Time Objective (RTO):
- Backup, DR, Multi-zone application and database.
Recovery Point Objective (RPO):
- DR, Multi-zone application and database, autoscaling.

The matter of recovery is to have redundancy for every critical service within the platform and architecture. For example to hit RTO within 1 day and RPO in 15 minutes, we need to make sure:

We have a proper architecture across multi-zones,
We do have database backup more frequent,
We are using shared storage and disc to be usable by other instances.
Autoscaling and load balancing is in-placed
Monitoring is everywhere as much as possible

Cost Effective

Even though it’s a time-consuming process, the cloud can provide extensive financial benefits like budget estimation and planning, budget savings and increased workplace productivity. In fact, companies can save an average of 15 percent on all IT costs by migrating to the cloud, and getting benefits like:

On-demand services
Autoscaling
Reduces the necessary amount of hardware (CAPEX)
Less demanding labor and maintenance (OPEX)
Higher productivity
Pay as you go ( lower initial capital investment)
etc.

Cloud price calculators will help to estimate the yearly or quarterly cost, therefore, by adjusting the services and finding proper on-demand usage, managing the cost would be easily achievable. It is very important to plan the migration strategy properly, then manage the cost. Importantly, cloud solutions are available in a pay-as-you-go pricing model. This format provides savings and flexibility in several ways

High Availability

Using a clustered architecture within the cloud is one of the solutions you can achieve High Availability. A high availability cluster is a group of servers that act as a single server to provide continuous uptime. These servers will have access to the same shared storage for data, so if a server is unavailable, the other servers pick up the load. A high availability cluster can be anything from two to dozens of servers. As well as providing failover, high availability clusters also allow auto-scaling and load balancing of workloads so that any server within the cluster will not get overloaded and you can provide more consistent performance. So the basic elements of high availability are as follows:

Redundancy: ensuring that any elements critical to system operations have an additional, redundant component.
Monitoring: collecting data from a running system and detecting when a component fails or stops responding.
Failover: a mechanism that can switch automatically from the currently active component to a redundant component

Moreover, components enabling high availability are as follows:

Data backup and recovery

Load balancing

Clustering

Define availability metrics

Percentage of Uptime

Mean Time to Recovery (MTTR)

Mean Time between Failures (MTBR)

Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

</ul> Most of the systems need a High-Performing Architecture, which can satisfy Security, High Availability and fast Failover. This will help to reduce RTO and RPO as main requirements. For example below items are required in the first phase to have high availability in the beginning:

Shared storage
Load-balancing
Autoscaling
Multi-zone database
API Gateway
Security services
Monitoring services

If it was IoT for example, better to use queue and some DB like DynamoDB, but for Fintech, performance and false tolerance are the main items as HA, therefore, using RDS would be a more suitable choice. At the same time changing the architecture into a more reliable and cost effective way and containerizing them then utilizing EKS and ECR would help more on target achievement.

Cloud Operations

The key is to get use of a proper DevOps culture for operation of new architecture in the cloud to achieve a correct agility. This would be possible by integrating Technology, Process, People and Environment.

Adaptation of Cloud technologies
Break up application architecture into more service-based architectures
Find evolving orchestration that works in our organization
Deployment
Reliable
Quickly Validated
Reversible
Replace an approval with a notification
Technology vendors
Service Providers

With a combination of cloud operation services and some extra tools (open source or subscription based) we can make seamless cloud operations.

CI/CD in placed and easier in cloud
- CodeDeploy, Jenkins, etc.
Monitoring
- CloudWatch, Grafana, Zabbix, APM, etc.
Infra as code
- Terraform, Ansible, etc.
Change management
- Jira, Word, etc.
Alerting
- SNS, CloudTrail

Cloud Operations

Pricing is calculated based on Compute, Storage, and Data transfer, therefore budget estimation and planning, budget savings is possible based on cloud cost calculator. For this architecture focusing on belows are recommended:

When to use reserved instances over on-demand or spot instances.
Redundancy
TPS
Database size
For a pilot project using pay-as-you-go is also recommended

As stated earlier, calculating and budget saving is possible through managing service utilization, based on using exactly what you need, and not reserve any useless services.

Migration Checklist

After confirmation on the architecture a kick-off migration checklist is required with can be extracted from a high level migration checklist accordingly

Build Cloud Strategy and Benefits

As a very quick overview, we can see the benefits of cloud across proposed architecture

Migration Triggers

In another way, the migration process which trigger for a platform would be summarized as follows:

Cloud Migration Timeline

In general a sample timeline for such migration would be itemized as follows:

In conclusion, below is a simple example for a general optimized architecture in AWS cloud:

Eventually, this migration would be itemized in a few phases, which would help the migration to be concluded in a more proper architecture as it needs. It would help to cover many gaps within your requirement documents. To make a proper clustered environment using EKS, and Container services hosted in ECR:

It will help to improve HA even higher that previous one
Easily doing continuous improvement and delivery
In case of deployment and rollback, it would be happen within a couple of minutes when you have container versions in ECR
Pods, and Microservices can work as stand alone and scalability would be per services, and not only helping the scalability and HA but also reduce the cost
Same container and package can be tested in sandbox or staging environment before reaching to the production
Resources can be managed easier per services requirement
Security would be higher by reducing the attack surfaces, visibility and increase automation.

ABOUT ATABAK

Atabak is a Software and Data Engineering Consultant

FOLLOW ATABAK