Exploring the Benefits and Challenges of AWS Adoption

The cloud is easier, they say.

But the cloud can be expensive. Very expensive.

Many years ago, I’ve chosen to migrate a traditional on-premise website to AWS. And here are the challenges and benefits I saw.

In traditional on-premise servers, I pay once for the servers, and pay a maintenance service to keep it up and running with functional hardware.

With the Cloud (AWS or other) , I pay a monthly rent for the resources I use. But at equivalent hardware, the cloud is more expensive.

So is it really suitable to deploy on AWS instead of a traditional on-premise?

In a traditional on-premise server, I can do whatever I want. I can install whatever solution I want. But I have to configure it and keep it up to date. By myself.

In AWS, I can customize an EC2 to install whatever I want too. But I also have all the ready-to-use solutions.

Challenges: costs, knowledge, security, downtime/volatility of the servers.

Benefits: high resources customization, managed services, separation of concerns.

Context

We had a stateful application. A Spring Boot monolith application with a ReactJS frontend. And PostgreSQL as the database. Nothing fancy.

But it was very hard to customise the resources. As it’s on-premise, we could only buy new servers if we need more resources. Because the actual architecture was getting small. We had to decide to buy more servers or migrate to the Cloud.

As the resources are limited, it was very hard to deploy new tools. We even shared some of the resources between the environments. This makes the environments very connected. The load on the staging environment impacted the production environment. 😱

We needed to isolate the environments. We needed to add new servers and services, as a MongoDB cluster.

So, we thought about AWS.

Challenges

Using the same architecture makes AWS more expensive. This is not the argument to use with our stakeholders.

We even had no or little experience.

For simplicity, we decided to follow the same architecture as the old one: many EC2 instances and an RDS database. Nothing more.

Having separated EC2 instances, in separated VPCs gave us the isolation we were looking for.

For the RDS, just a plain PostgreSQL.

And the frontend was deployed on S3. As static frontend.

And we configured everything with the same resources (CPU and Memory) as the old architecture.

We made it simple.

How about the costs? With this configuration, we paid more on AWS. A lot more. But we have more flexibility. 😅

The migration

The application takes about 2 minutes to startup. Yes, a lot of legacy code.

We had a big database, some terabytes.

We wanted to do the migration on night to limit the downtime of the application. As it’s an e-commerce website, every minute of downtime is money loosed.

What were the steps?

First, we created the platform and migrated the staging environment.

This way we could test the AWS environment, adjust the resources and see how AWS really works.

One week before, we created the resources for the production environment. EC2, RDS and S3. We deployed a production-ready application, but hidden to the Internet. This way we could test again all the infrastructure.

The day before. We stopped the production late on night. To ensure no more users are making changes to the database. Then we start a dump and restore of the database. It took hours.

Meanwhile, we updated again the web application and frontend files.

Another point which also takes some time to be live is the DNS records.

We have to update the DNS records to make our URLs to point to the new IPs. This isn’t an immediate change.

Once the database was restored, we started everything.

Cross the fingers.

Hopefully, no problems were found.

But, at the end of the month, we found our first problem. We were prepare to that, but it’s not easy to see it for real.

The bill.

And now?

We started a refactoring to decrease the bill. We made everything possible to migrate from a vertical architecture to an horizontal architecture.

This means that we wanted an application that could run in several servers in parallel, instead of running on a single big server.

Because until there, we had no resilience at all. If one component crashes, all the application becomes unavailable.

Conclusion

We chose the cloud. We chose AWS because it was tendency.

But in fact it solved little of our problems.

In fact in costed us a lot of time and money.

When done, we realised that the problem was not the platform but the application.

But there are also good points.

We learned a lot about the Cloud and AWS.

We cleaned a lot of old and no more used components.

And finally, we did everything to have a resilient platform and decrease the costs.

This was not possible maintaining the old platform. Because the urge of a new bill was so significant that we placed the architecture refactoring in the highest priority.

The Dev World – Sergio Lema