Debug Server Performance

I had the same application running in two distinct servers. One in AWS and the other in Scaleway. You might think the one in AWS was faster, but in fact, it was 4 to 5 times slower.

The most interesting part was that both had 2 vCPU and 4 GB of RAM. So, how there be such a big difference.

In this article, I’m gonna describe all the steps I’ve followed to determine which caused the performance difference, and how I’ve solve it. Yes, I’ve used ChatGPT to help me.

What does the application looks like

The application was a simple Strapi backend with an SQLite database. For the instance in Scaleway, I’ve used a play2-nano with 2 vCPU and 4 GB of RAM. And for the AWS EC2 instance, I’ve used a t3.large with 2 vCPU and 8 GB of RAM.

The Scaleway instance was the legacy server, and the objective is to migrate to an AWS infrastructure where I can use Terraform, CloudWatch and backups easily.

When many many users complained about the high response times of the AWS instance, I’ve prepared a JMeter plan which request the Strapi backend at several endpoints. I didn’t use a large load, just 5 threads to be sure that the server isn’t saturated.

Once the plan is run, I pick the max response time, the min response time and the average response time. There I see that the Scaleway is 4 to 5 times faster than the AWS server.

Now, let’s test new things.

Upgrade the AWS Instance

AWS is known to be very flexible when choosing an instance. I can easily choose the number of vCPUs and the amount of RAM. So, let’s duplicate the instance type and test again.

With 4 vCPU and 16 GB of RAM, I expected to have better results. But in fact, I get again the same response times. Yes, again 4 to 5 times slower than the Scaleway instance.

t3.xlarge and t3.2xlarge gave the same results. Ok, the T series in AWS are for shared CPU machines, so let’s move to another instance type: m5.large, with 2 vCPU and 8 GB of RAM.

And the results? Well, it didn’t change at all.

What conclusion can I make right now? That increasing the CPUs and RAM doesn’t impact the Strapi backend. Why? In fact, Strapi and SQLite run in a single thread. Once started and allocated all the necessary RAM, the application doesn’t need anymore. If I upgrade more and more the ressources, it will be wasted ressources.

What’s the next step?

Check Server Performance

With the help of ChatGPT, I’ve tried to figure out the speed of the hard drive first. What if my hard drive has the IOPSs limited? And how do I check it?

Let’s first check the write operations in the hard drive with the following command:

$ dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.6218 s, 141 MB/s

This gave me the speed of the write operations, an average of 141 MB per seconds.

Let’s now check the read operations with the following command:

$ dd if=testfile of=/dev/null bs=1G
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.18249 s, 149 MB/s

So, let’s compare the Scaleway instance and the AWS instance.

Here, I got that the AWS instance is in fact two times faster. WHAT?? Even having the double of IOPS in AWS makes the application 4 to 5 times slower.

So, the problem isn’t the hard drive. Let’s now check the CPU speed. With the following command I obtain the number of events the CPU is able to handle per second:

$ sysbench cpu --cpu-max-prime=20000 --threads=1 run
sysbench 1.1.0-3ceba0b (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   414.31

Throughput:
    events/s (eps):                      414.3078
    time elapsed:                        10.0022s
    total number of events:              4144

Latency (ms):
         min:                                    2.40
         avg:                                    2.41
         max:                                    3.08
         95th percentile:                        2.43
         sum:                                10000.83

Threads fairness:
    events (avg/stddev):           4144.0000/0.00
    execution time (avg/stddev):   10.0008/0.00

In the AWS instance, it was a little bit harder, because it’s not installed by default, and the yum install sysbench didn’t work, so I had to compile the source code.

But the results were very interesting. Now, I have that the AWS instance is 4 to 5 times slower than the Scaleway instance. This gave me a lot of information, because it explained why the results didn’t change when upgrading the instance from t3.xlarge to t3.2xlarge.

I’ve noticed some changes using the AWS EC2 M3 series. But it remained slower than the Scaleway instance.

Looking at the specifications of CPU, it look like the Scaleway instance uses machines very performant.

So, what’s the solution? The instances I was using from AWS were located in a new region in the Middle East. So, the available instance types were less performant and older. I’ve chosen the location to be closer to the final users.

So, let’s test other instances a little bit further but newer. I’ve tried with c6g.large and c6in.large which gave results a lot better. But still slower than Scaleway. Dammit.

Conclusion

After testing several instance types, I found that c6a.large gave better results, even better than Scaleway. But those instances were only available in Europe or North America. A lot further from my target users.

But let’s face it, the time spend to establish the connection is about 100 milliseconds.

Comments

One response to “Debug Server Performance”

Google Cloud Run Limitations: When Serverless Complexity Outweighs the Cost Savings – The Dev World – Sergio Lema

2026-06-10

[…] ran into similar performance surprises when comparing instance types. Here’s how I debugged server performance across AWS and Scaleway — the methodology applies directly to Cloud Run cold start […]

LikeLike