I have a microservices architecture.
Many services running in a cluster of servers.
But how many services do I need?
How many servers I need in my cluster?
How much memory and CPU can I allocate to each service?

How Many Instances for a Service
If I’m in a microservices architecture, it’s because I need resilience, I need my application to be up and running even if an error occurs.
To obtain resilience, I need to configure my services to be running with almost two instances of each service. Why? If one fails, the other remains active to receive the requests.
Two instances are enough? It depends on the probability an instance fails.
Take one year of production. Check how many hours a single instance was off due to an error (exclude the planned actions). Transform it in a percentage.
- 10 hours off per year
- 8760 hours in a year
- 10 / 8760 = 0.1% off
Multiply it by the number of instances until the percentage is acceptable.
- 1 instance -> 0.1% off
- 2 instances -> 0.1 * 0.1 = 0.01% off
- 3 instances -> 0.1 * 0.1 * 0.1 = 0.001% off
Here you are.
Memory and CPU Allocation for an Instance
The best scenario case is having a constant load. So, calibrate the CPU and memory to be all the time at 100%. All consumed, nothing wasted.
But constant load doesn’t exist.
So I try to select an adequate load to be at 80%.
What about the peaks? When talking about CPU peaks, the server will slow down if the CPU consumed is too much time at 100%. So, how long are the peaks? Calibrate the maximum CPU based on that.
When talking about memory, it’s another game. Having a peak at 100% means an out of memory. This has two results, the application is blocked or restarted. So calibrate the memory to never reach the 100% with the peaks.
How Many Servers for the Cluster
Same as for the number of instances for a single service. At least two to have resilience. If one goes down, the other is up an running to received the load.
More will depend on the stability of the servers. The same rule applies here.
Memory and CPU Allocation for a Server
This time, the CPU and memory consumption will be stable. As the services have a configured memory and CPU allocated. They won’t overload those values.
But what happens if a server of my cluster goes down?
To maintain the stability of the application, the microservices orchestrator will dispatch the the demanded services into the remaining servers.
I must ensure that if a server goes down, the application won’t notice it. I must ensure that if a server goes down there is enough resources to start again all the services which were on the server gone.
Let’s see it.
- I have two servers in my cluster;
- I have 3 services running in each server; 6 services in total;
- If one server goes down, all the 6 services will be moved to the running server.
This means that having two servers in my cluster, I must leave 50% of the capacity free in case of the other goes down.
What if I have three servers in my cluster?
- I have three servers in my cluster;
- I have 3 services running in each server; 9 services in total;
- If one server goes down, I need to move 3 services to the remaining servers.
- One server will have 5 services running, and the other server will have 4 services running.
I need to calibrate my servers in order to have enough free capacity to handle the missing services of a single server.
If the probability of a server to go down is higher, I need to calibrate the servers capacity to handle the missing services of two servers (or more).
Conclusion
Until the cluster is up and running with real users, there is not way to really see the CPU and memory consumption.
Until the cluster is in production, the previous calculations are based on assumptions.
But I always follow the previous rules to calibrate any microservices architecture.



Leave a comment