How to Calculate Server Max Requests per Second (2024)

Rizal Widyarta Gowandy

Published in

Geek Culture

5 min read

Dec 31, 2020

Beginner guide on how to calculate maximum requests per second that a server can handle.

How to Calculate Server Max Requests per Second (3)

One of the important part before creating a new service is called capacity planning. In this process, we calculate how many resources are we are required for the service to able to handle the projected traffic.

In Layman’s terms, this planning will be like planning for a trip for a bunch of people, knowing how many people that will go, can help you determine how many cars are required for the trip. Let’s say, there are 10 people, and each car can be ridden by max 5 people, it means you need at least 2 car for this trip.

What about a server? How do you calculate the server size assuming you are expecting to have maximum 10000 RPS? To do that you need to know “the car” or the server, how many requests can it handle?

Before doing any actual calculation, it necessary for us to know the boundary or the server limitation because the server only has a limit number of resource, just like a car. What are these boundaries?

A computer can do a calculation. This calculation requires CPU to work. Processing image, and converting a string to byte are examples of the process that requires calculation. Hence, the limiting factor is the amount of CPU power the machine has. The limit here is the CPU power and number of cores. Basically, more cores equal to more workers. More workers mean more tasks can be done, and the higher the RPS.

In a CPU bound system, we can calculate the number of RPS using this formula:

How to Calculate Server Max Requests per Second (4)

For example, a server with a total number of cores 4 and task duration 10ms can handle 400 RPS while the same server with task duration 100ms can only handle 40 RPS.

A computer requires information or data to work. It fetches the data from a database, reads a file, or get the data from the network by calling another computer. CPU is doing nothing most of the time while waiting for the data. Since a computer can have multiple workers, other workers can still do the task while a worker waiting for the previous process to be done.

From the previous part, we know that a computer is limited by the number of workers but a computer needs to store the data in memory before doing any operation on the data itself. The limit here is the RAM. Basically, more memory equals more workers. More workers mean more tasks can be done, and the higher the RPS.

In a memory-bound system, we can calculate the number of RPS using this formula:

How to Calculate Server Max Requests per Second (5)

For example, a server with a total RAM 16Gb, tasks memory usage 40Mb, and task duration 100ms can handle 4000 RPS while the same server with task duration 50ms (half the previous one) can handle 8000 RPS.

In order to have more RPS, a server need to have more RAM, smaller tasks memory usage, and faster task duration.

A computer can talk to another computer. To talk to another, a computer needs to create a request, and the receiving computer needs to accept the request. In a UNIX based computer, both creating and accepting a request will create a file descriptor. A computer has a limited number of file descriptors. The limit here is the open files limit. Open files limit has soft limits and hard limits.

The soft limits are the ones that actually affect processes; hard limits are the maximum values for soft limits. Any user or process can raise the soft limits up to the value of the hard limits.

These limits highly depend on the operating system. By default, the limits are 1024 for soft limits and 4096 for hard limits. The open files limit is a common limitation where the RPS cannot arise even though the server still have CPU and memory usage room. Higher the open files limit means more requests can be received and made resulting higher RPS.

Another different kind of I/O bound is network bandwidth. Remember that when a computer talks to another computer it basically sends data from one computer to another. When the server is data-intensive, receives or sends Gb size data, but only has Mb size network bandwidth, it only normals when the RPS is very low.

Normally a server follows an ups and downs pattern where at night the load decreases, and during the day it increases up to a certain point, stays there for a while, then decreasing again. To calculate the system load, we need to know the maximum number of requests going to arrive at any second during a sustained period of time.

Let’s say, we are expecting 3.6 million incoming requests to arrive within 1 hour, meaning the server needs to handle 1000 RPS using the formula:

How to Calculate Server Max Requests per Second (6)

Let’s say it takes 10ms for a server to finish each requires, and we have a 4 core server, it means our server can only handle 400 RPS. In this case, we require at least 3 instances using the formula:

How to Calculate Server Max Requests per Second (7)

Now we know that we need at least 3 instances and each instance handles only 400 RPS, and we are expecting it takes 3.6Gb memory usage for each request, we can conclude we need a server with 4Gb RAM size using the formula:

How to Calculate Server Max Requests per Second (8)

Lastly, as you can see we need a lot of educated guess regarding the total number of requests, how long it takes to handle a request, and how much memory each request requires. Hence, this calculation should only be used on capacity planning. For real condition, it still better to do some load tests to know the current RPS limit for each of our servers since most of the time a server depends on another server performance, the bottleneck itself could be not located inside your service but on the underlying service.

Voilà, and we are done. Now, you guys should have a better understanding regarding server RPS, what are the boundaries, how to calculate the required server to handle a certain number of requests.