Improve Performance and Reduce Downtime in Your Node.js API with Clusters
High-Performance API Challenges
One of the main challenges in high-performance API development is ensuring the server can handle a large number of simultaneous requests without compromising performance. This is especially important in high-demand scenarios, where the number of requests can increase rapidly, impacting response time and user experience.
To solve this problem, it's crucial to adopt scalability and optimization strategies that allow the server to efficiently distribute requests. One such strategy is the use of clusters in Node.js.
What Are Clusters in Node.js?
In Node.js, the cluster module allows you to create multiple processes that share the same server, making better use of machine resources — especially CPU cores. In multi-core systems, Node.js by default only uses one core, which limits its processing capacity under high load. Using clusters, it's possible to distribute the workload across different processes, improving performance and scalability.
Clusters allow each worker process to be responsible for a set of requests, while the primary process coordinates distribution, balancing the load among workers.
Step 1: Creating a Simple API
First, let's create a simple API that performs a heavy computation: finding prime numbers. Prime calculation is computationally expensive, and its implementation in a single execution thread can easily freeze the API when receiving multiple simultaneous requests.
import { createServer } from 'node:http';
function findPrimes(limit) {
const primes = [];
for (let i = 2; i <= limit; i++) {
let isPrime = true;
for (let j = 2; j < i; j++) {
if (i % j === 0) {
isPrime = false;
break;
}
}
if (isPrime) primes.push(i);
}
return primes;
}
function createPrimeServer() {
return createServer((req, res) => {
if (req.url.startsWith('/primes')) {
const urlParams = new URL(req.url, `http://${req.headers.host}`);
const limit = parseInt(urlParams.searchParams.get('limit'), 10) || 10000;
const primes = findPrimes(limit);
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ limit, primes }));
}
});
}
function startServer() {
const server = createPrimeServer();
server.listen(3000, () => console.log('Server running on http://localhost:3000'));
}
startServer();
Step 2: Load Testing the API
We can use wrk to simulate multiple simultaneous requests and observe performance.
wrk -t4 -c500 -d10s http://localhost:3000/primes?limit=10000
Results without clusters:
4 threads and 500 connections
Latency 1.00s 577.21ms 1.98s 58.16%
Req/Sec 25.29 12.11 50.00 70.90%
489 requests in 10.10s, 2.87MB read
Socket errors: connect 0, read 13, write 0, timeout 391
Requests/sec: 48.42
Transfer/sec: 290.82KB
Step 3: Improving the API with Clusters
Now let's implement the Node.js cluster module to split the workload across multiple processes:
import cluster from 'node:cluster';
import { createServer } from 'node:http';
import { cpus } from 'os';
function findPrimes(limit) {
const primes = [];
for (let i = 2; i <= limit; i++) {
let isPrime = true;
for (let j = 2; j < i; j++) {
if (i % j === 0) { isPrime = false; break; }
}
if (isPrime) primes.push(i);
}
return primes;
}
function createPrimeServer() {
return createServer((req, res) => {
if (req.url.startsWith('/primes')) {
const urlParams = new URL(req.url, `http://${req.headers.host}`);
const limit = parseInt(urlParams.searchParams.get('limit'), 10) || 10000;
const primes = findPrimes(limit);
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ limit, primes }));
}
});
}
function startServer() {
const server = createPrimeServer();
server.listen(3000, () => console.log(`Worker ${process.pid} running`));
}
function handleWorkerExit() {
cluster.on('exit', (worker) => {
console.log(`Worker ${worker.process.pid} died. Spawning a new worker...`);
cluster.fork();
});
}
function initializeApp(startServerCallback) {
if (cluster.isPrimary) {
const numCPUs = cpus().length;
console.log(`Master PID: ${process.pid}. Forking ${numCPUs} workers...`);
for (let i = 0; i < numCPUs; i++) cluster.fork();
handleWorkerExit();
} else {
startServerCallback();
}
}
initializeApp(startServer);
Step 4: Load Testing the Optimized Version
wrk -t4 -c500 -d10s http://localhost:3000/primes?limit=10000
Results with clusters:
4 threads and 500 connections
Latency 983.49ms 215.39ms 1.95s 81.92%
Req/Sec 76.26 47.21 333.00 79.52%
2902 requests in 10.09s, 17.02MB read
Socket errors: connect 0, read 0, write 0, timeout 87
Requests/sec: 288.52
Transfer/sec: 1.69MB
Results Analysis
| Metric | Without Clusters | With Clusters | Improvement |
|---|---|---|---|
| Requests/sec | 48.42 | 288.52 | ~6x |
| Transfer/sec | 290.82 KB | 1.69 MB | ~5x |
| Read errors | 13 | 0 | eliminated |
| Timeouts | 391 | 87 | 78% fewer |
Latency dropped slightly and became far more consistent — the standard deviation fell from 577ms to 215ms, meaning the server now responds predictably under load.
Conclusion
The results show that introducing clusters brought significant improvements across all tested aspects:
- Performance: Nearly 6x increase in requests processed per second
- Reliability: Dramatic reduction in errors and timeouts
- Efficiency: Transfer rate increased proportionally
Using clusters not only improves performance but also makes the application more robust and scalable. This approach is highly recommended for APIs that process computationally expensive tasks or face high traffic volumes.