Articles
Nov 23, 2024·7 min read

Improve Performance and Reduce Downtime in Your Node.js API with Clusters

High-Performance API Challenges

One of the main challenges in high-performance API development is ensuring the server can handle a large number of simultaneous requests without compromising performance. This is especially important in high-demand scenarios, where the number of requests can increase rapidly, impacting response time and user experience.

To solve this problem, it's crucial to adopt scalability and optimization strategies that allow the server to efficiently distribute requests. One such strategy is the use of clusters in Node.js.

What Are Clusters in Node.js?

In Node.js, the cluster module allows you to create multiple processes that share the same server, making better use of machine resources — especially CPU cores. In multi-core systems, Node.js by default only uses one core, which limits its processing capacity under high load. Using clusters, it's possible to distribute the workload across different processes, improving performance and scalability.

Clusters allow each worker process to be responsible for a set of requests, while the primary process coordinates distribution, balancing the load among workers.

Step 1: Creating a Simple API

First, let's create a simple API that performs a heavy computation: finding prime numbers. Prime calculation is computationally expensive, and its implementation in a single execution thread can easily freeze the API when receiving multiple simultaneous requests.

import { createServer } from 'node:http';

function findPrimes(limit) {
  const primes = [];
  for (let i = 2; i <= limit; i++) {
    let isPrime = true;
    for (let j = 2; j < i; j++) {
      if (i % j === 0) {
        isPrime = false;
        break;
      }
    }
    if (isPrime) primes.push(i);
  }
  return primes;
}

function createPrimeServer() {
  return createServer((req, res) => {
    if (req.url.startsWith('/primes')) {
      const urlParams = new URL(req.url, `http://${req.headers.host}`);
      const limit = parseInt(urlParams.searchParams.get('limit'), 10) || 10000;
      const primes = findPrimes(limit);
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ limit, primes }));
    }
  });
}

function startServer() {
  const server = createPrimeServer();
  server.listen(3000, () => console.log('Server running on http://localhost:3000'));
}

startServer();

Step 2: Load Testing the API

We can use wrk to simulate multiple simultaneous requests and observe performance.

wrk -t4 -c500 -d10s http://localhost:3000/primes?limit=10000

Results without clusters:

4 threads and 500 connections
  Latency     1.00s   577.21ms   1.98s    58.16%
  Req/Sec    25.29     12.11    50.00     70.90%
489 requests in 10.10s, 2.87MB read
Socket errors: connect 0, read 13, write 0, timeout 391
Requests/sec:  48.42
Transfer/sec:  290.82KB

Step 3: Improving the API with Clusters

Now let's implement the Node.js cluster module to split the workload across multiple processes:

import cluster from 'node:cluster';
import { createServer } from 'node:http';
import { cpus } from 'os';

function findPrimes(limit) {
  const primes = [];
  for (let i = 2; i <= limit; i++) {
    let isPrime = true;
    for (let j = 2; j < i; j++) {
      if (i % j === 0) { isPrime = false; break; }
    }
    if (isPrime) primes.push(i);
  }
  return primes;
}

function createPrimeServer() {
  return createServer((req, res) => {
    if (req.url.startsWith('/primes')) {
      const urlParams = new URL(req.url, `http://${req.headers.host}`);
      const limit = parseInt(urlParams.searchParams.get('limit'), 10) || 10000;
      const primes = findPrimes(limit);
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ limit, primes }));
    }
  });
}

function startServer() {
  const server = createPrimeServer();
  server.listen(3000, () => console.log(`Worker ${process.pid} running`));
}

function handleWorkerExit() {
  cluster.on('exit', (worker) => {
    console.log(`Worker ${worker.process.pid} died. Spawning a new worker...`);
    cluster.fork();
  });
}

function initializeApp(startServerCallback) {
  if (cluster.isPrimary) {
    const numCPUs = cpus().length;
    console.log(`Master PID: ${process.pid}. Forking ${numCPUs} workers...`);
    for (let i = 0; i < numCPUs; i++) cluster.fork();
    handleWorkerExit();
  } else {
    startServerCallback();
  }
}

initializeApp(startServer);

Step 4: Load Testing the Optimized Version

wrk -t4 -c500 -d10s http://localhost:3000/primes?limit=10000

Results with clusters:

4 threads and 500 connections
  Latency   983.49ms  215.39ms   1.95s    81.92%
  Req/Sec    76.26     47.21   333.00     79.52%
2902 requests in 10.09s, 17.02MB read
Socket errors: connect 0, read 0, write 0, timeout 87
Requests/sec:  288.52
Transfer/sec:  1.69MB

Results Analysis

MetricWithout ClustersWith ClustersImprovement
Requests/sec48.42288.52~6x
Transfer/sec290.82 KB1.69 MB~5x
Read errors130eliminated
Timeouts3918778% fewer

Latency dropped slightly and became far more consistent — the standard deviation fell from 577ms to 215ms, meaning the server now responds predictably under load.

Conclusion

The results show that introducing clusters brought significant improvements across all tested aspects:

  • Performance: Nearly 6x increase in requests processed per second
  • Reliability: Dramatic reduction in errors and timeouts
  • Efficiency: Transfer rate increased proportionally

Using clusters not only improves performance but also makes the application more robust and scalable. This approach is highly recommended for APIs that process computationally expensive tasks or face high traffic volumes.