Monitoring Quiz API Performance with Prometheus and Grafana

You Cannot Fix What You Cannot See

Your quiz API might be running fine right now, but when 500 students hit it during an exam, will you know about the latency spike before they start complaining? Monitoring with Prometheus and Grafana gives you visibility into request latency, error rates, database performance, and quiz-specific metrics like completion rates and scoring distributions.

This guide walks you through instrumenting a Node.js quiz API, defining custom metrics, building dashboards, and creating alerts.

Prerequisites

Node.js quiz API (Express or Fastify)
Docker for running Prometheus and Grafana locally
Basic understanding of HTTP metrics

Setting Up prom-client

Install the Prometheus client library:

npm install prom-client

Create a metrics module at src/metrics.ts:

1import {
2  Registry,
3  Counter,
4  Histogram,
5  Gauge,
6  collectDefaultMetrics,
7} from "prom-client";
8
9export const registry = new Registry();
10
11// Collect Node.js runtime metrics (memory, CPU, event loop)
12collectDefaultMetrics({ register: registry });
13
14// HTTP request metrics
15export const httpRequestDuration = new Histogram({
16  name: "http_request_duration_seconds",
17  help: "Duration of HTTP requests in seconds",
18  labelNames: ["method", "route", "status_code"],
19  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
20  registers: [registry],
21});
22
23export const httpRequestTotal = new Counter({
24  name: "http_requests_total",
25  help: "Total number of HTTP requests",
26  labelNames: ["method", "route", "status_code"],
27  registers: [registry],
28});
29
30// Quiz-specific metrics
31export const quizCompletionDuration = new Histogram({
32  name: "quiz_completion_duration_seconds",
33  help: "Time taken to complete a quiz",
34  labelNames: ["quiz_id", "difficulty"],
35  buckets: [30, 60, 120, 300, 600, 900, 1800],
36  registers: [registry],
37});
38
39export const quizScore = new Histogram({
40  name: "quiz_score_percentage",
41  help: "Distribution of quiz scores as percentages",
42  labelNames: ["quiz_id", "difficulty"],
43  buckets: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
44  registers: [registry],
45});
46
47export const quizSubmissions = new Counter({
48  name: "quiz_submissions_total",
49  help: "Total quiz submissions",
50  labelNames: ["quiz_id", "difficulty", "passed"],
51  registers: [registry],
52});
53
54export const activeQuizSessions = new Gauge({
55  name: "active_quiz_sessions",
56  help: "Number of currently active quiz sessions",
57  registers: [registry],
58});
59
60// Database metrics
61export const dbQueryDuration = new Histogram({
62  name: "db_query_duration_seconds",
63  help: "Duration of database queries",
64  labelNames: ["operation", "table"],
65  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
66  registers: [registry],
67});
68
69export const dbConnectionPool = new Gauge({
70  name: "db_connection_pool_size",
71  help: "Current database connection pool size",
72  labelNames: ["state"],
73  registers: [registry],
74});

Instrumenting Express

Add middleware to capture HTTP metrics:

1import express from "express";
2import { registry, httpRequestDuration, httpRequestTotal } from "./metrics";
3
4const app = express();
5
6// Metrics endpoint for Prometheus to scrape
7app.get("/metrics", async (req, res) => {
8  res.setHeader("Content-Type", registry.contentType);
9  res.send(await registry.metrics());
10});
11
12// Request duration middleware
13app.use((req, res, next) => {
14  const end = httpRequestDuration.startTimer();
15
16  res.on("finish", () => {
17    const route = req.route?.path || req.path;
18    const labels = {
19      method: req.method,
20      route: normalizeRoute(route),
21      status_code: res.statusCode.toString(),
22    };
23
24    end(labels);
25    httpRequestTotal.inc(labels);
26  });
27
28  next();
29});
30
31// Normalize route paths to avoid high cardinality
32function normalizeRoute(path: string): string {
33  return path
34    .replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/g, "/:id")
35    .replace(/\/\d+/g, "/:id")
36    .replace(/\/cuid_[a-z0-9]+/g, "/:id");
37}

High cardinality is the most common Prometheus mistake. If you use raw paths with IDs as label values, you create a new time series for every unique quiz ID. The normalizeRoute function collapses these into generic patterns.

Instrumenting Quiz Logic

Add metrics to your quiz submission handler:

1import {
2  quizCompletionDuration,
3  quizScore,
4  quizSubmissions,
5  activeQuizSessions,
6} from "./metrics";
7
8app.post("/api/v1/quizzes/:id/submit", async (req, res) => {
9  const { id: quizId } = req.params;
10  const { answers, startedAt } = req.body;
11
12  try {
13    const quiz = await getQuiz(quizId);
14    const result = calculateScore(quiz, answers);
15
16    // Record completion time
17    if (startedAt) {
18      const durationSeconds = (Date.now() - new Date(startedAt).getTime()) / 1000;
19      quizCompletionDuration.observe(
20        { quiz_id: quizId, difficulty: quiz.difficulty },
21        durationSeconds
22      );
23    }
24
25    // Record score distribution
26    const percentage = (result.score / result.total) * 100;
27    quizScore.observe(
28      { quiz_id: quizId, difficulty: quiz.difficulty },
29      percentage
30    );
31
32    // Count submissions
33    const passed = percentage >= 70;
34    quizSubmissions.inc({
35      quiz_id: quizId,
36      difficulty: quiz.difficulty,
37      passed: passed.toString(),
38    });
39
40    // Decrement active sessions
41    activeQuizSessions.dec();
42
43    res.json(result);
44  } catch (err) {
45    res.status(500).json({ error: "Submission failed" });
46  }
47});
48
49// Track when quizzes start
50app.post("/api/v1/quizzes/:id/start", async (req, res) => {
51  activeQuizSessions.inc();
52  // ... start logic
53});

Database Query Instrumentation

Wrap your database client to capture query metrics:

1import { Pool } from "pg";
2import { dbQueryDuration, dbConnectionPool } from "./metrics";
3
4const pool = new Pool({ connectionString: process.env.DATABASE_URL });
5
6// Monitor connection pool
7setInterval(() => {
8  dbConnectionPool.set({ state: "total" }, pool.totalCount);
9  dbConnectionPool.set({ state: "idle" }, pool.idleCount);
10  dbConnectionPool.set({ state: "waiting" }, pool.waitingCount);
11}, 5000);
12
13// Instrumented query function
14export async function query(
15  text: string,
16  params?: unknown[]
17): Promise<any> {
18  const operation = text.trim().split(" ")[0].toUpperCase();
19  const table = extractTableName(text);
20
21  const end = dbQueryDuration.startTimer({ operation, table });
22
23  try {
24    const result = await pool.query(text, params);
25    end();
26    return result;
27  } catch (err) {
28    end();
29    throw err;
30  }
31}
32
33function extractTableName(sql: string): string {
34  const match = sql.match(/(?:FROM|INTO|UPDATE|JOIN)\s+(\w+)/i);
35  return match?.[1] ?? "unknown";
36}

Prometheus Configuration

Create prometheus.yml:

1global:
2  scrape_interval: 15s
3  evaluation_interval: 15s
4
5rule_files:
6  - "alert_rules.yml"
7
8scrape_configs:
9  - job_name: "quiz-api"
10    metrics_path: "/metrics"
11    static_configs:
12      - targets: ["host.docker.internal:3000"]
13        labels:
14          environment: "production"

Alerting Rules

Create alert_rules.yml:

1groups:
2  - name: quiz-api-alerts
3    rules:
4      - alert: HighErrorRate
5        expr: |
6          sum(rate(http_requests_total{status_code=~"5.."}[5m]))
7          /
8          sum(rate(http_requests_total[5m]))
9          > 0.05
10        for: 2m
11        labels:
12          severity: critical
13        annotations:
14          summary: "High error rate detected"
15          description: "More than 5% of requests are returning 5xx errors"
16
17      - alert: HighLatency
18        expr: |
19          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
20          > 2
21        for: 5m
22        labels:
23          severity: warning
24        annotations:
25          summary: "High API latency"
26          description: "95th percentile latency is above 2 seconds"
27
28      - alert: DatabaseSlowQueries
29        expr: |
30          histogram_quantile(0.99, rate(db_query_duration_seconds_bucket[5m]))
31          > 1
32        for: 3m
33        labels:
34          severity: warning
35        annotations:
36          summary: "Slow database queries"
37          description: "99th percentile query duration is above 1 second"
38
39      - alert: LowQuizPassRate
40        expr: |
41          sum(rate(quiz_submissions_total{passed="true"}[1h]))
42          /
43          sum(rate(quiz_submissions_total[1h]))
44          < 0.2
45        for: 30m
46        labels:
47          severity: info
48        annotations:
49          summary: "Low quiz pass rate"
50          description: "Less than 20% of submissions are passing - questions may be too difficult"

Docker Compose Setup

Run Prometheus and Grafana locally:

1# docker-compose.monitoring.yml
2services:
3  prometheus:
4    image: prom/prometheus:v2.53.0
5    ports:
6      - "9090:9090"
7    volumes:
8      - ./prometheus.yml:/etc/prometheus/prometheus.yml
9      - ./alert_rules.yml:/etc/prometheus/alert_rules.yml
10    extra_hosts:
11      - "host.docker.internal:host-gateway"
12
13  grafana:
14    image: grafana/grafana:11.1.0
15    ports:
16      - "3001:3000"
17    environment:
18      GF_SECURITY_ADMIN_PASSWORD: admin
19    volumes:
20      - grafana-data:/var/lib/grafana
21
22volumes:
23  grafana-data:

Start the stack:

docker compose -f docker-compose.monitoring.yml up -d

Grafana Dashboard

After connecting Prometheus as a data source in Grafana, create panels with these PromQL queries:

Request rate:

sum(rate(http_requests_total[5m])) by (route)

95th percentile latency:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))

Error rate:

sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

Quiz score distribution:

histogram_quantile(0.5, sum(rate(quiz_score_percentage_bucket[1h])) by (le, quiz_id))

Active sessions:

active_quiz_sessions

Summary

Prometheus and Grafana give you complete visibility into your quiz API. The combination of standard HTTP metrics, database performance tracking, and quiz-specific metrics like score distributions and pass rates lets you understand both technical and product health.

Key points:

Normalize route paths to avoid high-cardinality label problems
Instrument both the HTTP layer and the business logic layer
Set alerts on error rates, latency, and slow database queries
Track quiz-specific metrics like pass rates to catch content problems
Use histograms with meaningful bucket boundaries for latency and scores

Monitoring Quiz API Performance with Prometheus and Grafana

You Cannot Fix What You Cannot See

Prerequisites

Setting Up prom-client

Instrumenting Express

Instrumenting Quiz Logic

Database Query Instrumentation

Prometheus Configuration

Alerting Rules

Docker Compose Setup

Grafana Dashboard

Summary

Enjoyed this article?

Stay Updated

Related Articles

Building a Quiz Import/Export System

Rate Limiting Your Quiz API: A Practical Guide

Scaling Quiz Delivery: From 100 to 100,000 Concurrent Players