Back to Blog
Engineering

Monitoring Quiz API Performance with Prometheus and Grafana

Instrument your quiz API with Prometheus metrics, build Grafana dashboards, and set up alerts that catch problems before users notice.

Bobby Iliev2026-04-088 min read

You Cannot Fix What You Cannot See

Your quiz API might be running fine right now, but when 500 students hit it during an exam, will you know about the latency spike before they start complaining? Monitoring with Prometheus and Grafana gives you visibility into request latency, error rates, database performance, and quiz-specific metrics like completion rates and scoring distributions.

This guide walks you through instrumenting a Node.js quiz API, defining custom metrics, building dashboards, and creating alerts.

Prerequisites

  • Node.js quiz API (Express or Fastify)
  • Docker for running Prometheus and Grafana locally
  • Basic understanding of HTTP metrics

Setting Up prom-client

Install the Prometheus client library:

npm install prom-client

Create a metrics module at src/metrics.ts:

import {
  Registry,
  Counter,
  Histogram,
  Gauge,
  collectDefaultMetrics,
} from "prom-client";

export const registry = new Registry();

// Collect Node.js runtime metrics (memory, CPU, event loop)
collectDefaultMetrics({ register: registry });

// HTTP request metrics
export const httpRequestDuration = new Histogram({
  name: "http_request_duration_seconds",
  help: "Duration of HTTP requests in seconds",
  labelNames: ["method", "route", "status_code"],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
  registers: [registry],
});

export const httpRequestTotal = new Counter({
  name: "http_requests_total",
  help: "Total number of HTTP requests",
  labelNames: ["method", "route", "status_code"],
  registers: [registry],
});

// Quiz-specific metrics
export const quizCompletionDuration = new Histogram({
  name: "quiz_completion_duration_seconds",
  help: "Time taken to complete a quiz",
  labelNames: ["quiz_id", "difficulty"],
  buckets: [30, 60, 120, 300, 600, 900, 1800],
  registers: [registry],
});

export const quizScore = new Histogram({
  name: "quiz_score_percentage",
  help: "Distribution of quiz scores as percentages",
  labelNames: ["quiz_id", "difficulty"],
  buckets: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
  registers: [registry],
});

export const quizSubmissions = new Counter({
  name: "quiz_submissions_total",
  help: "Total quiz submissions",
  labelNames: ["quiz_id", "difficulty", "passed"],
  registers: [registry],
});

export const activeQuizSessions = new Gauge({
  name: "active_quiz_sessions",
  help: "Number of currently active quiz sessions",
  registers: [registry],
});

// Database metrics
export const dbQueryDuration = new Histogram({
  name: "db_query_duration_seconds",
  help: "Duration of database queries",
  labelNames: ["operation", "table"],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
  registers: [registry],
});

export const dbConnectionPool = new Gauge({
  name: "db_connection_pool_size",
  help: "Current database connection pool size",
  labelNames: ["state"],
  registers: [registry],
});

Instrumenting Express

Add middleware to capture HTTP metrics:

import express from "express";
import { registry, httpRequestDuration, httpRequestTotal } from "./metrics";

const app = express();

// Metrics endpoint for Prometheus to scrape
app.get("/metrics", async (req, res) => {
  res.setHeader("Content-Type", registry.contentType);
  res.send(await registry.metrics());
});

// Request duration middleware
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();

  res.on("finish", () => {
    const route = req.route?.path || req.path;
    const labels = {
      method: req.method,
      route: normalizeRoute(route),
      status_code: res.statusCode.toString(),
    };

    end(labels);
    httpRequestTotal.inc(labels);
  });

  next();
});

// Normalize route paths to avoid high cardinality
function normalizeRoute(path: string): string {
  return path
    .replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/g, "/:id")
    .replace(/\/\d+/g, "/:id")
    .replace(/\/cuid_[a-z0-9]+/g, "/:id");
}

High cardinality is the most common Prometheus mistake. If you use raw paths with IDs as label values, you create a new time series for every unique quiz ID. The normalizeRoute function collapses these into generic patterns.

Instrumenting Quiz Logic

Add metrics to your quiz submission handler:

import {
  quizCompletionDuration,
  quizScore,
  quizSubmissions,
  activeQuizSessions,
} from "./metrics";

app.post("/api/v1/quizzes/:id/submit", async (req, res) => {
  const { id: quizId } = req.params;
  const { answers, startedAt } = req.body;

  try {
    const quiz = await getQuiz(quizId);
    const result = calculateScore(quiz, answers);

    // Record completion time
    if (startedAt) {
      const durationSeconds = (Date.now() - new Date(startedAt).getTime()) / 1000;
      quizCompletionDuration.observe(
        { quiz_id: quizId, difficulty: quiz.difficulty },
        durationSeconds
      );
    }

    // Record score distribution
    const percentage = (result.score / result.total) * 100;
    quizScore.observe(
      { quiz_id: quizId, difficulty: quiz.difficulty },
      percentage
    );

    // Count submissions
    const passed = percentage >= 70;
    quizSubmissions.inc({
      quiz_id: quizId,
      difficulty: quiz.difficulty,
      passed: passed.toString(),
    });

    // Decrement active sessions
    activeQuizSessions.dec();

    res.json(result);
  } catch (err) {
    res.status(500).json({ error: "Submission failed" });
  }
});

// Track when quizzes start
app.post("/api/v1/quizzes/:id/start", async (req, res) => {
  activeQuizSessions.inc();
  // ... start logic
});

Database Query Instrumentation

Wrap your database client to capture query metrics:

import { Pool } from "pg";
import { dbQueryDuration, dbConnectionPool } from "./metrics";

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

// Monitor connection pool
setInterval(() => {
  dbConnectionPool.set({ state: "total" }, pool.totalCount);
  dbConnectionPool.set({ state: "idle" }, pool.idleCount);
  dbConnectionPool.set({ state: "waiting" }, pool.waitingCount);
}, 5000);

// Instrumented query function
export async function query(
  text: string,
  params?: unknown[]
): Promise<any> {
  const operation = text.trim().split(" ")[0].toUpperCase();
  const table = extractTableName(text);

  const end = dbQueryDuration.startTimer({ operation, table });

  try {
    const result = await pool.query(text, params);
    end();
    return result;
  } catch (err) {
    end();
    throw err;
  }
}

function extractTableName(sql: string): string {
  const match = sql.match(/(?:FROM|INTO|UPDATE|JOIN)\s+(\w+)/i);
  return match?.[1] ?? "unknown";
}

Prometheus Configuration

Create prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: "quiz-api"
    metrics_path: "/metrics"
    static_configs:
      - targets: ["host.docker.internal:3000"]
        labels:
          environment: "production"

Alerting Rules

Create alert_rules.yml:

groups:
  - name: quiz-api-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status_code=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m]))
          > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "More than 5% of requests are returning 5xx errors"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
          > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High API latency"
          description: "95th percentile latency is above 2 seconds"

      - alert: DatabaseSlowQueries
        expr: |
          histogram_quantile(0.99, rate(db_query_duration_seconds_bucket[5m]))
          > 1
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "Slow database queries"
          description: "99th percentile query duration is above 1 second"

      - alert: LowQuizPassRate
        expr: |
          sum(rate(quiz_submissions_total{passed="true"}[1h]))
          /
          sum(rate(quiz_submissions_total[1h]))
          < 0.2
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "Low quiz pass rate"
          description: "Less than 20% of submissions are passing - questions may be too difficult"

Docker Compose Setup

Run Prometheus and Grafana locally:

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:v2.53.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alert_rules.yml:/etc/prometheus/alert_rules.yml
    extra_hosts:
      - "host.docker.internal:host-gateway"

  grafana:
    image: grafana/grafana:11.1.0
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

Start the stack:

docker compose -f docker-compose.monitoring.yml up -d

Grafana Dashboard

After connecting Prometheus as a data source in Grafana, create panels with these PromQL queries:

Request rate:

sum(rate(http_requests_total[5m])) by (route)

95th percentile latency:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))

Error rate:

sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

Quiz score distribution:

histogram_quantile(0.5, sum(rate(quiz_score_percentage_bucket[1h])) by (le, quiz_id))

Active sessions:

active_quiz_sessions

Summary

Prometheus and Grafana give you complete visibility into your quiz API. The combination of standard HTTP metrics, database performance tracking, and quiz-specific metrics like score distributions and pass rates lets you understand both technical and product health.

Key points:

  • Normalize route paths to avoid high-cardinality label problems
  • Instrument both the HTTP layer and the business logic layer
  • Set alerts on error rates, latency, and slow database queries
  • Track quiz-specific metrics like pass rates to catch content problems
  • Use histograms with meaningful bucket boundaries for latency and scores

Stay Updated

Get the latest tutorials and API tips delivered to your inbox.

No spam, unsubscribe anytime.

Enjoyed this article?

Share it with your team or try our quiz platform.