Monitoring Quiz API Performance with Prometheus and Grafana
Instrument your quiz API with Prometheus metrics, build Grafana dashboards, and set up alerts that catch problems before users notice.
You Cannot Fix What You Cannot See
Your quiz API might be running fine right now, but when 500 students hit it during an exam, will you know about the latency spike before they start complaining? Monitoring with Prometheus and Grafana gives you visibility into request latency, error rates, database performance, and quiz-specific metrics like completion rates and scoring distributions.
This guide walks you through instrumenting a Node.js quiz API, defining custom metrics, building dashboards, and creating alerts.
Prerequisites
- Node.js quiz API (Express or Fastify)
- Docker for running Prometheus and Grafana locally
- Basic understanding of HTTP metrics
Setting Up prom-client
Install the Prometheus client library:
npm install prom-client
Create a metrics module at src/metrics.ts:
import {
Registry,
Counter,
Histogram,
Gauge,
collectDefaultMetrics,
} from "prom-client";
export const registry = new Registry();
// Collect Node.js runtime metrics (memory, CPU, event loop)
collectDefaultMetrics({ register: registry });
// HTTP request metrics
export const httpRequestDuration = new Histogram({
name: "http_request_duration_seconds",
help: "Duration of HTTP requests in seconds",
labelNames: ["method", "route", "status_code"],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry],
});
export const httpRequestTotal = new Counter({
name: "http_requests_total",
help: "Total number of HTTP requests",
labelNames: ["method", "route", "status_code"],
registers: [registry],
});
// Quiz-specific metrics
export const quizCompletionDuration = new Histogram({
name: "quiz_completion_duration_seconds",
help: "Time taken to complete a quiz",
labelNames: ["quiz_id", "difficulty"],
buckets: [30, 60, 120, 300, 600, 900, 1800],
registers: [registry],
});
export const quizScore = new Histogram({
name: "quiz_score_percentage",
help: "Distribution of quiz scores as percentages",
labelNames: ["quiz_id", "difficulty"],
buckets: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
registers: [registry],
});
export const quizSubmissions = new Counter({
name: "quiz_submissions_total",
help: "Total quiz submissions",
labelNames: ["quiz_id", "difficulty", "passed"],
registers: [registry],
});
export const activeQuizSessions = new Gauge({
name: "active_quiz_sessions",
help: "Number of currently active quiz sessions",
registers: [registry],
});
// Database metrics
export const dbQueryDuration = new Histogram({
name: "db_query_duration_seconds",
help: "Duration of database queries",
labelNames: ["operation", "table"],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1],
registers: [registry],
});
export const dbConnectionPool = new Gauge({
name: "db_connection_pool_size",
help: "Current database connection pool size",
labelNames: ["state"],
registers: [registry],
});
Instrumenting Express
Add middleware to capture HTTP metrics:
import express from "express";
import { registry, httpRequestDuration, httpRequestTotal } from "./metrics";
const app = express();
// Metrics endpoint for Prometheus to scrape
app.get("/metrics", async (req, res) => {
res.setHeader("Content-Type", registry.contentType);
res.send(await registry.metrics());
});
// Request duration middleware
app.use((req, res, next) => {
const end = httpRequestDuration.startTimer();
res.on("finish", () => {
const route = req.route?.path || req.path;
const labels = {
method: req.method,
route: normalizeRoute(route),
status_code: res.statusCode.toString(),
};
end(labels);
httpRequestTotal.inc(labels);
});
next();
});
// Normalize route paths to avoid high cardinality
function normalizeRoute(path: string): string {
return path
.replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/g, "/:id")
.replace(/\/\d+/g, "/:id")
.replace(/\/cuid_[a-z0-9]+/g, "/:id");
}
High cardinality is the most common Prometheus mistake. If you use raw paths with IDs as label values, you create a new time series for every unique quiz ID. The normalizeRoute function collapses these into generic patterns.
Instrumenting Quiz Logic
Add metrics to your quiz submission handler:
import {
quizCompletionDuration,
quizScore,
quizSubmissions,
activeQuizSessions,
} from "./metrics";
app.post("/api/v1/quizzes/:id/submit", async (req, res) => {
const { id: quizId } = req.params;
const { answers, startedAt } = req.body;
try {
const quiz = await getQuiz(quizId);
const result = calculateScore(quiz, answers);
// Record completion time
if (startedAt) {
const durationSeconds = (Date.now() - new Date(startedAt).getTime()) / 1000;
quizCompletionDuration.observe(
{ quiz_id: quizId, difficulty: quiz.difficulty },
durationSeconds
);
}
// Record score distribution
const percentage = (result.score / result.total) * 100;
quizScore.observe(
{ quiz_id: quizId, difficulty: quiz.difficulty },
percentage
);
// Count submissions
const passed = percentage >= 70;
quizSubmissions.inc({
quiz_id: quizId,
difficulty: quiz.difficulty,
passed: passed.toString(),
});
// Decrement active sessions
activeQuizSessions.dec();
res.json(result);
} catch (err) {
res.status(500).json({ error: "Submission failed" });
}
});
// Track when quizzes start
app.post("/api/v1/quizzes/:id/start", async (req, res) => {
activeQuizSessions.inc();
// ... start logic
});
Database Query Instrumentation
Wrap your database client to capture query metrics:
import { Pool } from "pg";
import { dbQueryDuration, dbConnectionPool } from "./metrics";
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Monitor connection pool
setInterval(() => {
dbConnectionPool.set({ state: "total" }, pool.totalCount);
dbConnectionPool.set({ state: "idle" }, pool.idleCount);
dbConnectionPool.set({ state: "waiting" }, pool.waitingCount);
}, 5000);
// Instrumented query function
export async function query(
text: string,
params?: unknown[]
): Promise<any> {
const operation = text.trim().split(" ")[0].toUpperCase();
const table = extractTableName(text);
const end = dbQueryDuration.startTimer({ operation, table });
try {
const result = await pool.query(text, params);
end();
return result;
} catch (err) {
end();
throw err;
}
}
function extractTableName(sql: string): string {
const match = sql.match(/(?:FROM|INTO|UPDATE|JOIN)\s+(\w+)/i);
return match?.[1] ?? "unknown";
}
Prometheus Configuration
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: "quiz-api"
metrics_path: "/metrics"
static_configs:
- targets: ["host.docker.internal:3000"]
labels:
environment: "production"
Alerting Rules
Create alert_rules.yml:
groups:
- name: quiz-api-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
> 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "More than 5% of requests are returning 5xx errors"
- alert: HighLatency
expr: |
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
> 2
for: 5m
labels:
severity: warning
annotations:
summary: "High API latency"
description: "95th percentile latency is above 2 seconds"
- alert: DatabaseSlowQueries
expr: |
histogram_quantile(0.99, rate(db_query_duration_seconds_bucket[5m]))
> 1
for: 3m
labels:
severity: warning
annotations:
summary: "Slow database queries"
description: "99th percentile query duration is above 1 second"
- alert: LowQuizPassRate
expr: |
sum(rate(quiz_submissions_total{passed="true"}[1h]))
/
sum(rate(quiz_submissions_total[1h]))
< 0.2
for: 30m
labels:
severity: info
annotations:
summary: "Low quiz pass rate"
description: "Less than 20% of submissions are passing - questions may be too difficult"
Docker Compose Setup
Run Prometheus and Grafana locally:
# docker-compose.monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.53.0
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert_rules.yml:/etc/prometheus/alert_rules.yml
extra_hosts:
- "host.docker.internal:host-gateway"
grafana:
image: grafana/grafana:11.1.0
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
volumes:
- grafana-data:/var/lib/grafana
volumes:
grafana-data:
Start the stack:
docker compose -f docker-compose.monitoring.yml up -d
Grafana Dashboard
After connecting Prometheus as a data source in Grafana, create panels with these PromQL queries:
Request rate:
sum(rate(http_requests_total[5m])) by (route)
95th percentile latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))
Error rate:
sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Quiz score distribution:
histogram_quantile(0.5, sum(rate(quiz_score_percentage_bucket[1h])) by (le, quiz_id))
Active sessions:
active_quiz_sessions
Summary
Prometheus and Grafana give you complete visibility into your quiz API. The combination of standard HTTP metrics, database performance tracking, and quiz-specific metrics like score distributions and pass rates lets you understand both technical and product health.
Key points:
- Normalize route paths to avoid high-cardinality label problems
- Instrument both the HTTP layer and the business logic layer
- Set alerts on error rates, latency, and slow database queries
- Track quiz-specific metrics like pass rates to catch content problems
- Use histograms with meaningful bucket boundaries for latency and scores
Stay Updated
Get the latest tutorials and API tips delivered to your inbox.
No spam, unsubscribe anytime.
Related Articles
Building a Quiz Import/Export System
Design a robust import/export system for quizzes with JSON and CSV support, validation schemas, bulk operations, and clear error reporting.
Rate Limiting Your Quiz API: A Practical Guide
Protect your quiz API from abuse with token bucket and sliding window rate limiters. Includes Redis-based implementation and graceful 429 handling.
Scaling Quiz Delivery: From 100 to 100,000 Concurrent Players
Scale your quiz platform to handle massive concurrent load with database optimization, caching, connection pooling, and read replicas.
Enjoyed this article?
Share it with your team or try our quiz platform.