Skip to content

STD-INFRA-067: Load Testing Standards

Owner Classification Review Date Status
Infrastructure Internal April 2027 Active

STD-INFRA-067: Load Testing Standards

Field Value
Owner Platform Engineering
Approved By CDO
Date 2026-04-03
Review Cycle Quarterly
Last Review

Purpose

This standard defines mandatory load testing requirements for Simpaisa's payment platform, which processes 270M+ transactions worth over $1B annually across five markets (PK, BD, NP, IQ, EG). Load testing ensures that Go microservices behind KrakenD can handle baseline, peak, and surge traffic without degradation.

Scope

Applies to all services in the payment processing path: KrakenD gateway, Pay-In, Pay-Out, Remittance, and Cards services, plus shared services (merchant-svc, auth-svc). Non-payment services (analytics, reporting) are recommended but not mandatory.

Tooling

  • Primary tool : k6 (Go-based, scriptable in JavaScript). Selected for Go ecosystem alignment, CI/CD integration, and support for gRPC and HTTP/2.

  • Test scripts : Stored in infra/loadtest/ in each service repository. Committed alongside application code.

  • Results storage : k6 results exported to InfluxDB via the k6-to-influxdb extension. Grafana dashboards for trending.

  • Execution environment : Dedicated k6 runners in the test-* Kubernetes namespace. Never run load tests against production.

Mandatory Testing Gates

Pre-Release Load Test

Every service release that touches payment endpoints MUST pass a load test before promotion to production. Failures block the release pipeline.

Test Profiles

Profile Description Duration Target RPS (per service)
Baseline Normal weekday traffic 15 min Per service baseline (see below)
Peak Friday salary-day traffic (PK/BD) 15 min 3× baseline
Surge Eid/Black Friday spike 15 min 5× baseline
Soak Sustained load to detect memory leaks and connection pool exhaustion 4 hours 1.5× baseline
Stress Ramp to breaking point to find capacity ceiling 30 min (ramp) Ramp from baseline to 10×

Service Baselines

Service Baseline RPS Peak RPS Surge RPS
KrakenD gateway 3,000 9,000 15,000
payin-svc 1,500 4,500 7,500
payout-svc 800 2,400 4,000
remit-svc 400 1,200 2,000
cards-svc 600 1,800 3,000
auth-svc 2,000 6,000 10,000

Baselines are recalculated quarterly based on actual production traffic (see STD-INFRA-069).

Pass/Fail Thresholds

Metric Threshold Action on Breach
P50 latency ≤100ms (gateway), ≤50ms (service) Warning
P95 latency ≤300ms (gateway), ≤150ms (service) Release blocked
P99 latency ≤1,000ms (gateway), ≤500ms (service) Release blocked
Error rate (5xx) <0.1% Release blocked
Transaction success rate ≥99.5% Release blocked
CPU utilisation <80% at peak Warning
Memory utilisation <85% at peak Warning
Connection pool exhaustion 0 occurrences Release blocked

Soak Testing Requirements

Soak tests run for 4 hours at 1.5× baseline and verify:

  • Memory stability : RSS does not grow more than 10% over the test duration (detects Go memory leaks, goroutine leaks).

  • Connection pool health : Database and Redis connection counts remain stable (no leak, no exhaustion).

  • Latency stability : P95 latency does not degrade more than 20% between the first and last hour.

  • Error accumulation : No increasing error rate trend over time.

Stress Testing Requirements

Stress tests ramp traffic from baseline to 10× baseline over 30 minutes to:

  • Identify breaking point : The RPS at which error rate exceeds 1% or P95 latency exceeds 2 seconds.

  • Validate graceful degradation : Services should return 429 (rate limited) or 503 (circuit open), not crash or corrupt data.

  • Document capacity ceiling : Results feed into quarterly capacity planning (STD-INFRA-069).

  • Per-test report : k6 summary output stored as a CI/CD artefact. Includes all threshold results.

  • Trending dashboard : Grafana dashboard (Load Test Trends) shows P95 latency, throughput, and error rate over time per service.

  • Regression detection : If P95 latency increases by more than 20% between releases, the release is flagged for performance review.

Scheduling

Test Type Trigger Frequency
Baseline + Peak Pre-release CI/CD gate Every release
Surge Manual or scheduled Monthly
Soak Scheduled Weekly (Friday night)
Stress Manual Quarterly (before capacity review)

Exceptions

Services that do not process payment transactions may request an exemption from mandatory pre-release load testing. Exemptions must be approved by the Platform Engineering lead and documented in the service's README.md.