Skip to content

Structured Logging Standard

Owner Classification Review Date Status
Engineering Internal April 2027 Active

Structured Logging Standard

Version: 1.0
Last Updated: 2026-04-03
Owner: Platform Team
Status: Active

Purpose

Define a consistent, machine-parseable logging format across all Simpaisa services. Every log entry must support distributed tracing, payment debugging, and regulatory audit requirements.

Log Format

All services MUST emit JSON-structured logs. One JSON object per line, no multi-line entries.

Required Fields

Field Type Description
timestamp string ISO 8601 with milliseconds and timezone: 2026-04-03T14:22:01.123Z
level string One of: TRACE, DEBUG, INFO, WARN, ERROR, FATAL
service string Service name, e.g. payin-svc, payout-svc, remittance-svc
traceId string OpenTelemetry trace ID (32-char hex). Propagated from incoming request
spanId string OpenTelemetry span ID (16-char hex)
message string Human-readable description of the event

Contextual Fields (when applicable)

Field Type Description
merchantId string Merchant identifier
transactionId string Simpaisa transaction reference
channelName string Upstream channel, e.g. jazzcash, easypaisa, bkash
amount number Transaction amount
currency string ISO 4217 currency code: PKR, BDT, NPR, IQD
transactionStatus string Current state per TRANSACTION-LIFECYCLE-STANDARD.md
error object { "code": "string", "message": "string", "upstream": "string" }
durationMs number Operation duration in milliseconds

Log Levels

Level Usage Examples
TRACE Fine-grained diagnostics. Never in production Step-through of signature computation
DEBUG Development and sandbox troubleshooting. Disabled in production by default Full request/response bodies, parsed field values
INFO Normal operational events Transaction initiated, payment completed, webhook delivered
WARN Recoverable issues that need attention Channel retry triggered, rate limit approaching 80%, certificate expiry <30 days
ERROR Failures requiring investigation Channel timeout, signature verification failed, database connection lost
FATAL Service cannot continue Configuration missing, database unreachable on startup, TLS cert invalid

PII Masking

All Personally Identifiable Information MUST be masked before logging. See PII-HANDLING-STANDARD.md for the complete policy.

Data Type Masking Rule Example
MSISDN Show last 4 digits ****4567
Account number Show last 4 digits ****7890
Card number (PAN) First 6 + last 4 (BIN preserved) 424242****4242
CNIC/NID Show last 4 digits *****4321
OTP values NEVER logged under any circumstance
Email Mask local part d***@example.com

Services MUST apply masking at the logger level, not the caller. Use the shared logmask package.

Correlation & Tracing

  • Every inbound request at KrakenD generates a traceId if none exists (W3C Trace Context traceparent header).

  • All downstream service calls propagate the same traceId.

  • Every log line includes traceId and spanId from the OpenTelemetry context.

  • Async operations (webhooks, queue consumers) MUST carry the originating traceId.

Log Pipeline

Services → OpenTelemetry Collector → OpenSearch
                                   → Grafana (dashboards/alerts)
  • Services emit logs to stdout (JSON).

  • OpenTelemetry Collector scrapes/receives logs, enriches with resource attributes, and forwards.

  • OpenSearch is the primary log store and query interface.

  • Grafana reads from OpenSearch for dashboards and alerting.

Retention

Tier Duration Storage Purpose
Hot 90 days OpenSearch Active investigation, dashboards
Warm 1 year OpenSearch (warm nodes) Historical debugging, trend analysis
Cold 7 years Object storage (S3-compatible) Regulatory compliance (SBP, Bangladesh Bank)

Log Volume Management

  • Production: Default level INFO. DEBUG enabled per-service via feature flag for time-limited troubleshooting (auto-revert after 1 hour).

  • Never log full request/response bodies at INFO or above. Use DEBUG level only.

  • Never log credentials, API keys, or signing keys at any level.

  • Batch operations: log summary (count, success, failure) not individual items.

Example Log Entries

Successful Payment

{
  "timestamp": "2026-04-03T14:22:01.123Z",
  "level": "INFO",
  "service": "payin-svc",
  "traceId": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4",
  "spanId": "1a2b3c4d5e6f7a8b",
  "message": "Payment completed successfully",
  "merchantId": "MCH-001",
  "transactionId": "TXN-20260403-00042",
  "channelName": "jazzcash",
  "amount": 1500.00,
  "currency": "PKR",
  "transactionStatus": "COMPLETED",
  "durationMs": 2340
}

Failed Payment

{
  "timestamp": "2026-04-03T14:22:05.456Z",
  "level": "ERROR",
  "service": "payin-svc",
  "traceId": "b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5",
  "spanId": "2b3c4d5e6f7a8b9c",
  "message": "Payment failed: insufficient funds",
  "merchantId": "MCH-002",
  "transactionId": "TXN-20260403-00043",
  "channelName": "easypaisa",
  "amount": 25000.00,
  "currency": "PKR",
  "transactionStatus": "FAILED",
  "error": { "code": "INSUFFICIENT_FUNDS", "message": "Account balance too low", "upstream": "EP-4012" },
  "durationMs": 1820
}

Channel Timeout

{
  "timestamp": "2026-04-03T14:22:30.789Z",
  "level": "ERROR",
  "service": "payin-svc",
  "traceId": "c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6",
  "spanId": "3c4d5e6f7a8b9c0d",
  "message": "Channel request timed out after 30000ms",
  "merchantId": "MCH-001",
  "transactionId": "TXN-20260403-00044",
  "channelName": "bkash",
  "transactionStatus": "PROCESSING",
  "error": { "code": "CHANNEL_TIMEOUT", "message": "No response within 30s", "upstream": null },
  "durationMs": 30000
}

Rate Limit Hit

{
  "timestamp": "2026-04-03T14:23:00.111Z",
  "level": "WARN",
  "service": "krakend-gateway",
  "traceId": "d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1",
  "spanId": "4d5e6f7a8b9c0d1e",
  "message": "Rate limit exceeded for merchant",
  "merchantId": "MCH-003",
  "error": { "code": "RATE_LIMIT_EXCEEDED", "message": "100 req/min limit reached", "upstream": null }
}

Compliance

  • All services MUST pass log format validation in CI (linter checks JSON structure against this schema).

  • Log masking is verified by automated tests — any log containing raw PII fails the build.

  • Quarterly audit: sample 1000 log entries across services and verify PII masking compliance.