Skip to content

C4 Platform Services Container Diagram

Field Value
Status Draft
Owner Platform Engineering
Last Updated 2026-04-03
Applies To Shared platform infrastructure

1. Overview

This document describes the shared platform services that underpin all Simpaisa product lines. These services provide API gateway routing, workflow orchestration, messaging, data persistence, observability, identity management and edge networking. Product services (pay-in-svc, pay-out-svc, remit-svc, cards-svc) depend on these platform components but do not own them.

2. Platform Container Diagram

graph TB
    subgraph "Edge — Cloudflare"
        CF_CDN["Cloudflare CDN"]
        CF_WAF["Cloudflare WAF"]
        CF_Workers["Cloudflare Workers<br/>(Edge logic, rate limiting)"]
        CF_Pages["Cloudflare Pages<br/>(Merchant Portal MFEs)"]
        CF_DNS["Cloudflare DNS"]
    end

    subgraph "API Gateway"
        KrakenD["KrakenD<br/>API Gateway<br/>Auth validation, routing,<br/>response aggregation"]
    end

    subgraph "Identity & Access"
        ControlPlane["ControlPlane.com<br/>OIDC, RBAC, service identity,<br/>certificate authority"]
    end

    subgraph "Workflow Orchestration"
        Temporal["Temporal Server<br/>Durable workflows<br/>(payouts, remittances, recon)"]
    end

    subgraph "Messaging"
        NSQ_D["nsqd (× 3)<br/>Message broker"]
        NSQ_Lookup["nsqlookupd (× 2)<br/>Discovery"]
        NSQ_Admin["nsqadmin<br/>Admin UI"]
    end

    subgraph "Data Persistence"
        SurrealDB["SurrealDB Cluster<br/>(Primary — PK region)<br/>Transactions, merchants, config"]
        SurrealDB_BD["SurrealDB Cluster<br/>(Secondary — BD region)<br/>Bangladesh data residency"]
        Redis["Redis Cluster<br/>Caching, rate limiting,<br/>OTP, idempotency"]
        Meilisearch["Meilisearch<br/>Transaction search,<br/>merchant search"]
    end

    subgraph "Observability"
        OTelCollector["OpenTelemetry Collector<br/>Traces, metrics, logs ingestion"]
        Grafana["Grafana<br/>Dashboards, alerting"]
        Jaeger["Jaeger<br/>Distributed tracing UI"]
    end

    subgraph "Product Analytics"
        PostHog["PostHog<br/>Feature flags, analytics,<br/>session replay"]
    end

    CF_CDN --> CF_WAF
    CF_WAF --> CF_Workers
    CF_Workers --> KrakenD
    CF_DNS --> CF_CDN
    CF_Pages --> CF_CDN

    KrakenD -- "JWT validation" --> ControlPlane
    KrakenD -- "gRPC" --> ProductServices["Product Services<br/>(pay-in, pay-out, remit, cards)"]

    ProductServices --> Temporal
    ProductServices --> NSQ_D
    ProductServices --> SurrealDB
    ProductServices --> Redis
    ProductServices --> Meilisearch
    ProductServices --> OTelCollector
    ProductServices --> PostHog

    NSQ_D --> NSQ_Lookup
    NSQ_Admin --> NSQ_Lookup

    OTelCollector --> Grafana
    OTelCollector --> Jaeger

3. Service Dependency Matrix

Product Service KrakenD Temporal NSQ SurrealDB Redis Meilisearch OTel PostHog ControlPlane
pay-in-svc
pay-out-svc
remit-svc
cards-svc ✓ (CDE)
merchant-svc
notification-svc
fx-svc
fraud-svc
recon-svc
settlement-svc

4. Platform Component Details

4.1 KrakenD — API Gateway

Capability Configuration
Authentication JWT validation via ControlPlane.com JWKS
Rate limiting Per-merchant, per-endpoint
Request/response JSON↔gRPC transcoding
Aggregation Merge responses from multiple backends
Circuit breaker Backend failure isolation
Telemetry Export traces to OTel Collector

4.2 Temporal — Workflow Orchestration

Capability Usage
Payout workflows Validate → debit → transfer → settle
Remittance workflows Quote → AML → disburse → settle
Reconciliation Nightly file download → match → report
Retry policies Per-activity, exponential back-off
Persistence PostgreSQL backend for workflow state
Namespaces One per environment (sandbox, prod)

4.3 NSQ — Messaging

Capability Configuration
Topics Per-domain event (txn.payin.completed, etc)
Channels Per-consumer service
Replication 3-node nsqd cluster
Message TTL 7 days (configurable per topic)
Max in-flight 200 per channel
Dead letter After 5 failed attempts → DLQ topic

4.4 SurrealDB — Primary Data Store

Capability Configuration
Deployment Clustered, multi-node
PK region cluster Primary — all markets except BD
BD region cluster Bangladesh data residency compliance
CDE cluster Cards — PCI DSS isolated
Replication Synchronous within cluster
Encryption AES-256 at rest, mTLS in transit

4.5 Redis — Caching

Use Case TTL
OTP codes 5 minutes
Idempotency keys 24 hours
FX rate cache 120 seconds
Rate limit counters Sliding window (60 s)
Session tokens 30 minutes

4.6 Observability Stack

graph LR
    Services["Product Services"] -- "OTLP gRPC" --> Collector["OTel Collector"]
    Collector -- "Traces" --> Jaeger
    Collector -- "Metrics" --> Prometheus["Prometheus"]
    Collector -- "Logs" --> Loki["Loki"]
    Prometheus --> Grafana
    Loki --> Grafana
    Jaeger --> Grafana
    Grafana -- "Alerts" --> PagerDuty["PagerDuty"]
    Grafana -- "Alerts" --> Slack["Slack"]

All services emit traces, metrics and logs via the OpenTelemetry SDK. The OTel Collector routes data to appropriate backends. Grafana provides unified dashboards and alerting.

4.7 PostHog — Product Analytics & Feature Flags

Capability Usage
Feature flags Progressive rollout of new channels
Analytics events Merchant portal usage tracking
Session replay Debug merchant portal issues
Experimentation A/B test fraud rule thresholds

4.8 ControlPlane.com — Identity & Access

Capability Usage
OIDC provider Merchant Portal SSO
Service identity mTLS certificate issuance
RBAC Role-based access for portal users
K8s integration Workload identity for pods

5. Architectural Decision Records

Changes to platform services require an ADR in /Standards/ADR/.