Skip to content

Now I have a thorough understanding of the existing codebase issues, the tech stack, and the capabilities of the key technologies. Let me produce the comprehensive architecture document.


Phoenix Architecture Document -- Simpaisa API Platform Rewrite

Document Metadata

  • Project: Phoenix
  • Organisation: Simpaisa Holdings
  • Author: CDO (Daniel O'Reilly) with Claude as pair programmer
  • Date: 2026-04-03
  • Status: Architecture Design -- Pre-Implementation
  • Classification: Internal -- Engineering Leadership

1. Service Architecture

1.1 Microservice Inventory

Phoenix decomposes into 7 services plus shared infrastructure. This is fewer than the legacy 15+ services because the legacy platform grew organically with code forks (e.g., two disbursement schedulers) and single-purpose services (e.g., separate card-refund-reversal, auto-void-scheduler). Phoenix consolidates by domain.

Service Domain Repo Name Priority
phoenix-gateway KrakenD configuration, rate limiting, JWT validation, mTLS termination, CORS, IP whitelisting phoenix-gateway Phase 1
phoenix-auth OAuth 2.0 token issuance, merchant credential management, key rotation phoenix-auth Phase 1
phoenix-merchant Merchant onboarding, configuration, feature flags, product assignment, webhook registration phoenix-merchant Phase 1
phoenix-payin Wallet pay-in (single charge, recurring/tokenisation), OTP flows, inquiry phoenix-payin Phase 2
phoenix-payout Domestic disbursements (1Link IBFT, Easypaisa, JazzCash, HBL), batch processing phoenix-payout Phase 3
phoenix-remittance Cross-border transfers (Bank of Asia, Faysal Bank, Trust Bank, 1Link), FX, AML phoenix-remittance Phase 4
phoenix-card Card payments (Alfalah MasterCard, Safepay), 3DS, capture, void, refund phoenix-card Phase 5
phoenix-webhook Outbound webhook delivery, retry, signing, dead-letter management phoenix-webhook Phase 2
phoenix-proxy Translation proxy for legacy API backward compatibility (maps v2 requests → Phoenix format) phoenix-proxy Phase 1
phoenix-reconciliation Settlement calculation, partner reconciliation, reporting phoenix-reconciliation Phase 3

1.2 Communication Patterns

Synchronous (merchant-facing): HTTP/JSON via KrakenD gateway. All merchant traffic enters through KrakenD, which handles JWT validation, rate limiting, mTLS termination, and routing. Services expose Echo v4.15 HTTP handlers behind the gateway.

Synchronous (inter-service): gRPC inter-service communication is planned but not yet implemented. Services currently operate independently — each service owns its own domain data and does not make synchronous calls to other services. The sole exception is phoenix-merchant, which exposes a gRPC server for future consumption by payment services. When inter-service calls are introduced, they will use protobuf-defined contracts; go-kratos will provide service discovery integration via the Kubernetes registry plugin.

Asynchronous (events): NSQ for all event-driven processing. Key topics:

Topic Producer Consumer(s) Purpose
payment.initiated payin, payout, remittance, card webhook, reconciliation New payment created
payment.completed payin, payout, remittance, card webhook, reconciliation Payment succeeded
payment.failed payin, payout, remittance, card webhook, reconciliation Payment failed
payment.status_changed All payment services webhook Any state transition
webhook.deliver webhook (self) webhook Retry delivery queue
webhook.dead_letter webhook reconciliation Permanently failed deliveries
settlement.calculate reconciliation (cron) reconciliation Daily settlement trigger
partner.callback payin, payout, remittance Respective service Async partner responses

Design rule: No service writes directly to another service's database tables. Cross-service data access is via NSQ event (current) or gRPC call (planned).

1.3 What Lives in KrakenD vs In Services

KrakenD handles ALL inbound cross-cutting concerns (stateless, config-driven): - JWT RS256 validation (public key from phoenix-auth JWKS endpoint) - Rate limiting (per-merchant, per-endpoint, sliding window) -- NO rate limiting in services - Circuit breaking on backend services (inbound: merchant → Phoenix) -- NO inbound CB in services - mTLS termination (client certificate validation) - CORS policy enforcement (no wildcard -- explicit merchant origins) - IP whitelisting (per-merchant, loaded from config) - Request/response transformation (envelope wrapping) - Request logging and correlation ID injection (X-Request-Id) - OpenAPI spec serving - Bot detection, request validation, payload size limits

Services handle ONLY business logic and outbound resilience: - Authentication token issuance (phoenix-auth only -- issues tokens, not validates them) - Business validation (amount limits, operator availability, merchant product access) - Partner API orchestration (adapter pattern) - Transaction state management - Idempotency enforcement (fail-closed, SurrealDB-backed -- KrakenD cannot do this) - Webhook payload construction and HMAC signing - Encryption/decryption of sensitive data - Outbound resilience (Phoenix → partner banks/wallets): - Partner-level circuit breaker (sdk/resilience -- per partner, not per merchant) - Retry with exponential backoff on partner calls - Bulkhead (concurrency limiter per partner) - Partner health monitoring and smart routing - Client-side deduplication for partners without idempotency

Principle: KrakenD owns merchant→Phoenix. Services own Phoenix→partner. Services MUST NOT implement rate limiting, JWT validation, or inbound circuit breaking. Services trust that KrakenD has validated the JWT and extracted X-Merchant-Id.

1.4 Shared Libraries vs Service-Specific Code

Shared Go module: phoenix-sdk (lives in sdk/ within the monorepo, imported via replace directive as github.com/doreilly257/sp-apis/sdk)

Package Contents
sdk/envelope Standard request/response envelope, error types, pagination
sdk/money Decimal money type (wraps shopspring/decimal), currency codes, rounding rules
sdk/crypto AES-256-GCM encrypt/decrypt, RSA-OAEP, HMAC-SHA256, key loading from Vault
sdk/idempotency Idempotency middleware (SurrealDB-backed, fail-closed)
sdk/partner Partner adapter interface, circuit breaker wrapper, retry policy
sdk/observability OpenTelemetry setup (tracer, meter, logger), correlation ID propagation
sdk/surreal SurrealDB client wrapper, connection pooling, health checks
sdk/nsq NSQ producer/consumer wrappers, dead-letter handling, message serialisation
sdk/config Configuration loader (env vars, SurrealDB config table, hot-reload)
sdk/auth JWT claims parsing, merchant context extraction middleware
sdk/validation Common field validators (MSISDN format, IBAN, CNIC, amount ranges)
sdk/webhook HMAC signing, payload construction, event type constants
sdk/testutil Test helpers, SurrealDB test container setup, NSQ test helpers

Service-specific code lives entirely within each service repo. This includes: - Partner adapters (e.g., Easypaisa adapter lives in phoenix-payin, not in phoenix-sdk) - Service-specific SurrealDB queries - Business rules unique to a product domain

1.5 Service Mesh / Discovery on controlplane.com

controlplane.com provides a managed Kubernetes environment with Envoy-based service mesh. The topology:

  • Each Phoenix service deploys as a workload on controlplane.com
  • Inter-service communication uses internal DNS (e.g., phoenix-auth.phoenix.svc.cluster.local)
  • Service discovery (when inter-service gRPC is enabled) will use the Kubernetes registry plugin (github.com/go-kratos/kratos/contrib/registry/kubernetes) via phoenix-merchant's go-kratos integration
  • TLS between services is handled by the mesh (mTLS between Envoy sidecars) -- services communicate in plaintext locally, encrypted in transit
  • Health checks via /healthz (liveness) and /readyz (readiness) endpoints
  • Country namespaces: phoenix_pk (Pakistan), phoenix_bd (Bangladesh), phoenix_np (Nepal), phoenix_eg (Egypt) — each isolated per data residency policy

2. Data Architecture

2.1 SurrealDB Schema Design

SurrealDB serves dual roles: persistent store (with RocksDB embedded backend) and in-memory cache (for truly ephemeral data). The same database engine, two deployment modes:

  • Persistent instance: SurrealDB with RocksDB storage backend (file:// connection) for transaction records, merchant data, settlements, and idempotency keys
  • Cache instance: SurrealDB in-memory mode (memory connection) for short-lived, loss-tolerant data only: OTP state (5-minute TTL) and FX rate quotes (corridor-specific TTL)

All tables are SCHEMAFULL to enforce data integrity -- a direct response to the legacy platform's HashMap<String, Object> chaos.

2.2 Table Structure

Core Domain Tables

-- ============================================================
-- NAMESPACE & DATABASE
-- ============================================================
DEFINE NAMESPACE phoenix;
USE NS phoenix;

DEFINE DATABASE payments;
USE DB payments;

-- ============================================================
-- MERCHANTS
-- ============================================================
DEFINE TABLE merchant SCHEMAFULL;

DEFINE FIELD name ON merchant TYPE string;
DEFINE FIELD legal_name ON merchant TYPE string;
DEFINE FIELD status ON merchant TYPE string
    ASSERT $value IN ['active', 'suspended', 'onboarding', 'terminated'];
DEFINE FIELD country ON merchant TYPE string ASSERT string::len($value) = 2;
DEFINE FIELD currency ON merchant TYPE string ASSERT string::len($value) = 3;
DEFINE FIELD tier ON merchant TYPE string
    ASSERT $value IN ['standard', 'premium', 'enterprise'];
DEFINE FIELD products ON merchant TYPE array<string>;
DEFINE FIELD webhook_url ON merchant TYPE option<string>;
DEFINE FIELD webhook_secret ON merchant TYPE option<string>;
DEFINE FIELD ip_whitelist ON merchant TYPE array<string>;
DEFINE FIELD rate_limits ON merchant TYPE object;
DEFINE FIELD metadata ON merchant TYPE option<object>;
DEFINE FIELD created_at ON merchant TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON merchant TYPE datetime VALUE time::now();

DEFINE INDEX merchant_status ON merchant FIELDS status;
DEFINE INDEX merchant_country ON merchant FIELDS country;

-- ============================================================
-- MERCHANT CREDENTIALS (for OAuth 2.0)
-- ============================================================
DEFINE TABLE merchant_credential SCHEMAFULL;

DEFINE FIELD merchant ON merchant_credential TYPE record<merchant>;
DEFINE FIELD client_id ON merchant_credential TYPE string;
DEFINE FIELD client_secret_hash ON merchant_credential TYPE string;
DEFINE FIELD scopes ON merchant_credential TYPE array<string>;
DEFINE FIELD status ON merchant_credential TYPE string
    ASSERT $value IN ['active', 'revoked', 'expired'];
DEFINE FIELD expires_at ON merchant_credential TYPE option<datetime>;
DEFINE FIELD created_at ON merchant_credential TYPE datetime DEFAULT time::now();

DEFINE INDEX cred_client_id ON merchant_credential FIELDS client_id UNIQUE;
DEFINE INDEX cred_merchant ON merchant_credential FIELDS merchant;

-- ============================================================
-- MERCHANT PRODUCT CONFIGURATION
-- ============================================================
DEFINE TABLE product_config SCHEMAFULL;

DEFINE FIELD merchant ON product_config TYPE record<merchant>;
DEFINE FIELD product ON product_config TYPE string
    ASSERT $value IN ['payin', 'payout', 'remittance', 'card'];
DEFINE FIELD operator ON product_config TYPE string;
DEFINE FIELD country ON product_config TYPE string;
DEFINE FIELD enabled ON product_config TYPE bool DEFAULT true;
DEFINE FIELD min_amount ON product_config TYPE decimal;
DEFINE FIELD max_amount ON product_config TYPE decimal;
DEFINE FIELD daily_limit ON product_config TYPE option<decimal>;
DEFINE FIELD monthly_limit ON product_config TYPE option<decimal>;
DEFINE FIELD otp_required ON product_config TYPE bool DEFAULT true;
DEFINE FIELD otp_expiry_seconds ON product_config TYPE int DEFAULT 300;
DEFINE FIELD settlement_schedule ON product_config TYPE string
    ASSERT $value IN ['realtime', 't_plus_1', 't_plus_2', 'weekly'];
DEFINE FIELD partner_credentials_ref ON product_config TYPE string;
DEFINE FIELD feature_flags ON product_config TYPE option<object>;
DEFINE FIELD created_at ON product_config TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON product_config TYPE datetime VALUE time::now();

DEFINE INDEX pc_merchant_product ON product_config FIELDS merchant, product, operator UNIQUE;

-- ============================================================
-- DOMAIN-SPECIFIC PAYMENT TABLES
-- NOTE: Each service uses its own domain-specific table name rather than
-- a shared `transaction` table. The schema structure below is representative
-- of the common shape; actual table names are:
--   phoenix-payin      → `payment`
--   phoenix-payout     → `disbursement`
--   phoenix-remittance → `transfer`
--   phoenix-card       → `payment`
-- Services do NOT share a central transaction ledger at the database layer.
-- Cross-service aggregation happens at the reconciliation layer via NSQ events.
-- ============================================================
DEFINE TABLE payment SCHEMAFULL; -- example: payin + card use `payment`

DEFINE FIELD merchant ON payment TYPE record<merchant>;
DEFINE FIELD product ON payment TYPE string
    ASSERT $value IN ['payin', 'payout', 'remittance', 'card'];
DEFINE FIELD reference ON payment TYPE string;
DEFINE FIELD idempotency_key ON payment TYPE string;
DEFINE FIELD amount ON payment TYPE decimal;
DEFINE FIELD currency ON payment TYPE string ASSERT string::len($value) = 3;
DEFINE FIELD fee ON payment TYPE decimal DEFAULT 0;
DEFINE FIELD net_amount ON payment TYPE decimal;
DEFINE FIELD status ON payment TYPE string
    ASSERT $value IN [
        'initiated', 'awaiting_authorisation', 'processing',
        'pending_partner', 'completed', 'failed', 'cancelled',
        'reversed', 'refunded', 'partially_refunded',
        'expired', 'on_hold', 'aml_review', 'stuck'
    ];
-- NOTE: `awaiting_authorisation` is the code-canonical name for the OTP-pending
-- state. Earlier architecture drafts used `pending_otp` — the code uses
-- `awaiting_authorisation` throughout.
DEFINE FIELD status_reason ON payment TYPE option<string>;
DEFINE FIELD operator ON payment TYPE string;
DEFINE FIELD country ON payment TYPE string;
DEFINE FIELD partner_reference ON payment TYPE option<string>;
DEFINE FIELD partner_status ON payment TYPE option<string>;
DEFINE FIELD payer ON payment TYPE object;
DEFINE FIELD payee ON payment TYPE option<object>;
DEFINE FIELD metadata ON payment TYPE option<object>;
DEFINE FIELD initiated_at ON payment TYPE datetime DEFAULT time::now();
DEFINE FIELD completed_at ON payment TYPE option<datetime>;
DEFINE FIELD updated_at ON payment TYPE datetime VALUE time::now();
DEFINE FIELD expires_at ON payment TYPE option<datetime>;

DEFINE INDEX tx_idempotency ON payment FIELDS merchant, idempotency_key UNIQUE;
DEFINE INDEX tx_reference ON payment FIELDS merchant, reference UNIQUE;
DEFINE INDEX tx_status ON payment FIELDS status;
DEFINE INDEX tx_merchant_product ON payment FIELDS merchant, product, initiated_at;
DEFINE INDEX tx_partner_ref ON payment FIELDS partner_reference;
DEFINE INDEX tx_initiated_at ON payment FIELDS initiated_at;

-- ============================================================
-- STATE LOG (audit trail, append-only) — table name follows the domain table
-- e.g., `payment_state_log`, `transfer_state_log`, `disbursement_state_log`
-- ============================================================
DEFINE TABLE payment_state_log SCHEMAFULL;

DEFINE FIELD payment ON payment_state_log TYPE record<payment>;
DEFINE FIELD from_status ON payment_state_log TYPE option<string>;
DEFINE FIELD to_status ON payment_state_log TYPE string;
DEFINE FIELD reason ON payment_state_log TYPE option<string>;
DEFINE FIELD actor ON payment_state_log TYPE string;
DEFINE FIELD partner_response ON payment_state_log TYPE option<object>;
DEFINE FIELD created_at ON payment_state_log TYPE datetime DEFAULT time::now();

DEFINE INDEX tsl_payment ON payment_state_log FIELDS payment, created_at;

-- ============================================================
-- SETTLEMENTS
-- ============================================================
DEFINE TABLE settlement SCHEMAFULL;

DEFINE FIELD merchant ON settlement TYPE record<merchant>;
DEFINE FIELD product ON settlement TYPE string;
DEFINE FIELD operator ON settlement TYPE string;
DEFINE FIELD country ON settlement TYPE string;
DEFINE FIELD period_start ON settlement TYPE datetime;
DEFINE FIELD period_end ON settlement TYPE datetime;
DEFINE FIELD transaction_count ON settlement TYPE int;
DEFINE FIELD gross_amount ON settlement TYPE decimal;
DEFINE FIELD total_fees ON settlement TYPE decimal;
DEFINE FIELD net_amount ON settlement TYPE decimal;
DEFINE FIELD currency ON settlement TYPE string;
DEFINE FIELD status ON settlement TYPE string
    ASSERT $value IN ['calculating', 'pending', 'approved', 'paid', 'disputed'];
DEFINE FIELD paid_at ON settlement TYPE option<datetime>;
DEFINE FIELD created_at ON settlement TYPE datetime DEFAULT time::now();

DEFINE INDEX sett_merchant_period ON settlement FIELDS merchant, period_start, period_end;

-- ============================================================
-- PARTNER API LOGS (audit trail)
-- ============================================================
DEFINE TABLE partner_api_log SCHEMAFULL;

DEFINE FIELD payment_id ON partner_api_log TYPE option<string>; -- domain-agnostic reference; actual table varies by service
DEFINE FIELD partner ON partner_api_log TYPE string;
DEFINE FIELD direction ON partner_api_log TYPE string
    ASSERT $value IN ['outbound', 'inbound'];
DEFINE FIELD method ON partner_api_log TYPE string;
DEFINE FIELD url ON partner_api_log TYPE string;
DEFINE FIELD request_headers ON partner_api_log TYPE option<object>;
DEFINE FIELD request_body ON partner_api_log TYPE option<string>;
DEFINE FIELD response_status ON partner_api_log TYPE option<int>;
DEFINE FIELD response_body ON partner_api_log TYPE option<string>;
DEFINE FIELD duration_ms ON partner_api_log TYPE int;
DEFINE FIELD error ON partner_api_log TYPE option<string>;
DEFINE FIELD created_at ON partner_api_log TYPE datetime DEFAULT time::now();

DEFINE INDEX pal_transaction ON partner_api_log FIELDS transaction;
DEFINE INDEX pal_created ON partner_api_log FIELDS created_at;

-- ============================================================
-- WEBHOOK DELIVERIES
-- ============================================================
DEFINE TABLE webhook_delivery SCHEMAFULL;

DEFINE FIELD merchant ON webhook_delivery TYPE record<merchant>;
DEFINE FIELD payment_id ON webhook_delivery TYPE option<string>; -- domain-agnostic reference; actual record type varies by service
DEFINE FIELD event_type ON webhook_delivery TYPE string;
DEFINE FIELD payload ON webhook_delivery TYPE object;
DEFINE FIELD url ON webhook_delivery TYPE string;
DEFINE FIELD status ON webhook_delivery TYPE string
    ASSERT $value IN ['pending', 'delivered', 'failed', 'dead_letter'];
DEFINE FIELD attempts ON webhook_delivery TYPE int DEFAULT 0;
DEFINE FIELD max_attempts ON webhook_delivery TYPE int DEFAULT 5;
DEFINE FIELD last_attempt_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD next_attempt_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD last_response_status ON webhook_delivery TYPE option<int>;
DEFINE FIELD last_error ON webhook_delivery TYPE option<string>;
DEFINE FIELD delivered_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD created_at ON webhook_delivery TYPE datetime DEFAULT time::now();

DEFINE INDEX wd_status_next ON webhook_delivery FIELDS status, next_attempt_at;
DEFINE INDEX wd_merchant ON webhook_delivery FIELDS merchant;

-- ============================================================
-- OPERATOR TOKENS (for recurring/tokenised payments)
-- ============================================================
DEFINE TABLE operator_token SCHEMAFULL;

DEFINE FIELD merchant ON operator_token TYPE record<merchant>;
DEFINE FIELD operator ON operator_token TYPE string;
DEFINE FIELD msisdn ON operator_token TYPE string;
DEFINE FIELD token_ref ON operator_token TYPE string;
DEFINE FIELD operator_token_id ON operator_token TYPE string;
DEFINE FIELD status ON operator_token TYPE string
    ASSERT $value IN ['active', 'expired', 'revoked'];
DEFINE FIELD created_at ON operator_token TYPE datetime DEFAULT time::now();
DEFINE FIELD expires_at ON operator_token TYPE option<datetime>;

DEFINE INDEX ot_merchant_msisdn ON operator_token FIELDS merchant, operator, msisdn;

Cache Tables (SurrealDB in-memory instance)

The in-memory SurrealDB instance stores only data that is acceptable to lose on restart: OTP state and FX quotes both have short TTLs and can be regenerated. Idempotency keys are not stored here — see the Persistent Tables section below.

-- ============================================================
-- IN-MEMORY CACHE DATABASE
-- Loss-tolerant, short-TTL data only. Do NOT store idempotency
-- keys here — restart would cause duplicate payment risk.
-- ============================================================
DEFINE NAMESPACE phoenix_cache;
USE NS phoenix_cache;
DEFINE DATABASE cache;
USE DB cache;

-- OTP state (short TTL, 5 minutes)
DEFINE TABLE otp_state SCHEMAFULL;
DEFINE FIELD transaction_id ON otp_state TYPE string;
DEFINE FIELD msisdn ON otp_state TYPE string;
DEFINE FIELD attempts ON otp_state TYPE int DEFAULT 0;
DEFINE FIELD max_attempts ON otp_state TYPE int DEFAULT 3;
DEFINE FIELD created_at ON otp_state TYPE datetime DEFAULT time::now();
DEFINE FIELD expires_at ON otp_state TYPE datetime;
DEFINE INDEX otp_tx ON otp_state FIELDS transaction_id UNIQUE;

-- FX rate quotes (short TTL, configurable per corridor)
DEFINE TABLE fx_quote SCHEMAFULL;
DEFINE FIELD quote_id ON fx_quote TYPE string;
DEFINE FIELD source_currency ON fx_quote TYPE string;
DEFINE FIELD target_currency ON fx_quote TYPE string;
DEFINE FIELD rate ON fx_quote TYPE decimal;
DEFINE FIELD expires_at ON fx_quote TYPE datetime;
DEFINE FIELD created_at ON fx_quote TYPE datetime DEFAULT time::now();
DEFINE INDEX fxq_id ON fx_quote FIELDS quote_id UNIQUE;

Idempotency Keys (SurrealDB persistent instance)

Idempotency keys are stored in the persistent SurrealDB instance (NS simpaisa; DB phoenix). This is critical: if idempotency state is lost on restart, a merchant retry could result in a duplicate payment charge (see legacy finding W-04). The 48-hour TTL is enforced by the application-layer janitor (sdk/cleanup), not by database expiry — SurrealDB v2 does not yet support native record TTL.

The schema lives in migrations/004_idempotency.surql. Services access idempotency records via the same SurrealDB connection used for all other persistent data.

2.3 SurrealQL Patterns for Payment Operations

Idempotent transaction creation (fail-closed):

-- Atomic: check idempotency key, create transaction, log state -- all in one query
BEGIN TRANSACTION;

-- Check idempotency (fail-closed: if cache is down, this query fails, request is rejected)
LET $existing = (SELECT * FROM idempotency_key
    WHERE merchant_id = $merchant_id AND key = $idempotency_key
    LIMIT 1);

-- If key exists, return cached response (handled in Go code)
-- If key does not exist, proceed:

-- NOTE: table name is domain-specific (e.g., `payment` for payin/card,
-- `disbursement` for payout, `transfer` for remittance)
LET $tx = (CREATE payment CONTENT {
    merchant: type::thing('merchant', $merchant_id),
    product: $product,
    reference: $reference,
    idempotency_key: $idempotency_key,
    amount: <decimal> $amount,
    currency: $currency,
    fee: <decimal> $fee,
    net_amount: <decimal> ($amount - $fee),
    status: 'initiated',
    operator: $operator,
    country: $country,
    payer: $payer,
    expires_at: time::now() + 30m
});

-- Log state transition
CREATE payment_state_log CONTENT {
    payment: $tx.id,
    from_status: NONE,
    to_status: 'initiated',
    actor: 'system',
    reason: 'Payment initiated'
};

COMMIT TRANSACTION;

Transaction state transition (atomic with audit):

BEGIN TRANSACTION;

-- NOTE: replace `payment` with the appropriate domain table name per service
LET $tx = (UPDATE payment
    SET status = $new_status,
        status_reason = $reason,
        partner_reference = $partner_ref,
        partner_status = $partner_status,
        completed_at = IF $new_status IN ['completed', 'failed', 'reversed']
            THEN time::now() ELSE completed_at END
    WHERE id = type::thing('payment', $tx_id)
        AND status = $expected_current_status
    RETURN AFTER);

-- Fail if optimistic lock violated (status changed concurrently)
IF array::len($tx) = 0 {
    THROW 'Optimistic lock failure: payment status has changed';
};

CREATE payment_state_log CONTENT {
    payment: type::thing('payment', $tx_id),
    from_status: $expected_current_status,
    to_status: $new_status,
    reason: $reason,
    actor: $actor,
    partner_response: $partner_response
};

COMMIT TRANSACTION;

Cursor-based pagination:

-- NOTE: replace `payment` with the appropriate domain table name per service
SELECT * FROM payment
    WHERE merchant = type::thing('merchant', $merchant_id)
        AND product = $product
        AND initiated_at < type::datetime($cursor)
    ORDER BY initiated_at DESC
    LIMIT $page_size;

2.4 RocksDB Storage Model

SurrealDB uses its embedded RocksDB backend for persistent storage. There is no separate distributed storage cluster. SurrealDB is started with a file:// path, and RocksDB handles on-disk persistence locally to the workload.

Component Instances Spec (per node) Location
SurrealDB (persistent, RocksDB) 2 4 vCPU, 16 GB RAM, 200 GB NVMe SSD PK region (data residency)
SurrealDB (in-memory cache) 2 4 vCPU, 16 GB RAM PK region

RocksDB provides durable, high-performance key-value storage embedded within the SurrealDB process. No external storage cluster, no Raft coordination layer, no separate placement driver nodes. The simplicity is intentional for the current scale; a distributed backend (TiKV or FoundationDB) can be adopted when horizontal write scaling is required.

2.5 Data Residency: PK Data Stays in PK

Architecture: - All SurrealDB (RocksDB) instances for PK transactions run on controlplane.com workloads pinned to the Pakistan region (or nearest available -- likely Mumbai, with VPN tunnel to PK-based infrastructure if SBP requires strict in-country) - KrakenD edge nodes on Cloudflare route PK traffic to PK backend - controlplane.com workloads tagged with region: pk get scheduled only to PK-designated infrastructure - SurrealDB namespace isolation: PK data in phoenix_pk namespace, other countries get their own namespace - Partner API logs (which contain PII) are stored in the same region as the transaction - If Simpaisa expands to Bangladesh/Nepal/Egypt, separate SurrealDB (RocksDB) instances per country, with SurrealDB multi-tenancy at the namespace level - EG transaction data stays in EG infrastructure (phoenix_eg namespace) in compliance with Egyptian data localisation requirements - No cross-region replication of transaction data -- each country is its own data island - Only aggregated, anonymised analytics data may leave the country of origin


3. API Design Principles

3.1 Standard Request/Response Envelope

Request envelope (for POST/PUT/PATCH):

{
  "data": {
    "reference": "merchant-ref-001",
    "amount": "1500.00",
    "currency": "PKR",
    "operator": "easypaisa",
    "payer": {
      "msisdn": "03001234567"
    }
  },
  "metadata": {
    "cnic": "4210112345678"
  }
}

Success response:

{
  "data": {
    "id": "tx_01HXYZ...",
    "reference": "merchant-ref-001",
    "status": "initiated",
    "amount": "1500.00",
    "currency": "PKR",
    "operator": "easypaisa",
    "created_at": "2026-04-03T14:30:00+05:00",
    "expires_at": "2026-04-03T15:00:00+05:00"
  },
  "links": {
    "self": "/api/v1/payin/transactions/tx_01HXYZ..."
  },
  "request_id": "req_abc123def456"
}

Paginated list response:

{
  "data": [ ... ],
  "links": {
    "self": "/api/v1/payin/transactions?cursor=2026-04-03T14:30:00Z&limit=25",
    "next": "/api/v1/payin/transactions?cursor=2026-04-03T12:00:00Z&limit=25"
  },
  "meta": {
    "count": 25,
    "has_more": true
  },
  "request_id": "req_abc123def456"
}

3.2 Error Format (Open Banking Aligned)

{
  "errors": [
    {
      "code": "INSUFFICIENT_BALANCE",
      "status": "0042",
      "message": "Merchant balance is insufficient for this disbursement",
      "path": "data.amount",
      "reference": "https://docs.simpaisa.com/errors/INSUFFICIENT_BALANCE"
    }
  ],
  "request_id": "req_abc123def456",
  "timestamp": "2026-04-03T14:30:00+05:00"
}

HTTP status codes follow REST conventions:

HTTP Status Usage
200 Successful query
201 Resource created (payment initiated)
400 Validation error
401 Authentication failure
403 Authorisation failure (valid token, insufficient scope)
404 Resource not found
409 Conflict (idempotency key reuse with different body)
422 Business rule violation (amount exceeds limit)
429 Rate limited (includes Retry-After header)
500 Internal server error
503 Service unavailable (partner down, circuit open)

The legacy status: "0000" numeric codes are preserved in the status field for backward compatibility during migration, but the primary identifier is the machine-readable code string.

3.3 Idempotency Implementation (Mandatory, Fail-Closed)

Rules: 1. X-Idempotency-Key header is mandatory on all POST endpoints. Missing key returns 400. 2. Key format: UUID v4, max 40 characters (Open Banking compatible). 3. Deduplication window: 48 hours. 4. Same key + same body = return cached response with original HTTP status. 5. Same key + different body = return 409 Conflict. 6. Fail-closed: If the SurrealDB persistent store is unreachable, the request is rejected with 503 -- never processed without idempotency protection. This directly addresses legacy finding W-04. 7. Idempotency keys are scoped per merchant (merchant A's key "abc" does not conflict with merchant B's key "abc").

Implementation in Go middleware (sdk/idempotency):

Request arrives -> Extract X-Idempotency-Key
  -> If missing: 400 Bad Request
  -> Query SurrealDB (persistent) for (merchant_id, key)
    -> If store unreachable: 503 Service Unavailable (FAIL CLOSED)
    -> If key exists:
      -> Compare request body hash
        -> Match: return cached response
        -> Mismatch: 409 Conflict
    -> If key not found:
      -> Store key with status "processing"
      -> Process request
      -> Store response with 48h TTL (cleaned up by sdk/cleanup janitor)
      -> Return response

3.4 Pagination (Cursor-Based)

All list endpoints use cursor-based pagination, not offset/page-number. This performs better at scale and avoids the "drifting page" problem with concurrent inserts.

  • Cursor is a datetime value (the initiated_at or created_at of the last item)
  • Default page size: 25
  • Maximum page size: 100
  • Response includes links.next with pre-built URL
  • Response includes meta.has_more boolean

3.5 Versioning Strategy

URL path versioning: /api/v1/payin/..., /api/v1/payout/...

  • Major versions only (v1, v2). No minor versions.
  • New version only on breaking changes.
  • Old versions supported for minimum 12 months after deprecation announcement.
  • Deprecation signalled via Sunset header and Deprecation header on responses.
  • KrakenD routes different versions to different service instances (blue/green per version).

3.6 Webhook Design

Event types:

payment.initiated
payment.otp_sent
payment.otp_verified
payment.processing
payment.completed
payment.failed
payment.reversed
payment.expired
payment.on_hold
payment.aml_review
settlement.calculated
settlement.paid

Webhook payload:

{
  "id": "evt_01HXYZ...",
  "event": "payment.completed",
  "timestamp": "2026-04-03T14:35:00+05:00",
  "data": {
    "transaction_id": "tx_01HXYZ...",
    "reference": "merchant-ref-001",
    "status": "completed",
    "amount": "1500.00",
    "currency": "PKR",
    "operator": "easypaisa",
    "partner_reference": "EP-20260403-12345",
    "completed_at": "2026-04-03T14:35:00+05:00"
  }
}

HMAC signing: - Algorithm: HMAC-SHA256 - Key: per-merchant webhook secret (generated on registration, rotatable) - Signed content: timestamp.raw_body (timestamp prevents replay attacks) - Header: X-Simpaisa-Signature: t=1712144100,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd - Merchants verify by computing HMAC of {timestamp}.{raw_body} with their secret

Retry with exponential backoff:

Attempt Delay Cumulative
1 Immediate 0s
2 1 minute 1m
3 5 minutes 6m
4 30 minutes 36m
5 2 hours 2h 36m

After 5 failed attempts, the delivery moves to dead-letter status. Merchants can query a dead-letter endpoint to retrieve missed webhooks. Dead-letter webhooks are retained for 30 days.


4. Security Architecture

4.1 OAuth 2.0 Flow for Merchants

Grant type: Client Credentials (RFC 6749 Section 4.4). This is server-to-server, no user interaction.

1. Merchant registers via onboarding (gets client_id + client_secret)
2. Merchant calls POST /api/v1/auth/token
   Authorization: Basic base64(client_id:client_secret)
   Content-Type: application/x-www-form-urlencoded
   Body: grant_type=client_credentials&scope=payin:write payin:read

3. phoenix-auth validates credentials against merchant_credential table
4. Issues JWT (RS256) with claims (see 4.2)
5. Token response:
   {
     "access_token": "eyJhbGciOi...",
     "token_type": "Bearer",
     "expires_in": 3600,
     "scope": "payin:write payin:read"
   }

6. Merchant includes token in subsequent requests:
   Authorization: Bearer eyJhbGciOi...

Token lifetime: 1 hour. No refresh tokens (Client Credentials flow -- merchant re-authenticates). Short-lived tokens reduce the blast radius of token compromise.

4.2 JWT Claims Structure

{
  "iss": "https://auth.simpaisa.com",
  "sub": "merchant:2000001",
  "aud": "https://api.simpaisa.com",
  "exp": 1712147700,
  "iat": 1712144100,
  "jti": "jwt_01HXYZ...",
  "scope": "payin:write payin:read payout:read",
  "merchant_id": "2000001",
  "merchant_name": "Acme Corp",
  "country": "PK",
  "tier": "premium",
  "products": ["payin", "payout"]
}

Scopes follow the pattern {product}:{action}: - payin:write -- initiate pay-in transactions - payin:read -- query pay-in transactions and status - payout:write -- initiate disbursements - payout:read -- query disbursements - remittance:write, remittance:read - card:write, card:read - webhook:manage -- manage webhook configuration - settlement:read -- query settlement reports

4.3 mTLS at KrakenD

  • KrakenD terminates mTLS from merchants
  • Merchant client certificates issued by Simpaisa's private CA (managed via Vault PKI secrets engine)
  • Certificate CN must match the client_id in the JWT
  • KrakenD validates: certificate chain, expiry, revocation (CRL/OCSP), CN match
  • Internal services behind KrakenD communicate over the controlplane.com service mesh (Envoy mTLS)
  • mTLS is mandatory for all production merchant connections. Sandbox allows TLS-only for easier testing.

4.4 WAF Rules (Cloudflare)

Cloudflare sits in front of KrakenD. Rules:

Rule Purpose
Rate limiting (L7) First line of defence before KrakenD's application-level rate limits
Bot management Block automated scanning, credential stuffing
Geo-blocking Only allow traffic from countries where Simpaisa operates (PK, BD, NP, IQ, AE, EG) + merchant-registered IPs
Request size limit 256 KB max body (payment requests are small)
SQL injection / XSS Managed ruleset (defence in depth, even though API is JSON-only)
TLS 1.2 minimum Block TLS 1.0/1.1
DDoS protection Cloudflare's standard L3/L4/L7 DDoS mitigation
IP reputation Block known-bad IPs from threat intelligence
Custom rules Block requests missing required headers (Authorization, Content-Type)

4.5 Secrets Management

HashiCorp Vault (already partially adopted in legacy) becomes the single source for all secrets:

Secret Type Vault Path Rotation
Partner API credentials secret/phoenix/{env}/partners/{partner_name} Manual, per partner requirement
Merchant webhook secrets secret/phoenix/{env}/merchants/{id}/webhook On merchant request
JWT signing keys (RS256) transit/phoenix/jwt-signing 90-day automatic rotation
Database credentials database/phoenix/{env} Dynamic, 24-hour lease
TLS certificates pki/phoenix/{env} Automatic, 30-day renewal
Encryption keys (AES-256) transit/phoenix/data-encryption Annual rotation with key versioning

Go services authenticate to Vault via Kubernetes auth method (controlplane.com provides the service account JWT). No credentials in environment variables, no credentials in code, no credentials in Git. Ever.

4.6 Encryption Standards

Purpose Algorithm Notes
Data at rest (field-level) AES-256-GCM Random 12-byte IV per operation. Replaces legacy AES-ECB.
Data in transit TLS 1.2+ Enforced at Cloudflare and KrakenD
Request signing RSA-2048 with SHA-256 For merchants that require request signing
Key wrapping RSA-OAEP with SHA-256 Replaces legacy PKCS1v1.5
Webhook signing HMAC-SHA256 Per-merchant shared secret
Password hashing Argon2id For any stored credentials
Token hashing SHA-256 Client secrets stored as hashes
PAN tokenisation Vault Transit engine Raw PAN never enters application layer. Direct response to legacy finding C-05.

Critical rule: No raw PAN data passes through Phoenix services. Card payments use Vault's tokenisation or the acquirer's hosted tokenisation. The application only handles token references.


5. Payment Flow Design

5.1 Pay-In Flow (Wallet OTP + Direct Charge)

Single Charge (OTP Flow):

Merchant                   KrakenD          phoenix-payin        SurrealDB       Operator (EP/JC)
   |                          |                   |                  |                  |
   |--POST /api/v1/payin/---->|                   |                  |                  |
   |  transactions/initiate   |                   |                  |                  |
   |  X-Idempotency-Key: abc  |--validate JWT---->|                  |                  |
   |                          |--rate limit OK---->|                  |                  |
   |                          |                   |--check idemp.--->|                  |
   |                          |                   |  key "abc"       |                  |
   |                          |                   |<--not found------|                  |
   |                          |                   |--validate merch->|                  |
   |                          |                   |  config, limits  |                  |
   |                          |                   |<--config---------|                  |
   |                          |                   |--CREATE tx------>|                  |
   |                          |                   |  status:initiated|                  |
   |                          |                   |                  |                  |
   |                          |                   |--IF otp_required:                   |
   |                          |                   |  send OTP via operator API---------->|
   |                          |                   |  store OTP state in cache            |
   |                          |                   |  UPDATE tx -> awaiting_authorisation |
   |                          |                   |                  |                  |
   |<---201 {status:awaiting_authorisation, tx_id}|                  |                  |
   |                          |                   |                  |                  |
   |--POST /api/v1/payin/---->|                   |                  |                  |
   |  transactions/{id}/verify|                   |                  |                  |
   |  {otp: "123456"}         |                   |                  |                  |
   |                          |                   |--check OTP state>|                  |
   |                          |                   |  verify attempts |                  |
   |                          |                   |--UPDATE tx ----->|                  |
   |                          |                   |  -> processing   |                  |
   |                          |                   |--call operator charge API----------->|
   |                          |                   |<--operator response-----------------|
   |                          |                   |--UPDATE tx ----->|                  |
   |                          |                   |  -> completed    |                  |
   |                          |                   |--publish NSQ --->| payment.completed|
   |<---200 {status:completed}|                   |                  |                  |
   |                          |                   |                  |                  |
   |<---WEBHOOK POST (async)--|---phoenix-webhook--|                  |                  |

Direct Charge (Tokenised/Recurring):

Same flow but skips OTP. Uses stored operator_token from a previous initial charge. The transactionType field (legacy values: 0=single, 1=initial, 8=recurring, 9=subscription) maps to: - type: "single" -- one-time charge with OTP - type: "initial" -- first charge that creates a token - type: "recurring" -- subsequent charge using token (no OTP) - type: "subscription" -- scheduled recurring charge

5.2 Payment State Machine

Note: Each service uses its own domain-specific table (payment, disbursement, transfer) but shares the same state machine structure. State names match the Go code exactly.

                                         +---> expired
                                         |     (TTL exceeded)
                                         |
initiated ---> awaiting_authorisation ---> processing ---> pending_partner
    |               |                                            |
    |               |                                      +-----+-----+
    |               v                                      |           |
    |           failed                                 completed    failed
    |         (max OTP attempts)                           |           |
    |                                                      v           v
    +---> failed                                       refunded  (terminal)
    (validation failure)                   partially_refunded

    processing ---> on_hold ---> processing (resumed)
                |
                +---> aml_review ---> processing | failed
                |
                +---> stuck ---> failed (after max retries)
                |
                +---> cancelled (merchant-initiated cancellation)

State glossary (code-canonical names):

State Description
initiated Payment record created, validation passed
awaiting_authorisation OTP sent, waiting for customer verification (replaces the earlier pending_otp draft name)
processing OTP verified or direct charge; partner call in flight
pending_partner Ambiguous partner response; inquiry polling active
completed Partner confirmed success
failed Terminal failure
cancelled Cancelled by merchant before completion
reversed Reversal confirmed by partner
refunded Full refund confirmed
partially_refunded Partial refund confirmed
expired TTL exceeded before completion
on_hold Manually held for review
aml_review Flagged for AML checks
stuck Exceeded maximum retries; requires manual intervention

Rules: - Every state transition is atomic (SurrealDB transaction + state log insert) - Optimistic locking: UPDATE only succeeds if current status matches expected status - Only valid transitions are permitted (enforced by a state machine in Go code, not just database constraints) - Every transition produces an NSQ event for webhook delivery

5.3 Partner Integration Abstraction (Adapter Pattern)

// sdk/partner/adapter.go

// Adapter is the interface every partner integration must implement.
type Adapter interface {
    // Initiate starts a payment with the partner.
    Initiate(ctx context.Context, req *InitiateRequest) (*InitiateResponse, error)

    // Verify confirms a payment (e.g., OTP verification).
    Verify(ctx context.Context, req *VerifyRequest) (*VerifyResponse, error)

    // Inquiry checks the status of a payment at the partner.
    Inquiry(ctx context.Context, req *InquiryRequest) (*InquiryResponse, error)

    // Reverse requests a reversal/refund.
    Reverse(ctx context.Context, req *ReverseRequest) (*ReverseResponse, error)

    // Name returns the partner identifier (e.g., "easypaisa", "jazzcash").
    Name() string

    // HealthCheck verifies the partner API is reachable.
    HealthCheck(ctx context.Context) error
}

Each partner gets its own implementation file (e.g., phoenix-payin/internal/adapter/easypaisa.go). The adapter: - Handles protocol differences (REST vs SOAP -- 1Link uses SOAP) - Maps Simpaisa's canonical request/response to the partner's format - Manages partner-specific authentication (Easypaisa storeId, JazzCash client credentials, 1Link certificate-based) - Logs all partner API calls to partner_api_log table - Wraps calls in a circuit breaker (using sony/gobreaker)

Adapter registry resolves the correct adapter at runtime based on operator code:

// phoenix-payin/internal/adapter/registry.go
func (r *Registry) Get(operator string) (partner.Adapter, error) {
    switch operator {
    case "easypaisa":
        return r.easypaisa, nil
    case "jazzcash":
        return r.jazzcash, nil
    // ...
    default:
        return nil, fmt.Errorf("unsupported operator: %s", operator)
    }
}

5.4 Retry and Reconciliation Strategy

Partner call retries: - Circuit breaker per partner: open after 5 consecutive failures, half-open after 30 seconds - Retries within a single request: max 2 retries with 1s, 3s backoff (only for network errors, never for business errors) - If partner returns ambiguous response (timeout, 5xx): transition to pending_partner, trigger inquiry

Reconciliation: - phoenix-reconciliation runs a scheduled job (configurable, default hourly) - Queries all transactions in pending_partner or processing state older than the expected partner SLA - Calls the partner's Inquiry API to get definitive status - Updates transaction state based on partner response - Flags transactions stuck beyond 2x SLA for manual review - Daily settlement calculation: aggregates completed transactions per merchant per operator per day

5.5 Settlement Flow

Daily cron (02:00 PKT) -> phoenix-reconciliation
    |
    |-- Query: all completed transactions for previous day, grouped by merchant + operator
    |-- Calculate: gross amount, fees (per product_config), net amount
    |-- CREATE settlement record (status: calculating -> pending)
    |-- Generate settlement report (stored in SurrealDB)
    |-- Publish NSQ: settlement.calculated
    |
    |-- Manual approval step (via internal admin API) -> status: approved
    |-- Bank transfer initiated -> status: paid
    |-- Publish NSQ: settlement.paid -> webhook to merchant

6. Configuration and Feature Flags

6.1 Configuration Hierarchy

Configuration follows a four-tier hierarchy with cascading overrides:

Tier 1: Defaults (compiled into sdk/config)
    ↓ overridden by
Tier 2: Environment (env vars, per controlplane.com workload)
    ↓ overridden by
Tier 3: Country (SurrealDB country_config table)
    ↓ overridden by
Tier 4: Merchant (SurrealDB product_config table)

Example: OTP expiry - Default: 300 seconds - Pakistan override: 180 seconds (SBP recommendation) - Merchant "Acme Corp" override: 120 seconds (merchant's preference)

6.2 Feature Flag System Design

DEFINE TABLE feature_flag SCHEMAFULL;

DEFINE FIELD key ON feature_flag TYPE string;
DEFINE FIELD description ON feature_flag TYPE string;
DEFINE FIELD default_value ON feature_flag TYPE bool DEFAULT false;
DEFINE FIELD overrides ON feature_flag TYPE array<object>;
-- overrides: [{scope: "country", value: "PK", enabled: true},
--             {scope: "merchant", value: "2000001", enabled: false}]
DEFINE FIELD created_at ON feature_flag TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON feature_flag TYPE datetime VALUE time::now();

DEFINE INDEX ff_key ON feature_flag FIELDS key UNIQUE;

Resolution order: 1. Check merchant-specific override 2. Check country-specific override 3. Check operator-specific override 4. Fall back to default value

Example flags:

Flag Key Purpose Default
payin.recurring.enabled Enable recurring/tokenised payments true
payout.batch.enabled Enable batch disbursement API false
remittance.aml_review.auto_approve Auto-approve AML review below threshold false
webhook.retry.max_attempts Override max webhook retry attempts 5
partner.easypaisa.direct_charge Enable Easypaisa direct charge (non-OTP) true

6.3 Environment-Based Config

Variable Dev Staging Production
SURREAL_URL ws://localhost:8000 ws://surreal-staging:8000 ws://surreal-prod:8000
SURREAL_NS phoenix_dev phoenix_staging phoenix
NSQ_LOOKUPD localhost:4161 nsqlookupd-staging:4161 nsqlookupd-prod:4161
VAULT_ADDR http://localhost:8200 https://vault-staging https://vault-prod
LOG_LEVEL debug info warn
OTEL_EXPORTER stdout jaeger-staging:4317 jaeger-prod:4317
ENV dev staging production

6.4 Hot-Reload Without Restart

Configuration stored in SurrealDB supports hot-reload:

  1. phoenix-merchant exposes an internal gRPC endpoint ConfigService.Reload() (planned; currently services read config directly from SurrealDB at startup)
  2. Each service subscribes to a SurrealDB LIVE SELECT on the product_config and feature_flag tables
  3. On change, the in-memory config cache is invalidated and reloaded
  4. Config changes take effect within 5 seconds (SurrealDB LIVE query push + local cache refresh)
  5. No service restart, no redeployment

For KrakenD configuration (rate limits, routing), changes require a KrakenD config reload. KrakenD supports SIGUSR1 for config reload without downtime. In controlplane.com, this is triggered by updating the KrakenD configmap and sending the signal.


7. Deployment Architecture

7.1 controlplane.com Service Topology

                              Internet
                                 |
                         [Cloudflare Edge]
                         WAF, DDoS, Geo-block
                                 |
                         [KrakenD Gateway]
                         JWT, mTLS, Rate Limit
                          (2 replicas, HA)
                                 |
                    +------------+------------+
                    |            |            |
              [phoenix-auth] [phoenix-merchant] [phoenix-webhook]
               (2 replicas)   (2 replicas)      (2 replicas)
                    |            |            |
              +-----+-----+-----+-----+------+
              |           |           |
        [phoenix-payin] [phoenix-payout] [phoenix-remittance]
         (3 replicas)    (2 replicas)     (2 replicas)
              |           |           |
              +-----+-----+-----+-----+
                    |           |
              [SurrealDB     [SurrealDB
               Persistent     In-Memory Cache]
               RocksDB]       (2 replicas, OTP/FX only)
              (2 replicas)
              [local NVMe SSD per replica]

              [NSQ Cluster]
              nsqd (3) + nsqlookupd (3)

              [Vault]
              (HA, 3 nodes)

              [Jaeger + Prometheus]
              Observability stack

Each workload on controlplane.com: - Auto-scaling based on CPU/memory (configurable min/max replicas) - Health checks via /healthz (liveness) and /readyz (readiness) - Resource limits enforced (prevents noisy neighbour) - Envoy sidecar for inter-service mTLS

7.2 Cloudflare Edge Config

  • DNS: api.simpaisa.com (single domain, replacing 7 legacy domains)
  • SSL/TLS: Full (strict) mode, minimum TLS 1.2
  • Caching: No caching (all requests are dynamic payment operations)
  • Page rules: Force HTTPS, security headers (HSTS, X-Content-Type-Options, X-Frame-Options)
  • Workers: Optional -- could add request transformation at edge if needed
  • Load balancing: Cloudflare LB with health checks to KrakenD backend
  • Argo Smart Routing: Enabled for optimal path to origin (reduces latency for PK/BD/NP/EG traffic)

7.3 NSQ Cluster

Component Instances Purpose
nsqd 3 Message storage and delivery
nsqlookupd 3 Service discovery for consumers
nsqadmin 1 Monitoring UI

NSQ configuration: - --mem-queue-size=10000 -- messages in memory before spilling to disk - --max-msg-size=1048576 -- 1 MB max message (payment events are small) - --msg-timeout=300s -- 5 minute processing timeout - --max-req-timeout=3600s -- max requeue delay for retries

7.4 Multi-Region Considerations

Phase 1 (Pakistan only): - All infrastructure in a single region (PK or nearest -- likely via VPN tunnel to PK-based hosting or Karachi-based cloud) - controlplane.com workloads pinned to PK region - SBP data residency: all PK transaction data stays in PK infrastructure

Phase 2 (Bangladesh/Nepal/Egypt): - Separate SurrealDB (RocksDB) instances per country - Shared KrakenD gateway with geo-routing - Shared phoenix-auth and phoenix-merchant (these don't hold transaction data) - Country-specific phoenix-payin deployments with country-specific partner adapters - SurrealDB namespace isolation: phoenix_pk, phoenix_bd, phoenix_np, phoenix_eg - EG infrastructure provisioned in Egyptian or nearest-compliant region; EGP currency support via sdk/money

Phase 3 (Multi-region HA): - Active-passive per country (failover to secondary region within same country) - No cross-country data replication (regulatory requirement) - Global control plane for configuration and monitoring


8. Migration Strategy

8.1 How Merchants Migrate from Legacy to Phoenix per Product

Principle: Gradual, per-merchant, per-product migration. No big bang.

Phase A: Shadow Mode
- Phoenix runs in parallel, receiving copies of production traffic
- Responses are logged but NOT returned to merchants
- Compare Phoenix responses with legacy responses
- Duration: 2-4 weeks per product

Phase B: Canary Migration
- Select 2-3 low-volume merchants per product
- Route their traffic to Phoenix
- Legacy remains available as fallback
- Duration: 2-4 weeks

Phase C: Progressive Rollout
- Migrate merchants in batches of 5-10
- Monitor error rates, latency, settlement accuracy
- Any merchant can be rolled back to legacy within minutes

Phase D: Legacy Sunset
- Once all merchants are on Phoenix for a product
- Legacy service enters read-only mode (queries only)
- After 90 days, legacy service is decommissioned

8.2 API Compatibility Layer / Translation Proxy

A translation proxy sits in front of Phoenix and translates legacy API requests to Phoenix format:

Legacy merchant request               Translation proxy              Phoenix
POST /v2/wallets/transaction/initiate  -> maps to ->  POST /api/v1/payin/transactions/initiate
{                                                     {
  "merchantId": "1000001",                              "data": {
  "amount": "1",                                          "reference": "<generated>",
  "msisdn": "34XXXXXXX",                                 "amount": "1.00",
  "operatorId": "100007",                                 "currency": "PKR",
  "transactionType": "0"                                  "operator": "easypaisa",
}                                                         "payer": { "msisdn": "34XXXXXXX" },
                                                          "type": "single"
                                                        }
                                                      }

The translation proxy: - Lives as a KrakenD plugin or lightweight Go service - Maps legacy URLs to Phoenix URLs - Maps legacy field names to Phoenix field names - Maps Phoenix error responses back to legacy format (numeric status codes, flat JSON) - Generates X-Idempotency-Key from legacy Request-Id or creates one if missing - Authenticates legacy merchants using their existing credentials (mapped to OAuth tokens internally)

This allows merchants to migrate at their own pace. Eager merchants adopt the new API directly; laggards use the compatibility layer indefinitely.

8.3 Rollback Strategy

  • KrakenD routing rules control which backend (legacy vs Phoenix) receives traffic per merchant
  • Rollback is a configuration change in KrakenD, not a code deployment
  • Takes effect in seconds
  • Transaction data created in Phoenix during the rollout period is retained (not lost on rollback)
  • If rollback occurs, the translation proxy routes back to legacy for that merchant

8.4 Data Migration Approach

Principle: Phoenix starts with a clean database. Historical data is not migrated.

  • Active transactions (in-flight at migration time) complete on the legacy system
  • Historical transaction data remains queryable in legacy MySQL (read-only)
  • A read-only legacy query API is maintained for 12 months post-migration for merchants that need historical data
  • Merchant configuration is migrated proactively: phoenix-merchant is seeded with all 40 merchant configs before any traffic migration
  • Operator tokens (for recurring payments) are migrated per merchant when they move to Phoenix

9. Development Workflow

9.1 Repo Structure (Monorepo)

All services live in a single GitHub monorepo (github.com/doreilly257/sp-apis) using Go workspace-style replace directives so each service module can reference the shared SDK locally without a published module registry.

github.com/doreilly257/sp-apis   (monorepo root)
  ├── sdk/                       -- Shared Go module (go.simpaisa.com/phoenix-sdk)
  ├── services/
  │   ├── auth/                  -- OAuth 2.0 service
  │   ├── merchant/              -- Merchant management service (+ gRPC)
  │   ├── payin/                 -- Pay-In service
  │   ├── payout/                -- Pay-Out service
  │   ├── remittance/            -- Remittance service
  │   ├── card/                  -- Card payment service
  │   └── webhook/               -- Webhook delivery service
  ├── infra/                     -- docker-compose, KrakenD config, Prometheus config
  ├── migrations/                -- SurrealDB schema scripts
  ├── specs/                     -- OpenAPI / protobuf definitions
  ├── build/                     -- Build tooling
  └── docs/                      -- Architecture and technical documentation

Each service has its own go.mod with a replace github.com/doreilly257/sp-apis/sdk => ../../sdk directive. This provides module isolation (each service can be built, tested, and containerised independently) while keeping all code in one place for ease of cross-service refactoring and a single CI/CD pipeline.

Note: The original architecture envisioned separate Bitbucket repos per service. The actual implementation uses a GitHub monorepo. The Bitbucket Pipelines CI/CD section below describes the intended pipeline shape; the actual CI configuration will be GitHub Actions or equivalent.

9.2 Individual Service Layout

Each service uses a flat, idiomatic Go layout. go-kratos conventions (biz/, data/, server/, conf/) are not used — only phoenix-merchant uses go-kratos (for its gRPC server). All other services use Echo v4.15 as the HTTP framework with manual dependency wiring in main.go. There is no Wire code generation.

services/remittance/          -- representative example
  cmd/
    main.go                   -- Entry point; manual dependency injection (no Wire)
  internal/
    adapter/                  -- Partner adapter implementations
      bankofasia.go
      faysalbank.go
      registry.go             -- Adapter registry
    config/                   -- Configuration loading (env vars)
      config.go
    event/                    -- NSQ producer/consumer wiring
      publisher.go
    handler/                  -- Echo HTTP handler layer (request parsing, response marshalling)
      transfer.go
    middleware/                -- Echo middleware (auth, correlation ID, logging)
      auth.go
    model/                    -- Domain model structs, status constants, state machine
      models.go
    repository/               -- SurrealDB data access layer
      transfer.go
    service/                  -- Business logic layer
      transfer.go
  go.mod
  go.sum

phoenix-merchant differs — it uses go-kratos for its gRPC server:

services/merchant/
  cmd/
    main.go
  internal/
    config/
    grpc/                     -- go-kratos gRPC server + protobuf handlers
    handler/                  -- Echo HTTP handlers
    model/
    repository/
    service/
  go.mod
  go.sum

9.3 Shared Go Module for Common Code

phoenix-sdk lives at sdk/ in the monorepo. Each service references it via a replace directive in its go.mod:

replace github.com/doreilly257/sp-apis/sdk => ../../sdk

This means no external module registry is required during development. When the platform matures and services need to be built independently, the SDK can be published to a module proxy (e.g., go.simpaisa.com/phoenix-sdk) and the replace directives removed.

Versioning (current): All services and the SDK are developed in lockstep within the monorepo. SDK changes are immediately visible to all services without a go get update cycle.

Versioning (future, if split): Semantic versioning with services pinning to explicit tags, updated via go get -u go.simpaisa.com/[email protected].

9.4 CI/CD Pipeline Design (Outline)

Pipeline (deferred implementation; the monorepo structure means a single pipeline with per-service jobs):

# Per-service pipeline (e.g., phoenix-payin)
pipelines:
  default:
    - step:
        name: Lint & Test
        script:
          - go vet ./...
          - golangci-lint run
          - go test -race -coverprofile=coverage.out ./...
          - go tool cover -func=coverage.out
        services:
          - surrealdb  # testcontainers for integration tests
    - step:
        name: Build
        script:
          - docker build -t phoenix-payin:${BITBUCKET_COMMIT} .
    - step:
        name: Push to Registry
        deployment: staging
        script:
          - docker push registry.simpaisa.com/phoenix-payin:${BITBUCKET_COMMIT}
    - step:
        name: Deploy to Staging
        deployment: staging
        trigger: manual
        script:
          - cpln workload update phoenix-payin --image registry.simpaisa.com/phoenix-payin:${BITBUCKET_COMMIT}

Quality gates (enforced in CI): - go vet and golangci-lint pass - Test coverage >= 80% - No nosec annotations without comment - Docker image scan (Trivy) - Protobuf backward compatibility check (buf breaking)

9.5 Developer Local Setup

phoenix-infra/docker-compose.yml provides the full local stack:

services:
  surrealdb:
    image: surrealdb/surrealdb:latest
    command: start --user root --pass root memory
    ports: ["8000:8000"]

  surrealdb-persistent:
    image: surrealdb/surrealdb:latest
    command: start --user root --pass root file:/data/surreal.db
    ports: ["8001:8000"]
    volumes: ["surreal-data:/data"]

  nsqd:
    image: nsqio/nsq
    command: /nsqd --lookupd-tcp-address=nsqlookupd:4160
    ports: ["4150:4150", "4151:4151"]

  nsqlookupd:
    image: nsqio/nsq
    command: /nsqlookupd
    ports: ["4160:4160", "4161:4161"]

  nsqadmin:
    image: nsqio/nsq
    command: /nsqadmin --lookupd-http-address=nsqlookupd:4161
    ports: ["4171:4171"]

  vault:
    image: hashicorp/vault:latest
    environment:
      VAULT_DEV_ROOT_TOKEN_ID: "dev-root-token"
    ports: ["8200:8200"]
    cap_add: [IPC_LOCK]

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports: ["16686:16686", "4317:4317"]

  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]

  krakend:
    image: devopsfaith/krakend:latest
    ports: ["8080:8080"]
    volumes: ["./krakend.json:/etc/krakend/krakend.json"]

Developer workflow: 1. docker compose up -d -- starts all infrastructure 2. make seed -- runs SurrealDB schema + test data 3. make run -- starts the service with hot-reload (using air) 4. Service runs on localhost:8080 via KrakenD, or directly on its own port for debugging


10. Phased Delivery Plan

Phase 1: Foundation (Weeks 1-8)

Goal: Shared libraries, gateway, auth, SurrealDB schema, and CI/CD scaffolding. No payment processing yet.

Week Deliverable Owner
1-2 phoenix-sdk v0.1: envelope, money, crypto, validation, observability packages CDO + Claude
1-2 Protobuf definitions for merchant gRPC service (in specs/) CDO + Claude
2-3 infra/: docker-compose, SurrealDB schema scripts, Vault dev setup CDO
3-4 phoenix-auth: OAuth 2.0 token issuance, JWT RS256, merchant credential CRUD Junior Go dev 1 + CDO
3-4 phoenix-merchant: merchant CRUD, product config, feature flags, gRPC service Junior Go dev 2 + CDO
5-6 phoenix-gateway: KrakenD configuration -- JWT validation, rate limiting, mTLS, routing CDO
5-6 phoenix-sdk v0.2: idempotency middleware, SurrealDB client, NSQ wrappers CDO + Claude
7-8 phoenix-webhook: webhook delivery engine, HMAC signing, retry, dead-letter Mid Java dev (learning Go) + CDO
7-8 Integration testing: auth -> gateway -> merchant -> webhook end-to-end All
8 controlplane.com staging deployment, Cloudflare DNS setup CDO

Milestone: A merchant can authenticate via OAuth 2.0, hit the gateway, and receive a properly formatted error response ("no such endpoint -- payin not yet deployed"). Webhook infrastructure is ready.

Phase 2: Pay-In API (Weeks 9-18)

Goal: Full Pay-In service replacing legacy wallet. First live merchant traffic.

Week Deliverable Owner
9-10 phoenix-payin scaffolding: Echo handlers, SurrealDB repo, NSQ integration Junior Go dev 1 + CDO
10-12 Easypaisa adapter: initiate, verify (OTP), inquiry, direct charge Junior Go dev 1 + Mid Java dev
10-12 JazzCash adapter Junior Go dev 2 + CDO
12-13 HBL Konnect, Alfa, JSBL Zindagi adapters Mid Java dev + Junior Go dev 2
13-14 Transaction state machine, OTP flow, recurring/tokenisation CDO + Claude
14-15 Translation proxy for legacy Pay-In API compatibility CDO
15-16 Shadow mode: mirror production traffic to Phoenix, compare responses All
16-17 Canary: 2-3 test merchants on Phoenix Pay-In CDO
17-18 Progressive rollout to remaining Pay-In merchants CDO

Milestone: All 40 Pay-In merchants on Phoenix. Legacy wallet service in read-only mode.

Phase 3: Pay-Out API (Weeks 19-26)

Goal: Full Pay-Out service replacing legacy disbursement stack.

Week Deliverable Owner
19-20 phoenix-payout scaffolding, phoenix-reconciliation scaffolding Junior devs + CDO
20-22 1Link IBFT adapter, Easypaisa disbursement adapter, JazzCash disbursement adapter 2 devs
22-23 HBL adapter (replacing the legacy pass-through proxy) 1 dev + CDO
23-24 Batch disbursement processing (replacing legacy scheduler) CDO
24-25 Settlement calculation and reporting Mid Java dev
25-26 Shadow mode, canary, progressive rollout All

Milestone: All Pay-Out merchants on Phoenix. Legacy disbursement-gateway, disbursement-scheduler, and disbursement-scheduler-sunshine decommissioned.

Phase 4: Remittance API (Weeks 27-34)

Goal: Full Remittance service replacing legacy sp-remittance-consumer and retry scheduler.

Week Deliverable Owner
27-28 phoenix-remittance scaffolding, FX quote management Junior devs
28-30 Bank of Asia adapter, Faysal Bank adapter 2 devs
30-31 Trust Bank adapter, 1Link adapter 2 devs
31-32 AML review workflow, multi-corridor routing CDO
32-33 Bangladesh/Nepal corridor configuration CDO + dev
33-34 Shadow, canary, rollout All

Milestone: All remittance corridors on Phoenix. Legacy remittance services decommissioned.

Phase 5: Card API (Weeks 35-42)

Goal: Full Card service replacing legacy cardbackend.

Week Deliverable Owner
35-36 phoenix-card scaffolding, Vault transit tokenisation setup CDO + dev
36-38 Alfalah MasterCard 3DS adapter (using Vault for PAN tokenisation -- no raw PAN in application) 2 devs
38-39 Safepay adapter 1 dev
39-40 Capture, void, refund flows 1 dev
40-41 PCI-DSS scope review (should be minimal with Vault tokenisation) CDO
41-42 Shadow, canary, rollout All

Milestone: All card merchants on Phoenix. Legacy cardbackend decommissioned. PCI-DSS scope significantly reduced.

Timeline Summary

Phase Duration Calendar (from start) Key Risk
Phase 1: Foundation 8 weeks Weeks 1-8 Team learning Go; SDK design decisions have cascading impact
Phase 2: Pay-In 10 weeks Weeks 9-18 First live traffic; partner adapter complexity (5 operators)
Phase 3: Pay-Out 8 weeks Weeks 19-26 Batch processing reliability; settlement accuracy
Phase 4: Remittance 8 weeks Weeks 27-34 Cross-border complexity; AML; FX; multiple bank APIs (incl. SOAP)
Phase 5: Card 8 weeks Weeks 35-42 PCI-DSS compliance; 3DS flow complexity

Total: approximately 42 weeks (10.5 months) from first commit to full migration.

Risk Mitigation

Risk Mitigation
Junior devs slow to learn Go Pair programming with Claude. Phase 1 is intentionally foundational -- learning Go while building non-critical shared libraries.
SurrealDB immaturity (newer database) Extensive integration testing via testcontainers. RocksDB (the embedded backend) is battle-tested and widely used. Maintain a fallback plan to PostgreSQL if SurrealDB proves unreliable.
Partner API changes during migration Adapter pattern isolates partner changes. Legacy and Phoenix can coexist indefinitely.
No dedicated DevOps controlplane.com is managed Kubernetes -- reduces ops burden. Cloudflare is managed CDN/WAF. Infrastructure-as-code in phoenix-infra repo. CDO handles infra directly.
2000 TPS target Go's concurrency model (goroutines) handles this comfortably. SurrealDB with RocksDB handles >10K reads/sec on a single node. KrakenD handles 70K+ req/sec. Bottleneck is partner APIs, not Phoenix.

Appendix A: Legacy to Phoenix Mapping

Legacy Service Legacy Repo Phoenix Service Notes
wallet simpaisa1/wallet phoenix-payin Consolidates 5 operator integrations
cardbackend simpaisa1/cardbackend phoenix-card PAN tokenised via Vault, not in-app
card-redirection-app simpaisa1/card-redirection-app phoenix-card 3DS redirect handled in-service
auto-void-scheduler simpaisa1/auto-void-scheduler phoenix-card Scheduled job within service
sp-card-refund-reversal simpaisa1/sp-card-refund-reversal phoenix-card Refund/reversal within service
sp-remittance-consumer simpaisa1/sp-remittance-consumer phoenix-remittance NSQ replaces Kafka
sp-remittance-scheduler-retry simpaisa1/sp-remittance-scheduler-retry phoenix-remittance Built-in retry, no separate scheduler
disbursement-gateway simpaisa1/disbursement-gateway phoenix-gateway (KrakenD) Zuul replaced by KrakenD
disbusrment-scheduler simpaisa1/disbusrment-scheduler phoenix-payout Integrated batch processing
disbursement-scheduler-sunshine simpaisa1/disbursement-scheduler-sunshine phoenix-payout Code fork eliminated
1bill simpaisa1/1bill phoenix-payin Bill payment as Pay-In variant

Appendix B: Audit Findings Addressed by Phoenix

Every P0 and P1 finding from the Codebase Audit is structurally eliminated by Phoenix's architecture:

Finding Legacy Issue Phoenix Resolution
CX-01 No Spring Security OAuth 2.0 + JWT + mTLS at gateway
CX-02 CORS wildcard KrakenD CORS policy, no wildcard
CX-04 HashMap request bodies Typed Go structs with validation tags
CX-05 No versioning /api/v1/ URL path versioning
CX-06 No rate limiting KrakenD rate limiting + Cloudflare L7
CX-10 No tracing OpenTelemetry (Jaeger + Prometheus)
CX-12 No tests testify + testcontainers, 80% coverage gate
W-01, R-02, D-01 Hardcoded credentials Vault for all secrets, dynamic credentials
W-03, C-01, D-07 AES-ECB AES-256-GCM via sdk/crypto
W-04 Idempotency fails open Fail-closed idempotency middleware
R-01 SSL disabled TLS 1.2+ enforced everywhere
R-05 No @Transactional SurrealDB transactions (BEGIN/COMMIT)
R-06 double for money shopspring/decimal via sdk/money
C-05 Raw PAN in app Vault Transit tokenisation
D-09 No row locking SurrealDB atomic transactions + optimistic locking

Critical Files for Implementation

  • /Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/Codebase-Audit.md - Source of all 60 legacy findings that Phoenix must structurally eliminate; the definitive reference for what went wrong
  • /Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/OpenBanking-Comparison-Audit.md - Open Banking UK gap analysis that defines the API design standards Phoenix adopts (error format, idempotency, pagination, signing)
  • /Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/PayIn PK Technical Specs.pdf - Legacy Pay-In flow documentation (single charge, recurring/tokenisation, operator codes, OTP handling) that Phoenix must replicate functionally
  • /Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/GitBook-API-Audit.md - Current merchant-facing API documentation audit; defines what merchants expect today and where the translation proxy must maintain compatibility
  • /Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/API-Best-Practices-Audit.md - 37 findings across all products with severity ratings; provides the prioritised checklist for Phoenix's security and reliability requirements