Now I have a thorough understanding of the existing codebase issues, the tech stack, and the capabilities of the key technologies. Let me produce the comprehensive architecture document.
Phoenix Architecture Document -- Simpaisa API Platform Rewrite¶
Document Metadata¶
- Project: Phoenix
- Organisation: Simpaisa Holdings
- Author: CDO (Daniel O'Reilly) with Claude as pair programmer
- Date: 2026-04-03
- Status: Architecture Design -- Pre-Implementation
- Classification: Internal -- Engineering Leadership
1. Service Architecture¶
1.1 Microservice Inventory¶
Phoenix decomposes into 7 services plus shared infrastructure. This is fewer than the legacy 15+ services because the legacy platform grew organically with code forks (e.g., two disbursement schedulers) and single-purpose services (e.g., separate card-refund-reversal, auto-void-scheduler). Phoenix consolidates by domain.
| Service | Domain | Repo Name | Priority |
|---|---|---|---|
| phoenix-gateway | KrakenD configuration, rate limiting, JWT validation, mTLS termination, CORS, IP whitelisting | phoenix-gateway |
Phase 1 |
| phoenix-auth | OAuth 2.0 token issuance, merchant credential management, key rotation | phoenix-auth |
Phase 1 |
| phoenix-merchant | Merchant onboarding, configuration, feature flags, product assignment, webhook registration | phoenix-merchant |
Phase 1 |
| phoenix-payin | Wallet pay-in (single charge, recurring/tokenisation), OTP flows, inquiry | phoenix-payin |
Phase 2 |
| phoenix-payout | Domestic disbursements (1Link IBFT, Easypaisa, JazzCash, HBL), batch processing | phoenix-payout |
Phase 3 |
| phoenix-remittance | Cross-border transfers (Bank of Asia, Faysal Bank, Trust Bank, 1Link), FX, AML | phoenix-remittance |
Phase 4 |
| phoenix-card | Card payments (Alfalah MasterCard, Safepay), 3DS, capture, void, refund | phoenix-card |
Phase 5 |
| phoenix-webhook | Outbound webhook delivery, retry, signing, dead-letter management | phoenix-webhook |
Phase 2 |
| phoenix-proxy | Translation proxy for legacy API backward compatibility (maps v2 requests → Phoenix format) | phoenix-proxy |
Phase 1 |
| phoenix-reconciliation | Settlement calculation, partner reconciliation, reporting | phoenix-reconciliation |
Phase 3 |
1.2 Communication Patterns¶
Synchronous (merchant-facing): HTTP/JSON via KrakenD gateway. All merchant traffic enters through KrakenD, which handles JWT validation, rate limiting, mTLS termination, and routing. Services expose Echo v4.15 HTTP handlers behind the gateway.
Synchronous (inter-service): gRPC inter-service communication is planned but not yet implemented. Services currently operate independently — each service owns its own domain data and does not make synchronous calls to other services. The sole exception is phoenix-merchant, which exposes a gRPC server for future consumption by payment services. When inter-service calls are introduced, they will use protobuf-defined contracts; go-kratos will provide service discovery integration via the Kubernetes registry plugin.
Asynchronous (events): NSQ for all event-driven processing. Key topics:
| Topic | Producer | Consumer(s) | Purpose |
|---|---|---|---|
payment.initiated |
payin, payout, remittance, card | webhook, reconciliation | New payment created |
payment.completed |
payin, payout, remittance, card | webhook, reconciliation | Payment succeeded |
payment.failed |
payin, payout, remittance, card | webhook, reconciliation | Payment failed |
payment.status_changed |
All payment services | webhook | Any state transition |
webhook.deliver |
webhook (self) | webhook | Retry delivery queue |
webhook.dead_letter |
webhook | reconciliation | Permanently failed deliveries |
settlement.calculate |
reconciliation (cron) | reconciliation | Daily settlement trigger |
partner.callback |
payin, payout, remittance | Respective service | Async partner responses |
Design rule: No service writes directly to another service's database tables. Cross-service data access is via NSQ event (current) or gRPC call (planned).
1.3 What Lives in KrakenD vs In Services¶
KrakenD handles ALL inbound cross-cutting concerns (stateless, config-driven):
- JWT RS256 validation (public key from phoenix-auth JWKS endpoint)
- Rate limiting (per-merchant, per-endpoint, sliding window) -- NO rate limiting in services
- Circuit breaking on backend services (inbound: merchant → Phoenix) -- NO inbound CB in services
- mTLS termination (client certificate validation)
- CORS policy enforcement (no wildcard -- explicit merchant origins)
- IP whitelisting (per-merchant, loaded from config)
- Request/response transformation (envelope wrapping)
- Request logging and correlation ID injection (X-Request-Id)
- OpenAPI spec serving
- Bot detection, request validation, payload size limits
Services handle ONLY business logic and outbound resilience: - Authentication token issuance (phoenix-auth only -- issues tokens, not validates them) - Business validation (amount limits, operator availability, merchant product access) - Partner API orchestration (adapter pattern) - Transaction state management - Idempotency enforcement (fail-closed, SurrealDB-backed -- KrakenD cannot do this) - Webhook payload construction and HMAC signing - Encryption/decryption of sensitive data - Outbound resilience (Phoenix → partner banks/wallets): - Partner-level circuit breaker (sdk/resilience -- per partner, not per merchant) - Retry with exponential backoff on partner calls - Bulkhead (concurrency limiter per partner) - Partner health monitoring and smart routing - Client-side deduplication for partners without idempotency
Principle: KrakenD owns merchant→Phoenix. Services own Phoenix→partner. Services MUST NOT implement rate limiting, JWT validation, or inbound circuit breaking. Services trust that KrakenD has validated the JWT and extracted X-Merchant-Id.
1.4 Shared Libraries vs Service-Specific Code¶
Shared Go module: phoenix-sdk (lives in sdk/ within the monorepo, imported via replace directive as github.com/doreilly257/sp-apis/sdk)
| Package | Contents |
|---|---|
sdk/envelope |
Standard request/response envelope, error types, pagination |
sdk/money |
Decimal money type (wraps shopspring/decimal), currency codes, rounding rules |
sdk/crypto |
AES-256-GCM encrypt/decrypt, RSA-OAEP, HMAC-SHA256, key loading from Vault |
sdk/idempotency |
Idempotency middleware (SurrealDB-backed, fail-closed) |
sdk/partner |
Partner adapter interface, circuit breaker wrapper, retry policy |
sdk/observability |
OpenTelemetry setup (tracer, meter, logger), correlation ID propagation |
sdk/surreal |
SurrealDB client wrapper, connection pooling, health checks |
sdk/nsq |
NSQ producer/consumer wrappers, dead-letter handling, message serialisation |
sdk/config |
Configuration loader (env vars, SurrealDB config table, hot-reload) |
sdk/auth |
JWT claims parsing, merchant context extraction middleware |
sdk/validation |
Common field validators (MSISDN format, IBAN, CNIC, amount ranges) |
sdk/webhook |
HMAC signing, payload construction, event type constants |
sdk/testutil |
Test helpers, SurrealDB test container setup, NSQ test helpers |
Service-specific code lives entirely within each service repo. This includes:
- Partner adapters (e.g., Easypaisa adapter lives in phoenix-payin, not in phoenix-sdk)
- Service-specific SurrealDB queries
- Business rules unique to a product domain
1.5 Service Mesh / Discovery on controlplane.com¶
controlplane.com provides a managed Kubernetes environment with Envoy-based service mesh. The topology:
- Each Phoenix service deploys as a workload on controlplane.com
- Inter-service communication uses internal DNS (e.g.,
phoenix-auth.phoenix.svc.cluster.local) - Service discovery (when inter-service gRPC is enabled) will use the Kubernetes registry plugin (
github.com/go-kratos/kratos/contrib/registry/kubernetes) viaphoenix-merchant's go-kratos integration - TLS between services is handled by the mesh (mTLS between Envoy sidecars) -- services communicate in plaintext locally, encrypted in transit
- Health checks via
/healthz(liveness) and/readyz(readiness) endpoints - Country namespaces:
phoenix_pk(Pakistan),phoenix_bd(Bangladesh),phoenix_np(Nepal),phoenix_eg(Egypt) — each isolated per data residency policy
2. Data Architecture¶
2.1 SurrealDB Schema Design¶
SurrealDB serves dual roles: persistent store (with RocksDB embedded backend) and in-memory cache (for truly ephemeral data). The same database engine, two deployment modes:
- Persistent instance: SurrealDB with RocksDB storage backend (
file://connection) for transaction records, merchant data, settlements, and idempotency keys - Cache instance: SurrealDB in-memory mode (
memoryconnection) for short-lived, loss-tolerant data only: OTP state (5-minute TTL) and FX rate quotes (corridor-specific TTL)
All tables are SCHEMAFULL to enforce data integrity -- a direct response to the legacy platform's HashMap<String, Object> chaos.
2.2 Table Structure¶
Core Domain Tables¶
-- ============================================================
-- NAMESPACE & DATABASE
-- ============================================================
DEFINE NAMESPACE phoenix;
USE NS phoenix;
DEFINE DATABASE payments;
USE DB payments;
-- ============================================================
-- MERCHANTS
-- ============================================================
DEFINE TABLE merchant SCHEMAFULL;
DEFINE FIELD name ON merchant TYPE string;
DEFINE FIELD legal_name ON merchant TYPE string;
DEFINE FIELD status ON merchant TYPE string
ASSERT $value IN ['active', 'suspended', 'onboarding', 'terminated'];
DEFINE FIELD country ON merchant TYPE string ASSERT string::len($value) = 2;
DEFINE FIELD currency ON merchant TYPE string ASSERT string::len($value) = 3;
DEFINE FIELD tier ON merchant TYPE string
ASSERT $value IN ['standard', 'premium', 'enterprise'];
DEFINE FIELD products ON merchant TYPE array<string>;
DEFINE FIELD webhook_url ON merchant TYPE option<string>;
DEFINE FIELD webhook_secret ON merchant TYPE option<string>;
DEFINE FIELD ip_whitelist ON merchant TYPE array<string>;
DEFINE FIELD rate_limits ON merchant TYPE object;
DEFINE FIELD metadata ON merchant TYPE option<object>;
DEFINE FIELD created_at ON merchant TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON merchant TYPE datetime VALUE time::now();
DEFINE INDEX merchant_status ON merchant FIELDS status;
DEFINE INDEX merchant_country ON merchant FIELDS country;
-- ============================================================
-- MERCHANT CREDENTIALS (for OAuth 2.0)
-- ============================================================
DEFINE TABLE merchant_credential SCHEMAFULL;
DEFINE FIELD merchant ON merchant_credential TYPE record<merchant>;
DEFINE FIELD client_id ON merchant_credential TYPE string;
DEFINE FIELD client_secret_hash ON merchant_credential TYPE string;
DEFINE FIELD scopes ON merchant_credential TYPE array<string>;
DEFINE FIELD status ON merchant_credential TYPE string
ASSERT $value IN ['active', 'revoked', 'expired'];
DEFINE FIELD expires_at ON merchant_credential TYPE option<datetime>;
DEFINE FIELD created_at ON merchant_credential TYPE datetime DEFAULT time::now();
DEFINE INDEX cred_client_id ON merchant_credential FIELDS client_id UNIQUE;
DEFINE INDEX cred_merchant ON merchant_credential FIELDS merchant;
-- ============================================================
-- MERCHANT PRODUCT CONFIGURATION
-- ============================================================
DEFINE TABLE product_config SCHEMAFULL;
DEFINE FIELD merchant ON product_config TYPE record<merchant>;
DEFINE FIELD product ON product_config TYPE string
ASSERT $value IN ['payin', 'payout', 'remittance', 'card'];
DEFINE FIELD operator ON product_config TYPE string;
DEFINE FIELD country ON product_config TYPE string;
DEFINE FIELD enabled ON product_config TYPE bool DEFAULT true;
DEFINE FIELD min_amount ON product_config TYPE decimal;
DEFINE FIELD max_amount ON product_config TYPE decimal;
DEFINE FIELD daily_limit ON product_config TYPE option<decimal>;
DEFINE FIELD monthly_limit ON product_config TYPE option<decimal>;
DEFINE FIELD otp_required ON product_config TYPE bool DEFAULT true;
DEFINE FIELD otp_expiry_seconds ON product_config TYPE int DEFAULT 300;
DEFINE FIELD settlement_schedule ON product_config TYPE string
ASSERT $value IN ['realtime', 't_plus_1', 't_plus_2', 'weekly'];
DEFINE FIELD partner_credentials_ref ON product_config TYPE string;
DEFINE FIELD feature_flags ON product_config TYPE option<object>;
DEFINE FIELD created_at ON product_config TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON product_config TYPE datetime VALUE time::now();
DEFINE INDEX pc_merchant_product ON product_config FIELDS merchant, product, operator UNIQUE;
-- ============================================================
-- DOMAIN-SPECIFIC PAYMENT TABLES
-- NOTE: Each service uses its own domain-specific table name rather than
-- a shared `transaction` table. The schema structure below is representative
-- of the common shape; actual table names are:
-- phoenix-payin → `payment`
-- phoenix-payout → `disbursement`
-- phoenix-remittance → `transfer`
-- phoenix-card → `payment`
-- Services do NOT share a central transaction ledger at the database layer.
-- Cross-service aggregation happens at the reconciliation layer via NSQ events.
-- ============================================================
DEFINE TABLE payment SCHEMAFULL; -- example: payin + card use `payment`
DEFINE FIELD merchant ON payment TYPE record<merchant>;
DEFINE FIELD product ON payment TYPE string
ASSERT $value IN ['payin', 'payout', 'remittance', 'card'];
DEFINE FIELD reference ON payment TYPE string;
DEFINE FIELD idempotency_key ON payment TYPE string;
DEFINE FIELD amount ON payment TYPE decimal;
DEFINE FIELD currency ON payment TYPE string ASSERT string::len($value) = 3;
DEFINE FIELD fee ON payment TYPE decimal DEFAULT 0;
DEFINE FIELD net_amount ON payment TYPE decimal;
DEFINE FIELD status ON payment TYPE string
ASSERT $value IN [
'initiated', 'awaiting_authorisation', 'processing',
'pending_partner', 'completed', 'failed', 'cancelled',
'reversed', 'refunded', 'partially_refunded',
'expired', 'on_hold', 'aml_review', 'stuck'
];
-- NOTE: `awaiting_authorisation` is the code-canonical name for the OTP-pending
-- state. Earlier architecture drafts used `pending_otp` — the code uses
-- `awaiting_authorisation` throughout.
DEFINE FIELD status_reason ON payment TYPE option<string>;
DEFINE FIELD operator ON payment TYPE string;
DEFINE FIELD country ON payment TYPE string;
DEFINE FIELD partner_reference ON payment TYPE option<string>;
DEFINE FIELD partner_status ON payment TYPE option<string>;
DEFINE FIELD payer ON payment TYPE object;
DEFINE FIELD payee ON payment TYPE option<object>;
DEFINE FIELD metadata ON payment TYPE option<object>;
DEFINE FIELD initiated_at ON payment TYPE datetime DEFAULT time::now();
DEFINE FIELD completed_at ON payment TYPE option<datetime>;
DEFINE FIELD updated_at ON payment TYPE datetime VALUE time::now();
DEFINE FIELD expires_at ON payment TYPE option<datetime>;
DEFINE INDEX tx_idempotency ON payment FIELDS merchant, idempotency_key UNIQUE;
DEFINE INDEX tx_reference ON payment FIELDS merchant, reference UNIQUE;
DEFINE INDEX tx_status ON payment FIELDS status;
DEFINE INDEX tx_merchant_product ON payment FIELDS merchant, product, initiated_at;
DEFINE INDEX tx_partner_ref ON payment FIELDS partner_reference;
DEFINE INDEX tx_initiated_at ON payment FIELDS initiated_at;
-- ============================================================
-- STATE LOG (audit trail, append-only) — table name follows the domain table
-- e.g., `payment_state_log`, `transfer_state_log`, `disbursement_state_log`
-- ============================================================
DEFINE TABLE payment_state_log SCHEMAFULL;
DEFINE FIELD payment ON payment_state_log TYPE record<payment>;
DEFINE FIELD from_status ON payment_state_log TYPE option<string>;
DEFINE FIELD to_status ON payment_state_log TYPE string;
DEFINE FIELD reason ON payment_state_log TYPE option<string>;
DEFINE FIELD actor ON payment_state_log TYPE string;
DEFINE FIELD partner_response ON payment_state_log TYPE option<object>;
DEFINE FIELD created_at ON payment_state_log TYPE datetime DEFAULT time::now();
DEFINE INDEX tsl_payment ON payment_state_log FIELDS payment, created_at;
-- ============================================================
-- SETTLEMENTS
-- ============================================================
DEFINE TABLE settlement SCHEMAFULL;
DEFINE FIELD merchant ON settlement TYPE record<merchant>;
DEFINE FIELD product ON settlement TYPE string;
DEFINE FIELD operator ON settlement TYPE string;
DEFINE FIELD country ON settlement TYPE string;
DEFINE FIELD period_start ON settlement TYPE datetime;
DEFINE FIELD period_end ON settlement TYPE datetime;
DEFINE FIELD transaction_count ON settlement TYPE int;
DEFINE FIELD gross_amount ON settlement TYPE decimal;
DEFINE FIELD total_fees ON settlement TYPE decimal;
DEFINE FIELD net_amount ON settlement TYPE decimal;
DEFINE FIELD currency ON settlement TYPE string;
DEFINE FIELD status ON settlement TYPE string
ASSERT $value IN ['calculating', 'pending', 'approved', 'paid', 'disputed'];
DEFINE FIELD paid_at ON settlement TYPE option<datetime>;
DEFINE FIELD created_at ON settlement TYPE datetime DEFAULT time::now();
DEFINE INDEX sett_merchant_period ON settlement FIELDS merchant, period_start, period_end;
-- ============================================================
-- PARTNER API LOGS (audit trail)
-- ============================================================
DEFINE TABLE partner_api_log SCHEMAFULL;
DEFINE FIELD payment_id ON partner_api_log TYPE option<string>; -- domain-agnostic reference; actual table varies by service
DEFINE FIELD partner ON partner_api_log TYPE string;
DEFINE FIELD direction ON partner_api_log TYPE string
ASSERT $value IN ['outbound', 'inbound'];
DEFINE FIELD method ON partner_api_log TYPE string;
DEFINE FIELD url ON partner_api_log TYPE string;
DEFINE FIELD request_headers ON partner_api_log TYPE option<object>;
DEFINE FIELD request_body ON partner_api_log TYPE option<string>;
DEFINE FIELD response_status ON partner_api_log TYPE option<int>;
DEFINE FIELD response_body ON partner_api_log TYPE option<string>;
DEFINE FIELD duration_ms ON partner_api_log TYPE int;
DEFINE FIELD error ON partner_api_log TYPE option<string>;
DEFINE FIELD created_at ON partner_api_log TYPE datetime DEFAULT time::now();
DEFINE INDEX pal_transaction ON partner_api_log FIELDS transaction;
DEFINE INDEX pal_created ON partner_api_log FIELDS created_at;
-- ============================================================
-- WEBHOOK DELIVERIES
-- ============================================================
DEFINE TABLE webhook_delivery SCHEMAFULL;
DEFINE FIELD merchant ON webhook_delivery TYPE record<merchant>;
DEFINE FIELD payment_id ON webhook_delivery TYPE option<string>; -- domain-agnostic reference; actual record type varies by service
DEFINE FIELD event_type ON webhook_delivery TYPE string;
DEFINE FIELD payload ON webhook_delivery TYPE object;
DEFINE FIELD url ON webhook_delivery TYPE string;
DEFINE FIELD status ON webhook_delivery TYPE string
ASSERT $value IN ['pending', 'delivered', 'failed', 'dead_letter'];
DEFINE FIELD attempts ON webhook_delivery TYPE int DEFAULT 0;
DEFINE FIELD max_attempts ON webhook_delivery TYPE int DEFAULT 5;
DEFINE FIELD last_attempt_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD next_attempt_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD last_response_status ON webhook_delivery TYPE option<int>;
DEFINE FIELD last_error ON webhook_delivery TYPE option<string>;
DEFINE FIELD delivered_at ON webhook_delivery TYPE option<datetime>;
DEFINE FIELD created_at ON webhook_delivery TYPE datetime DEFAULT time::now();
DEFINE INDEX wd_status_next ON webhook_delivery FIELDS status, next_attempt_at;
DEFINE INDEX wd_merchant ON webhook_delivery FIELDS merchant;
-- ============================================================
-- OPERATOR TOKENS (for recurring/tokenised payments)
-- ============================================================
DEFINE TABLE operator_token SCHEMAFULL;
DEFINE FIELD merchant ON operator_token TYPE record<merchant>;
DEFINE FIELD operator ON operator_token TYPE string;
DEFINE FIELD msisdn ON operator_token TYPE string;
DEFINE FIELD token_ref ON operator_token TYPE string;
DEFINE FIELD operator_token_id ON operator_token TYPE string;
DEFINE FIELD status ON operator_token TYPE string
ASSERT $value IN ['active', 'expired', 'revoked'];
DEFINE FIELD created_at ON operator_token TYPE datetime DEFAULT time::now();
DEFINE FIELD expires_at ON operator_token TYPE option<datetime>;
DEFINE INDEX ot_merchant_msisdn ON operator_token FIELDS merchant, operator, msisdn;
Cache Tables (SurrealDB in-memory instance)¶
The in-memory SurrealDB instance stores only data that is acceptable to lose on restart: OTP state and FX quotes both have short TTLs and can be regenerated. Idempotency keys are not stored here — see the Persistent Tables section below.
-- ============================================================
-- IN-MEMORY CACHE DATABASE
-- Loss-tolerant, short-TTL data only. Do NOT store idempotency
-- keys here — restart would cause duplicate payment risk.
-- ============================================================
DEFINE NAMESPACE phoenix_cache;
USE NS phoenix_cache;
DEFINE DATABASE cache;
USE DB cache;
-- OTP state (short TTL, 5 minutes)
DEFINE TABLE otp_state SCHEMAFULL;
DEFINE FIELD transaction_id ON otp_state TYPE string;
DEFINE FIELD msisdn ON otp_state TYPE string;
DEFINE FIELD attempts ON otp_state TYPE int DEFAULT 0;
DEFINE FIELD max_attempts ON otp_state TYPE int DEFAULT 3;
DEFINE FIELD created_at ON otp_state TYPE datetime DEFAULT time::now();
DEFINE FIELD expires_at ON otp_state TYPE datetime;
DEFINE INDEX otp_tx ON otp_state FIELDS transaction_id UNIQUE;
-- FX rate quotes (short TTL, configurable per corridor)
DEFINE TABLE fx_quote SCHEMAFULL;
DEFINE FIELD quote_id ON fx_quote TYPE string;
DEFINE FIELD source_currency ON fx_quote TYPE string;
DEFINE FIELD target_currency ON fx_quote TYPE string;
DEFINE FIELD rate ON fx_quote TYPE decimal;
DEFINE FIELD expires_at ON fx_quote TYPE datetime;
DEFINE FIELD created_at ON fx_quote TYPE datetime DEFAULT time::now();
DEFINE INDEX fxq_id ON fx_quote FIELDS quote_id UNIQUE;
Idempotency Keys (SurrealDB persistent instance)¶
Idempotency keys are stored in the persistent SurrealDB instance (NS simpaisa; DB phoenix). This is critical: if idempotency state is lost on restart, a merchant retry could result in a duplicate payment charge (see legacy finding W-04). The 48-hour TTL is enforced by the application-layer janitor (sdk/cleanup), not by database expiry — SurrealDB v2 does not yet support native record TTL.
The schema lives in migrations/004_idempotency.surql. Services access idempotency records via the same SurrealDB connection used for all other persistent data.
2.3 SurrealQL Patterns for Payment Operations¶
Idempotent transaction creation (fail-closed):
-- Atomic: check idempotency key, create transaction, log state -- all in one query
BEGIN TRANSACTION;
-- Check idempotency (fail-closed: if cache is down, this query fails, request is rejected)
LET $existing = (SELECT * FROM idempotency_key
WHERE merchant_id = $merchant_id AND key = $idempotency_key
LIMIT 1);
-- If key exists, return cached response (handled in Go code)
-- If key does not exist, proceed:
-- NOTE: table name is domain-specific (e.g., `payment` for payin/card,
-- `disbursement` for payout, `transfer` for remittance)
LET $tx = (CREATE payment CONTENT {
merchant: type::thing('merchant', $merchant_id),
product: $product,
reference: $reference,
idempotency_key: $idempotency_key,
amount: <decimal> $amount,
currency: $currency,
fee: <decimal> $fee,
net_amount: <decimal> ($amount - $fee),
status: 'initiated',
operator: $operator,
country: $country,
payer: $payer,
expires_at: time::now() + 30m
});
-- Log state transition
CREATE payment_state_log CONTENT {
payment: $tx.id,
from_status: NONE,
to_status: 'initiated',
actor: 'system',
reason: 'Payment initiated'
};
COMMIT TRANSACTION;
Transaction state transition (atomic with audit):
BEGIN TRANSACTION;
-- NOTE: replace `payment` with the appropriate domain table name per service
LET $tx = (UPDATE payment
SET status = $new_status,
status_reason = $reason,
partner_reference = $partner_ref,
partner_status = $partner_status,
completed_at = IF $new_status IN ['completed', 'failed', 'reversed']
THEN time::now() ELSE completed_at END
WHERE id = type::thing('payment', $tx_id)
AND status = $expected_current_status
RETURN AFTER);
-- Fail if optimistic lock violated (status changed concurrently)
IF array::len($tx) = 0 {
THROW 'Optimistic lock failure: payment status has changed';
};
CREATE payment_state_log CONTENT {
payment: type::thing('payment', $tx_id),
from_status: $expected_current_status,
to_status: $new_status,
reason: $reason,
actor: $actor,
partner_response: $partner_response
};
COMMIT TRANSACTION;
Cursor-based pagination:
-- NOTE: replace `payment` with the appropriate domain table name per service
SELECT * FROM payment
WHERE merchant = type::thing('merchant', $merchant_id)
AND product = $product
AND initiated_at < type::datetime($cursor)
ORDER BY initiated_at DESC
LIMIT $page_size;
2.4 RocksDB Storage Model¶
SurrealDB uses its embedded RocksDB backend for persistent storage. There is no separate distributed storage cluster. SurrealDB is started with a file:// path, and RocksDB handles on-disk persistence locally to the workload.
| Component | Instances | Spec (per node) | Location |
|---|---|---|---|
| SurrealDB (persistent, RocksDB) | 2 | 4 vCPU, 16 GB RAM, 200 GB NVMe SSD | PK region (data residency) |
| SurrealDB (in-memory cache) | 2 | 4 vCPU, 16 GB RAM | PK region |
RocksDB provides durable, high-performance key-value storage embedded within the SurrealDB process. No external storage cluster, no Raft coordination layer, no separate placement driver nodes. The simplicity is intentional for the current scale; a distributed backend (TiKV or FoundationDB) can be adopted when horizontal write scaling is required.
2.5 Data Residency: PK Data Stays in PK¶
Architecture:
- All SurrealDB (RocksDB) instances for PK transactions run on controlplane.com workloads pinned to the Pakistan region (or nearest available -- likely Mumbai, with VPN tunnel to PK-based infrastructure if SBP requires strict in-country)
- KrakenD edge nodes on Cloudflare route PK traffic to PK backend
- controlplane.com workloads tagged with region: pk get scheduled only to PK-designated infrastructure
- SurrealDB namespace isolation: PK data in phoenix_pk namespace, other countries get their own namespace
- Partner API logs (which contain PII) are stored in the same region as the transaction
- If Simpaisa expands to Bangladesh/Nepal/Egypt, separate SurrealDB (RocksDB) instances per country, with SurrealDB multi-tenancy at the namespace level
- EG transaction data stays in EG infrastructure (phoenix_eg namespace) in compliance with Egyptian data localisation requirements
- No cross-region replication of transaction data -- each country is its own data island
- Only aggregated, anonymised analytics data may leave the country of origin
3. API Design Principles¶
3.1 Standard Request/Response Envelope¶
Request envelope (for POST/PUT/PATCH):
{
"data": {
"reference": "merchant-ref-001",
"amount": "1500.00",
"currency": "PKR",
"operator": "easypaisa",
"payer": {
"msisdn": "03001234567"
}
},
"metadata": {
"cnic": "4210112345678"
}
}
Success response:
{
"data": {
"id": "tx_01HXYZ...",
"reference": "merchant-ref-001",
"status": "initiated",
"amount": "1500.00",
"currency": "PKR",
"operator": "easypaisa",
"created_at": "2026-04-03T14:30:00+05:00",
"expires_at": "2026-04-03T15:00:00+05:00"
},
"links": {
"self": "/api/v1/payin/transactions/tx_01HXYZ..."
},
"request_id": "req_abc123def456"
}
Paginated list response:
{
"data": [ ... ],
"links": {
"self": "/api/v1/payin/transactions?cursor=2026-04-03T14:30:00Z&limit=25",
"next": "/api/v1/payin/transactions?cursor=2026-04-03T12:00:00Z&limit=25"
},
"meta": {
"count": 25,
"has_more": true
},
"request_id": "req_abc123def456"
}
3.2 Error Format (Open Banking Aligned)¶
{
"errors": [
{
"code": "INSUFFICIENT_BALANCE",
"status": "0042",
"message": "Merchant balance is insufficient for this disbursement",
"path": "data.amount",
"reference": "https://docs.simpaisa.com/errors/INSUFFICIENT_BALANCE"
}
],
"request_id": "req_abc123def456",
"timestamp": "2026-04-03T14:30:00+05:00"
}
HTTP status codes follow REST conventions:
| HTTP Status | Usage |
|---|---|
| 200 | Successful query |
| 201 | Resource created (payment initiated) |
| 400 | Validation error |
| 401 | Authentication failure |
| 403 | Authorisation failure (valid token, insufficient scope) |
| 404 | Resource not found |
| 409 | Conflict (idempotency key reuse with different body) |
| 422 | Business rule violation (amount exceeds limit) |
| 429 | Rate limited (includes Retry-After header) |
| 500 | Internal server error |
| 503 | Service unavailable (partner down, circuit open) |
The legacy status: "0000" numeric codes are preserved in the status field for backward compatibility during migration, but the primary identifier is the machine-readable code string.
3.3 Idempotency Implementation (Mandatory, Fail-Closed)¶
Rules:
1. X-Idempotency-Key header is mandatory on all POST endpoints. Missing key returns 400.
2. Key format: UUID v4, max 40 characters (Open Banking compatible).
3. Deduplication window: 48 hours.
4. Same key + same body = return cached response with original HTTP status.
5. Same key + different body = return 409 Conflict.
6. Fail-closed: If the SurrealDB persistent store is unreachable, the request is rejected with 503 -- never processed without idempotency protection. This directly addresses legacy finding W-04.
7. Idempotency keys are scoped per merchant (merchant A's key "abc" does not conflict with merchant B's key "abc").
Implementation in Go middleware (sdk/idempotency):
Request arrives -> Extract X-Idempotency-Key
-> If missing: 400 Bad Request
-> Query SurrealDB (persistent) for (merchant_id, key)
-> If store unreachable: 503 Service Unavailable (FAIL CLOSED)
-> If key exists:
-> Compare request body hash
-> Match: return cached response
-> Mismatch: 409 Conflict
-> If key not found:
-> Store key with status "processing"
-> Process request
-> Store response with 48h TTL (cleaned up by sdk/cleanup janitor)
-> Return response
3.4 Pagination (Cursor-Based)¶
All list endpoints use cursor-based pagination, not offset/page-number. This performs better at scale and avoids the "drifting page" problem with concurrent inserts.
- Cursor is a
datetimevalue (theinitiated_atorcreated_atof the last item) - Default page size: 25
- Maximum page size: 100
- Response includes
links.nextwith pre-built URL - Response includes
meta.has_moreboolean
3.5 Versioning Strategy¶
URL path versioning: /api/v1/payin/..., /api/v1/payout/...
- Major versions only (v1, v2). No minor versions.
- New version only on breaking changes.
- Old versions supported for minimum 12 months after deprecation announcement.
- Deprecation signalled via
Sunsetheader andDeprecationheader on responses. - KrakenD routes different versions to different service instances (blue/green per version).
3.6 Webhook Design¶
Event types:
payment.initiated
payment.otp_sent
payment.otp_verified
payment.processing
payment.completed
payment.failed
payment.reversed
payment.expired
payment.on_hold
payment.aml_review
settlement.calculated
settlement.paid
Webhook payload:
{
"id": "evt_01HXYZ...",
"event": "payment.completed",
"timestamp": "2026-04-03T14:35:00+05:00",
"data": {
"transaction_id": "tx_01HXYZ...",
"reference": "merchant-ref-001",
"status": "completed",
"amount": "1500.00",
"currency": "PKR",
"operator": "easypaisa",
"partner_reference": "EP-20260403-12345",
"completed_at": "2026-04-03T14:35:00+05:00"
}
}
HMAC signing:
- Algorithm: HMAC-SHA256
- Key: per-merchant webhook secret (generated on registration, rotatable)
- Signed content: timestamp.raw_body (timestamp prevents replay attacks)
- Header: X-Simpaisa-Signature: t=1712144100,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd
- Merchants verify by computing HMAC of {timestamp}.{raw_body} with their secret
Retry with exponential backoff:
| Attempt | Delay | Cumulative |
|---|---|---|
| 1 | Immediate | 0s |
| 2 | 1 minute | 1m |
| 3 | 5 minutes | 6m |
| 4 | 30 minutes | 36m |
| 5 | 2 hours | 2h 36m |
After 5 failed attempts, the delivery moves to dead-letter status. Merchants can query a dead-letter endpoint to retrieve missed webhooks. Dead-letter webhooks are retained for 30 days.
4. Security Architecture¶
4.1 OAuth 2.0 Flow for Merchants¶
Grant type: Client Credentials (RFC 6749 Section 4.4). This is server-to-server, no user interaction.
1. Merchant registers via onboarding (gets client_id + client_secret)
2. Merchant calls POST /api/v1/auth/token
Authorization: Basic base64(client_id:client_secret)
Content-Type: application/x-www-form-urlencoded
Body: grant_type=client_credentials&scope=payin:write payin:read
3. phoenix-auth validates credentials against merchant_credential table
4. Issues JWT (RS256) with claims (see 4.2)
5. Token response:
{
"access_token": "eyJhbGciOi...",
"token_type": "Bearer",
"expires_in": 3600,
"scope": "payin:write payin:read"
}
6. Merchant includes token in subsequent requests:
Authorization: Bearer eyJhbGciOi...
Token lifetime: 1 hour. No refresh tokens (Client Credentials flow -- merchant re-authenticates). Short-lived tokens reduce the blast radius of token compromise.
4.2 JWT Claims Structure¶
{
"iss": "https://auth.simpaisa.com",
"sub": "merchant:2000001",
"aud": "https://api.simpaisa.com",
"exp": 1712147700,
"iat": 1712144100,
"jti": "jwt_01HXYZ...",
"scope": "payin:write payin:read payout:read",
"merchant_id": "2000001",
"merchant_name": "Acme Corp",
"country": "PK",
"tier": "premium",
"products": ["payin", "payout"]
}
Scopes follow the pattern {product}:{action}:
- payin:write -- initiate pay-in transactions
- payin:read -- query pay-in transactions and status
- payout:write -- initiate disbursements
- payout:read -- query disbursements
- remittance:write, remittance:read
- card:write, card:read
- webhook:manage -- manage webhook configuration
- settlement:read -- query settlement reports
4.3 mTLS at KrakenD¶
- KrakenD terminates mTLS from merchants
- Merchant client certificates issued by Simpaisa's private CA (managed via Vault PKI secrets engine)
- Certificate CN must match the
client_idin the JWT - KrakenD validates: certificate chain, expiry, revocation (CRL/OCSP), CN match
- Internal services behind KrakenD communicate over the controlplane.com service mesh (Envoy mTLS)
- mTLS is mandatory for all production merchant connections. Sandbox allows TLS-only for easier testing.
4.4 WAF Rules (Cloudflare)¶
Cloudflare sits in front of KrakenD. Rules:
| Rule | Purpose |
|---|---|
| Rate limiting (L7) | First line of defence before KrakenD's application-level rate limits |
| Bot management | Block automated scanning, credential stuffing |
| Geo-blocking | Only allow traffic from countries where Simpaisa operates (PK, BD, NP, IQ, AE, EG) + merchant-registered IPs |
| Request size limit | 256 KB max body (payment requests are small) |
| SQL injection / XSS | Managed ruleset (defence in depth, even though API is JSON-only) |
| TLS 1.2 minimum | Block TLS 1.0/1.1 |
| DDoS protection | Cloudflare's standard L3/L4/L7 DDoS mitigation |
| IP reputation | Block known-bad IPs from threat intelligence |
| Custom rules | Block requests missing required headers (Authorization, Content-Type) |
4.5 Secrets Management¶
HashiCorp Vault (already partially adopted in legacy) becomes the single source for all secrets:
| Secret Type | Vault Path | Rotation |
|---|---|---|
| Partner API credentials | secret/phoenix/{env}/partners/{partner_name} |
Manual, per partner requirement |
| Merchant webhook secrets | secret/phoenix/{env}/merchants/{id}/webhook |
On merchant request |
| JWT signing keys (RS256) | transit/phoenix/jwt-signing |
90-day automatic rotation |
| Database credentials | database/phoenix/{env} |
Dynamic, 24-hour lease |
| TLS certificates | pki/phoenix/{env} |
Automatic, 30-day renewal |
| Encryption keys (AES-256) | transit/phoenix/data-encryption |
Annual rotation with key versioning |
Go services authenticate to Vault via Kubernetes auth method (controlplane.com provides the service account JWT). No credentials in environment variables, no credentials in code, no credentials in Git. Ever.
4.6 Encryption Standards¶
| Purpose | Algorithm | Notes |
|---|---|---|
| Data at rest (field-level) | AES-256-GCM | Random 12-byte IV per operation. Replaces legacy AES-ECB. |
| Data in transit | TLS 1.2+ | Enforced at Cloudflare and KrakenD |
| Request signing | RSA-2048 with SHA-256 | For merchants that require request signing |
| Key wrapping | RSA-OAEP with SHA-256 | Replaces legacy PKCS1v1.5 |
| Webhook signing | HMAC-SHA256 | Per-merchant shared secret |
| Password hashing | Argon2id | For any stored credentials |
| Token hashing | SHA-256 | Client secrets stored as hashes |
| PAN tokenisation | Vault Transit engine | Raw PAN never enters application layer. Direct response to legacy finding C-05. |
Critical rule: No raw PAN data passes through Phoenix services. Card payments use Vault's tokenisation or the acquirer's hosted tokenisation. The application only handles token references.
5. Payment Flow Design¶
5.1 Pay-In Flow (Wallet OTP + Direct Charge)¶
Single Charge (OTP Flow):
Merchant KrakenD phoenix-payin SurrealDB Operator (EP/JC)
| | | | |
|--POST /api/v1/payin/---->| | | |
| transactions/initiate | | | |
| X-Idempotency-Key: abc |--validate JWT---->| | |
| |--rate limit OK---->| | |
| | |--check idemp.--->| |
| | | key "abc" | |
| | |<--not found------| |
| | |--validate merch->| |
| | | config, limits | |
| | |<--config---------| |
| | |--CREATE tx------>| |
| | | status:initiated| |
| | | | |
| | |--IF otp_required: |
| | | send OTP via operator API---------->|
| | | store OTP state in cache |
| | | UPDATE tx -> awaiting_authorisation |
| | | | |
|<---201 {status:awaiting_authorisation, tx_id}| | |
| | | | |
|--POST /api/v1/payin/---->| | | |
| transactions/{id}/verify| | | |
| {otp: "123456"} | | | |
| | |--check OTP state>| |
| | | verify attempts | |
| | |--UPDATE tx ----->| |
| | | -> processing | |
| | |--call operator charge API----------->|
| | |<--operator response-----------------|
| | |--UPDATE tx ----->| |
| | | -> completed | |
| | |--publish NSQ --->| payment.completed|
|<---200 {status:completed}| | | |
| | | | |
|<---WEBHOOK POST (async)--|---phoenix-webhook--| | |
Direct Charge (Tokenised/Recurring):
Same flow but skips OTP. Uses stored operator_token from a previous initial charge. The transactionType field (legacy values: 0=single, 1=initial, 8=recurring, 9=subscription) maps to:
- type: "single" -- one-time charge with OTP
- type: "initial" -- first charge that creates a token
- type: "recurring" -- subsequent charge using token (no OTP)
- type: "subscription" -- scheduled recurring charge
5.2 Payment State Machine¶
Note: Each service uses its own domain-specific table (payment, disbursement, transfer) but shares the same state machine structure. State names match the Go code exactly.
+---> expired
| (TTL exceeded)
|
initiated ---> awaiting_authorisation ---> processing ---> pending_partner
| | |
| | +-----+-----+
| v | |
| failed completed failed
| (max OTP attempts) | |
| v v
+---> failed refunded (terminal)
(validation failure) partially_refunded
processing ---> on_hold ---> processing (resumed)
|
+---> aml_review ---> processing | failed
|
+---> stuck ---> failed (after max retries)
|
+---> cancelled (merchant-initiated cancellation)
State glossary (code-canonical names):
| State | Description |
|---|---|
initiated |
Payment record created, validation passed |
awaiting_authorisation |
OTP sent, waiting for customer verification (replaces the earlier pending_otp draft name) |
processing |
OTP verified or direct charge; partner call in flight |
pending_partner |
Ambiguous partner response; inquiry polling active |
completed |
Partner confirmed success |
failed |
Terminal failure |
cancelled |
Cancelled by merchant before completion |
reversed |
Reversal confirmed by partner |
refunded |
Full refund confirmed |
partially_refunded |
Partial refund confirmed |
expired |
TTL exceeded before completion |
on_hold |
Manually held for review |
aml_review |
Flagged for AML checks |
stuck |
Exceeded maximum retries; requires manual intervention |
Rules: - Every state transition is atomic (SurrealDB transaction + state log insert) - Optimistic locking: UPDATE only succeeds if current status matches expected status - Only valid transitions are permitted (enforced by a state machine in Go code, not just database constraints) - Every transition produces an NSQ event for webhook delivery
5.3 Partner Integration Abstraction (Adapter Pattern)¶
// sdk/partner/adapter.go
// Adapter is the interface every partner integration must implement.
type Adapter interface {
// Initiate starts a payment with the partner.
Initiate(ctx context.Context, req *InitiateRequest) (*InitiateResponse, error)
// Verify confirms a payment (e.g., OTP verification).
Verify(ctx context.Context, req *VerifyRequest) (*VerifyResponse, error)
// Inquiry checks the status of a payment at the partner.
Inquiry(ctx context.Context, req *InquiryRequest) (*InquiryResponse, error)
// Reverse requests a reversal/refund.
Reverse(ctx context.Context, req *ReverseRequest) (*ReverseResponse, error)
// Name returns the partner identifier (e.g., "easypaisa", "jazzcash").
Name() string
// HealthCheck verifies the partner API is reachable.
HealthCheck(ctx context.Context) error
}
Each partner gets its own implementation file (e.g., phoenix-payin/internal/adapter/easypaisa.go). The adapter:
- Handles protocol differences (REST vs SOAP -- 1Link uses SOAP)
- Maps Simpaisa's canonical request/response to the partner's format
- Manages partner-specific authentication (Easypaisa storeId, JazzCash client credentials, 1Link certificate-based)
- Logs all partner API calls to partner_api_log table
- Wraps calls in a circuit breaker (using sony/gobreaker)
Adapter registry resolves the correct adapter at runtime based on operator code:
// phoenix-payin/internal/adapter/registry.go
func (r *Registry) Get(operator string) (partner.Adapter, error) {
switch operator {
case "easypaisa":
return r.easypaisa, nil
case "jazzcash":
return r.jazzcash, nil
// ...
default:
return nil, fmt.Errorf("unsupported operator: %s", operator)
}
}
5.4 Retry and Reconciliation Strategy¶
Partner call retries:
- Circuit breaker per partner: open after 5 consecutive failures, half-open after 30 seconds
- Retries within a single request: max 2 retries with 1s, 3s backoff (only for network errors, never for business errors)
- If partner returns ambiguous response (timeout, 5xx): transition to pending_partner, trigger inquiry
Reconciliation:
- phoenix-reconciliation runs a scheduled job (configurable, default hourly)
- Queries all transactions in pending_partner or processing state older than the expected partner SLA
- Calls the partner's Inquiry API to get definitive status
- Updates transaction state based on partner response
- Flags transactions stuck beyond 2x SLA for manual review
- Daily settlement calculation: aggregates completed transactions per merchant per operator per day
5.5 Settlement Flow¶
Daily cron (02:00 PKT) -> phoenix-reconciliation
|
|-- Query: all completed transactions for previous day, grouped by merchant + operator
|-- Calculate: gross amount, fees (per product_config), net amount
|-- CREATE settlement record (status: calculating -> pending)
|-- Generate settlement report (stored in SurrealDB)
|-- Publish NSQ: settlement.calculated
|
|-- Manual approval step (via internal admin API) -> status: approved
|-- Bank transfer initiated -> status: paid
|-- Publish NSQ: settlement.paid -> webhook to merchant
6. Configuration and Feature Flags¶
6.1 Configuration Hierarchy¶
Configuration follows a four-tier hierarchy with cascading overrides:
Tier 1: Defaults (compiled into sdk/config)
↓ overridden by
Tier 2: Environment (env vars, per controlplane.com workload)
↓ overridden by
Tier 3: Country (SurrealDB country_config table)
↓ overridden by
Tier 4: Merchant (SurrealDB product_config table)
Example: OTP expiry - Default: 300 seconds - Pakistan override: 180 seconds (SBP recommendation) - Merchant "Acme Corp" override: 120 seconds (merchant's preference)
6.2 Feature Flag System Design¶
DEFINE TABLE feature_flag SCHEMAFULL;
DEFINE FIELD key ON feature_flag TYPE string;
DEFINE FIELD description ON feature_flag TYPE string;
DEFINE FIELD default_value ON feature_flag TYPE bool DEFAULT false;
DEFINE FIELD overrides ON feature_flag TYPE array<object>;
-- overrides: [{scope: "country", value: "PK", enabled: true},
-- {scope: "merchant", value: "2000001", enabled: false}]
DEFINE FIELD created_at ON feature_flag TYPE datetime DEFAULT time::now();
DEFINE FIELD updated_at ON feature_flag TYPE datetime VALUE time::now();
DEFINE INDEX ff_key ON feature_flag FIELDS key UNIQUE;
Resolution order: 1. Check merchant-specific override 2. Check country-specific override 3. Check operator-specific override 4. Fall back to default value
Example flags:
| Flag Key | Purpose | Default |
|---|---|---|
payin.recurring.enabled |
Enable recurring/tokenised payments | true |
payout.batch.enabled |
Enable batch disbursement API | false |
remittance.aml_review.auto_approve |
Auto-approve AML review below threshold | false |
webhook.retry.max_attempts |
Override max webhook retry attempts | 5 |
partner.easypaisa.direct_charge |
Enable Easypaisa direct charge (non-OTP) | true |
6.3 Environment-Based Config¶
| Variable | Dev | Staging | Production |
|---|---|---|---|
SURREAL_URL |
ws://localhost:8000 |
ws://surreal-staging:8000 |
ws://surreal-prod:8000 |
SURREAL_NS |
phoenix_dev |
phoenix_staging |
phoenix |
NSQ_LOOKUPD |
localhost:4161 |
nsqlookupd-staging:4161 |
nsqlookupd-prod:4161 |
VAULT_ADDR |
http://localhost:8200 |
https://vault-staging |
https://vault-prod |
LOG_LEVEL |
debug |
info |
warn |
OTEL_EXPORTER |
stdout |
jaeger-staging:4317 |
jaeger-prod:4317 |
ENV |
dev |
staging |
production |
6.4 Hot-Reload Without Restart¶
Configuration stored in SurrealDB supports hot-reload:
phoenix-merchantexposes an internal gRPC endpointConfigService.Reload()(planned; currently services read config directly from SurrealDB at startup)- Each service subscribes to a SurrealDB
LIVE SELECTon theproduct_configandfeature_flagtables - On change, the in-memory config cache is invalidated and reloaded
- Config changes take effect within 5 seconds (SurrealDB LIVE query push + local cache refresh)
- No service restart, no redeployment
For KrakenD configuration (rate limits, routing), changes require a KrakenD config reload. KrakenD supports SIGUSR1 for config reload without downtime. In controlplane.com, this is triggered by updating the KrakenD configmap and sending the signal.
7. Deployment Architecture¶
7.1 controlplane.com Service Topology¶
Internet
|
[Cloudflare Edge]
WAF, DDoS, Geo-block
|
[KrakenD Gateway]
JWT, mTLS, Rate Limit
(2 replicas, HA)
|
+------------+------------+
| | |
[phoenix-auth] [phoenix-merchant] [phoenix-webhook]
(2 replicas) (2 replicas) (2 replicas)
| | |
+-----+-----+-----+-----+------+
| | |
[phoenix-payin] [phoenix-payout] [phoenix-remittance]
(3 replicas) (2 replicas) (2 replicas)
| | |
+-----+-----+-----+-----+
| |
[SurrealDB [SurrealDB
Persistent In-Memory Cache]
RocksDB] (2 replicas, OTP/FX only)
(2 replicas)
[local NVMe SSD per replica]
[NSQ Cluster]
nsqd (3) + nsqlookupd (3)
[Vault]
(HA, 3 nodes)
[Jaeger + Prometheus]
Observability stack
Each workload on controlplane.com:
- Auto-scaling based on CPU/memory (configurable min/max replicas)
- Health checks via /healthz (liveness) and /readyz (readiness)
- Resource limits enforced (prevents noisy neighbour)
- Envoy sidecar for inter-service mTLS
7.2 Cloudflare Edge Config¶
- DNS:
api.simpaisa.com(single domain, replacing 7 legacy domains) - SSL/TLS: Full (strict) mode, minimum TLS 1.2
- Caching: No caching (all requests are dynamic payment operations)
- Page rules: Force HTTPS, security headers (HSTS, X-Content-Type-Options, X-Frame-Options)
- Workers: Optional -- could add request transformation at edge if needed
- Load balancing: Cloudflare LB with health checks to KrakenD backend
- Argo Smart Routing: Enabled for optimal path to origin (reduces latency for PK/BD/NP/EG traffic)
7.3 NSQ Cluster¶
| Component | Instances | Purpose |
|---|---|---|
nsqd |
3 | Message storage and delivery |
nsqlookupd |
3 | Service discovery for consumers |
nsqadmin |
1 | Monitoring UI |
NSQ configuration:
- --mem-queue-size=10000 -- messages in memory before spilling to disk
- --max-msg-size=1048576 -- 1 MB max message (payment events are small)
- --msg-timeout=300s -- 5 minute processing timeout
- --max-req-timeout=3600s -- max requeue delay for retries
7.4 Multi-Region Considerations¶
Phase 1 (Pakistan only): - All infrastructure in a single region (PK or nearest -- likely via VPN tunnel to PK-based hosting or Karachi-based cloud) - controlplane.com workloads pinned to PK region - SBP data residency: all PK transaction data stays in PK infrastructure
Phase 2 (Bangladesh/Nepal/Egypt):
- Separate SurrealDB (RocksDB) instances per country
- Shared KrakenD gateway with geo-routing
- Shared phoenix-auth and phoenix-merchant (these don't hold transaction data)
- Country-specific phoenix-payin deployments with country-specific partner adapters
- SurrealDB namespace isolation: phoenix_pk, phoenix_bd, phoenix_np, phoenix_eg
- EG infrastructure provisioned in Egyptian or nearest-compliant region; EGP currency support via sdk/money
Phase 3 (Multi-region HA): - Active-passive per country (failover to secondary region within same country) - No cross-country data replication (regulatory requirement) - Global control plane for configuration and monitoring
8. Migration Strategy¶
8.1 How Merchants Migrate from Legacy to Phoenix per Product¶
Principle: Gradual, per-merchant, per-product migration. No big bang.
Phase A: Shadow Mode
- Phoenix runs in parallel, receiving copies of production traffic
- Responses are logged but NOT returned to merchants
- Compare Phoenix responses with legacy responses
- Duration: 2-4 weeks per product
Phase B: Canary Migration
- Select 2-3 low-volume merchants per product
- Route their traffic to Phoenix
- Legacy remains available as fallback
- Duration: 2-4 weeks
Phase C: Progressive Rollout
- Migrate merchants in batches of 5-10
- Monitor error rates, latency, settlement accuracy
- Any merchant can be rolled back to legacy within minutes
Phase D: Legacy Sunset
- Once all merchants are on Phoenix for a product
- Legacy service enters read-only mode (queries only)
- After 90 days, legacy service is decommissioned
8.2 API Compatibility Layer / Translation Proxy¶
A translation proxy sits in front of Phoenix and translates legacy API requests to Phoenix format:
Legacy merchant request Translation proxy Phoenix
POST /v2/wallets/transaction/initiate -> maps to -> POST /api/v1/payin/transactions/initiate
{ {
"merchantId": "1000001", "data": {
"amount": "1", "reference": "<generated>",
"msisdn": "34XXXXXXX", "amount": "1.00",
"operatorId": "100007", "currency": "PKR",
"transactionType": "0" "operator": "easypaisa",
} "payer": { "msisdn": "34XXXXXXX" },
"type": "single"
}
}
The translation proxy:
- Lives as a KrakenD plugin or lightweight Go service
- Maps legacy URLs to Phoenix URLs
- Maps legacy field names to Phoenix field names
- Maps Phoenix error responses back to legacy format (numeric status codes, flat JSON)
- Generates X-Idempotency-Key from legacy Request-Id or creates one if missing
- Authenticates legacy merchants using their existing credentials (mapped to OAuth tokens internally)
This allows merchants to migrate at their own pace. Eager merchants adopt the new API directly; laggards use the compatibility layer indefinitely.
8.3 Rollback Strategy¶
- KrakenD routing rules control which backend (legacy vs Phoenix) receives traffic per merchant
- Rollback is a configuration change in KrakenD, not a code deployment
- Takes effect in seconds
- Transaction data created in Phoenix during the rollout period is retained (not lost on rollback)
- If rollback occurs, the translation proxy routes back to legacy for that merchant
8.4 Data Migration Approach¶
Principle: Phoenix starts with a clean database. Historical data is not migrated.
- Active transactions (in-flight at migration time) complete on the legacy system
- Historical transaction data remains queryable in legacy MySQL (read-only)
- A read-only legacy query API is maintained for 12 months post-migration for merchants that need historical data
- Merchant configuration is migrated proactively:
phoenix-merchantis seeded with all 40 merchant configs before any traffic migration - Operator tokens (for recurring payments) are migrated per merchant when they move to Phoenix
9. Development Workflow¶
9.1 Repo Structure (Monorepo)¶
All services live in a single GitHub monorepo (github.com/doreilly257/sp-apis) using Go workspace-style replace directives so each service module can reference the shared SDK locally without a published module registry.
github.com/doreilly257/sp-apis (monorepo root)
├── sdk/ -- Shared Go module (go.simpaisa.com/phoenix-sdk)
├── services/
│ ├── auth/ -- OAuth 2.0 service
│ ├── merchant/ -- Merchant management service (+ gRPC)
│ ├── payin/ -- Pay-In service
│ ├── payout/ -- Pay-Out service
│ ├── remittance/ -- Remittance service
│ ├── card/ -- Card payment service
│ └── webhook/ -- Webhook delivery service
├── infra/ -- docker-compose, KrakenD config, Prometheus config
├── migrations/ -- SurrealDB schema scripts
├── specs/ -- OpenAPI / protobuf definitions
├── build/ -- Build tooling
└── docs/ -- Architecture and technical documentation
Each service has its own go.mod with a replace github.com/doreilly257/sp-apis/sdk => ../../sdk directive. This provides module isolation (each service can be built, tested, and containerised independently) while keeping all code in one place for ease of cross-service refactoring and a single CI/CD pipeline.
Note: The original architecture envisioned separate Bitbucket repos per service. The actual implementation uses a GitHub monorepo. The Bitbucket Pipelines CI/CD section below describes the intended pipeline shape; the actual CI configuration will be GitHub Actions or equivalent.
9.2 Individual Service Layout¶
Each service uses a flat, idiomatic Go layout. go-kratos conventions (biz/, data/, server/, conf/) are not used — only phoenix-merchant uses go-kratos (for its gRPC server). All other services use Echo v4.15 as the HTTP framework with manual dependency wiring in main.go. There is no Wire code generation.
services/remittance/ -- representative example
cmd/
main.go -- Entry point; manual dependency injection (no Wire)
internal/
adapter/ -- Partner adapter implementations
bankofasia.go
faysalbank.go
registry.go -- Adapter registry
config/ -- Configuration loading (env vars)
config.go
event/ -- NSQ producer/consumer wiring
publisher.go
handler/ -- Echo HTTP handler layer (request parsing, response marshalling)
transfer.go
middleware/ -- Echo middleware (auth, correlation ID, logging)
auth.go
model/ -- Domain model structs, status constants, state machine
models.go
repository/ -- SurrealDB data access layer
transfer.go
service/ -- Business logic layer
transfer.go
go.mod
go.sum
phoenix-merchant differs — it uses go-kratos for its gRPC server:
services/merchant/
cmd/
main.go
internal/
config/
grpc/ -- go-kratos gRPC server + protobuf handlers
handler/ -- Echo HTTP handlers
model/
repository/
service/
go.mod
go.sum
9.3 Shared Go Module for Common Code¶
phoenix-sdk lives at sdk/ in the monorepo. Each service references it via a replace directive in its go.mod:
replace github.com/doreilly257/sp-apis/sdk => ../../sdk
This means no external module registry is required during development. When the platform matures and services need to be built independently, the SDK can be published to a module proxy (e.g., go.simpaisa.com/phoenix-sdk) and the replace directives removed.
Versioning (current): All services and the SDK are developed in lockstep within the monorepo. SDK changes are immediately visible to all services without a go get update cycle.
Versioning (future, if split): Semantic versioning with services pinning to explicit tags, updated via go get -u go.simpaisa.com/[email protected].
9.4 CI/CD Pipeline Design (Outline)¶
Pipeline (deferred implementation; the monorepo structure means a single pipeline with per-service jobs):
# Per-service pipeline (e.g., phoenix-payin)
pipelines:
default:
- step:
name: Lint & Test
script:
- go vet ./...
- golangci-lint run
- go test -race -coverprofile=coverage.out ./...
- go tool cover -func=coverage.out
services:
- surrealdb # testcontainers for integration tests
- step:
name: Build
script:
- docker build -t phoenix-payin:${BITBUCKET_COMMIT} .
- step:
name: Push to Registry
deployment: staging
script:
- docker push registry.simpaisa.com/phoenix-payin:${BITBUCKET_COMMIT}
- step:
name: Deploy to Staging
deployment: staging
trigger: manual
script:
- cpln workload update phoenix-payin --image registry.simpaisa.com/phoenix-payin:${BITBUCKET_COMMIT}
Quality gates (enforced in CI):
- go vet and golangci-lint pass
- Test coverage >= 80%
- No nosec annotations without comment
- Docker image scan (Trivy)
- Protobuf backward compatibility check (buf breaking)
9.5 Developer Local Setup¶
phoenix-infra/docker-compose.yml provides the full local stack:
services:
surrealdb:
image: surrealdb/surrealdb:latest
command: start --user root --pass root memory
ports: ["8000:8000"]
surrealdb-persistent:
image: surrealdb/surrealdb:latest
command: start --user root --pass root file:/data/surreal.db
ports: ["8001:8000"]
volumes: ["surreal-data:/data"]
nsqd:
image: nsqio/nsq
command: /nsqd --lookupd-tcp-address=nsqlookupd:4160
ports: ["4150:4150", "4151:4151"]
nsqlookupd:
image: nsqio/nsq
command: /nsqlookupd
ports: ["4160:4160", "4161:4161"]
nsqadmin:
image: nsqio/nsq
command: /nsqadmin --lookupd-http-address=nsqlookupd:4161
ports: ["4171:4171"]
vault:
image: hashicorp/vault:latest
environment:
VAULT_DEV_ROOT_TOKEN_ID: "dev-root-token"
ports: ["8200:8200"]
cap_add: [IPC_LOCK]
jaeger:
image: jaegertracing/all-in-one:latest
ports: ["16686:16686", "4317:4317"]
prometheus:
image: prom/prometheus:latest
ports: ["9090:9090"]
volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]
krakend:
image: devopsfaith/krakend:latest
ports: ["8080:8080"]
volumes: ["./krakend.json:/etc/krakend/krakend.json"]
Developer workflow:
1. docker compose up -d -- starts all infrastructure
2. make seed -- runs SurrealDB schema + test data
3. make run -- starts the service with hot-reload (using air)
4. Service runs on localhost:8080 via KrakenD, or directly on its own port for debugging
10. Phased Delivery Plan¶
Phase 1: Foundation (Weeks 1-8)¶
Goal: Shared libraries, gateway, auth, SurrealDB schema, and CI/CD scaffolding. No payment processing yet.
| Week | Deliverable | Owner |
|---|---|---|
| 1-2 | phoenix-sdk v0.1: envelope, money, crypto, validation, observability packages |
CDO + Claude |
| 1-2 | Protobuf definitions for merchant gRPC service (in specs/) |
CDO + Claude |
| 2-3 | infra/: docker-compose, SurrealDB schema scripts, Vault dev setup |
CDO |
| 3-4 | phoenix-auth: OAuth 2.0 token issuance, JWT RS256, merchant credential CRUD |
Junior Go dev 1 + CDO |
| 3-4 | phoenix-merchant: merchant CRUD, product config, feature flags, gRPC service |
Junior Go dev 2 + CDO |
| 5-6 | phoenix-gateway: KrakenD configuration -- JWT validation, rate limiting, mTLS, routing |
CDO |
| 5-6 | phoenix-sdk v0.2: idempotency middleware, SurrealDB client, NSQ wrappers |
CDO + Claude |
| 7-8 | phoenix-webhook: webhook delivery engine, HMAC signing, retry, dead-letter |
Mid Java dev (learning Go) + CDO |
| 7-8 | Integration testing: auth -> gateway -> merchant -> webhook end-to-end | All |
| 8 | controlplane.com staging deployment, Cloudflare DNS setup | CDO |
Milestone: A merchant can authenticate via OAuth 2.0, hit the gateway, and receive a properly formatted error response ("no such endpoint -- payin not yet deployed"). Webhook infrastructure is ready.
Phase 2: Pay-In API (Weeks 9-18)¶
Goal: Full Pay-In service replacing legacy wallet. First live merchant traffic.
| Week | Deliverable | Owner |
|---|---|---|
| 9-10 | phoenix-payin scaffolding: Echo handlers, SurrealDB repo, NSQ integration |
Junior Go dev 1 + CDO |
| 10-12 | Easypaisa adapter: initiate, verify (OTP), inquiry, direct charge | Junior Go dev 1 + Mid Java dev |
| 10-12 | JazzCash adapter | Junior Go dev 2 + CDO |
| 12-13 | HBL Konnect, Alfa, JSBL Zindagi adapters | Mid Java dev + Junior Go dev 2 |
| 13-14 | Transaction state machine, OTP flow, recurring/tokenisation | CDO + Claude |
| 14-15 | Translation proxy for legacy Pay-In API compatibility | CDO |
| 15-16 | Shadow mode: mirror production traffic to Phoenix, compare responses | All |
| 16-17 | Canary: 2-3 test merchants on Phoenix Pay-In | CDO |
| 17-18 | Progressive rollout to remaining Pay-In merchants | CDO |
Milestone: All 40 Pay-In merchants on Phoenix. Legacy wallet service in read-only mode.
Phase 3: Pay-Out API (Weeks 19-26)¶
Goal: Full Pay-Out service replacing legacy disbursement stack.
| Week | Deliverable | Owner |
|---|---|---|
| 19-20 | phoenix-payout scaffolding, phoenix-reconciliation scaffolding |
Junior devs + CDO |
| 20-22 | 1Link IBFT adapter, Easypaisa disbursement adapter, JazzCash disbursement adapter | 2 devs |
| 22-23 | HBL adapter (replacing the legacy pass-through proxy) | 1 dev + CDO |
| 23-24 | Batch disbursement processing (replacing legacy scheduler) | CDO |
| 24-25 | Settlement calculation and reporting | Mid Java dev |
| 25-26 | Shadow mode, canary, progressive rollout | All |
Milestone: All Pay-Out merchants on Phoenix. Legacy disbursement-gateway, disbursement-scheduler, and disbursement-scheduler-sunshine decommissioned.
Phase 4: Remittance API (Weeks 27-34)¶
Goal: Full Remittance service replacing legacy sp-remittance-consumer and retry scheduler.
| Week | Deliverable | Owner |
|---|---|---|
| 27-28 | phoenix-remittance scaffolding, FX quote management |
Junior devs |
| 28-30 | Bank of Asia adapter, Faysal Bank adapter | 2 devs |
| 30-31 | Trust Bank adapter, 1Link adapter | 2 devs |
| 31-32 | AML review workflow, multi-corridor routing | CDO |
| 32-33 | Bangladesh/Nepal corridor configuration | CDO + dev |
| 33-34 | Shadow, canary, rollout | All |
Milestone: All remittance corridors on Phoenix. Legacy remittance services decommissioned.
Phase 5: Card API (Weeks 35-42)¶
Goal: Full Card service replacing legacy cardbackend.
| Week | Deliverable | Owner |
|---|---|---|
| 35-36 | phoenix-card scaffolding, Vault transit tokenisation setup |
CDO + dev |
| 36-38 | Alfalah MasterCard 3DS adapter (using Vault for PAN tokenisation -- no raw PAN in application) | 2 devs |
| 38-39 | Safepay adapter | 1 dev |
| 39-40 | Capture, void, refund flows | 1 dev |
| 40-41 | PCI-DSS scope review (should be minimal with Vault tokenisation) | CDO |
| 41-42 | Shadow, canary, rollout | All |
Milestone: All card merchants on Phoenix. Legacy cardbackend decommissioned. PCI-DSS scope significantly reduced.
Timeline Summary¶
| Phase | Duration | Calendar (from start) | Key Risk |
|---|---|---|---|
| Phase 1: Foundation | 8 weeks | Weeks 1-8 | Team learning Go; SDK design decisions have cascading impact |
| Phase 2: Pay-In | 10 weeks | Weeks 9-18 | First live traffic; partner adapter complexity (5 operators) |
| Phase 3: Pay-Out | 8 weeks | Weeks 19-26 | Batch processing reliability; settlement accuracy |
| Phase 4: Remittance | 8 weeks | Weeks 27-34 | Cross-border complexity; AML; FX; multiple bank APIs (incl. SOAP) |
| Phase 5: Card | 8 weeks | Weeks 35-42 | PCI-DSS compliance; 3DS flow complexity |
Total: approximately 42 weeks (10.5 months) from first commit to full migration.
Risk Mitigation¶
| Risk | Mitigation |
|---|---|
| Junior devs slow to learn Go | Pair programming with Claude. Phase 1 is intentionally foundational -- learning Go while building non-critical shared libraries. |
| SurrealDB immaturity (newer database) | Extensive integration testing via testcontainers. RocksDB (the embedded backend) is battle-tested and widely used. Maintain a fallback plan to PostgreSQL if SurrealDB proves unreliable. |
| Partner API changes during migration | Adapter pattern isolates partner changes. Legacy and Phoenix can coexist indefinitely. |
| No dedicated DevOps | controlplane.com is managed Kubernetes -- reduces ops burden. Cloudflare is managed CDN/WAF. Infrastructure-as-code in phoenix-infra repo. CDO handles infra directly. |
| 2000 TPS target | Go's concurrency model (goroutines) handles this comfortably. SurrealDB with RocksDB handles >10K reads/sec on a single node. KrakenD handles 70K+ req/sec. Bottleneck is partner APIs, not Phoenix. |
Appendix A: Legacy to Phoenix Mapping¶
| Legacy Service | Legacy Repo | Phoenix Service | Notes |
|---|---|---|---|
| wallet | simpaisa1/wallet |
phoenix-payin | Consolidates 5 operator integrations |
| cardbackend | simpaisa1/cardbackend |
phoenix-card | PAN tokenised via Vault, not in-app |
| card-redirection-app | simpaisa1/card-redirection-app |
phoenix-card | 3DS redirect handled in-service |
| auto-void-scheduler | simpaisa1/auto-void-scheduler |
phoenix-card | Scheduled job within service |
| sp-card-refund-reversal | simpaisa1/sp-card-refund-reversal |
phoenix-card | Refund/reversal within service |
| sp-remittance-consumer | simpaisa1/sp-remittance-consumer |
phoenix-remittance | NSQ replaces Kafka |
| sp-remittance-scheduler-retry | simpaisa1/sp-remittance-scheduler-retry |
phoenix-remittance | Built-in retry, no separate scheduler |
| disbursement-gateway | simpaisa1/disbursement-gateway |
phoenix-gateway (KrakenD) | Zuul replaced by KrakenD |
| disbusrment-scheduler | simpaisa1/disbusrment-scheduler |
phoenix-payout | Integrated batch processing |
| disbursement-scheduler-sunshine | simpaisa1/disbursement-scheduler-sunshine |
phoenix-payout | Code fork eliminated |
| 1bill | simpaisa1/1bill |
phoenix-payin | Bill payment as Pay-In variant |
Appendix B: Audit Findings Addressed by Phoenix¶
Every P0 and P1 finding from the Codebase Audit is structurally eliminated by Phoenix's architecture:
| Finding | Legacy Issue | Phoenix Resolution |
|---|---|---|
| CX-01 | No Spring Security | OAuth 2.0 + JWT + mTLS at gateway |
| CX-02 | CORS wildcard | KrakenD CORS policy, no wildcard |
| CX-04 | HashMap request bodies | Typed Go structs with validation tags |
| CX-05 | No versioning | /api/v1/ URL path versioning |
| CX-06 | No rate limiting | KrakenD rate limiting + Cloudflare L7 |
| CX-10 | No tracing | OpenTelemetry (Jaeger + Prometheus) |
| CX-12 | No tests | testify + testcontainers, 80% coverage gate |
| W-01, R-02, D-01 | Hardcoded credentials | Vault for all secrets, dynamic credentials |
| W-03, C-01, D-07 | AES-ECB | AES-256-GCM via sdk/crypto |
| W-04 | Idempotency fails open | Fail-closed idempotency middleware |
| R-01 | SSL disabled | TLS 1.2+ enforced everywhere |
| R-05 | No @Transactional | SurrealDB transactions (BEGIN/COMMIT) |
| R-06 | double for money | shopspring/decimal via sdk/money |
| C-05 | Raw PAN in app | Vault Transit tokenisation |
| D-09 | No row locking | SurrealDB atomic transactions + optimistic locking |
Critical Files for Implementation¶
/Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/Codebase-Audit.md- Source of all 60 legacy findings that Phoenix must structurally eliminate; the definitive reference for what went wrong/Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/OpenBanking-Comparison-Audit.md- Open Banking UK gap analysis that defines the API design standards Phoenix adopts (error format, idempotency, pagination, signing)/Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/PayIn PK Technical Specs.pdf- Legacy Pay-In flow documentation (single charge, recurring/tokenisation, operator codes, OTP handling) that Phoenix must replicate functionally/Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/GitBook-API-Audit.md- Current merchant-facing API documentation audit; defines what merchants expect today and where the translation proxy must maintain compatibility/Users/daniel/Library/CloudStorage/OneDrive-SIMPAISA/Work/API/API-Best-Practices-Audit.md- 37 findings across all products with severity ratings; provides the prioritised checklist for Phoenix's security and reliability requirements