Simpaisa Infrastructure Standards
Version: 1.0.0
Date: 2026-04-03
Owner: CDO (Daniel O'Reilly)
Classification: Internal — Architecture & Engineering Leadership
Status: Living Document — Prototype / AI SDLC Showcase
Table of Contents
- Executive Summary
- Infrastructure Principles
- Environment Strategy
- Compute Standards
- Networking Standards
- Edge & CDN (Cloudflare)
- API Gateway (KrakenD)
- Observability Stack
- Identity & Access (ControlPlane.com)
- Data Infrastructure
- Secret Management
- Disaster Recovery & Business Continuity
- Compliance Infrastructure Requirements
- Infrastructure as Code
- CI/CD Pipeline Standards
- Cost Management
- Migration Roadmap
- Appendix: Infrastructure Controls Checklist
1. Executive Summary
This document defines the infrastructure standards for Simpaisa's payment gateway platform, which processes 270M+ transactions worth $1B+ across Pakistan, Bangladesh, Nepal, Iraq, and Egypt. It covers four product lines: Pay-Ins, Pay-Outs, Remittances, and Cards.
Context
This is a prototype and showcase of AI SDLC capabilities. The organisation is adopting an agentic AI SDLC-first approach — the team structure will be reorganised as required to support this model.
Current State Summary
Simpaisa runs on AWS with sound foundational infrastructure (Multi-AZ, WAF, ALB, ASG, RDS, ElastiCache) but has significant gaps in observability, API gateway, disaster recovery documentation, and distributed tracing. The platform uses Spring Boot / Java services on EC2.
Target State Summary
The target architecture moves towards a cloud-native, multi-provider model:
| Layer |
Current |
Target |
| Edge/CDN |
AWS WAF only |
Cloudflare (CDN, WAF, DDoS, Workers, Pages, R2, DNS) |
| API Gateway |
None |
KrakenD |
| Compute |
EC2 + ASG (Spring Boot/Java) |
Containers (Go services) + Unikraft unikernels (assess) |
| Reverse Proxy |
ALB direct |
Caddy (per-service, mTLS) behind ALB |
| Identity |
Custom auth |
ControlPlane.com |
| Observability |
CloudWatch |
OpenTelemetry → Grafana / Jaeger / OpenSearch |
| Analytics |
None |
PostHog |
| Database |
RDS MySQL (shared) |
SurrealDB (new services) + MySQL (existing) |
| Messaging |
Kafka |
NSQ |
| Search |
None |
Meilisearch (merchant-facing) + OpenSearch (logs) |
| Workflow |
None |
Temporal |
| Hosting |
AWS only |
Cloudflare preferred + AWS for existing |
Critical Gaps
| Gap |
Priority |
Impact |
| No API Gateway |
CRITICAL |
No centralised rate limiting, auth verification, or request validation |
| Single shared RDS |
CRITICAL |
Single point of failure, no service isolation |
| No DR documentation |
CRITICAL |
Unknown recovery posture |
| No distributed tracing |
HIGH |
Cannot trace transactions end-to-end across services |
| No CDN |
HIGH |
Latency for merchant-facing assets, no edge caching |
| Single ElastiCache cluster |
HIGH |
Cache failure impacts all services |
| No blue/green or canary |
MEDIUM |
Risky deployments with potential downtime |
| No IaC documented |
MEDIUM |
Infrastructure drift, no reproducibility |
2. Infrastructure Principles
2.1 Cloud-Native
All new services MUST be designed as cloud-native, containerised workloads. Infrastructure MUST be provisioned through APIs, not manual console operations.
2.2 Infrastructure as Code
All infrastructure MUST be defined in version-controlled code. No manual provisioning or configuration changes in any environment. Drift detection MUST run on every deployment.
2.3 Immutable Deployments
Infrastructure and application artefacts MUST be immutable. No in-place updates to running instances. Every deployment creates new artefacts; rollback means deploying the previous artefact.
2.4 Observability-First
Every service MUST emit structured logs, metrics, and traces from day one. Observability is not optional — it is a deployment prerequisite. OpenTelemetry is the mandatory instrumentation standard.
2.5 Security by Default
All network traffic MUST be encrypted in transit (TLS 1.2 minimum, TLS 1.3 preferred). All data at rest MUST be encrypted. Zero-trust networking: no implicit trust between services. mTLS for all service-to-service communication.
2.6 Multi-Jurisdiction Compliance
Infrastructure MUST satisfy regulatory requirements across all operating jurisdictions (Pakistan, Bangladesh, Nepal, Iraq, Egypt). Data residency requirements MUST be met per jurisdiction. Compliance controls MUST be auditable and evidenced.
2.7 Least Privilege
All access — human and machine — MUST follow the principle of least privilege. Service accounts MUST have only the permissions required for their function. Permissions MUST be reviewed quarterly.
2.8 Automation Over Process
Automate everything that can be automated. Manual processes are a source of error and a barrier to scale. If a runbook step is repeated more than twice, it MUST be automated.
3. Environment Strategy
3.1 Environment Definitions
| Environment |
Purpose |
URL Pattern |
Access |
Data |
| Sandbox |
Merchant-facing testing and integration |
sandbox.simpaisa.com |
Merchants + internal |
Synthetic test data only |
| Dev |
Internal development and experimentation |
dev.internal.simpaisa.com |
Engineering only |
Synthetic / anonymised |
| Test |
Automated testing, QA, UAT |
test.internal.simpaisa.com |
Engineering + QA |
Synthetic / anonymised |
| Prod |
Live production traffic |
api.simpaisa.com |
Controlled access |
Real customer/merchant data |
3.2 Environment Parity Requirements
| Aspect |
Requirement |
| Architecture |
All environments MUST use the same architectural patterns (ALB, ASG, VPC layout) |
| Configuration |
Same configuration structure, different values per environment |
| Infrastructure |
Dev/Test may use smaller instance sizes; architecture MUST match Prod |
| Networking |
Same VPC/subnet design; security groups MUST be equivalent |
| Secrets |
Each environment has its own secrets; NEVER share across environments |
| Databases |
Same engine and version across all environments |
| Monitoring |
All environments MUST have observability; alerting thresholds differ |
3.3 Data Segregation
- Production data MUST NEVER be copied to lower environments without anonymisation
- Each environment MUST have its own database instances, cache clusters, and message queues
- Payment channel credentials MUST be environment-specific (sandbox credentials for Sandbox, live for Prod)
- PII MUST NOT exist in Dev or Test environments
- Sandbox MUST simulate realistic payment channel responses (success, failure, timeout, partial)
Dev → Test → Prod
│ │ │
│ │ └── Requires: all quality gates passed, change approval, deployment window
│ └────────── Requires: all automated tests pass, security scan clean
└───────────────── Requires: code review, unit tests pass, lint clean
| Gate |
Dev → Test |
Test → Prod |
| Code review |
Required |
N/A (already done) |
| Unit tests |
Pass |
Pass |
| Integration tests |
Run |
Pass (mandatory) |
| Security scan |
Run |
Pass (mandatory, zero critical/high) |
| Performance test |
Optional |
Required for payment-path changes |
| Change approval |
Not required |
Required (CDO or delegate) |
| Deployment window |
Any time |
Scheduled (avoid peak transaction hours) |
| Rollback plan |
Documented |
Documented and tested |
3.5 Sandbox-Specific Requirements
The Sandbox environment is merchant-facing and MUST:
- Be available 99.5% of the time (separate SLA from Prod)
- Provide realistic response times (within 2x of Prod P95)
- Support all payment channels with simulated responses
- Provide test credentials and documentation
- Allow merchants to trigger specific scenarios (success, decline, timeout, insufficient funds)
- Have its own KrakenD instance with the same rate limiting configuration as Prod
- Log all requests for merchant support and debugging
4. Compute Standards
4.1 Current State
| Aspect |
Detail |
| Platform |
AWS EC2 instances |
| Scaling |
Auto Scaling Groups (ASG) |
| Runtime |
Spring Boot / Java |
| Load Balancing |
Application Load Balancer (ALB) in public subnets |
| Availability |
Multi-AZ deployment |
| Deployment |
Rolling updates via ASG |
4.2 Target State
| Aspect |
Detail |
| New services |
Go services in containers |
| Security-critical |
Unikraft unikernels (assess phase — evaluate for payment processing core) |
| Reverse proxy |
Caddy per-service (behind ALB, providing mTLS termination) |
| Orchestration |
TBC — evaluate ECS Fargate, Kubernetes (EKS), or ControlPlane.com |
| Existing services |
Spring Boot / Java on EC2 (maintained until rewritten) |
4.3 Sizing Guidelines
| Service Tier |
Description |
Min Instances |
Instance Type (Current) |
Auto-Scale Trigger |
| Tier 1 — Payment Critical |
Pay-In initiation, Pay-Out execution, Remittance processing, Card auth |
3 (Multi-AZ) |
m5.xlarge or equivalent |
CPU > 60%, request latency P99 > 500ms |
| Tier 2 — Merchant Facing |
API Gateway, Sandbox, Developer Portal, Merchant Dashboard |
2 (Multi-AZ) |
m5.large or equivalent |
CPU > 70%, request latency P99 > 1s |
| Tier 3 — Internal |
Reporting, reconciliation, back-office |
2 |
m5.large or equivalent |
CPU > 75% |
| Tier 4 — Infrastructure |
Observability, logging, search indexing |
2 |
r5.large or equivalent |
Disk > 80%, memory > 85% |
4.4 Auto-Scaling Policies for Payment Workloads
Payment services MUST scale based on:
- Request rate (transactions per second)
- Response latency (P95 and P99)
- CPU utilisation
- Queue depth (for async processing)
Scale-out: aggressive (1-minute evaluation, 2-minute cooldown)
Scale-in: conservative (5-minute evaluation, 10-minute cooldown)
Payment services MUST NOT scale to zero.
Minimum capacity MUST handle 2x average traffic without scaling.
4.5 Deployment Strategies
| Strategy |
Current State |
Target State |
| Rolling update |
Yes (ASG) |
Maintained for non-critical services |
| Blue/green |
No |
Required for Tier 1 (payment-critical) services |
| Canary |
No |
Required for API Gateway and payment initiation |
Blue/Green Requirements:
- Two identical environments (blue and green)
- Traffic switch at ALB level (weighted target groups)
- Automated health checks before full cutover
- Instant rollback capability (switch back to previous colour)
- Both environments kept warm for minimum 30 minutes post-deployment
Canary Requirements:
- Initial canary: 5% of traffic
- Automated metric comparison (error rate, latency, success rate)
- Automatic rollback if error rate increases by > 0.1%
- Progressive rollout: 5% → 25% → 50% → 100%
- Minimum 10 minutes at each stage
5. Networking Standards
5.1 VPC Design
Each environment MUST have its own VPC. VPCs MUST NOT be shared across environments.
| Environment |
VPC CIDR |
Region |
| Prod |
10.0.0.0/16 |
TBC (primary) |
| Test |
10.1.0.0/16 |
TBC (same region as Prod) |
| Dev |
10.2.0.0/16 |
TBC (same region as Prod) |
| Sandbox |
10.3.0.0/16 |
TBC (same region as Prod) |
Note: CIDR ranges are illustrative. Final allocation requires network planning exercise including payment channel VPN requirements.
5.2 Subnet Strategy
Each VPC MUST have three subnet tiers across a minimum of two Availability Zones:
| Subnet Tier |
Purpose |
Internet Access |
Examples |
| Public |
Edge / ingress |
Direct (IGW) |
ALB, NAT Gateway, Bastion (if required) |
| Private |
Application workloads |
Outbound only (NAT) |
EC2 instances, containers, KrakenD, Caddy |
| Isolated |
Data stores |
None |
RDS, ElastiCache, SurrealDB, OpenSearch |
5.3 Security Groups and NACLs
Security Group Rules:
| Component |
Inbound |
Outbound |
| ALB |
443 (HTTPS) from Cloudflare IPs only |
Application ports to private subnets |
| Application instances |
Application port from ALB SG only |
443 to NAT GW (external APIs), DB ports to isolated subnet |
| KrakenD |
8080 from ALB SG |
Application ports to private subnets |
| RDS MySQL |
3306 from application SG only |
None (stateful return traffic) |
| ElastiCache Redis |
6379 from application SG only |
None |
| OpenSearch |
9200 from observability SG only |
None |
NACL Rules:
- NACLs provide defence in depth at the subnet level
- Deny all by default, explicitly allow required traffic
- NACLs MUST mirror security group intent but at the subnet level
5.4 NAT Gateway Configuration
- One NAT Gateway per Availability Zone for high availability
- All private subnet outbound traffic routes through NAT Gateway
- NAT Gateway MUST be in the public subnet
- Elastic IP allocated per NAT Gateway
5.5 DNS: Cloudflare DNS
| Aspect |
Standard |
| Primary DNS |
Cloudflare (authoritative) |
| Internal DNS |
Route 53 Private Hosted Zones (for VPC-internal resolution) |
| TTL |
300s for API endpoints, 3600s for static assets |
| DNSSEC |
Enabled on all public zones |
| Records |
A/AAAA records proxied through Cloudflare (orange cloud) |
5.6 DDoS Protection
| Layer |
Current |
Target |
| Layer 7 |
AWS WAF |
Cloudflare WAF (primary) + AWS WAF (transitional) |
| Layer 3/4 |
AWS Shield Standard |
Cloudflare DDoS protection |
| Rate limiting |
None centralised |
Cloudflare rate limiting + KrakenD per-merchant limits |
| Bot management |
None |
Cloudflare Bot Management |
6. Edge & CDN (Cloudflare)
6.1 Cloudflare as Primary Edge
Cloudflare MUST be the primary edge for all Simpaisa public-facing services. All traffic MUST pass through Cloudflare before reaching AWS infrastructure.
| Service |
Cloudflare Product |
Purpose |
| CDN |
Cloudflare CDN |
Cache static assets, reduce origin load |
| WAF |
Cloudflare WAF |
Application-layer attack protection |
| DDoS |
Cloudflare DDoS Protection |
Volumetric and protocol attack mitigation |
| DNS |
Cloudflare DNS |
Authoritative DNS with global anycast |
| Workers |
Cloudflare Workers |
Edge logic (rate limiting, validation, geo-routing) |
| Pages |
Cloudflare Pages |
Static site hosting (corporate site, developer portal) |
| R2 |
Cloudflare R2 |
Object storage (reports, receipts, merchant documents) |
| Bot Management |
Cloudflare Bot Management |
Distinguish legitimate traffic from bots |
6.2 Cloudflare Workers Use Cases
| Use Case |
Description |
Priority |
| Geo-routing |
Route requests to appropriate regional backend based on merchant jurisdiction |
HIGH |
| Request validation |
Validate request structure before forwarding to origin |
HIGH |
| Rate limiting |
First-pass rate limiting at the edge (before KrakenD) |
HIGH |
| A/B testing |
Route percentage of traffic to canary deployments |
MEDIUM |
| IP allowlisting |
Enforce merchant IP allowlists at the edge |
MEDIUM |
| Response caching |
Cache merchant configuration, channel status responses |
MEDIUM |
| Header injection |
Add tracing headers (X-Request-ID, X-Trace-ID) at the edge |
HIGH |
6.3 Cloudflare Pages
| Site |
Repository |
Domain |
| Corporate website |
simpaisa.com repo |
www.simpaisa.com |
| Developer portal |
developer-portal repo |
developer.simpaisa.com |
| Status page |
status repo |
status.simpaisa.com |
6.4 Cloudflare R2
| Bucket |
Purpose |
Retention |
Access |
merchant-reports |
Generated merchant reports (CSV, PDF) |
90 days |
Merchant portal (signed URLs) |
transaction-receipts |
Payment receipts |
7 years (compliance) |
Internal + merchant API |
merchant-documents |
KYC/KYB documents |
10 years (compliance) |
Internal only |
static-assets |
Images, fonts, scripts |
Indefinite |
Public (CDN) |
6.5 WAF Rules for Payment API Protection
| Rule |
Action |
Description |
| Block non-HTTPS |
Block |
All payment API traffic MUST be HTTPS |
| Block non-JSON |
Block |
Payment APIs accept JSON only; block other content types |
| Block oversized requests |
Block |
Maximum 1MB request body for payment APIs |
| Rate limit by merchant |
Challenge/Block |
Per-merchant TPS limits enforced at edge |
| Block known bad IPs |
Block |
Threat intelligence feed integration |
| SQL injection detection |
Block |
OWASP CRS rules for SQLi patterns |
| Geographic restrictions |
Block |
Block traffic from sanctioned jurisdictions |
| Bot score filtering |
Challenge |
Challenge requests with bot score < 30 |
6.6 Cloudflare-to-Origin Security
- Authenticated Origin Pulls: Cloudflare presents a client certificate to the ALB; the ALB validates it
- Origin CA: Use Cloudflare Origin CA certificates on ALB
- Strict SSL mode: Full (Strict) — Cloudflare validates origin certificate
- IP allowlisting: ALB security group MUST only allow Cloudflare IP ranges (published at
cloudflare.com/ips)
7. API Gateway (KrakenD)
7.1 Deployment Architecture
Cloudflare Edge → ALB → KrakenD Cluster → Caddy (mTLS) → Backend Services
| Aspect |
Standard |
| Deployment |
Containerised, minimum 3 instances across AZs |
| Configuration |
Declarative JSON, version-controlled |
| Health check |
/health endpoint, 10-second interval |
| Scaling |
Horizontal, based on request rate and latency |
| State |
Stateless — no persistent storage required |
7.2 Configuration Management
- KrakenD configuration MUST be declarative JSON stored in Git
- Configuration changes MUST go through the standard promotion workflow (Dev → Test → Prod)
- Configuration MUST be validated (
krakend check) before deployment
- Flexible Configuration (FC) MUST be used to template environment-specific values
- Configuration MUST be generated from OpenAPI specifications where possible
7.3 Auth Verification at Gateway
| Auth Method |
Product |
KrakenD Handling |
| JWT validation |
All (target) |
Validate JWT signature, expiry, issuer, audience |
| API key |
Sandbox |
Validate against key store, inject merchant context |
| RSA signature |
Pay-Outs, Remittances (current) |
Pass through to backend (gateway validates timestamp freshness) |
| mTLS |
Cards |
Terminate at Caddy, KrakenD receives forwarded client cert info |
7.4 Rate Limiting Tiers
| Tier |
Scope |
Default Limit |
Burst |
Notes |
| Global |
All merchants |
10,000 req/s |
15,000 |
Platform-wide safety limit |
| Per-merchant |
Individual merchant |
100 req/s |
200 |
Configurable per merchant agreement |
| Per-product |
Product line |
5,000 req/s |
7,500 |
Pay-Ins, Pay-Outs, Remittances, Cards |
| Per-endpoint |
Specific endpoint |
Varies |
Varies |
e.g., payment initiation: 50 req/s per merchant |
| Sandbox |
Sandbox environment |
20 req/s per merchant |
30 |
Lower limits for testing |
Rate limit responses MUST include:
- X-RateLimit-Limit — maximum requests allowed
- X-RateLimit-Remaining — requests remaining in window
- X-RateLimit-Reset — seconds until window resets
- HTTP 429 status code with a JSON error body
7.5 OpenAPI Validation
- All API endpoints MUST have an OpenAPI 3.1 specification
- KrakenD MUST validate incoming requests against the OpenAPI schema
- Invalid requests MUST be rejected at the gateway (400 Bad Request)
- Request body validation: required fields, types, format constraints
- Query parameter validation: allowed values, types
7.6 Error Response Standardisation
All error responses from KrakenD MUST follow RFC 9457 (Problem Details for HTTP APIs):
{
"type": "https://api.simpaisa.com/errors/rate-limited",
"title": "Rate limit exceeded",
"status": 429,
"detail": "Merchant has exceeded 100 requests per second",
"instance": "/v1/pay-ins/transactions",
"traceId": "abc123-def456-ghi789"
}
7.7 High Availability
| Requirement |
Standard |
| Minimum instances |
3 (one per AZ) |
| Health check |
HTTP 200 on /health within 5 seconds |
| Graceful shutdown |
Drain connections for 30 seconds before termination |
| Configuration reload |
Zero-downtime reload on configuration change |
| Failover |
ALB removes unhealthy instances within 30 seconds |
| Availability target |
99.99% (gateway MUST NOT be the bottleneck) |
8. Observability Stack
CloudWatch will NOT be used. The observability stack is built on open standards (OpenTelemetry) with open-source tooling.
8.1 Architecture Overview
Services (OTel SDK) → OTel Collector → ┬→ Prometheus (metrics) → Grafana
├→ Jaeger / Tempo (traces) → Grafana
└→ OpenSearch (logs) → Grafana / OpenSearch Dashboards
PostHog ← (product events from frontend + backend)
8.2 OpenTelemetry Collector
The OpenTelemetry Collector is the unified telemetry pipeline. All services MUST send telemetry to the OTel Collector — never directly to backends.
| Aspect |
Standard |
| Deployment |
Agent mode (sidecar or daemonset) + Gateway mode (central) |
| Receivers |
OTLP (gRPC and HTTP), Prometheus scrape, Fluent Forward |
| Processors |
Batch, memory limiter, attribute enrichment, tail sampling |
| Exporters |
Prometheus Remote Write, Jaeger/Tempo OTLP, OpenSearch |
| Configuration |
Version-controlled YAML, per-environment |
8.3 Traces: Jaeger (or Grafana Tempo)
| Aspect |
Standard |
| Tool |
Jaeger (evaluate Grafana Tempo as alternative) |
| Storage |
OpenSearch (Jaeger backend) or S3 (Tempo) |
| Retention |
30 days hot, 90 days cold |
| Sampling |
Head-based: 100% for errors, 10% for success (adjust per traffic) |
| Context propagation |
W3C Trace Context (mandatory), B3 (for legacy compatibility) |
Mandatory Trace Spans:
Every payment transaction MUST include the following spans:
| Span |
Service |
Description |
gateway.receive |
KrakenD |
Request received at gateway |
auth.verify |
KrakenD / Auth service |
Authentication/authorisation check |
payment.initiate |
Payment service |
Payment initiation logic |
channel.request |
Channel adapter |
Request sent to payment channel (Easypaisa, JazzCash, etc.) |
channel.response |
Channel adapter |
Response received from channel |
payment.complete |
Payment service |
Transaction finalisation |
callback.dispatch |
Callback service |
Webhook sent to merchant |
8.4 Metrics: Prometheus + Grafana
| Aspect |
Standard |
| Collection |
Prometheus (via OTel Collector remote write) |
| Visualisation |
Grafana |
| Retention |
15 days high-resolution, 1 year downsampled |
| Naming convention |
simpaisa_<product>_<metric>_<unit> |
Mandatory Metrics:
| Metric |
Type |
Labels |
Description |
simpaisa_transaction_total |
Counter |
product, channel, status, merchant |
Total transactions |
simpaisa_transaction_duration_seconds |
Histogram |
product, channel, merchant |
Transaction processing time |
simpaisa_transaction_amount_total |
Counter |
product, channel, currency |
Total transaction value |
simpaisa_channel_request_duration_seconds |
Histogram |
channel, operation |
Time to get response from payment channel |
simpaisa_channel_availability |
Gauge |
channel |
Channel health (1 = up, 0 = down) |
simpaisa_gateway_request_total |
Counter |
method, path, status |
API gateway requests |
simpaisa_gateway_latency_seconds |
Histogram |
method, path |
API gateway response time |
simpaisa_error_total |
Counter |
product, error_type, severity |
Errors by type |
8.5 Logs: OpenSearch with Structured Logging
| Aspect |
Standard |
| Format |
JSON structured logging (mandatory) |
| Transport |
OTel Collector → OpenSearch |
| Retention |
90 days hot, 1 year warm, 7 years cold (compliance) |
| Index pattern |
simpaisa-<service>-<environment>-YYYY.MM.DD |
| ISM Policy |
Hot → Warm at 7 days, Warm → Cold at 90 days, Delete at 7 years |
Mandatory Log Fields:
{
"timestamp": "2026-04-03T10:30:00.000Z",
"level": "INFO",
"service": "pay-in-service",
"traceId": "abc123",
"spanId": "def456",
"merchantId": "M12345",
"transactionId": "TXN-789",
"channel": "easypaisa",
"message": "Transaction initiated",
"environment": "prod"
}
Sensitive Data Rules:
- NEVER log card numbers, CVV, PINs, or full account numbers
- Mask mobile numbers: 03XX-XXXX-1234 (show last 4 digits only)
- Mask CNICs: XXXXX-XXXXXXX-3 (show last digit only)
- Log transaction IDs, merchant IDs, channel references — these are required for tracing
8.6 Alerting
| Aspect |
Standard |
| Tool |
Grafana Alerting (evaluate PagerDuty/OpsGenie for escalation) |
| Channels |
Slack (info/warning), SMS/call (critical), email (summary) |
| Escalation |
P1: immediate call → CDO + on-call engineer; P2: Slack + 15min response; P3: next business day |
Alert Definitions:
| Alert |
Severity |
Condition |
Action |
| Transaction success rate drop |
P1 |
Success rate < 95% for any channel over 5 minutes |
Immediate investigation |
| Payment channel down |
P1 |
Channel health check fails for 3 consecutive checks |
Failover / merchant notification |
| API latency spike |
P2 |
P99 latency > 2s for 5 minutes |
Scale out / investigate |
| Error rate increase |
P2 |
Error rate > 5% over 5 minutes |
Investigate |
| Disk space critical |
P2 |
Any data store > 85% disk usage |
Expand / clean up |
| Certificate expiry |
P3 |
Any certificate expiring within 14 days |
Renew |
| Deployment failed |
P2 |
Deployment health check fails |
Automatic rollback |
8.7 Dashboards
| Dashboard |
Audience |
Key Metrics |
| Executive Overview |
CDO, leadership |
Total transactions, value, success rate, revenue by product |
| Per-Product |
Product owners |
Transaction volume, success/failure rates, channel mix, latency |
| Per-Channel |
Operations |
Channel availability, response times, error rates, queue depth |
| Per-Merchant |
Support, account managers |
Merchant transaction volume, errors, rate limit hits |
| Infrastructure |
Engineering |
CPU, memory, disk, network, scaling events |
| Security |
Security team |
WAF blocks, auth failures, suspicious patterns, rate limit events |
| SLA Monitoring |
Operations, leadership |
P95/P99 latency per endpoint, uptime percentages |
8.8 Transaction Tracing
End-to-end transaction tracing is the highest priority observability feature. Every merchant request MUST be traceable from Cloudflare edge → KrakenD → service → payment channel → callback.
| Requirement |
Standard |
| Trace ID |
Generated at Cloudflare edge (Worker), propagated through all services |
| Correlation |
Trace ID MUST appear in logs, metrics labels, and traces |
| Merchant visibility |
Trace ID returned in API response headers (X-Trace-Id) |
| Support lookup |
Support team can search by trace ID, transaction ID, or merchant reference |
| Channel correlation |
Map Simpaisa trace ID to channel reference number |
8.9 PostHog for Product Analytics
| Aspect |
Standard |
| Deployment |
Self-hosted (data residency compliance) or cloud (evaluate) |
| Events |
Merchant portal interactions, developer portal usage, API adoption |
| Feature flags |
PostHog feature flags for gradual rollout |
| Session replay |
Enabled for merchant portal (with PII redaction) |
| Funnels |
Merchant onboarding, first transaction, product adoption |
9. Identity & Access (ControlPlane.com)
9.1 Overview
ControlPlane.com provides Universal Cloud Identity, enabling workloads to consume cloud resources from multiple providers without storing credentials. It employs a zero-trust architecture where every access request is fully authenticated and authorised.
9.2 Centralised Identity Management
| Aspect |
Current State |
Target State |
| Human access |
AWS IAM users + console |
ControlPlane.com SSO → cloud provider roles |
| Service identity |
AWS IAM roles (per-service) |
ControlPlane.com workload identity |
| Merchant identity |
Custom auth (JSESSIONID / RSA) |
ControlPlane.com + KrakenD JWT validation |
| Audit trail |
CloudTrail (AWS only) |
ControlPlane.com tamper-proof audit trail + CloudTrail |
9.3 Service-to-Service Authentication
| Requirement |
Standard |
| Protocol |
mTLS (mutual TLS) via Caddy |
| Certificate management |
ControlPlane.com or automated CA (evaluate) |
| Rotation |
Automatic, maximum 24-hour certificate lifetime |
| Verification |
Both client and server certificates validated |
| No shared secrets |
Services MUST NOT use shared API keys for inter-service communication |
9.4 Merchant Identity and RBAC
| Role |
Permissions |
Description |
| Merchant Admin |
Full access to merchant's resources |
Account owner, manages users and settings |
| Merchant Operator |
Initiate transactions, view reports |
Day-to-day operational access |
| Merchant Viewer |
Read-only access |
Reporting and audit access |
| Merchant Developer |
Sandbox access, API key management |
Integration and testing |
9.5 Integration with KrakenD
Merchant Request → Cloudflare → KrakenD → ControlPlane.com (token validation)
↓
Valid JWT with claims:
- merchant_id
- roles[]
- products[]
- rate_limit_tier
↓
Backend Service (receives validated claims as headers)
9.6 Policy-as-Code
- Access policies MUST be defined as code and version-controlled
- Policy changes MUST go through the same review process as code changes
- Policies MUST be testable (unit tests for policy logic)
- ControlPlane.com policies define: who can access what resources, from which networks, at which times
10. Data Infrastructure
10.1 Overview
| Technology |
Role |
Current State |
Target State |
| RDS MySQL |
Primary transactional database |
Shared single instance, Multi-AZ |
Per-service instances, read replicas, automated backups |
| SurrealDB |
New service database |
Not deployed |
Clustered deployment for new Go services |
| ElastiCache Redis |
Caching and session store |
Single shared cluster |
Cluster mode enabled, per-service namespacing |
| NSQ |
Message queue |
Not deployed (Kafka currently) |
Replace Kafka for inter-service messaging |
| Meilisearch |
Merchant-facing search |
Not deployed |
Merchant/transaction search in portal |
| OpenSearch |
Log storage and search |
Not deployed |
Log aggregation, Jaeger trace storage |
10.2 RDS MySQL (Existing)
| Aspect |
Current |
Target |
Priority |
| Instances |
1 shared instance |
Per-service instances (minimum: separate Pay-Ins, Pay-Outs, Remittances, Cards) |
CRITICAL |
| Multi-AZ |
Yes |
Yes (maintained) |
— |
| Read replicas |
None |
1 per service instance (reporting queries) |
HIGH |
| Backups |
TBC |
Automated daily, 35-day retention, point-in-time recovery |
CRITICAL |
| Encryption at rest |
TBC |
AES-256 (AWS KMS managed key) |
CRITICAL |
| Encryption in transit |
TBC |
TLS mandatory for all connections |
CRITICAL |
| Version |
TBC |
MySQL 8.0+ (latest stable) |
MEDIUM |
| Monitoring |
CloudWatch |
Prometheus exporter → Grafana |
HIGH |
| Slow query log |
TBC |
Enabled, threshold 1s, exported to OpenSearch |
HIGH |
10.3 SurrealDB (New Services)
| Aspect |
Standard |
| Deployment |
Clustered (minimum 3 nodes for Prod) |
| Storage backend |
TiKV (distributed) or RocksDB (single-node for Dev/Test) |
| Backup |
Automated daily export, stored in R2 |
| Access |
Namespace and database per service, scoped authentication |
| Schema |
Schemaful tables for payment data, schemafree for flexible data |
| Monitoring |
Prometheus metrics endpoint → Grafana |
10.4 Redis (ElastiCache)
| Aspect |
Current |
Target |
Priority |
| Mode |
Single cluster, no cluster mode |
Cluster mode enabled |
HIGH |
| Failover |
Multi-AZ with automatic failover |
Maintained |
— |
| Namespacing |
None (shared keyspace) |
Prefix per service: payin:, payout:, remit:, cards: |
HIGH |
| Encryption |
TBC |
In-transit (TLS) and at-rest encryption |
HIGH |
| Eviction |
TBC |
allkeys-lru for caches, noeviction for session stores |
MEDIUM |
| Monitoring |
CloudWatch |
Prometheus exporter → Grafana |
HIGH |
| Backup |
TBC |
Daily snapshots, 7-day retention |
MEDIUM |
10.5 NSQ (Messaging)
| Aspect |
Standard |
| Deployment |
nsqlookupd (3 instances) + nsqd (per application host) |
| Topics |
One topic per event type: payment.initiated, payment.completed, payment.failed, callback.pending, etc. |
| Channels |
One channel per consumer group (e.g., payment.completed#notification, payment.completed#reconciliation) |
| Message retention |
In-memory with disk overflow; messages purged after successful consumption |
| Dead letter |
Failed messages after 5 retries → dead letter topic for manual investigation |
| Monitoring |
nsqadmin + Prometheus exporter → Grafana |
| Ordering |
Per-partition ordering not guaranteed; use idempotency keys for exactly-once semantics |
10.6 Meilisearch (Merchant-Facing Search)
| Aspect |
Standard |
| Purpose |
Fast search in merchant portal (transactions, customers, reports) |
| Deployment |
Single instance per environment (evaluate clustering for Prod) |
| Indices |
transactions, merchants, customers, reports |
| Refresh strategy |
Near-real-time: primary write to MySQL/SurrealDB, async index update via NSQ |
| Security |
API key per merchant, tenant isolation via filterable attributes |
| Monitoring |
Health check endpoint + Prometheus metrics |
10.7 OpenSearch (Logs and Traces)
| Aspect |
Standard |
| Deployment |
3 master nodes + 3 data nodes (Prod minimum) |
| Indices |
simpaisa-logs-*, simpaisa-jaeger-*, simpaisa-audit-* |
| ISM Policies |
Hot (7 days, SSD) → Warm (90 days, HDD) → Cold (7 years, S3/R2) → Delete |
| Retention |
Logs: 7 years (compliance), Traces: 90 days, Audit: 10 years |
| Security |
OpenSearch Security plugin, RBAC per index, TLS |
| Backup |
Snapshot to S3/R2, daily |
| Monitoring |
Built-in performance analyser + Prometheus exporter |
11. Secret Management
11.1 Current State
| Aspect |
Detail |
| Tool |
AWS Systems Manager Parameter Store (SecureString) |
| Encryption |
AWS KMS managed keys |
| Access |
IAM role-based |
| Rotation |
Manual |
| Audit |
CloudTrail |
11.2 Target State
| Aspect |
Detail |
Priority |
| Tool |
Evaluate: ControlPlane.com secrets, HashiCorp Vault, AWS Secrets Manager |
HIGH |
| Rotation |
Automated rotation for all secrets; maximum 90-day lifetime |
HIGH |
| Access |
Workload identity (no static credentials); secrets injected at runtime |
HIGH |
| Audit |
All secret access logged and alerted on anomalous patterns |
HIGH |
11.3 Secret Policies
| Policy |
Requirement |
| No secrets in code |
NEVER commit secrets, tokens, keys, or passwords to source control |
| No secrets in config files |
Configuration files MUST reference secret paths, not values |
| No secrets in environment variables |
Prefer mounted secrets or secret injection; env vars are visible in process listings |
| No secrets in container images |
Build-time secrets MUST use multi-stage builds with secret mounts |
| Secret scanning in CI |
Every commit MUST be scanned for secret patterns (pre-commit hook + CI step) |
| Rotation on compromise |
If a secret is suspected compromised, rotate immediately (< 1 hour) |
| Shared secrets |
NEVER share secrets between environments; each environment has its own |
11.4 Secret Categories and Rotation
| Category |
Examples |
Max Lifetime |
Rotation Method |
| Database credentials |
MySQL, SurrealDB, Redis passwords |
90 days |
Automated (dual-user pattern) |
| API keys |
Payment channel API keys, merchant API keys |
365 days |
Merchant-initiated or scheduled |
| TLS certificates |
Service certificates, mTLS certs |
90 days (target: 24 hours via ControlPlane) |
Automated |
| Signing keys |
RSA keys for Pay-Outs/Remittances |
365 days |
Coordinated rotation with merchants |
| OAuth tokens |
Service-to-service tokens |
1 hour |
Automatic refresh |
| Encryption keys |
KMS keys, data encryption keys |
Annual rotation |
AWS KMS automatic rotation |
12. Disaster Recovery & Business Continuity
12.1 Service Tier Classification
| Tier |
Services |
RPO |
RTO |
Description |
| Tier 1 — Payment Critical |
Pay-In processing, Pay-Out execution, Card auth, Remittance processing |
0 (zero data loss) |
< 5 minutes |
Direct revenue impact; customer-facing payment flows |
| Tier 2 — Merchant Facing |
API Gateway, Merchant Portal, Sandbox |
< 5 minutes |
< 15 minutes |
Merchant experience; no direct payment loss |
| Tier 3 — Operational |
Reporting, reconciliation, settlement, back-office |
< 1 hour |
< 4 hours |
Internal operations; deferred processing acceptable |
| Tier 4 — Supporting |
Developer portal, corporate website, analytics |
< 24 hours |
< 24 hours |
No operational impact |
12.2 Backup Strategy
| Resource |
Backup Method |
Frequency |
Retention |
Testing |
| RDS MySQL |
Automated snapshots + binlog replication |
Continuous (point-in-time) |
35 days |
Monthly restore test |
| SurrealDB |
Export + snapshot |
Daily |
35 days |
Monthly restore test |
| Redis |
AOF + RDB snapshots |
Hourly (RDB), continuous (AOF) |
7 days |
Weekly restore test |
| OpenSearch |
Snapshot to S3/R2 |
Daily |
90 days (snapshots) |
Quarterly restore test |
| KrakenD config |
Git repository |
Every change |
Indefinite (Git history) |
On every deployment |
| IaC state |
Remote state backend + versioning |
Every change |
Indefinite |
On every deployment |
| Secrets |
AWS backup + encrypted export |
Daily |
35 days |
Quarterly |
| R2/S3 objects |
Cross-region replication |
Continuous |
Per retention policy |
Quarterly |
12.3 Current: Multi-AZ
| Component |
Multi-AZ Status |
Failover |
| EC2/ASG |
Yes (instances spread across AZs) |
Automatic (ASG replaces failed instances) |
| ALB |
Yes (cross-AZ load balancing) |
Automatic |
| RDS |
Yes (standby in different AZ) |
Automatic failover (< 2 minutes) |
| ElastiCache |
Yes (replica in different AZ) |
Automatic failover |
| NAT Gateway |
One per AZ |
Route table failover needed |
12.4 Target: Multi-Region
| Phase |
Scope |
Timeline |
| Phase 1 |
Document current DR posture, define RPO/RTO, create runbooks |
Q2 2026 |
| Phase 2 |
Cross-region backup replication (S3/R2), read replicas in secondary region |
Q3 2026 |
| Phase 3 |
Active-passive multi-region for Tier 1 services |
Q4 2026 |
| Phase 4 |
Active-active multi-region (evaluate need based on jurisdiction requirements) |
2027 |
12.5 Failover Procedures
| Scenario |
Detection |
Response |
Recovery |
| Single instance failure |
ASG health check (30s) |
ASG launches replacement |
Automatic (< 5 min) |
| AZ failure |
ALB health checks + CloudWatch |
Traffic shifts to healthy AZs |
Automatic (< 5 min) |
| RDS primary failure |
RDS event + monitoring alert |
Automatic failover to standby |
Automatic (< 2 min) |
| Redis primary failure |
ElastiCache failover |
Automatic promotion of replica |
Automatic (< 1 min) |
| Payment channel outage |
Health check failure (3 consecutive) |
Disable channel, notify merchants |
Manual channel re-enable after verification |
| Region failure |
Multi-region health check |
DNS failover to secondary region |
Manual (Phase 1) → Automatic (Phase 3) |
| Cloudflare incident |
External monitoring |
Evaluate: bypass to ALB direct (emergency only) |
Manual |
12.6 DR Testing Cadence
| Test Type |
Frequency |
Scope |
Owner |
| Backup restore |
Monthly |
Restore latest backup to Test environment |
Engineering |
| AZ failover |
Quarterly |
Simulate AZ failure, verify continued operation |
Engineering + Operations |
| Full DR exercise |
Bi-annually |
Full failover simulation, measure actual RTO/RPO |
CDO + Engineering |
| Tabletop exercise |
Quarterly |
Walk through failure scenarios with all stakeholders |
CDO |
| Chaos engineering |
Monthly (target) |
Controlled failure injection in Test/Prod |
Engineering |
12.7 Runbooks
The following runbooks MUST be created, tested, and maintained:
| Runbook |
Status |
| RDS failover procedure |
TO CREATE |
| Redis cluster failover |
TO CREATE |
| Payment channel outage response |
TO CREATE |
| Full region failover |
TO CREATE |
| KrakenD configuration rollback |
TO CREATE |
| Cloudflare bypass (emergency) |
TO CREATE |
| Data corruption recovery |
TO CREATE |
| DDoS attack response |
TO CREATE |
| Certificate emergency rotation |
TO CREATE |
| Merchant communication during outage |
TO CREATE |
13. Compliance Infrastructure Requirements
This section documents the infrastructure controls required by regulators in each jurisdiction where Simpaisa operates. Compliance is not optional — failure to meet these requirements risks licence revocation.
Note: Regulatory requirements are subject to change. This section MUST be reviewed quarterly and updated when new circulars or regulations are issued.
13.1 Pakistan — State Bank of Pakistan (SBP)
Governing Legislation:
- Payment Systems and Electronic Fund Transfers Act, 2007 (PSEFT Act)
- Rules for Payment System Operators and Payment Service Providers, 2014 (PSO/PSP Rules)
- Electronic Fund Transfer Regulations
- Personal Data Protection Bill, 2023 (pending enactment — draft approved by Federal Cabinet)
Infrastructure Requirements:
| Requirement |
Regulation Source |
Infrastructure Control |
Current Status |
Gap |
Priority |
| Data localisation |
PSO/PSP Rules 2014, PDPB 2023 (draft) |
Processing systems MUST be located within Pakistan; critical personal data stored on servers in Pakistan |
ASSESS — Verify all processing on Pakistan-based AWS region or local DC |
TBC |
CRITICAL |
| Technology platform approval |
PSO/PSP Rules 2014 |
Prior SBP approval required for changes to technology platforms |
ASSESS — Determine if current changes require approval |
TBC |
CRITICAL |
| Transaction record retention |
PSO/PSP Rules 2014 |
All transaction records retained for minimum 5 years (10 years recommended) |
ASSESS |
Log retention policy needed |
HIGH |
| Information security |
PSO/PSP Rules 2014 |
Appropriate measures for security, integrity, and confidentiality of financial transactions |
PARTIAL — AWS infrastructure sound, but gaps in observability and access control |
Strengthen controls |
HIGH |
| Risk management |
PSO/PSP Rules 2014 |
Documented risk management framework for payment operations |
ASSESS |
Documentation needed |
HIGH |
| Audit trail |
PSO/PSP Rules 2014 |
Complete audit trail of all transactions and system changes |
PARTIAL — Transaction logs exist but no centralised audit system |
Implement centralised audit logging |
HIGH |
| Business continuity |
PSO/PSP Rules 2014 |
Documented BCP/DR plan, tested regularly |
GAP — No DR documentation |
Create and test DR plan |
CRITICAL |
| Incident reporting |
SBP circulars |
Timely reporting of security incidents and system outages to SBP |
ASSESS |
Formalise incident reporting procedure |
HIGH |
| AML/CFT systems |
PSEFT Act, FATF requirements |
Transaction monitoring, sanctions screening, STR filing |
ASSESS |
Verify integration with FMU reporting |
HIGH |
Pakistan-Specific Notes:
- AWS does not have a region in Pakistan. Simpaisa MUST verify with SBP whether AWS ap-south-1 (Mumbai) is acceptable, or whether co-location in a Pakistan-based data centre is required for certain data categories
- The Personal Data Protection Bill 2023 introduces strict data localisation once enacted — "critical personal data shall only be processed in servers within Pakistan"
- SBP requires prior approval for changes to technology platforms — the migration to ControlPlane.com, KrakenD, and other new technologies may require SBP notification/approval
13.2 Bangladesh — Bangladesh Bank
Governing Legislation:
- Payment and Settlement Systems Act, 2024
- Mobile Financial Services Regulations, 2022
- Bangladesh Bank Payment Systems Department circulars
- Bangladesh Financial Intelligence Unit (BFIU) guidelines
Infrastructure Requirements:
| Requirement |
Regulation Source |
Infrastructure Control |
Current Status |
Gap |
Priority |
| Data localisation (mandatory) |
MFS Regulations 2022, PSS Act 2024 |
IT infrastructure and data centres MUST be located within Bangladesh; data localisation is mandatory |
ASSESS — Verify hosting for Bangladesh operations |
If not locally hosted, establish local DC or partner |
CRITICAL |
| On-site inspection readiness |
MFS Regulations 2022 |
Bangladesh Bank conducts on-site inspections of IT infrastructure after setup |
ASSESS |
Ensure infrastructure meets inspection standards |
CRITICAL |
| Biometric e-KYC |
BFIU guidelines |
Electronic KYC with biometric verification required |
ASSESS |
Integration with national ID system needed |
HIGH |
| AML/CFT compliance |
BFIU guidelines |
Suspicious Transaction Report filing, transaction monitoring |
ASSESS |
Verify STR filing integration |
HIGH |
| Two-phase licensing |
MFS Regulations 2022 |
Phase 1: NOC to set up infrastructure; Phase 2: licence to operate |
ASSESS — Verify current licence status |
Follow licensing process |
HIGH |
| Transaction reporting |
Bangladesh Bank circulars |
Regular transaction reports to Bangladesh Bank PSD |
ASSESS |
Automated reporting needed |
HIGH |
| Capital adequacy |
MFS Regulations 2022 |
Minimum paid-up capital BDT 450 million for MFS (bank-led model) |
ASSESS |
Verify capital structure |
MEDIUM |
Bangladesh-Specific Notes:
- Data localisation is non-negotiable in Bangladesh — on-site infrastructure inspection is conducted by Bangladesh Bank
- Two-phase licensing means infrastructure MUST be built before operational licence is granted
- BFIU compliance is separate from Bangladesh Bank payment licensing and adds additional infrastructure requirements for transaction monitoring
13.3 Nepal — Nepal Rastra Bank (NRB)
Governing Legislation:
- Payment and Settlement Act, 2019 (2075 BS)
- NRB PSO/PSP licensing directives
- Data Center and Cloud Services (Operation and Management) Directive, 2081 (2024)
- NRB Cyber Resilience Guidelines
- NRB IT Guidelines
Infrastructure Requirements:
| Requirement |
Regulation Source |
Infrastructure Control |
Current Status |
Gap |
Priority |
| Data centre approval |
Data Center Directive 2081 |
Data MUST be stored in centres approved by Nepal's IT Department; centres MUST comply with the Directive |
ASSESS |
Identify approved data centres in Nepal |
CRITICAL |
| PCI DSS compliance |
NRB IT Guidelines |
Licensed institutions MUST adhere to PCI DSS standards |
ASSESS |
PCI DSS certification required |
CRITICAL |
| ISO 27000 certification |
NRB IT Guidelines |
Financial institutions involved in payment processing require ISO 27001 certification |
ASSESS |
ISO 27001 audit and certification needed |
HIGH |
| Cyber resilience |
NRB Cyber Resilience Guidelines |
Governance, cyber risk culture, training, resilience testing, recovery planning |
ASSESS |
Formalise cyber resilience programme |
HIGH |
| EMV compliance |
NRB IT Guidelines |
EMV and EMV Contactless standards compliance for card processing |
ASSESS |
Verify EMV compliance for Cards product |
HIGH |
| Licensing requirements |
Payment and Settlement Act 2019 |
Prior NRB approval/licence for PSO/PSP operations; 12-18 month process |
ASSESS |
Verify licence status |
CRITICAL |
| Capital requirements |
NRB directives |
NPR 150M (domestic PSP) / NPR 250M (foreign investment PSP) |
ASSESS |
Verify capital compliance |
MEDIUM |
| Technical assessment |
NRB licensing |
NRB assesses system security, reliability, and technical standards compliance |
ASSESS |
Prepare for technical assessment |
HIGH |
Nepal-Specific Notes:
- Nepal has explicit data centre approval requirements — data MUST reside in government-approved centres within Nepal
- PCI DSS and ISO 27001 are explicitly mandated (not merely recommended) for payment processors
- The 12-18 month licensing timeline means infrastructure investment precedes revenue
13.4 Iraq — Central Bank of Iraq (CBI)
Governing Legislation:
- Electronic Payment Services Regulation, 2024 (replaced 2014 framework)
- Central Bank of Iraq circulars on digital banking and payment systems
- AML/CFT regulations (aligned with FATF recommendations)
Infrastructure Requirements:
| Requirement |
Regulation Source |
Infrastructure Control |
Current Status |
Gap |
Priority |
| CBI licensing |
Electronic Payment Services Regulation 2024 |
Licence required from CBI for electronic payment services; 10-year licence validity |
ASSESS |
Verify licence status |
CRITICAL |
| Minimum capital |
Electronic Payment Services Regulation 2024 |
Minimum IQD 10 billion company capital |
ASSESS |
Verify capital compliance |
HIGH |
| Feasibility study |
Electronic Payment Services Regulation 2024 |
3-year feasibility study required covering: economic projections, technical infrastructure, information security, AML systems, dispute resolution |
ASSESS |
Prepare or update feasibility study |
HIGH |
| 5-year record retention |
Electronic Payment Services Regulation 2024 |
All electronic payment transactions and related data retained for minimum 5 years |
ASSESS |
Implement 5-year retention policy |
HIGH |
| Cybersecurity infrastructure |
Electronic Payment Services Regulation 2024, CBI circulars |
Advanced cybersecurity measures to safeguard banking systems; compliance with international standards |
ASSESS |
Cybersecurity posture assessment needed |
HIGH |
| AML/CFT systems |
Electronic Payment Services Regulation 2024 |
Sanctions list screening, transaction monitoring, daily transaction reporting |
ASSESS |
Verify AML system integration |
CRITICAL |
| Business continuity |
CBI circulars |
Business continuity during crises; DR planning |
ASSESS |
DR plan required |
HIGH |
| ISO 20022 alignment |
CBI modernisation programme |
Payment messaging aligned with ISO 20022 standard |
ASSESS |
Evaluate ISO 20022 readiness |
MEDIUM |
Iraq-Specific Notes:
- The 2024 regulation is a significant upgrade from the 2014 framework — verify full compliance with the new requirements
- IQD 10 billion minimum capital (~USD 7.6M) is a substantial requirement
- The 3-year feasibility study requirement includes detailed technical infrastructure and security documentation
- Iraq's financial system is heavily influenced by US sanctions compliance (OFAC) — additional sanctions screening infrastructure may be required
13.5 PCI DSS v4.0.1 (Cards Product)
Standard: PCI DSS v4.0.1 (mandatory as of 31 March 2025)
PCI DSS applies specifically to the Cards product (Visa/Mastercard acquiring). All systems that store, process, or transmit cardholder data are in scope.
| Requirement Area |
PCI DSS Requirement |
Infrastructure Control |
Current Status |
Gap |
Priority |
| Network segmentation |
Req 1: Install and maintain network security controls |
CDE (Cardholder Data Environment) MUST be isolated in a dedicated subnet with strict firewall rules; micro-segmentation recommended |
ASSESS |
Verify CDE isolation |
CRITICAL |
| Secure configuration |
Req 2: Apply secure configurations to all system components |
Hardened OS images, no default credentials, unnecessary services disabled |
ASSESS |
Configuration baseline needed |
HIGH |
| Data protection (stored) |
Req 3: Protect stored account data |
PAN encrypted with AES-256; hash or truncate where possible; encryption keys managed separately from data |
ASSESS |
Verify encryption implementation |
CRITICAL |
| Data protection (transit) |
Req 4: Protect cardholder data with strong cryptography during transmission |
TLS 1.2+ for all cardholder data transmission; no SSL or early TLS |
PARTIAL — mTLS for Cards product |
Verify all transmission paths |
CRITICAL |
| Malware protection |
Req 5: Protect all systems and networks from malicious software |
Anti-malware on all CDE systems; regular scanning |
ASSESS |
Deploy and monitor |
HIGH |
| Secure development |
Req 6: Develop and maintain secure systems and software |
Secure coding practices, vulnerability patching within 30 days (critical) |
ASSESS |
SDLC security review needed |
HIGH |
| Access control |
Req 7 & 8: Restrict access; identify users and authenticate |
MFA mandatory for ALL CDE access (PCI DSS 4.0 requirement); role-based access; unique IDs |
ASSESS |
Implement MFA for all CDE access |
CRITICAL |
| Physical security |
Req 9: Restrict physical access to cardholder data |
Physical access controls for CDE infrastructure (if on-premise) |
N/A (cloud) |
Cloud provider responsibility; verify AWS compliance |
MEDIUM |
| Logging and monitoring |
Req 10: Log and monitor all access to system components and cardholder data |
All CDE access logged; logs tamper-evident; reviewed daily; retained 12 months (3 months immediately accessible) |
ASSESS |
Implement comprehensive CDE logging |
CRITICAL |
| Vulnerability management |
Req 11: Test security of systems and networks regularly |
Internal vulnerability scan quarterly; external ASV scan quarterly; penetration test annually; segmentation test bi-annually |
ASSESS |
Establish scanning programme |
CRITICAL |
| Organisational policies |
Req 12: Support information security with organisational policies and programmes |
Security policy, risk assessment, incident response plan, security awareness training |
ASSESS |
Formalise security programme |
HIGH |
PCI DSS 4.0 New Requirements (Mandatory from March 2025):
| New Requirement |
Description |
Infrastructure Impact |
| Targeted risk analysis |
Customised approach for each requirement based on risk |
Risk analysis documentation for each CDE control |
| MFA everywhere |
MFA for ALL access to CDE (not just remote) |
Deploy MFA for console, SSH, application access to CDE |
| Authenticated vulnerability scanning |
Internal scans must use authenticated scanning |
Scanning tools need credentials for CDE systems |
| Automated log review |
Automated mechanisms to detect security events |
SIEM/OpenSearch with automated alerting rules for CDE |
| Web application firewall |
WAF or equivalent for public-facing web applications |
Cloudflare WAF / KrakenD for card payment endpoints |
| Script management |
Inventory and integrity of payment page scripts |
CSP headers, SRI, script inventory for card entry pages |
| Enhanced encryption |
Disc-level encryption alone is insufficient |
Application-level encryption for stored PAN |
PCI DSS Scoping Notes:
- CDE MUST be clearly defined and documented
- All systems connected to or that could impact the CDE are in scope
- Network segmentation reduces scope — strongly recommended
- Cloudflare and KrakenD processing card data brings them into scope
- Annual PCI DSS assessment (SAQ or ROC depending on transaction volume)
13.6 Compliance Summary Matrix
| Jurisdiction |
Data Localisation |
Incident Reporting SLA |
Record Retention |
Licensing Status |
PCI DSS Required |
| Pakistan |
Required (processing in-country; PDPB 2023 pending) |
TBC (SBP circulars) |
5+ years |
VERIFY |
Yes (Cards) |
| Bangladesh |
Mandatory (DC inspection by Bangladesh Bank) |
TBC |
TBC |
VERIFY |
TBC |
| Nepal |
Mandatory (govt-approved DC only) |
TBC |
TBC |
VERIFY |
Mandatory (NRB directive) |
| Iraq |
TBC (new 2024 regulation) |
TBC |
5 years (minimum) |
VERIFY |
TBC |
| PCI DSS |
N/A |
72 hours (breach notification) |
12 months (3 months immediately accessible) |
N/A |
Yes (Cards) |
| Priority |
Action |
Jurisdictions |
Timeline |
| 1 |
Verify all current licence and authorisation statuses |
All |
Immediate |
| 2 |
Data localisation assessment — where is data stored/processed for each jurisdiction? |
PK, BD, NP |
Q2 2026 |
| 3 |
PCI DSS v4.0.1 gap assessment for Cards product |
Global |
Q2 2026 |
| 4 |
Implement 2-hour incident reporting capability (best practice across all markets) |
All |
Q2 2026 |
| 5 |
Formalise record retention policies meeting all jurisdictional minimums |
All |
Q2 2026 |
| 6 |
DR/BCP documentation and testing |
All (regulatory requirement in most jurisdictions) |
Q2-Q3 2026 |
| 7 |
AML/CFT system verification across all jurisdictions |
All |
Q3 2026 |
| 8 |
ISO 27001 certification (required for Nepal, beneficial for all) |
NP (mandatory), all |
Q3-Q4 2026 |
| 9 |
Prepare for Pakistan PDPB enactment |
PK |
Q3 2026 |
| 10 |
Iraq 2024 regulation full compliance assessment |
IQ |
Q3 2026 |
14. Infrastructure as Code
| Tool |
Pros |
Cons |
Recommendation |
| Terraform |
Industry standard, large ecosystem, HCL is declarative, multi-cloud |
State management complexity, HCL learning curve, BSL licence (OpenTofu as alternative) |
Evaluate |
| Pulumi |
Real programming languages (Go, TypeScript), strong typing, testing |
Smaller ecosystem, less community content, state management similar to Terraform |
Evaluate (strong fit with Go stack) |
| AWS CDK |
Native AWS integration, TypeScript/Go support |
AWS-only (not multi-cloud), CloudFormation under the hood |
Lower priority (multi-cloud needed for Cloudflare) |
| OpenTofu |
Terraform-compatible, open source (MPL 2.0) |
Younger project, smaller team |
Evaluate (if Terraform BSL is a concern) |
Decision required: IaC tool selection is TBC. Recommendation: evaluate Pulumi (Go alignment) and Terraform/OpenTofu (ecosystem breadth) in a spike. Whichever tool is chosen, the standards below apply.
14.2 Repository Structure
infrastructure/
├── modules/ # Reusable modules
│ ├── vpc/ # VPC, subnets, NAT, security groups
│ ├── compute/ # EC2/containers, ASG, ALB
│ ├── database/ # RDS, SurrealDB, ElastiCache
│ ├── observability/ # OpenSearch, Grafana, Jaeger, OTel Collector
│ ├── gateway/ # KrakenD deployment
│ ├── cloudflare/ # DNS, WAF, Workers, Pages, R2
│ └── security/ # WAF rules, security groups, KMS
├── environments/
│ ├── sandbox/ # Sandbox environment configuration
│ ├── dev/ # Dev environment configuration
│ ├── test/ # Test environment configuration
│ └── prod/ # Prod environment configuration
├── policies/ # OPA/Sentinel policies for compliance
└── README.md
14.3 Module Design Principles
- One module per concern: VPC, compute, database, observability are separate modules
- Inputs validated: All module inputs MUST have type constraints and validation rules
- Outputs explicit: Modules MUST export IDs, ARNs, endpoints needed by dependent modules
- No hardcoded values: All environment-specific values passed as variables
- Tagging enforced: Every resource MUST be tagged (see Cost Management section)
- Documentation: Every module MUST have a README with inputs, outputs, and examples
14.4 State Management
| Requirement |
Standard |
| Remote state |
S3 bucket (encrypted, versioned) + DynamoDB table (locking) |
| State per environment |
Separate state file per environment (never shared) |
| State locking |
Mandatory — prevent concurrent modifications |
| State encryption |
AES-256 encryption at rest |
| State access |
Restricted to CI/CD pipeline service account and designated operators |
| State backup |
S3 versioning provides history; cross-region replication for DR |
14.5 Drift Detection
- Drift detection MUST run daily on all environments
- Drift detection MUST run before every deployment
- Any detected drift MUST be reported as a P2 alert
- Drift MUST be resolved before the next planned deployment
- Unplanned manual changes to infrastructure are prohibited
15. CI/CD Pipeline Standards
Jenkins will NOT be used. CI/CD tool is TBC. The standards below are tool-agnostic.
| Tool |
Pros |
Cons |
Status |
| Bitbucket Pipelines |
Native Bitbucket integration, simple YAML config |
Limited compute, caching limitations |
Evaluate (Simpaisa uses Bitbucket) |
| Dagger |
Containerised pipelines, language-native (Go SDK), portable |
Newer, smaller community |
Evaluate (strong fit with Go + AI SDLC) |
| Buildkite |
Fast, self-hosted agents, YAML config, scalable |
Requires agent infrastructure |
Evaluate |
| Woodpecker CI |
Open source, Drone-compatible, container-native |
Smaller community |
Evaluate |
15.2 Pipeline Stages
┌─────┐ ┌──────┐ ┌───────┐ ┌──────────────┐ ┌────────┐ ┌────────┐
│ Lint │ → │ Test │ → │ Build │ → │ Security Scan │ → │ Deploy │ → │ Verify │
└─────┘ └──────┘ └───────┘ └──────────────┘ └────────┘ └────────┘
| Stage |
Activities |
Failure Action |
| Lint |
Code formatting, linting, static analysis |
Block — fix before proceeding |
| Test |
Unit tests, integration tests (with coverage) |
Block — tests must pass |
| Build |
Compile, build container image, generate artefacts |
Block — build must succeed |
| Security Scan |
Dependency vulnerability scan, SAST, secret scanning, container scan |
Block if critical/high findings |
| Deploy |
Deploy to target environment (blue/green or canary) |
Automatic rollback on failure |
| Verify |
Smoke tests, health checks, synthetic transactions |
Automatic rollback if verification fails |
15.3 Quality Gates
| Gate |
Requirement |
Blocks Deployment? |
| Code coverage |
Minimum 80% for new code, 60% overall |
Yes |
| Security vulnerabilities |
Zero critical, zero high (for Prod) |
Yes (Prod), Warning (Dev/Test) |
| Secret scanning |
No secrets detected in code or config |
Yes (all environments) |
| Dependency vulnerabilities |
No known critical CVEs in dependencies |
Yes (Prod) |
| Container scan |
No critical vulnerabilities in container image |
Yes (Prod) |
| Performance regression |
No P95 latency regression > 10% (payment paths) |
Yes (Prod) |
| API contract |
OpenAPI spec validation passes |
Yes (all environments) |
15.4 Artefact Management
| Artefact |
Storage |
Retention |
Naming |
| Container images |
Container registry (evaluate: ECR, Cloudflare Container Registry, or self-hosted) |
90 days for non-production tags, indefinite for production tags |
<service>:<git-sha>-<build-number> |
| Go binaries |
R2/S3 artefact bucket |
90 days |
<service>-<version>-<os>-<arch> |
| IaC plans |
R2/S3 artefact bucket |
365 days |
<environment>-<timestamp>-<git-sha>.plan |
| Test reports |
R2/S3 artefact bucket |
365 days |
<service>-<timestamp>-test-report.xml |
15.5 Deployment Automation
| Requirement |
Standard |
| No manual deployments |
All deployments MUST go through the CI/CD pipeline |
| Reproducible |
Same artefact deployed to all environments (configuration differs, not code) |
| Auditable |
Every deployment logged: who triggered, what version, when, which environment |
| Rollback |
One-click rollback to previous version (< 5 minutes) |
| Deployment windows |
Prod deployments during business hours (UTC+5) unless emergency |
| Feature flags |
Use PostHog feature flags for gradual rollout, not deployment gating |
16. Cost Management
16.1 Tagging Strategy
All AWS and Cloudflare resources MUST have the following tags:
| Tag Key |
Description |
Example Values |
Required |
Environment |
Deployment environment |
sandbox, dev, test, prod |
Yes |
Service |
Service name |
pay-in-service, krakend, grafana |
Yes |
Product |
Product line |
pay-ins, pay-outs, remittances, cards, platform |
Yes |
Owner |
Team or individual responsible |
engineering, platform, security |
Yes |
CostCentre |
Financial cost centre |
TECH-001, SEC-001 |
Yes |
ManagedBy |
IaC tool or manual |
terraform, pulumi, manual |
Yes |
Criticality |
Service tier |
tier-1, tier-2, tier-3, tier-4 |
Yes |
16.2 Budget Alerts
| Alert Level |
Threshold |
Notification |
Action |
| Info |
50% of monthly budget |
Email to engineering lead |
Review spending trend |
| Warning |
75% of monthly budget |
Slack notification to engineering |
Investigate and optimise |
| Critical |
90% of monthly budget |
SMS to CDO + engineering lead |
Immediate cost review |
| Breach |
100% of monthly budget |
Call to CDO |
Emergency cost reduction |
16.3 Reserved Capacity Planning
| Resource |
Strategy |
Review Cadence |
| EC2 instances |
Reserved Instances (1-year) for baseline, On-Demand for burst |
Quarterly |
| RDS |
Reserved Instances (1-year) for all Prod databases |
Annually |
| ElastiCache |
Reserved Nodes for Prod |
Annually |
| Cloudflare |
Enterprise plan (annual commitment) |
Annually |
| Data transfer |
Cloudflare reduces AWS egress; monitor and optimise |
Monthly |
16.4 Cost Optimisation Reviews
| Review |
Frequency |
Owner |
Focus |
| Resource utilisation |
Monthly |
Engineering |
Right-sizing instances, identifying idle resources |
| Data transfer costs |
Monthly |
Engineering |
Optimise cross-AZ and internet egress traffic |
| Reserved Instance coverage |
Quarterly |
CDO + Engineering |
Ensure RI coverage matches usage |
| Architecture cost review |
Quarterly |
CDO |
Evaluate architectural changes for cost impact |
| Vendor negotiation |
Annually |
CDO |
AWS, Cloudflare, ControlPlane.com contract review |
17. Migration Roadmap
Phase Overview
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6
Observability Edge Gateway Identity Compute Data
Q2 2026 Q2-Q3 2026 Q3 2026 Q3-Q4 2026 Q4 2026-Q1 2027 2027
─────────────────────────────────────────────────────────────────────────────────────────►
Phase 1: Observability (Replace CloudWatch)
Timeline: Q2 2026
Priority: CRITICAL — prerequisite for all other phases
| Task |
Description |
Dependencies |
Effort |
| Deploy OpenTelemetry Collector |
Central telemetry pipeline (gateway mode) |
None |
1 week |
| Deploy Grafana |
Dashboards and alerting |
None |
1 week |
| Deploy Prometheus |
Metrics storage |
Grafana |
1 week |
| Deploy Jaeger (or Tempo) |
Distributed tracing |
OpenSearch (for storage) |
1 week |
| Deploy OpenSearch |
Log aggregation and trace storage |
None |
2 weeks |
| Instrument existing services |
Add OTel SDK to Spring Boot services |
OTel Collector |
2-3 weeks |
| Build dashboards |
Per-product, per-channel, infrastructure, SLA |
Grafana + data flowing |
2 weeks |
| Configure alerting |
Alert rules for all P1/P2 scenarios |
Grafana |
1 week |
| Decommission CloudWatch dependency |
Remove CloudWatch alarms, switch to Grafana |
All above complete |
1 week |
| Deploy PostHog |
Product analytics |
None |
1 week |
Success Criteria:
- All services emit traces, metrics, and structured logs via OTel
- End-to-end transaction tracing works for all products
- Grafana dashboards operational for all products
- Alerting functional with correct escalation paths
- CloudWatch no longer primary monitoring tool
Phase 2: Edge (Cloudflare)
Timeline: Q2-Q3 2026
Priority: HIGH
| Task |
Description |
Dependencies |
Effort |
| Migrate DNS to Cloudflare |
Authoritative DNS for all domains |
None |
1 week |
| Enable Cloudflare CDN |
Cache static assets, configure cache rules |
DNS migration |
1 week |
| Configure Cloudflare WAF |
Payment API protection rules |
DNS migration |
1 week |
| Deploy Cloudflare Workers |
Geo-routing, rate limiting, header injection |
DNS migration |
2 weeks |
| Migrate static sites to Pages |
Corporate site, developer portal |
DNS migration |
2 weeks |
| Configure R2 buckets |
Merchant reports, transaction receipts |
None |
1 week |
| Implement Authenticated Origin Pulls |
Secure Cloudflare-to-ALB connection |
CDN enabled |
1 week |
| Configure bot management |
Bot detection and challenge rules |
WAF configured |
1 week |
Success Criteria:
- All traffic routes through Cloudflare
- WAF blocking malicious traffic
- Static sites served from Cloudflare Pages
- Origin servers only accessible from Cloudflare IPs
- DDoS protection active
Phase 3: API Gateway (KrakenD)
Timeline: Q3 2026
Priority: CRITICAL
| Task |
Description |
Dependencies |
Effort |
| Deploy KrakenD to Test |
Initial deployment with basic configuration |
Phase 1 (observability) |
1 week |
| Define API specifications |
OpenAPI 3.1 specs for all endpoints |
None |
2 weeks |
| Configure auth verification |
JWT validation, API key verification |
None |
1 week |
| Configure rate limiting |
Per-merchant, per-product, per-endpoint limits |
None |
1 week |
| Configure error standardisation |
RFC 9457 error responses |
None |
1 week |
| Deploy to Sandbox |
Merchant-facing test environment |
Test deployment stable |
1 week |
| Merchant migration (phased) |
Migrate merchants to gateway-fronted endpoints |
Sandbox proven |
4-6 weeks |
| Deploy to Prod |
Production deployment with blue/green |
Merchant migration tested |
1 week |
Success Criteria:
- All API traffic routes through KrakenD
- Rate limiting enforced per merchant
- Auth verification at gateway level
- Standardised error responses
- OpenAPI validation rejecting malformed requests
Phase 4: Identity (ControlPlane.com)
Timeline: Q3-Q4 2026
Priority: HIGH
| Task |
Description |
Dependencies |
Effort |
| ControlPlane.com setup |
Account, organisation, initial configuration |
None |
1 week |
| Workload identity |
Migrate service-to-service auth from IAM to ControlPlane |
Phase 3 (KrakenD) |
2-3 weeks |
| Merchant identity |
Design merchant RBAC model |
None |
1 week |
| KrakenD integration |
JWT issuance and validation via ControlPlane |
Phase 3 + workload identity |
2 weeks |
| SSO for internal tools |
Grafana, OpenSearch, merchant portal via SSO |
ControlPlane setup |
2 weeks |
| Policy-as-code |
Define and test access policies |
All above |
2 weeks |
Phase 5: Compute Modernisation
Timeline: Q4 2026 - Q1 2027
Priority: MEDIUM
| Task |
Description |
Dependencies |
Effort |
| Container platform selection |
Evaluate ECS Fargate vs EKS vs ControlPlane.com |
Phase 4 |
1-2 weeks |
| Deploy Caddy |
Per-service reverse proxy with mTLS |
Container platform |
2 weeks |
| First Go service |
New service built in Go, deployed as container |
Container platform |
4-6 weeks |
| Blue/green deployment |
Implement for Tier 1 services |
Container platform |
2 weeks |
| Canary deployment |
Implement for API Gateway and payment initiation |
Blue/green working |
2 weeks |
| Unikraft evaluation |
Assess Unikraft for security-critical payment processing |
Go service proven |
4 weeks |
Phase 6: Data Infrastructure
Timeline: 2027
Priority: MEDIUM
| Task |
Description |
Dependencies |
Effort |
| RDS split |
Separate shared RDS into per-service instances |
None (can start earlier) |
4-6 weeks |
| SurrealDB pilot |
Deploy SurrealDB for first new Go service |
Phase 5 (Go service) |
2-3 weeks |
| NSQ deployment |
Replace Kafka with NSQ for inter-service messaging |
None |
3-4 weeks |
| Meilisearch deployment |
Merchant-facing search in portal |
None |
2 weeks |
| Redis cluster mode |
Enable cluster mode, per-service namespacing |
None (can start earlier) |
1-2 weeks |
Migration Risk Register
| Risk |
Impact |
Likelihood |
Mitigation |
| Service disruption during KrakenD rollout |
HIGH |
Medium |
Blue/green deployment, gradual merchant migration, instant rollback |
| Cloudflare outage impacts all services |
HIGH |
Low |
Document emergency bypass procedure; monitor Cloudflare status |
| Data loss during RDS split |
CRITICAL |
Low |
Extensive testing in Test environment; point-in-time recovery enabled; rollback plan |
| ControlPlane.com integration delays |
MEDIUM |
Medium |
Keep existing auth as fallback; phased migration |
| Compliance issues with new infrastructure |
HIGH |
Medium |
Engage regulators early; legal review of each technology change |
| Team skill gap (Go, new tooling) |
MEDIUM |
High |
Training programme; gradual adoption; AI SDLC augmentation |
18. Appendix: Infrastructure Controls Checklist
Use this checklist for infrastructure reviews and compliance audits.
A. Network Security
| # |
Control |
Required By |
Status |
| N-01 |
All public endpoints behind Cloudflare (no direct origin access) |
Security standard |
☐ |
| N-02 |
ALB accepts traffic only from Cloudflare IP ranges |
Security standard |
☐ |
| N-03 |
Security groups follow least-privilege (no 0.0.0.0/0 inbound) |
PCI DSS, all regulators |
☐ |
| N-04 |
NACLs configured as defence in depth |
Security standard |
☐ |
| N-05 |
VPC flow logs enabled and exported to OpenSearch |
PCI DSS, audit requirement |
☐ |
| N-06 |
No public IP addresses on application or database instances |
Security standard |
☐ |
| N-07 |
CDE network segment isolated (Cards product) |
PCI DSS 4.0 |
☐ |
| N-08 |
DDoS protection active (Cloudflare) |
All regulators |
☐ |
| N-09 |
WAF rules configured for payment API protection |
PCI DSS, security standard |
☐ |
| N-10 |
DNS DNSSEC enabled |
Security standard |
☐ |
B. Encryption
| # |
Control |
Required By |
Status |
| E-01 |
TLS 1.2+ on all external connections |
PCI DSS 4.0, all regulators |
☐ |
| E-02 |
TLS 1.3 preferred where supported |
Security standard |
☐ |
| E-03 |
mTLS for all service-to-service communication |
Security standard |
☐ |
| E-04 |
Database encryption at rest (AES-256) |
PCI DSS, all regulators |
☐ |
| E-05 |
S3/R2 bucket encryption enabled |
Security standard |
☐ |
| E-06 |
PAN encrypted at application level (not just disc) |
PCI DSS 4.0 |
☐ |
| E-07 |
Encryption keys managed in KMS (separate from data) |
PCI DSS 4.0 |
☐ |
| E-08 |
Certificate auto-renewal configured |
Operational |
☐ |
| E-09 |
No SSL or early TLS anywhere |
PCI DSS 4.0 |
☐ |
C. Access Control
| # |
Control |
Required By |
Status |
| A-01 |
MFA enabled for all CDE access |
PCI DSS 4.0 |
☐ |
| A-02 |
MFA enabled for all infrastructure access |
Security standard, all regulators |
☐ |
| A-03 |
No shared accounts or credentials |
PCI DSS, security standard |
☐ |
| A-04 |
Service accounts use workload identity (no static credentials) |
Security standard |
☐ |
| A-05 |
Quarterly access review completed |
PCI DSS, all regulators |
☐ |
| A-06 |
Privileged access logged and alerted |
PCI DSS, all regulators |
☐ |
| A-07 |
Break-glass procedure documented and tested |
Operational |
☐ |
| A-08 |
Terminated employee access revoked within 24 hours |
PCI DSS, all regulators |
☐ |
D. Logging and Monitoring
| # |
Control |
Required By |
Status |
| L-01 |
Structured JSON logging on all services |
Observability standard |
☐ |
| L-02 |
Trace ID propagated end-to-end |
Observability standard |
☐ |
| L-03 |
CDE access logs tamper-evident |
PCI DSS 4.0 |
☐ |
| L-04 |
Log retention meets jurisdictional requirements (7 years max) |
PK, BD, NP, IQ, EG regulators |
☐ |
| L-05 |
Automated log review for security events |
PCI DSS 4.0 |
☐ |
| L-06 |
Alerting configured for all P1/P2 scenarios |
Operational |
☐ |
| L-07 |
Dashboards operational for all products |
Operational |
☐ |
| L-08 |
No sensitive data in logs (PAN, CVV, PIN, full CNIC) |
PCI DSS, PDPA |
☐ |
| L-09 |
Audit trail for all infrastructure changes |
All regulators |
☐ |
E. Backup and Recovery
| # |
Control |
Required By |
Status |
| R-01 |
Automated daily backups for all databases |
All regulators |
☐ |
| R-02 |
Backup restore tested monthly |
DR standard |
☐ |
| R-03 |
RPO/RTO defined per service tier |
DR standard |
☐ |
| R-04 |
DR runbooks documented |
All regulators, DR standard |
☐ |
| R-05 |
DR exercise conducted bi-annually |
DR standard |
☐ |
| R-06 |
Backups encrypted |
PCI DSS, security standard |
☐ |
| R-07 |
Backups stored in different location from primary |
DR standard |
☐ |
F. Compliance
| # |
Control |
Required By |
Status |
| C-01 |
Data localisation requirements met per jurisdiction |
PK, BD, NP |
☐ |
| C-02 |
Incident reporting capability (2-hour internal SLA) |
All |
☐ |
| C-03 |
Transaction record retention (minimum 5 years) |
PK, IQ, PCI DSS |
☐ |
| C-04 |
PCI DSS v4.0.1 assessment current |
PCI DSS |
☐ |
| C-05 |
AML/CFT transaction monitoring operational |
All jurisdictions |
☐ |
| C-06 |
Sanctions screening integrated |
All jurisdictions (especially IQ) |
☐ |
| C-07 |
Regulatory technology change approvals obtained |
PK (SBP), BD, NP |
☐ |
| C-08 |
ISO 27001 certification (required for Nepal) |
NP (NRB) |
☐ |
| C-09 |
Annual PCI DSS assessment scheduled |
PCI DSS |
☐ |
| C-10 |
Quarterly vulnerability scanning programme |
PCI DSS 4.0 |
☐ |
Document Control
| Version |
Date |
Author |
Changes |
| 1.0.0 |
2026-04-03 |
CDO (AI SDLC) |
Initial version — AI SDLC prototype and showcase |
Review Schedule: Quarterly (next review: Q3 2026)
Distribution: Architecture & Engineering Leadership
This document was generated as part of the Simpaisa AI SDLC prototype. All compliance information should be verified with legal counsel and regulatory advisors in each jurisdiction.