Skip to content

SDLC Governance Framework

Field Value
Document Number SP-SOP-SDLC-0704
Version 2.0
Date 7 April 2026
Owner Daniel O'Reilly, Chief Digital Officer
Reviewed By CDO
Classification Class 2 (Confidential)
Status Approved

Table of Contents

  1. Executive Summary
  2. Objectives
  3. Scope
  4. Governance Principles
  5. Roles and Responsibilities
  6. Delivery Model
  7. Work Hierarchy
  8. SDLC Lifecycle Phases
  9. Hotfix Process
  10. Release Governance
  11. Jira Workflow
  12. Definition of Done
  13. Testing Strategy
  14. Compliance and Audit
  15. Architecture Decision Records
  16. Exception Handling
  17. Metrics
  18. Continuous Improvement
  19. Supersedes
  20. Approval and Adoption

1. Executive Summary

This document defines the Software Development Lifecycle (SDLC) governance framework for Simpaisa Holdings. It replaces the previous SDLC SOP (SP-SOP-SDLC-2603 v1.0) to align with the CDO Division's new organisational model - seven tribes operating under a Kanban-based continuous delivery model with automated quality gates, engineer-owned testing, and daily deployment capability.

The previous SOP was built around two-week sprints, a separate QA team, manual release approvals through a Change Advisory Board (CAB), and waterfall-style Jira gates. None of these reflect how a modern, AI-native payments engineering organisation should operate. This framework codifies the target state: continuous flow, shift-left security, automated pipelines, and measurable engineering performance through DORA metrics.

This framework applies to all software development, integration, and deployment activities across all Simpaisa products and all seven CDO Division tribes.


2. Objectives

  1. Ship daily. Enable every tribe to deploy to production at least once per business day through automated pipelines, removing manual approval bottlenecks.

  2. Engineers own quality. Eliminate the handoff to a separate QA function. Solution Engineers write code, write tests, and are accountable for the quality of what they ship.

  3. Shift security left. Threat modelling and security review happen before development begins, not as a gate after code is written.

  4. Measure what matters. Track DORA metrics (deployment frequency, lead time for changes, mean time to recovery, change failure rate) as first-class engineering KPIs, reported to the CDO and the board.

  5. No projects. Work is organised as a continuous flow of Initiatives, Epics, and Stories through a Kanban system. There are no projects, no project managers in the delivery path, and no sprint commitments.

  6. Regulatory traceability. Maintain a complete, automated audit trail from Jira ticket to pull request to pipeline execution to production deployment, satisfying PCI-DSS and DFSA operational resilience requirements.

  7. AI-native by default. AI tools (code generation, code review assistance, automated test generation, documentation) are standard engineering tools, not exceptions requiring approval.


3. Scope

3.1 Products

This framework governs the SDLC for all Simpaisa software products:

Product Description
Pay-Ins (Collections) Inbound payment collection flows and partner integrations
Pay-Outs (Disbursements) Outbound payment processing across all corridors
Remittances Cross-border remittance corridor services
Portals Customer-facing and partner-facing portal applications
Cards Card issuance and payment card programme services

3.2 Tribes

All seven CDO Division tribes operate under this framework:

Tribe Head SDLC Role
Solution Engineering CTO Builds and ships product features. Owns code, tests, and delivery.
Platform Engineering Muhammad Mohsin Builds and maintains infrastructure, CI/CD pipelines, databases, and developer tooling.
Data Engineering Head of Data (hire) Builds data pipelines, analytics infrastructure, and reporting systems.
Production Engineering Muhammad Owais Khalid Owns production reliability, monitoring, incident response, and on-call.
Information Security Danish Hamid (CISO) Owns threat modelling, security review, penetration testing, and compliance evidence.
Product Management Rizwan Zafar (CPO) Owns discovery, requirements, acceptance criteria, and prioritisation.
Digital Office CDO Strategic execution, architecture governance, process improvement, and AI adoption.

3.3 Markets

This framework applies uniformly across all Simpaisa markets: Pakistan, Bangladesh, Nepal, Iraq, UAE, and planned expansions (Saudi Arabia, broader MENA, Central Asia). Market-specific regulatory requirements are handled through configuration, feature flags, and corridor-specific acceptance criteria - not through separate SDLCs.

3.4 Tooling

Tool Purpose
Jira Work tracking, Kanban boards, reporting
Bitbucket Source control, pull requests, code review
Bitbucket Pipelines / Jenkins CI/CD pipeline execution
Snyk Dependency vulnerability scanning, container scanning
Confluence Architecture decision records, runbooks, documentation

4. Governance Principles

4.1 No-Project Model

Simpaisa does not run projects. There are no project charters, no project managers in the delivery path, no project timelines with Gantt charts. Work flows continuously through a Kanban system. The unit of delivery is the Story. The unit of strategic alignment is the Initiative.

The Programme Management Office (PMO) function transitions to portfolio-level coordination, dependency management across tribes, and reporting - not day-to-day delivery management.

4.2 AI-Native Engineering

AI tools are standard engineering equipment, not exceptions. Engineers are expected to use AI-assisted code generation, AI-assisted code review, AI-assisted test writing, and AI-assisted documentation as part of their normal workflow. The Digital Office tracks AI adoption metrics and identifies opportunities to embed AI tooling deeper into the SDLC.

AI-generated code is subject to the same quality gates as human-written code: code review, automated testing, security scanning, and PR approval.

4.3 Automation Over Approval

Manual approval gates are the enemy of deployment frequency. This framework replaces manual approvals with automated quality gates wherever possible. The CI/CD pipeline is the authority on whether code is fit to ship. If the pipeline passes, the code can deploy. Human judgement is reserved for architecture decisions, security threat models, and exception handling - not for routine releases.

4.4 Ownership Over Handoff

Each tribe owns its output end-to-end. Solution Engineers own code quality, test coverage, and deployment. Production Engineering owns production reliability. Information Security owns threat models and security assurance. There is no "throw it over the wall" handoff between development and QA, between QA and operations, or between operations and security.

4.5 Transparency by Default

All work is visible in Jira. All code is visible in Bitbucket. All pipeline results are visible in the CI/CD system. All architecture decisions are recorded in Confluence. There are no shadow boards, no offline tracking spreadsheets, and no verbal-only decisions about technical direction.


5. Roles and Responsibilities

5.1 Solution Engineering

Solution Engineering is the primary delivery tribe. Solution Engineers (the engineers formerly organised as Portal, Pay-Ins, and Pay-Outs development teams) are responsible for:

  • Writing production code for all Simpaisa products
  • Writing unit tests, integration tests, and E2E tests for their own code
  • Creating and maintaining CI/CD pipeline configurations for their services
  • Conducting code reviews (every PR requires two approvals from Solution Engineers)
  • Owning the quality of their output - there is no separate QA sign-off
  • Participating in architecture reviews for significant changes
  • Writing and maintaining technical documentation for their services
  • Using AI tools to accelerate development and improve code quality

Tribe Lead accountability: The CTO is accountable for Solution Engineering throughput, code quality, and deployment frequency.

5.2 Platform Engineering

Platform Engineering (formerly DevOps and Infrastructure, plus Database Administration) is responsible for:

  • Building and maintaining CI/CD pipelines (Jenkins, Bitbucket Pipelines)
  • Managing cloud infrastructure (AWS, Cloudflare)
  • Database administration, performance tuning, and backup/recovery
  • Developer tooling and local development environments
  • Container orchestration and deployment automation
  • Infrastructure-as-code management
  • Automated rollback mechanisms
  • Feature flag platform administration

Tribe Lead accountability: Muhammad Mohsin is accountable for pipeline reliability, infrastructure availability, and deployment tooling.

5.3 Data Engineering

Data Engineering is responsible for:

  • Building and maintaining data pipelines
  • Analytics infrastructure and reporting systems
  • Data quality monitoring and governance
  • Business intelligence data models
  • Regulatory reporting data (transaction reporting, AML data feeds)
  • Data residency compliance per market

Tribe Lead accountability: Head of Data Engineering (hire in progress) is accountable for data pipeline reliability and data quality.

5.4 Production Engineering

Production Engineering (formerly SQA and Service Delivery, reoriented) is responsible for:

  • Production monitoring, alerting, and observability
  • Incident response and on-call rotation
  • SLO definition, measurement, and error budget management
  • Post-incident reviews
  • Service delivery and production handover
  • Canary deployment verification
  • Production environment management
  • Runbook creation and maintenance

Note on transition: The former SQA team members transition into Production Engineering with a focus on production reliability, automated test infrastructure, and monitoring - not manual QA. Engineers who previously performed manual testing will be retrained for production operations, automated test framework development, or redeployed based on skills assessment.

Tribe Lead accountability: Muhammad Owais Khalid is accountable for production uptime, MTTR, and incident response effectiveness.

5.5 Information Security

Information Security is responsible for:

  • Threat modelling for new features and services (before development begins)
  • Security architecture review for significant changes
  • Penetration testing (scheduled and ad hoc)
  • Snyk vulnerability management and triage
  • PCI-DSS compliance evidence and audit support
  • DFSA operational resilience evidence
  • SOC operations and security monitoring
  • Cloud security posture management
  • Security incident response

Tribe Lead accountability: Danish Hamid (CISO) is accountable for security posture, vulnerability remediation SLAs, and compliance evidence.

5.6 Product Management

Product Management is responsible for:

  • Discovery and validation (working with the Digital Office on spikes)
  • Writing Stories with clear acceptance criteria
  • Prioritising the backlog for each product area
  • Defining acceptance criteria that are testable and unambiguous
  • Stakeholder communication and expectation management
  • Corridor-specific requirements (regulatory, market, partner)
  • Product metrics and outcome measurement

Tribe Lead accountability: Rizwan Zafar (CPO) is accountable for product-market fit, roadmap alignment, and backlog quality.

5.7 Digital Office

The Digital Office is the CDO's strategic execution arm, responsible for:

  • Architecture governance (Architecture Review Board secretariat)
  • SDLC process ownership and continuous improvement
  • AI adoption strategy and tooling evaluation
  • Engineering metrics collection, analysis, and reporting
  • Cross-tribe coordination and dependency management
  • Spike facilitation (architecture blueprints for new initiatives)
  • Ways of Work documentation and training
  • Agile coaching and delivery model maturity

Team Lead accountability: The CDO directly oversees the Digital Office. The Agile Coach (Wajih Aslam) leads day-to-day delivery model support.

5.8 Architecture (Cross-Cutting)

The Principal Architect (Maqsood Ali) and Application Architect (Laique Ali) operate as a cross-cutting function within the Digital Office's governance remit:

  • Architecture Review Board (ARB) membership and facilitation
  • Architecture Decision Record (ADR) review and approval
  • System design review for significant changes
  • Technology standards and patterns governance
  • Integration architecture across products and corridors

6. Delivery Model

6.1 Kanban - Not Sprints

Simpaisa operates a Kanban continuous flow delivery model. There are no sprints. There are no sprint commitments. There is no sprint planning ceremony.

Work flows continuously from Backlog through to Done. The system is governed by WIP (Work in Progress) limits, not by time-boxed iterations.

6.2 Ten-Day Cycles

While delivery is continuous, Simpaisa uses a 10-business-day cadence for planning, reflection, and alignment:

Days Activity
Days 1–9 Continuous delivery. Engineers pull work from the Ready column, develop, review, and deploy.
Day 10 Cycle ceremonies: Retrospective, Demo, and Backlog Grooming.

Day 10 ceremonies:

  • Retrospective (45 minutes per tribe): What went well, what didn't, what to change. Each tribe runs its own retro. Actions are captured as Stories and enter the backlog.
  • Demo (30 minutes, cross-tribe): Each tribe demonstrates what shipped in the cycle. Stakeholders attend. This is the primary visibility mechanism for Product Management and leadership.
  • Backlog Grooming (60 minutes per tribe): Product Manager and tribe leads review upcoming work, refine Stories, confirm acceptance criteria, and ensure the Ready column is stocked for the next cycle.

6.3 No Daily Standups

There are no daily standup meetings. Status updates are asynchronous:

  • Engineers update their Jira tickets daily (move cards, add comments on blockers)
  • Tribe leads review their Kanban board daily and address blockers
  • The Digital Office publishes a weekly cycle report (automated from Jira data) showing throughput, WIP age, and blockers

If a tribe identifies a need for synchronous coordination, it may hold ad hoc huddles - but these are not mandated, not daily, and not a standing ceremony.

6.4 WIP Limits

Each Kanban column has a WIP limit. These are calibrated per tribe based on team size:

Column WIP Limit (per tribe)
Ready 2× team size
In Progress 1× team size
In Review 0.5× team size (rounded up)

When a WIP limit is breached, the tribe must resolve the bottleneck before pulling new work. WIP limit violations are tracked as a metric and reviewed in retrospectives.

6.5 Continuous Flow Principles

  • Pull, don't push. Engineers pull work when they have capacity. Work is not assigned by managers.
  • Finish before starting. Completing in-progress work takes priority over starting new work.
  • Small batches. Stories must be completable within a single 9-day delivery window. If a Story cannot be completed in 9 days, it must be decomposed.
  • Flow efficiency. The ratio of active work time to total lead time is tracked. Target: >40% flow efficiency.

7. Work Hierarchy

7.1 Initiative

An Initiative is a strategic goal that spans multiple Epics and may take one or more quarters to complete. Initiatives are owned by the CDO or CPO and align to company-level OKRs.

Examples: - "Launch Saudi Arabia Pay-Out corridor" - "Achieve PCI-DSS Level 1 certification" - "Migrate primary infrastructure to Cloudflare"

Jira type: Initiative
Owner: CDO or CPO
Lifespan: Typically 1–3 quarters

7.2 Epic

An Epic is a significant body of work that delivers a measurable outcome. Epics replace the concept of "projects." An Epic contains multiple Stories and is typically completed within 2–6 cycles (20–60 business days).

Examples: - "Integrate Saudi Arabia disbursement partner API" - "Implement feature flag infrastructure" - "Build transaction monitoring dashboard"

Jira type: Epic
Owner: Product Manager (with a Solution Engineering lead assigned)
Lifespan: Typically 2–6 cycles
Required fields: Business justification, acceptance criteria, target completion cycle, assigned tribe

7.3 Story

A Story is the fundamental unit of delivery. A Story describes a single, testable increment of value. Stories must be small enough to complete within one 9-day delivery window, including code, tests, review, and deployment.

Examples: - "As a partner, I can view my settlement report filtered by date range" - "Implement Snyk scanning in the Pay-Outs pipeline" - "Add canary deployment step to portal release pipeline"

Jira type: Story
Owner: Product Manager (definition) / Solution Engineer (delivery)
Lifespan: Must complete within one 9-day cycle
Required fields: Acceptance criteria, tribe, Story points (optional - used for forecasting, not commitment)

7.4 Task

A Task is a technical sub-unit of a Story. Tasks are used by engineers to decompose a Story into implementation steps. Tasks are optional - not every Story needs Tasks. Tasks do not appear on the Kanban board; only Stories flow through the board.

Examples: - "Write database migration for new settlement_reports table" - "Add unit tests for date range filter logic" - "Update API documentation for new endpoint"

Jira type: Sub-task
Owner: Solution Engineer
Lifespan: Hours to days (within the parent Story's cycle)


8. SDLC Lifecycle Phases

Lifecycle Flow

flowchart TD
    classDef planning fill:#4A90D9,stroke:#2C5F8A,color:#fff
    classDef technical fill:#7B68EE,stroke:#4B3DB5,color:#fff
    classDef delivery fill:#27AE60,stroke:#1A7A42,color:#fff

    START([New Initiative / Epic]) --> P1

    P1["Phase 1\nDiscovery and Spike"] --> P2
    P2["Phase 2\nRequirements and\nAcceptance Criteria"] --> ARCH_Q

    ARCH_Q{Architecture\nchange?}
    ARCH_Q -- Yes --> P3
    ARCH_Q -- No --> P4

    P3["Phase 3\nArchitecture Review"] --> P4
    P4["Phase 4\nSecurity Review\n(shift-left)"] --> P5

    HOTFIX([Hotfix Trigger]) -.->|Bypass Phases 1-4| P5

    P5["Phase 5\nDevelopment"] --> P6
    P6["Phase 6\nQuality Gates"] --> P7
    P7["Phase 7\nDeployment"] --> P8
    P8["Phase 8\nPost-Deploy\nVerification"] --> P9
    P9["Phase 9\nProduction\nOperations"] --> DONE([Live in Production])

    class P1,P2 planning
    class P3,P4,P5,P6 technical
    class P7,P8,P9 delivery

Jira Workflow

flowchart LR
    classDef active fill:#7B68EE,stroke:#4B3DB5,color:#fff
    classDef blocked fill:#E74C3C,stroke:#A93226,color:#fff
    classDef done fill:#27AE60,stroke:#1A7A42,color:#fff

    BL[Backlog] --> TD[To Do]
    TD --> IP[In Progress]
    IP --> CR[Code Review]
    CR --> TE[Testing]
    TE --> DN[Done]

    IP -.->|Impediment raised| BK[Blocked]
    BK -.->|Impediment cleared| IP

    class IP,CR active
    class BK blocked
    class DN done

Phase 1: Discovery and Spike

Owner: Digital Office + Product Management

When a new Initiative or significant Epic is proposed, the Digital Office conducts a discovery spike to validate feasibility and produce an architecture blueprint.

Activities: - Product Manager defines the business problem and desired outcome - Digital Office assesses technical feasibility, integration complexity, and regulatory implications - Principal Architect or Application Architect produces an architecture blueprint covering: system context, data flows, integration points, security considerations, and infrastructure requirements - AI tools are used to accelerate research and prototype validation - Spike output is a written document (Confluence) with a go/no-go recommendation

Output: Architecture blueprint (Confluence page)
Duration: 1–5 business days depending on complexity
Gate: CDO or CTO approves the blueprint before work enters the backlog

Phase 2: Requirements and Acceptance Criteria

Owner: Product Management

Once the architecture blueprint is approved, the Product Manager decomposes the Epic into Stories with clear, testable acceptance criteria.

Activities: - Product Manager writes Stories in Jira following the standard template: "As a [user], I [action], so that [outcome]" - Each Story includes explicit acceptance criteria written as Given/When/Then statements - Corridor-specific requirements (e.g., Bangladesh regulatory reporting, UAE DFSA evidence) are captured as acceptance criteria, not as separate documents - Product Manager confirms each Story is achievable within a single 9-day cycle - Stories are reviewed with the assigned Solution Engineering lead for feasibility

Output: Jira Stories with acceptance criteria in the Backlog column
Quality bar: No Story enters Ready without acceptance criteria approved by both Product Manager and Solution Engineering lead

Phase 3: Architecture Review

Owner: Architecture (Principal Architect) + Digital Office

Significant changes require Architecture Review Board (ARB) review before development begins. "Significant" is defined as:

  • New service or microservice introduction
  • Changes to data models that affect more than one service
  • New third-party integration or corridor partner
  • Infrastructure architecture changes (e.g., new AWS service, Cloudflare migration)
  • Changes affecting PCI-DSS Cardholder Data Environment (CDE) boundaries

Activities: - Solution Engineering lead presents the proposed approach to the ARB - ARB reviews against architecture principles, security standards, and scalability requirements - If approved, an Architecture Decision Record (ADR) is created documenting the decision - If rejected, the ARB provides specific feedback and the team revises

Output: ADR (Confluence page) and ARB approval recorded in Jira Epic
Attendees: Principal Architect, Application Architect, CISO (or delegate), relevant tribe leads
Cadence: ARB meets weekly (Wednesdays). Urgent reviews can be conducted asynchronously via Confluence with a 48-hour review period.

Phase 4: Security Review

Owner: Information Security

Security review is shift-left: it happens before development begins, not after code is written.

Activities: - Information Security conducts threat modelling for the proposed change using STRIDE methodology - Threat model covers: data flows, trust boundaries, authentication/authorisation, encryption at rest and in transit, PCI-DSS implications, and corridor-specific regulatory requirements - Snyk policy is confirmed for any new dependencies - If the change touches the CDE, a PCI-DSS impact assessment is produced - Security requirements are added as acceptance criteria to the relevant Stories

Output: Threat model document (Confluence), security acceptance criteria added to Stories
Duration: 1–3 business days depending on complexity
Gate: CISO (or delegate) signs off the threat model before Stories move to Ready

Exemptions: Changes that do not introduce new data flows, new integrations, or new infrastructure components may be exempted from formal threat modelling. The CISO maintains a standing exemption list (e.g., UI-only changes with no new API endpoints).

Phase 5: Development

Owner: Solution Engineering

Development is where code is written, tested, and reviewed.

Activities: - Solution Engineer pulls a Story from the Ready column and moves it to In Progress - Engineer writes production code, unit tests, and integration tests - Engineer creates a pull request (PR) in Bitbucket - PR description references the Jira Story key (e.g., PAY-1234) - PR includes: code changes, unit tests, integration tests (where applicable), and updated documentation - Two Solution Engineers (not including the author) review and approve the PR - Reviewers verify: code quality, test coverage, adherence to architecture patterns, security considerations, and acceptance criteria coverage - AI-assisted code review tools may be used to augment (not replace) human review

Constraints: - Every PR requires exactly two approvals before merge - PR author cannot approve their own PR - PRs should be small and focused - one Story per PR where possible - PRs must pass all automated pipeline checks before human review begins (see Phase 6)

Output: Approved, merged PR in Bitbucket
Jira transition: In Progress → In Review (when PR is created) → Done (when PR is merged and pipeline passes)

Phase 6: Automated Quality Gates

Owner: Platform Engineering (pipeline) + Solution Engineering (test content)

Every PR triggers an automated CI pipeline. There is no manual QA sign-off for standard releases. The pipeline is the quality authority.

Pipeline stages:

Stage Tool Criteria
Unit tests JUnit / Jest / pytest (per service) All tests pass. Coverage meets threshold (see Section 13).
Integration tests Service-specific All API contract tests pass.
Static analysis SonarQube or equivalent No new Critical or High issues.
Security scan Snyk No new Critical or High vulnerabilities in dependencies or container images.
Build Maven / npm / Docker Clean build with no errors.
Artifact publish Container registry Immutable, tagged artifact published.

Failure handling: - If any stage fails, the pipeline stops and the PR is blocked from merge - The Solution Engineer who raised the PR is responsible for fixing failures - Pipeline failures are not escalated to management - they are normal engineering workflow - Persistent pipeline failures (>24 hours unresolved) are flagged in the tribe's daily board review

Output: Green pipeline status on the PR in Bitbucket

Phase 7: Deployment

Owner: Platform Engineering (mechanism) + Solution Engineering (decision to deploy)

Simpaisa targets daily deployment capability for all services. Deployment is automated and does not require manual approval, a CAB, or a release manager.

Deployment process: 1. PR is merged to the main branch 2. Merge triggers the deployment pipeline automatically 3. Deployment pipeline executes: - Artifact promotion from CI registry to deployment registry - Database migration execution (if applicable) - Rolling deployment to staging environment - Automated smoke tests against staging - Progressive rollout to production (canary → 10% → 50% → 100%) 4. Feature flags control visibility of new functionality to end users 5. Automated rollback triggers if error rate exceeds threshold during canary phase

Rollback: - Automated rollback completes within 5 minutes of trigger - Rollback is triggered automatically if: error rate increases >2% above baseline, latency P99 exceeds 2× baseline, or health check failures exceed threshold - Manual rollback can be initiated by any Solution Engineer or Platform Engineer via a single command

No CAB: There is no Change Advisory Board. The pipeline is the change authority. If the pipeline passes, the change is approved. This is a deliberate decision to remove the bottleneck that prevented daily deployment under the previous SOP.

Deployment windows: Standard deployments occur during business hours (09:00–17:00 PKT, Monday–Thursday). Friday deployments are permitted but discouraged for non-urgent changes. Weekend and out-of-hours deployments require Production Engineering on-call acknowledgement.

Output: Running code in production behind appropriate feature flags

Phase 8: Post-Deployment Verification

Owner: Production Engineering

After deployment, Production Engineering verifies the release in production.

Activities: - Canary metrics are monitored for 30 minutes post-deployment (error rate, latency, throughput) - SLO compliance is verified against the service's defined SLOs - Automated alerting confirms no new alerts have fired - Key business metrics are spot-checked (transaction success rate, settlement completion) - If verification passes, the canary is promoted to full traffic - If verification fails, automated rollback is triggered and the Solution Engineering team is notified

Monitoring stack: - Application performance monitoring (APM) for latency and error tracking - Infrastructure monitoring for resource utilisation and availability - Business metrics dashboards for transaction volumes and success rates - Log aggregation for error investigation

Output: Verified production deployment or rollback with incident ticket

Phase 9: Production Operations

Owner: Production Engineering

Once code is in production, Production Engineering owns its operational health.

Activities: - 24/7 monitoring via NOC/SOC - On-call rotation for P1/P2 incident response (PagerDuty or equivalent) - Incident response per the Incident Management playbook (separate document) - Post-incident reviews for all P1 and P2 incidents (blameless, focused on systemic improvements) - SLO tracking and error budget management - Capacity planning and scaling recommendations - Runbook maintenance and operational documentation

On-call rotation: - Each tribe with production services maintains an on-call rotation - On-call engineer has authority to execute rollbacks, scale infrastructure, and page additional engineers - On-call handover occurs weekly with a written summary of active issues

Escalation path: 1. On-call engineer (Production Engineering) 2. Tribe lead 3. CTO (for P1 incidents) 4. CDO (for P1 incidents with business impact)


9. Hotfix Process

A hotfix is an emergency code change that bypasses the standard cycle workflow due to urgency. Hotfixes are reserved for:

  • P1 production incidents (service down or severely degraded)
  • Critical security vulnerabilities with active exploitation risk
  • Regulatory compliance breaches requiring immediate remediation

Hotfix workflow:

  1. Trigger: P1 incident declared or critical security vulnerability identified
  2. Branch: Engineer creates a hotfix branch from main (not from a feature branch)
  3. Fix: Engineer develops the minimal fix. Scope is strictly limited to the immediate issue.
  4. Review: One PR approval (reduced from standard two) from a senior Solution Engineer or tribe lead
  5. Pipeline: Automated quality gates run (unit tests, security scan). Integration tests may be skipped with CDO or CTO written approval.
  6. Deploy: Immediate deployment to production, bypassing canary progressive rollout if urgency requires it
  7. Verify: Production Engineering confirms the fix resolves the incident
  8. Post-incident review: Mandatory within 48 hours. Review covers root cause, timeline, fix effectiveness, and systemic improvements.
  9. Backfill: Any skipped tests or documentation are completed within the current cycle as a follow-up Story

Hotfix approvers: CTO, CDO, or CISO (for security hotfixes)
Audit trail: All hotfixes are tagged in Jira with the hotfix label and linked to the incident ticket


10. Release Governance

10.1 What Is a Release?

A release is a merge to the main branch that passes all automated pipeline stages. There is no separate "release" ceremony, no release manager, and no release approval board.

10.2 Automated Gates Replace Manual Approvals

The following automated gates constitute release approval:

Gate Authority
Unit tests pass Pipeline
Integration tests pass Pipeline
Static analysis clean Pipeline
Snyk scan clean (no new Critical/High) Pipeline
Two PR approvals Bitbucket
Build succeeds Pipeline
Staging smoke tests pass Pipeline

If all gates pass, the release is approved. No human signs a release form.

10.3 Feature Flags for Risk Management

Feature flags are the primary mechanism for managing release risk:

  • New features deploy behind flags, disabled by default
  • Product Management controls flag activation (who sees the feature and when)
  • Flags enable instant rollback of individual features without code deployment
  • Flags enable progressive rollout (internal → beta partners → 10% → 50% → 100%)
  • Stale flags (>30 days post-full-rollout) are cleaned up as technical debt Stories

10.4 Release Cadence

There is no fixed release cadence. Releases happen when code is ready. The target is at least one production deployment per tribe per business day. Actual deployment frequency is tracked as a DORA metric.


11. Jira Workflow

11.1 Board Columns

Every tribe operates a single Kanban board with the following columns:

Column Description Entry Criteria
Backlog All Stories not yet ready for development Story created with title and description
Ready Stories refined and ready to be pulled by an engineer Acceptance criteria approved, security review complete (if required), architecture review complete (if required), sized to fit one cycle
In Progress Story actively being developed Engineer has pulled the Story and begun work
In Review PR raised, awaiting code review and pipeline completion PR created in Bitbucket, linked to Jira Story
Done Story complete - code merged, pipeline green, deployed to production All Definition of Done criteria met (see Section 12)

11.2 WIP Limits

WIP limits are enforced per column per tribe. Jira is configured to visually flag WIP breaches. WIP limits are set by the tribe lead in consultation with the Agile Coach and reviewed quarterly.

11.3 No Waterfall Gates

There are no approval-gate columns (e.g., "Awaiting QA Sign-Off," "Awaiting CAB Approval," "Awaiting Release"). Stories flow from Backlog to Done through continuous work and automated checks.

11.4 Jira Hygiene

  • Every Story must have a Jira key
  • Every PR must reference its Jira key in the PR title or description
  • Engineers update Story status daily
  • Stories that have been In Progress for >5 business days without movement are flagged by the Agile Coach
  • Stale Stories (Backlog items untouched for >60 days) are archived quarterly

12. Definition of Done

A Story is Done when all of the following are true:

Criterion Verification
Code complete All acceptance criteria implemented
Unit tests pass Pipeline confirms all unit tests green
Integration tests pass Pipeline confirms API contract tests green (where applicable)
PR approved ×2 Two Solution Engineer approvals recorded in Bitbucket
Pipeline green All automated quality gates pass (see Section 8, Phase 6)
Feature flag configured New user-facing functionality is behind a feature flag
Documentation updated API documentation, runbooks, or user-facing docs updated as needed
No new Critical/High vulnerabilities Snyk scan confirms no new Critical or High findings
Deployed to production Code is running in production (behind feature flag if applicable)
Jira updated Story moved to Done, all fields current

If any criterion is not met, the Story is not Done and remains in its current column.


13. Testing Strategy

13.1 Test Pyramid

Simpaisa follows the test pyramid model. The base of the pyramid (unit tests) is large and fast. The top of the pyramid (E2E tests) is small and targeted.

        /  E2E  \           ← Critical user journeys only
       /----------\
      / Integration \       ← API contracts between services
     /----------------\
    /    Unit Tests     \   ← Every PR, every function
   /____________________\

13.2 Unit Tests

  • Mandatory: Every PR must include unit tests for new or changed logic
  • Coverage target: 80% line coverage per service (enforced in pipeline)
  • Owner: Solution Engineer who writes the code writes the tests
  • Framework: JUnit (Java services), Jest (Node.js/frontend), pytest (Python services)
  • Execution: Every PR, every pipeline run

13.3 Integration Tests

  • Scope: API contract tests verifying inter-service communication
  • Coverage target: All public API endpoints covered
  • Owner: Solution Engineering tribe, maintained per service
  • Execution: Every PR pipeline run (using service stubs/mocks for external dependencies)
  • Corridor-specific: Integration tests include corridor-specific scenarios (e.g., Pakistan RAAST, Bangladesh bKash, UAE partner APIs)

13.4 End-to-End Tests

  • Scope: Critical user journeys only - not comprehensive UI testing
  • Examples: Partner onboarding flow, Pay-Out initiation to settlement, Pay-In collection to reconciliation
  • Owner: Production Engineering (test infrastructure) + Solution Engineering (test content)
  • Execution: Nightly against staging environment; not on every PR
  • Principle: E2E tests verify the system works. Unit and integration tests verify the code works. Do not duplicate coverage.

13.5 No Manual QA Sign-Off

There is no manual QA sign-off step for standard releases. The automated pipeline (unit tests + integration tests + static analysis + security scan) is the quality gate.

Manual exploratory testing may be conducted by Product Management during UAT for high-risk features (new corridors, new product launches) - but this is a risk-management activity, not a release gate. Exploratory testing findings are logged as new Stories, not as PR blockers.

13.6 Performance Testing

  • Load testing is conducted before major corridor launches or infrastructure changes
  • Performance baselines are maintained per service
  • Performance regression is detected via APM monitoring in production, not via pre-release performance gates
  • Load test scripts are maintained in the service repository alongside application code

14. Compliance and Audit

14.1 Traceability Chain

Every change in production has a complete, automated audit trail:

Jira Story → Bitbucket PR → Pipeline Execution → Deployment Record → Production Metrics

This chain is maintained automatically through Jira-Bitbucket integration (PR links to Story key) and pipeline logging. No manual audit trail maintenance is required.

14.2 PCI-DSS

For services within the Cardholder Data Environment (CDE):

  • All code changes to CDE services require security review (Phase 4)
  • Snyk scanning is mandatory with zero tolerance for Critical vulnerabilities
  • Access to production CDE environments is restricted to named Production Engineering personnel
  • All CDE deployments are logged with timestamp, deployer identity, and change reference
  • Quarterly access reviews are conducted by Information Security

14.3 DFSA Operational Resilience

For UAE-regulated services, the SDLC provides evidence of:

  • Change management controls (automated pipeline with documented gates)
  • Incident management (Production Engineering on-call, post-incident reviews)
  • Business continuity (automated rollback, feature flags, multi-region capability)
  • Testing adequacy (test pyramid, coverage metrics, deployment verification)
  • Third-party risk management (Snyk dependency scanning, vendor assessment for new integrations)

14.4 Audit Access

Internal and external auditors are granted read-only access to:

  • Jira boards and Story history
  • Bitbucket repositories and PR history
  • Pipeline execution logs
  • Deployment records
  • Snyk vulnerability reports

Audit access is provisioned by Information Security upon request and reviewed quarterly.


15. Architecture Decision Records

15.1 What Is an ADR?

An Architecture Decision Record (ADR) documents a significant technical decision. ADRs create an institutional memory of why decisions were made, preventing future teams from relitigating settled questions or unknowingly reversing intentional choices.

15.2 When to Write an ADR

An ADR is required when:

  • A new service or microservice is introduced
  • A new technology, framework, or language is adopted
  • A significant data model change affects multiple services
  • An infrastructure architecture change is made (new cloud service, region, provider)
  • A security architecture decision is made (new authentication mechanism, encryption approach)
  • The ARB reviews and approves a significant change

An ADR is not required for:

  • Bug fixes
  • Routine feature development within existing patterns
  • Configuration changes
  • Dependency version updates (unless changing major versions with breaking changes)

15.3 ADR Template

All ADRs are stored in Confluence under the Architecture Decision Records space and follow this template:

# ADR-[number]: [Title]

**Date:** [date]  
**Status:** [Proposed | Accepted | Deprecated | Superseded by ADR-XXX]  
**Deciders:** [names]

## Context
What is the issue that we are seeing that motivates this decision?

## Decision
What is the change that we are proposing and/or doing?

## Consequences
What becomes easier or harder as a result of this decision?

## Alternatives Considered
What other options were evaluated and why were they rejected?

15.4 ARB Review Process

  1. Author creates a draft ADR in Confluence
  2. ADR is submitted to the ARB agenda (weekly Wednesday meeting or async review)
  3. ARB members review and provide comments within 48 hours (async) or discuss in meeting
  4. Principal Architect records the decision (Accepted / Rejected / Deferred)
  5. Accepted ADRs are linked to the relevant Jira Epic

16. Exception Handling

16.1 Regulatory Emergencies

When a regulatory authority (DFSA, SBP, Bangladesh Bank, NRB, CBI) issues a directive requiring immediate system changes:

  • CDO or CPO raises an emergency Initiative in Jira with the regulatory-emergency label
  • Normal cycle workflow is suspended for the affected tribe(s)
  • Architecture review and security review are conducted in parallel (not sequentially)
  • Pipeline gates remain enforced - regulatory emergencies do not bypass automated quality checks
  • CDO briefs the CEO within 24 hours on timeline and impact

16.2 Critical Security Vulnerabilities

When a Critical-severity vulnerability is identified (Snyk alert, penetration test finding, CERT advisory):

  • CISO declares a security incident and assigns an owner
  • Hotfix process (Section 9) is invoked
  • Vulnerability is patched within SLA: Critical = 24 hours, High = 72 hours
  • CISO confirms remediation and updates the vulnerability register
  • Post-incident review is conducted if the vulnerability was exploited or had production impact

16.3 P1 Incidents

P1 incidents (complete service outage or severe degradation affecting customers) follow the Incident Management playbook:

  • Production Engineering on-call engineer is paged immediately
  • Incident commander is assigned (CTO for engineering incidents, CISO for security incidents)
  • All-hands war room (virtual) is convened within 15 minutes
  • Resolution actions may invoke the hotfix process
  • Post-incident review within 48 hours is mandatory
  • Post-incident review actions are tracked as Stories in the relevant tribe's backlog

16.4 Requesting an Exception

Any deviation from this SDLC framework (e.g., deploying without two PR approvals, skipping security review, bypassing pipeline gates) requires written approval from the CDO. Exception requests are logged in Jira with the sdlc-exception label and include:

  • What is being bypassed
  • Why it is necessary
  • What risk it introduces
  • What compensating controls are in place
  • When the exception expires

There are no permanent exceptions. All exceptions have an expiry date and are reviewed.


17. Metrics

17.1 DORA Metrics

The following DORA metrics are tracked per tribe and reported fortnightly to the CDO:

Metric Definition Target
Deployment Frequency Number of production deployments per business day per tribe ≥1 per day
Lead Time for Changes Time from first commit to production deployment <24 hours
Mean Time to Recovery (MTTR) Time from P1 incident declaration to resolution <1 hour
Change Failure Rate Percentage of deployments causing a production incident <5%

17.2 Flow Metrics

Metric Definition Target
Cycle Time Time from In Progress to Done per Story <5 business days
WIP Age Age of the oldest item in In Progress per tribe <7 business days
Throughput Number of Stories completed per cycle per tribe Tracked, no fixed target (tribe-specific)
Flow Efficiency Active work time ÷ total lead time >40%

17.3 Quality Metrics

Metric Definition Target
Unit Test Coverage Line coverage percentage per service ≥80%
Pipeline Pass Rate Percentage of pipeline runs that pass on first attempt >85%
Snyk Critical/High Number of open Critical or High vulnerabilities 0 Critical, <5 High
Escaped Defects Production incidents caused by code changes (per cycle) <2 per tribe per cycle

17.4 Reporting

  • The Digital Office produces a fortnightly engineering metrics report from Jira and pipeline data
  • Metrics are reviewed in the CDO's fortnightly leadership meeting
  • Quarterly metrics are included in the CDO's board report
  • Tribes that consistently miss targets receive focused support from the Digital Office and Agile Coach

18. Continuous Improvement

18.1 Quarterly SDLC Retrospective

Every quarter, the Digital Office facilitates a cross-tribe SDLC retrospective:

  • All tribe leads attend
  • Review: DORA metrics trends, flow metrics trends, quality metrics trends
  • Identify: systemic bottlenecks, process friction, tooling gaps
  • Decide: process changes for the next quarter (captured as updates to this document)
  • Owner: CDO

18.2 Process Change Management

Changes to this SDLC framework follow this process:

  1. Anyone can propose a change by raising a Story with the sdlc-improvement label
  2. The Digital Office reviews proposed changes fortnightly
  3. Significant changes are discussed at the quarterly SDLC retrospective
  4. The CDO approves all changes to this document
  5. Updated versions are published to Confluence and communicated to all tribes

18.3 Maturity Model

The Digital Office maintains an SDLC maturity assessment per tribe, rated across:

  • Deployment automation maturity
  • Test coverage and test pyramid adherence
  • Security integration maturity
  • Monitoring and observability maturity
  • Documentation and ADR discipline

Maturity assessments are conducted quarterly and inform investment priorities and coaching focus.


19. Supersedes

This document supersedes and replaces the following:

Document Version Date Author
SP-SOP-SDLC-2603 v1.0 18 March 2026 Saqlain Raza (CTO)

The superseded document is archived in Confluence under "Superseded Documents" for audit trail purposes. It is no longer operative and must not be followed.

Key reasons for replacement:

  1. Organisational misalignment: v1.0 referenced the old team structure (Portal, Pay-Ins, Pay-Outs, SQA, DevOps). The organisation now operates as seven tribes under the CDO Division.
  2. Delivery model change: v1.0 mandated two-week sprints. The organisation has moved to Kanban with 10-day cycles.
  3. Quality model change: v1.0 relied on a separate SQA team for manual QA sign-off. Engineers now own testing.
  4. Release model change: v1.0 required CAB approval for releases. Automated pipeline gates now replace manual approval.
  5. Jira workflow change: v1.0 used waterfall-style gates in Jira. The Kanban board uses five flow columns.
  6. Language and quality: v1.0 contained US English spellings and multiple typographical errors inconsistent with Simpaisa's documentation standards.

20. Approval and Adoption

20.1 Approval

Role Name Date Signature
Owner / Approver Daniel O'Reilly, CDO 7 April 2026 ____
Reviewed Daniel O'Reilly, CDO 7 April 2026 ____

20.2 Adoption Timeline

Milestone Target Date Owner
Document published to Confluence 7 April 2026 Digital Office
All-hands communication to CDO Division 9 April 2026 CDO
Jira boards reconfigured (Kanban, WIP limits) 14 April 2026 Agile Coach
Pipeline automated gates operational 27 May 2026 Platform Engineering
Daily deploy capability for one tribe 27 May 2026 CTO + Platform Engineering
Feature flag infrastructure operational 27 May 2026 Platform Engineering
DORA metrics reporting live 24 June 2026 Digital Office
All tribes operating under v2.0 27 June 2026 CDO

20.3 Training

The Digital Office and Agile Coach deliver training to all CDO Division staff covering:

  • Kanban principles and WIP limits
  • PR workflow and code review expectations
  • Pipeline gates and how to respond to failures
  • Feature flag usage
  • Incident response and escalation
  • Jira workflow and hygiene standards

Training is completed within 30 days of document publication.

20.4 Version History

Version Date Author Changes
1.0 18 March 2026 Saqlain Raza Initial SDLC SOP (superseded)
2.0 7 April 2026 Daniel O'Reilly Complete replacement. New delivery model, org structure, quality model, release model. See Section 19.

End of document.