Skip to content

W-10: Engineering Ways of Work

Field Value
Document W-10
Title Engineering Ways of Work
Status Draft
Owner CTO (Acting)
Created 2026-04-05
Review Quarterly
Depends On W-01 (Company Operating Rhythm), STD-GOV-124 (ARB Charter), STD-GOV-125 (Technical Debt Management), GIT-WORKFLOW (Git Workflow & Branch Strategy), Incident Response Playbook

Purpose

Define how Simpaisa's engineering organisation builds, tests, ships, and operates software. This is the single source of truth for engineering process. If it is not in this document, it is not how we work. Where current practice diverges from the target, both states are documented explicitly.

This document applies to all 38 engineers across all five Kanban teams.

Team Structure

CTO (Acting) — Saqlain Raza
├── Pay-In Team (6 engineers)
│   └── Team Lead
│       ├── 2 × Senior Engineers (Go / Java)
│       ├── 2 × Engineers
│       └── 1 × Junior Engineer
│
├── Pay-Out Team (6 engineers)
│   └── Team Lead
│       ├── 2 × Senior Engineers (Go / Java)
│       ├── 2 × Engineers
│       └── 1 × Junior Engineer
│
├── Portal Team (5 engineers)
│   └── Team Lead
│       ├── 1 × Senior Engineer (React / Go)
│       ├── 2 × Engineers
│       └── 1 × Junior Engineer
│
├── DevOps / Infra Team (5 engineers)
│   └── Team Lead
│       ├── 2 × Senior Engineers (Terraform / K8s)
│       ├── 1 × Engineer
│       └── 1 × Junior Engineer
│
└── SQA Team (4 engineers, shared across all teams)
    └── SQA Lead
        ├── 2 × QA Engineers
        └── 1 × QA Automation Engineer

Headcount: ~32 in delivery teams + CTO + team leads + SQA lead ≈ 38

Reporting: All team leads report to the CTO (Acting). The CTO reports to the CDO. Product direction comes from the CPO via Product Managers embedded with each delivery team.

1. Kanban Cadence and Ceremonies

Flow model: Continuous flow with WIP limits. No fixed sprints. Work is pulled from the backlog as capacity becomes available.

WIP limits: Maximum 2 items per engineer in progress at any time. If at limit, finish something before starting something new. WIP limits are enforced on the Jira board.

Ceremony When Duration Attendees Purpose
Weekly Planning Monday AM 1 hour Kanban team + Product Manager Replenish the board. Pull highest-priority items. Review and adjust WIP limits.
Daily Stand-up Every day, 10:00 local 15 min (hard stop) Kanban team Focus on blocked items and WIP. Not a status report — unblock, then move on.
Weekly Demo Friday AM 30 min Kanban team + stakeholders + Product Manager Show what shipped this week. Gather feedback. No slides.
Fortnightly Retrospective Every other Friday PM 45 min Kanban team only (no managers unless invited) What went well, what to improve, agree max 3 action items. Track action completion.
Backlog Refinement Wednesday PM 1 hour Kanban team + Product Manager Refine upcoming stories. Break epics. Clarify acceptance criteria.

Time allocation per week (5 working days):

Activity Hours/week
Ceremonies (1 hr planning + 1.25 hr stand-ups + 30 min demo + ~22 min retro amortised) ~3
Feature development ~28
Technical debt (per STD-GOV-125) ~7 (1 day/week)
On-call / incident handling / production support ~2
Learning / documentation ~2

Ceremony overhead is approximately 3 hours per week — down from ~12 hours per two-week cycle under the previous Scrum cadence. The 20% technical debt allocation from STD-GOV-125 translates to 1 day per week per engineer. The CTO ensures this capacity is protected during weekly planning. If debt work is consistently deferred for feature work, escalate to the CDO.

2. Backlog Management and Story Writing Standards

Backlog Tool

Jira is the single backlog tool. Every piece of engineering work has a Jira ticket. No work happens without a ticket.

Story Template

Every user story or task in Jira must include:

Title: [Clear, concise description]

As a [persona],
I want [capability],
So that [business outcome].

Acceptance Criteria:
- [ ] [Specific, testable criterion]
- [ ] [Specific, testable criterion]
- [ ] ...

API Specification:
- Link to OpenAPI spec or API design doc (if applicable)

Compliance Requirements:
- [ ] PCI-DSS impact: [Yes/No — detail if yes]
- [ ] PII handling: [Yes/No — fields affected]
- [ ] Regulatory: [Market-specific requirements, e.g. SBP directive]

Technical Notes:
- Dependencies, migration steps, feature flag requirements

Estimation: [Story points — Fibonacci: 1, 2, 3, 5, 8, 13]

Rules: - Stories estimated at 13 points or above must be broken down before being pulled into work. - Stories without acceptance criteria are not pulled into work. - Stories involving new API endpoints require an OpenAPI spec link before being pulled. - Stories touching PII or payment data must have the compliance section completed.

Backlog Hygiene

  • Product Manager owns prioritisation. Team Lead owns technical feasibility.
  • Backlog refinement happens weekly (Wednesday session).
  • Stories in the "Ready" column have been refined, estimated, and have clear acceptance criteria.
  • Stale tickets (untouched for 6 weeks) are reviewed and either reprioritised or closed.

3. Code Review Process

Platform

All code reviews happen in Bitbucket pull requests. No code reaches main without a pull request.

Approval Requirements

Approval Who Required
Peer review Any team member at same or higher level Yes (minimum 1)
Lead / Architect review Team Lead, Platform Lead, or CDO Yes (minimum 1)
Total minimum approvals 2

Both approvals must be from different people. Self-approval is not permitted.

Review SLAs

SLA Timeframe Action
First review Within 1 business day of PR creation Reviewer picks up the PR
Escalation At 2 business days without review Author escalates to Team Lead
Hard escalation At 3 business days without review Team Lead escalates to CTO

Architecture Review Triggers

A PR requires an Architecture Review Board (ARB) review (per STD-GOV-124) if it involves any of the following:

  • New service or microservice creation
  • New external dependency or third-party integration
  • Database schema changes affecting more than one service
  • Changes to the API gateway or routing layer
  • New infrastructure components (not just scaling existing ones)
  • Changes to authentication or authorisation flows
  • Cross-service data flow changes
  • Any change touching settlement or reconciliation logic

For ARB-triggerable changes, add the Jira label arch-review-required and notify the CDO. The PR does not merge until ARB approval is recorded.

Review Checklist

Reviewers assess against:

  • Code compiles and tests pass
  • Follows Go or Java coding standards (as applicable)
  • No secrets, credentials, or PII in code or comments
  • Error handling is explicit (no swallowed errors)
  • Logging follows structured logging standard (JSON, correlation IDs)
  • API changes are backward-compatible or versioned
  • Database migrations are reversible
  • Test coverage has not decreased
  • Snyk scan is clean (no new critical/high vulnerabilities)
  • Feature flag wraps new behaviour (where applicable)

4. Definition of Done

A story is Done when all of the following are true:

# Criterion Verified By
1 Code merged to main via approved PR (2 approvals) Bitbucket
2 All unit and integration tests pass Jenkins CI
3 Test coverage has not decreased from baseline SonarQube
4 Snyk security scan clean (no new critical/high) Snyk
5 SonarQube quality gate passed SonarQube
6 Deployed to staging environment Jenkins CD
7 SQA verification passed on staging SQA team sign-off in Jira
8 API documentation updated (if API changed) OpenAPI spec
9 Runbook updated (if operational behaviour changed) Confluence
10 Product Manager accepts the story Jira status transition

A story that is merged but not verified by SQA on staging is not Done. It remains "In Review" in Jira.

5. CI/CD Pipeline Stages

Tooling

  • Source control: Bitbucket
  • CI/CD: Jenkins
  • Security scanning: Snyk (dependencies), SonarQube (static analysis)
  • Artefact registry: Docker images in private registry
  • Infrastructure as Code: Terraform
  • Configuration management: Ansible

Pipeline Stages

┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│  PR Created  │───▶│  Build + Unit│───▶│  SAST +     │───▶│  Integration │
│  (Bitbucket) │    │  Tests       │    │  Snyk Scan  │    │  Tests       │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘
                                                                  │
                                                                  ▼
┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│  Production  │◀──│  Staging     │◀──│  Docker      │◀──│  Quality     │
│  Deploy      │    │  Deploy      │    │  Image Build │    │  Gate Check  │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘

Stage details:

Stage Tool Gate Failure Action
Build + Unit Tests Jenkins + Go test / Maven All tests pass PR blocked
SAST + Snyk Scan SonarQube + Snyk No new critical/high findings PR blocked
Integration Tests Jenkins All integration tests pass PR blocked
Quality Gate Check SonarQube Coverage thresholds met, no new bugs/vulnerabilities PR blocked
Docker Image Build Jenkins + Docker Image builds successfully Pipeline fails
Staging Deploy Jenkins + Terraform/Ansible Health check passes Rollback, alert team
SQA Verification Manual + Automated SQA sign-off Story stays in review
Production Deploy Jenkins + Terraform/Ansible Health check + smoke tests pass Automatic rollback

Pipeline duration targets: - PR pipeline (build through quality gate): < 15 minutes - Full pipeline through staging: < 30 minutes - Production deploy (including smoke tests): < 15 minutes

Current state: PR pipeline runs approximately 20-25 minutes. Optimisation is a tracked technical debt item.

6. Testing Requirements and Coverage Targets

Coverage Targets

Metric Existing Services (Java) New Services (Go / Phoenix)
Unit test coverage 70% minimum 80% minimum
Integration test coverage 60% minimum 70% minimum
End-to-end test coverage Critical paths only Critical paths + happy paths

Coverage is enforced by SonarQube quality gates. A PR that decreases coverage below the threshold is blocked.

Testing Pyramid

        /‾‾‾‾‾‾‾‾‾‾‾‾‾\
       /   E2E Tests     \        ← Few, slow, high-confidence
      /   (SQA-owned)     \
     /─────────────────────\
    /   Integration Tests    \     ← Moderate, test service boundaries
   /   (Developer-owned)      \
  /────────────────────────────\
 /       Unit Tests              \  ← Many, fast, developer-owned
/────────────────────────────────-\

Testing Standards

  • Unit tests: Written by the developer as part of the story. Committed in the same PR as the feature code. No PR without tests.
  • Integration tests: Test service boundaries, database interactions, and external API contracts. Run in CI against ephemeral test infrastructure.
  • End-to-end tests: Owned by SQA. Run against staging after deployment. Cover critical payment flows (pay-in, pay-out, settlement, reconciliation).
  • Contract tests: Required for all inter-service APIs. Consumer-driven contracts where practical.
  • Performance tests: Required for any change to transaction processing paths. Must not degrade P99 latency by more than 10%.

Defect SLAs

Severity Definition SLA Action
Critical Production payment processing blocked, data loss, security breach Blocks deployment. Fix immediately. All hands. War room. See Incident Response Playbook.
High Degraded service, incorrect calculations, compliance gap 3 business days Pull into current work immediately.
Medium User-facing bug, non-critical functionality broken Next weekly planning Prioritise in next refinement.
Low Cosmetic, minor UX, non-blocking Backlog Fix when capacity allows.

Critical defects halt all deployments until resolved. No exceptions.

7. Deployment Process and Cadence

Target Cadence

Team Target Frequency Current State
Portal Daily 2-3 times per week
Pay-In Daily 2-3 times per week
Pay-Out Daily Weekly
DevOps/Infra As needed As needed

Gap: Target is daily deployments for all delivery teams. Portal and Pay-In are close. Pay-Out deploys weekly due to settlement window constraints and additional verification requirements. The path to daily Pay-Out deployments requires decoupling settlement batch processing from deployment — this is a Phoenix programme workstream.

Deployment Process

  1. Merge to main — PR approved and merged.
  2. CI pipeline runs — Build, test, scan, quality gate.
  3. Staging deploy — Automatic on successful pipeline.
  4. SQA verification — Manual and automated tests on staging.
  5. Production deploy — Triggered manually by Team Lead or Senior Engineer.
  6. Smoke tests — Automated post-deploy verification.
  7. Monitor — 30-minute watch window after deploy. Deployer monitors dashboards.

Deployment Rules

  • Deploy window: 09:00-16:00 local time (Karachi), Monday to Thursday. No Friday deploys unless critical. No weekend deploys unless P1 incident.
  • Who can deploy: Team Leads and Senior Engineers. Junior engineers may deploy with a Senior Engineer observing.
  • Rollback: If smoke tests fail or error rates spike above baseline + 5% within 30 minutes, roll back immediately. Do not debug in production.
  • Feature flags: New user-facing features must be wrapped in feature flags. Deploy dark, then enable via flag. This allows instant rollback without redeployment.
  • Database migrations: Run before application deployment. Must be backward-compatible (the old code must work with the new schema). Irreversible migrations require ARB approval.

Current state: Feature flags are being rolled out. Not yet consistently used across all teams. Target: all user-facing features behind flags by end of Q3 2026.

8. Incident Response for Engineering

Engineering incident response follows the Incident Response Playbook (see Standards/INCIDENT-RESPONSE-PLAYBOOK.md).

Engineering-Specific Responsibilities

Role During Incident After Incident
On-call engineer First responder. Triage, assess severity, begin mitigation. Contribute to Post-Incident Review.
Team Lead Escalation point. Coordinate team response. Decide on rollback. Own remediation actions.
CTO Escalation for P1/P2. Cross-team coordination. Review PIR. Approve systemic changes.
CDO Notified for P1. Stakeholder communication. Regulatory impact assessment. Sign off PIR. Track systemic improvements.

Severity to Engineering Action

Severity Engineering Action
P1 (Critical) All deployments halted. War room. All-hands until resolved.
P2 (High) Affected team stops feature work. Fix immediately.
P3 (Medium) Fix prioritised in next weekly planning.
P4 (Low) Added to backlog.

Post-Incident Review

Every P1 and P2 incident requires a blameless Post-Incident Review within 3 business days. The PIR follows the standard in Standards/STD-DEVEX-093-POST-INCIDENT-REVIEW-STANDARDS.md.

PIR outputs: timeline, root cause, contributing factors, remediation actions with owners and deadlines. Remediation actions are tracked in Jira with the label pir-action.

9. On-Call Rotation

Structure

Each delivery team (Pay-In, Pay-Out, Portal) maintains its own on-call rotation. DevOps/Infra provides a separate infrastructure on-call.

Rotation Coverage Escalation
Pay-In on-call 1 engineer, weekly rotation Team Lead → CTO → CDO
Pay-Out on-call 1 engineer, weekly rotation Team Lead → CTO → CDO
Portal on-call 1 engineer, weekly rotation Team Lead → CTO → CDO
Infra on-call 1 engineer, weekly rotation Team Lead → CTO → CDO

On-Call Rules

  • Hours: 24/7 for production systems. Response time: 15 minutes for P1, 30 minutes for P2, next business day for P3/P4.
  • Rotation length: 1 week, rotating through all eligible engineers (Senior Engineer and above).
  • Handover: Friday end of day. Outgoing on-call briefs incoming on-call on any open issues.
  • Compensation: On-call allowance per company policy. Incident response outside business hours compensated as overtime or time-in-lieu.
  • No single points of failure: Each rotation must have a minimum of 3 engineers to prevent burnout. If a team cannot staff 3 engineers, the CTO escalates to the CDO.
  • On-call engineer is not assigned work at full capacity. Reserve time for on-call duties (reflected in weekly planning by reducing WIP allocation).

Current state: On-call rotations are informal. Pay-In and Pay-Out have de facto on-call engineers but no formal schedule. Target: formalised rotation with PagerDuty (or equivalent) by end of Q2 2026.

10. Technical Debt Management

Technical debt management follows STD-GOV-125 (Technical Debt Management). Key points for engineering:

Capacity Allocation

20% of engineering capacity is reserved for technical debt reduction. This is not optional. It is not a stretch goal. It is committed capacity.

In practical terms: 1 day per engineer per week is spent on debt reduction items.

Debt Tracking

  • All technical debt items are logged in Jira with the label tech-debt.
  • Each item is categorised: Code, Architecture, Dependency, Testing, Documentation, Infrastructure.
  • Each item is scored for impact (1–5) and effort (1–5). High-impact, low-effort items are prioritised.
  • The CTO reviews the debt register quarterly with the CDO (per W-01 Operating Rhythm).

Phoenix Programme

The largest single debt reduction effort is the Phoenix programme — rewriting legacy Spring Boot Java services in Go. This is tracked as a separate programme with its own milestones, not as ad-hoc tech debt.

  • Legacy Java services remain in maintenance mode (critical bug fixes only).
  • New features are built in Go.
  • Migration follows a strangler fig pattern: new Go services sit behind the API gateway alongside legacy services, taking over routes incrementally.

11. Branch Strategy

Branch strategy follows the Git Workflow & Branch Strategy standard (see Standards/GIT-WORKFLOW-STANDARD.md).

Summary

  • Strategy: Trunk-based development with short-lived feature branches.
  • Platform: Bitbucket.
  • Main branch: main is always deployable. Protected: no direct pushes.
  • Feature branches: feature/{ticket-id}-brief-description. Maximum lifetime: 3 days. If a branch lives longer than 3 days, it is too large — break it down.
  • Bug fix branches: fix/{ticket-id}-brief-description.
  • Release branches: release/v{major}.{minor}.{patch} — created only when a release needs stabilisation.

Branch Rules

  • Rebase onto main before merging (no merge commits in feature branches).
  • Squash commits on merge to main (one commit per story).
  • Delete feature branches after merge.
  • No long-lived branches apart from main and active release/* branches.

12. Engineering Metrics

We track the four DORA metrics plus Simpaisa-specific operational metrics.

DORA Metrics

Metric Definition Current State Target
Deployment Frequency How often code is deployed to production 2-3x/week (Portal, Pay-In), 1x/week (Pay-Out) Daily (all teams)
Lead Time for Changes Time from commit to production ~3-5 days < 1 day
Mean Time to Recovery (MTTR) Time from incident detection to resolution ~2-4 hours (estimated) < 1 hour for P1
Change Failure Rate Percentage of deployments causing incidents Not currently tracked < 5%

Gap: DORA metrics are not systematically measured today. Target: automated DORA metric collection via Jenkins + Jira integration by end of Q3 2026.

Operational Metrics

Metric Definition Tracked In Reviewed
Throughput Items completed per week per team Jira Fortnightly retrospective
Cycle time Time from work started to work done Jira Fortnightly retrospective
PR review turnaround Time from PR creation to first review Bitbucket Monthly by CTO
Build success rate Percentage of CI builds that pass Jenkins Weekly by DevOps
Test coverage trend Unit and integration coverage over time SonarQube Monthly by CTO
Tech debt ratio Debt items created vs resolved per week Jira Quarterly (CTO + CDO)
Incident count by severity Number of P1-P4 incidents per month Incident tracker Monthly (CTO + CDO)

Metric Review Cadence

  • Team level: Throughput and cycle time reviewed in fortnightly retrospectives.
  • Monthly: CTO reviews PR turnaround, build success rate, coverage trends.
  • Quarterly: CTO + CDO review DORA metrics, tech debt ratio, incident trends. Feed into quarterly technical debt review (per W-01).

13. Appendix: Quick Reference

What Needs What

I want to... I need...
Merge a PR 2 approvals (1 peer + 1 lead/architect), all CI gates green
Deploy to production Merged to main, staging verified by SQA, deploy window, Team Lead or Senior Engineer
Create a new service ARB approval (STD-GOV-124)
Add a new dependency Snyk scan clean, ARB approval if external service
Change a database schema Backward-compatible migration, ARB approval if multi-service
Ship a user-facing feature Feature flag, SQA sign-off, Product Manager acceptance
Skip tech debt allocation You cannot. Escalate to CDO if pressured.
Deploy on Friday You need CTO approval and a very good reason

Key Documents

Document Location
Git Workflow & Branch Strategy Standards/GIT-WORKFLOW-STANDARD.md
Incident Response Playbook Standards/INCIDENT-RESPONSE-PLAYBOOK.md
Technical Debt Management Standards/STD-GOV-125-TECHNICAL-DEBT-MANAGEMENT.md
ARB Charter Standards/STD-GOV-124-ARCHITECTURE-REVIEW-BOARD-CHARTER.md
Post-Incident Review Standards Standards/STD-DEVEX-093-POST-INCIDENT-REVIEW-STANDARDS.md
Service Level Objectives Standards/STD-DEVEX-090-SERVICE-LEVEL-OBJECTIVES.md
Runbook Standards Standards/STD-DEVEX-092-RUNBOOK-STANDARDS.md
Company Operating Rhythm Standards/WaysOfWork/W-01-COMPANY-OPERATING-RHYTHM.md

Compliance with This Document

This document describes how engineering at Simpaisa works today, with clearly marked targets where practice has not yet reached the standard. The CTO is accountable for adherence. The CDO reviews compliance quarterly.

Deviations from this document are permitted only with CTO approval (for tactical exceptions) or CDO approval (for structural changes). All exceptions are time-boxed and tracked.