W-10: Engineering Ways of Work¶
| Field | Value |
|---|---|
| Document | W-10 |
| Title | Engineering Ways of Work |
| Status | Draft |
| Owner | CTO (Acting) |
| Created | 2026-04-05 |
| Review | Quarterly |
| Depends On | W-01 (Company Operating Rhythm), STD-GOV-124 (ARB Charter), STD-GOV-125 (Technical Debt Management), GIT-WORKFLOW (Git Workflow & Branch Strategy), Incident Response Playbook |
Purpose¶
Define how Simpaisa's engineering organisation builds, tests, ships, and operates software. This is the single source of truth for engineering process. If it is not in this document, it is not how we work. Where current practice diverges from the target, both states are documented explicitly.
This document applies to all 38 engineers across all five Kanban teams.
Team Structure¶
CTO (Acting) — Saqlain Raza
├── Pay-In Team (6 engineers)
│ └── Team Lead
│ ├── 2 × Senior Engineers (Go / Java)
│ ├── 2 × Engineers
│ └── 1 × Junior Engineer
│
├── Pay-Out Team (6 engineers)
│ └── Team Lead
│ ├── 2 × Senior Engineers (Go / Java)
│ ├── 2 × Engineers
│ └── 1 × Junior Engineer
│
├── Portal Team (5 engineers)
│ └── Team Lead
│ ├── 1 × Senior Engineer (React / Go)
│ ├── 2 × Engineers
│ └── 1 × Junior Engineer
│
├── DevOps / Infra Team (5 engineers)
│ └── Team Lead
│ ├── 2 × Senior Engineers (Terraform / K8s)
│ ├── 1 × Engineer
│ └── 1 × Junior Engineer
│
└── SQA Team (4 engineers, shared across all teams)
└── SQA Lead
├── 2 × QA Engineers
└── 1 × QA Automation Engineer
Headcount: ~32 in delivery teams + CTO + team leads + SQA lead ≈ 38
Reporting: All team leads report to the CTO (Acting). The CTO reports to the CDO. Product direction comes from the CPO via Product Managers embedded with each delivery team.
1. Kanban Cadence and Ceremonies¶
Flow model: Continuous flow with WIP limits. No fixed sprints. Work is pulled from the backlog as capacity becomes available.
WIP limits: Maximum 2 items per engineer in progress at any time. If at limit, finish something before starting something new. WIP limits are enforced on the Jira board.
| Ceremony | When | Duration | Attendees | Purpose |
|---|---|---|---|---|
| Weekly Planning | Monday AM | 1 hour | Kanban team + Product Manager | Replenish the board. Pull highest-priority items. Review and adjust WIP limits. |
| Daily Stand-up | Every day, 10:00 local | 15 min (hard stop) | Kanban team | Focus on blocked items and WIP. Not a status report — unblock, then move on. |
| Weekly Demo | Friday AM | 30 min | Kanban team + stakeholders + Product Manager | Show what shipped this week. Gather feedback. No slides. |
| Fortnightly Retrospective | Every other Friday PM | 45 min | Kanban team only (no managers unless invited) | What went well, what to improve, agree max 3 action items. Track action completion. |
| Backlog Refinement | Wednesday PM | 1 hour | Kanban team + Product Manager | Refine upcoming stories. Break epics. Clarify acceptance criteria. |
Time allocation per week (5 working days):
| Activity | Hours/week |
|---|---|
| Ceremonies (1 hr planning + 1.25 hr stand-ups + 30 min demo + ~22 min retro amortised) | ~3 |
| Feature development | ~28 |
| Technical debt (per STD-GOV-125) | ~7 (1 day/week) |
| On-call / incident handling / production support | ~2 |
| Learning / documentation | ~2 |
Ceremony overhead is approximately 3 hours per week — down from ~12 hours per two-week cycle under the previous Scrum cadence. The 20% technical debt allocation from STD-GOV-125 translates to 1 day per week per engineer. The CTO ensures this capacity is protected during weekly planning. If debt work is consistently deferred for feature work, escalate to the CDO.
2. Backlog Management and Story Writing Standards¶
Backlog Tool¶
Jira is the single backlog tool. Every piece of engineering work has a Jira ticket. No work happens without a ticket.
Story Template¶
Every user story or task in Jira must include:
Title: [Clear, concise description]
As a [persona],
I want [capability],
So that [business outcome].
Acceptance Criteria:
- [ ] [Specific, testable criterion]
- [ ] [Specific, testable criterion]
- [ ] ...
API Specification:
- Link to OpenAPI spec or API design doc (if applicable)
Compliance Requirements:
- [ ] PCI-DSS impact: [Yes/No — detail if yes]
- [ ] PII handling: [Yes/No — fields affected]
- [ ] Regulatory: [Market-specific requirements, e.g. SBP directive]
Technical Notes:
- Dependencies, migration steps, feature flag requirements
Estimation: [Story points — Fibonacci: 1, 2, 3, 5, 8, 13]
Rules: - Stories estimated at 13 points or above must be broken down before being pulled into work. - Stories without acceptance criteria are not pulled into work. - Stories involving new API endpoints require an OpenAPI spec link before being pulled. - Stories touching PII or payment data must have the compliance section completed.
Backlog Hygiene¶
- Product Manager owns prioritisation. Team Lead owns technical feasibility.
- Backlog refinement happens weekly (Wednesday session).
- Stories in the "Ready" column have been refined, estimated, and have clear acceptance criteria.
- Stale tickets (untouched for 6 weeks) are reviewed and either reprioritised or closed.
3. Code Review Process¶
Platform¶
All code reviews happen in Bitbucket pull requests. No code reaches main without a pull request.
Approval Requirements¶
| Approval | Who | Required |
|---|---|---|
| Peer review | Any team member at same or higher level | Yes (minimum 1) |
| Lead / Architect review | Team Lead, Platform Lead, or CDO | Yes (minimum 1) |
| Total minimum approvals | 2 |
Both approvals must be from different people. Self-approval is not permitted.
Review SLAs¶
| SLA | Timeframe | Action |
|---|---|---|
| First review | Within 1 business day of PR creation | Reviewer picks up the PR |
| Escalation | At 2 business days without review | Author escalates to Team Lead |
| Hard escalation | At 3 business days without review | Team Lead escalates to CTO |
Architecture Review Triggers¶
A PR requires an Architecture Review Board (ARB) review (per STD-GOV-124) if it involves any of the following:
- New service or microservice creation
- New external dependency or third-party integration
- Database schema changes affecting more than one service
- Changes to the API gateway or routing layer
- New infrastructure components (not just scaling existing ones)
- Changes to authentication or authorisation flows
- Cross-service data flow changes
- Any change touching settlement or reconciliation logic
For ARB-triggerable changes, add the Jira label arch-review-required and notify the CDO. The PR does not merge until ARB approval is recorded.
Review Checklist¶
Reviewers assess against:
- Code compiles and tests pass
- Follows Go or Java coding standards (as applicable)
- No secrets, credentials, or PII in code or comments
- Error handling is explicit (no swallowed errors)
- Logging follows structured logging standard (JSON, correlation IDs)
- API changes are backward-compatible or versioned
- Database migrations are reversible
- Test coverage has not decreased
- Snyk scan is clean (no new critical/high vulnerabilities)
- Feature flag wraps new behaviour (where applicable)
4. Definition of Done¶
A story is Done when all of the following are true:
| # | Criterion | Verified By |
|---|---|---|
| 1 | Code merged to main via approved PR (2 approvals) |
Bitbucket |
| 2 | All unit and integration tests pass | Jenkins CI |
| 3 | Test coverage has not decreased from baseline | SonarQube |
| 4 | Snyk security scan clean (no new critical/high) | Snyk |
| 5 | SonarQube quality gate passed | SonarQube |
| 6 | Deployed to staging environment | Jenkins CD |
| 7 | SQA verification passed on staging | SQA team sign-off in Jira |
| 8 | API documentation updated (if API changed) | OpenAPI spec |
| 9 | Runbook updated (if operational behaviour changed) | Confluence |
| 10 | Product Manager accepts the story | Jira status transition |
A story that is merged but not verified by SQA on staging is not Done. It remains "In Review" in Jira.
5. CI/CD Pipeline Stages¶
Tooling¶
- Source control: Bitbucket
- CI/CD: Jenkins
- Security scanning: Snyk (dependencies), SonarQube (static analysis)
- Artefact registry: Docker images in private registry
- Infrastructure as Code: Terraform
- Configuration management: Ansible
Pipeline Stages¶
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ PR Created │───▶│ Build + Unit│───▶│ SAST + │───▶│ Integration │
│ (Bitbucket) │ │ Tests │ │ Snyk Scan │ │ Tests │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Production │◀──│ Staging │◀──│ Docker │◀──│ Quality │
│ Deploy │ │ Deploy │ │ Image Build │ │ Gate Check │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────────┘
Stage details:
| Stage | Tool | Gate | Failure Action |
|---|---|---|---|
| Build + Unit Tests | Jenkins + Go test / Maven | All tests pass | PR blocked |
| SAST + Snyk Scan | SonarQube + Snyk | No new critical/high findings | PR blocked |
| Integration Tests | Jenkins | All integration tests pass | PR blocked |
| Quality Gate Check | SonarQube | Coverage thresholds met, no new bugs/vulnerabilities | PR blocked |
| Docker Image Build | Jenkins + Docker | Image builds successfully | Pipeline fails |
| Staging Deploy | Jenkins + Terraform/Ansible | Health check passes | Rollback, alert team |
| SQA Verification | Manual + Automated | SQA sign-off | Story stays in review |
| Production Deploy | Jenkins + Terraform/Ansible | Health check + smoke tests pass | Automatic rollback |
Pipeline duration targets: - PR pipeline (build through quality gate): < 15 minutes - Full pipeline through staging: < 30 minutes - Production deploy (including smoke tests): < 15 minutes
Current state: PR pipeline runs approximately 20-25 minutes. Optimisation is a tracked technical debt item.
6. Testing Requirements and Coverage Targets¶
Coverage Targets¶
| Metric | Existing Services (Java) | New Services (Go / Phoenix) |
|---|---|---|
| Unit test coverage | 70% minimum | 80% minimum |
| Integration test coverage | 60% minimum | 70% minimum |
| End-to-end test coverage | Critical paths only | Critical paths + happy paths |
Coverage is enforced by SonarQube quality gates. A PR that decreases coverage below the threshold is blocked.
Testing Pyramid¶
/‾‾‾‾‾‾‾‾‾‾‾‾‾\
/ E2E Tests \ ← Few, slow, high-confidence
/ (SQA-owned) \
/─────────────────────\
/ Integration Tests \ ← Moderate, test service boundaries
/ (Developer-owned) \
/────────────────────────────\
/ Unit Tests \ ← Many, fast, developer-owned
/────────────────────────────────-\
Testing Standards¶
- Unit tests: Written by the developer as part of the story. Committed in the same PR as the feature code. No PR without tests.
- Integration tests: Test service boundaries, database interactions, and external API contracts. Run in CI against ephemeral test infrastructure.
- End-to-end tests: Owned by SQA. Run against staging after deployment. Cover critical payment flows (pay-in, pay-out, settlement, reconciliation).
- Contract tests: Required for all inter-service APIs. Consumer-driven contracts where practical.
- Performance tests: Required for any change to transaction processing paths. Must not degrade P99 latency by more than 10%.
Defect SLAs¶
| Severity | Definition | SLA | Action |
|---|---|---|---|
| Critical | Production payment processing blocked, data loss, security breach | Blocks deployment. Fix immediately. | All hands. War room. See Incident Response Playbook. |
| High | Degraded service, incorrect calculations, compliance gap | 3 business days | Pull into current work immediately. |
| Medium | User-facing bug, non-critical functionality broken | Next weekly planning | Prioritise in next refinement. |
| Low | Cosmetic, minor UX, non-blocking | Backlog | Fix when capacity allows. |
Critical defects halt all deployments until resolved. No exceptions.
7. Deployment Process and Cadence¶
Target Cadence¶
| Team | Target Frequency | Current State |
|---|---|---|
| Portal | Daily | 2-3 times per week |
| Pay-In | Daily | 2-3 times per week |
| Pay-Out | Daily | Weekly |
| DevOps/Infra | As needed | As needed |
Gap: Target is daily deployments for all delivery teams. Portal and Pay-In are close. Pay-Out deploys weekly due to settlement window constraints and additional verification requirements. The path to daily Pay-Out deployments requires decoupling settlement batch processing from deployment — this is a Phoenix programme workstream.
Deployment Process¶
- Merge to main — PR approved and merged.
- CI pipeline runs — Build, test, scan, quality gate.
- Staging deploy — Automatic on successful pipeline.
- SQA verification — Manual and automated tests on staging.
- Production deploy — Triggered manually by Team Lead or Senior Engineer.
- Smoke tests — Automated post-deploy verification.
- Monitor — 30-minute watch window after deploy. Deployer monitors dashboards.
Deployment Rules¶
- Deploy window: 09:00-16:00 local time (Karachi), Monday to Thursday. No Friday deploys unless critical. No weekend deploys unless P1 incident.
- Who can deploy: Team Leads and Senior Engineers. Junior engineers may deploy with a Senior Engineer observing.
- Rollback: If smoke tests fail or error rates spike above baseline + 5% within 30 minutes, roll back immediately. Do not debug in production.
- Feature flags: New user-facing features must be wrapped in feature flags. Deploy dark, then enable via flag. This allows instant rollback without redeployment.
- Database migrations: Run before application deployment. Must be backward-compatible (the old code must work with the new schema). Irreversible migrations require ARB approval.
Current state: Feature flags are being rolled out. Not yet consistently used across all teams. Target: all user-facing features behind flags by end of Q3 2026.
8. Incident Response for Engineering¶
Engineering incident response follows the Incident Response Playbook (see Standards/INCIDENT-RESPONSE-PLAYBOOK.md).
Engineering-Specific Responsibilities¶
| Role | During Incident | After Incident |
|---|---|---|
| On-call engineer | First responder. Triage, assess severity, begin mitigation. | Contribute to Post-Incident Review. |
| Team Lead | Escalation point. Coordinate team response. Decide on rollback. | Own remediation actions. |
| CTO | Escalation for P1/P2. Cross-team coordination. | Review PIR. Approve systemic changes. |
| CDO | Notified for P1. Stakeholder communication. Regulatory impact assessment. | Sign off PIR. Track systemic improvements. |
Severity to Engineering Action¶
| Severity | Engineering Action |
|---|---|
| P1 (Critical) | All deployments halted. War room. All-hands until resolved. |
| P2 (High) | Affected team stops feature work. Fix immediately. |
| P3 (Medium) | Fix prioritised in next weekly planning. |
| P4 (Low) | Added to backlog. |
Post-Incident Review¶
Every P1 and P2 incident requires a blameless Post-Incident Review within 3 business days. The PIR follows the standard in Standards/STD-DEVEX-093-POST-INCIDENT-REVIEW-STANDARDS.md.
PIR outputs: timeline, root cause, contributing factors, remediation actions with owners and deadlines. Remediation actions are tracked in Jira with the label pir-action.
9. On-Call Rotation¶
Structure¶
Each delivery team (Pay-In, Pay-Out, Portal) maintains its own on-call rotation. DevOps/Infra provides a separate infrastructure on-call.
| Rotation | Coverage | Escalation |
|---|---|---|
| Pay-In on-call | 1 engineer, weekly rotation | Team Lead → CTO → CDO |
| Pay-Out on-call | 1 engineer, weekly rotation | Team Lead → CTO → CDO |
| Portal on-call | 1 engineer, weekly rotation | Team Lead → CTO → CDO |
| Infra on-call | 1 engineer, weekly rotation | Team Lead → CTO → CDO |
On-Call Rules¶
- Hours: 24/7 for production systems. Response time: 15 minutes for P1, 30 minutes for P2, next business day for P3/P4.
- Rotation length: 1 week, rotating through all eligible engineers (Senior Engineer and above).
- Handover: Friday end of day. Outgoing on-call briefs incoming on-call on any open issues.
- Compensation: On-call allowance per company policy. Incident response outside business hours compensated as overtime or time-in-lieu.
- No single points of failure: Each rotation must have a minimum of 3 engineers to prevent burnout. If a team cannot staff 3 engineers, the CTO escalates to the CDO.
- On-call engineer is not assigned work at full capacity. Reserve time for on-call duties (reflected in weekly planning by reducing WIP allocation).
Current state: On-call rotations are informal. Pay-In and Pay-Out have de facto on-call engineers but no formal schedule. Target: formalised rotation with PagerDuty (or equivalent) by end of Q2 2026.
10. Technical Debt Management¶
Technical debt management follows STD-GOV-125 (Technical Debt Management). Key points for engineering:
Capacity Allocation¶
20% of engineering capacity is reserved for technical debt reduction. This is not optional. It is not a stretch goal. It is committed capacity.
In practical terms: 1 day per engineer per week is spent on debt reduction items.
Debt Tracking¶
- All technical debt items are logged in Jira with the label
tech-debt. - Each item is categorised: Code, Architecture, Dependency, Testing, Documentation, Infrastructure.
- Each item is scored for impact (1–5) and effort (1–5). High-impact, low-effort items are prioritised.
- The CTO reviews the debt register quarterly with the CDO (per W-01 Operating Rhythm).
Phoenix Programme¶
The largest single debt reduction effort is the Phoenix programme — rewriting legacy Spring Boot Java services in Go. This is tracked as a separate programme with its own milestones, not as ad-hoc tech debt.
- Legacy Java services remain in maintenance mode (critical bug fixes only).
- New features are built in Go.
- Migration follows a strangler fig pattern: new Go services sit behind the API gateway alongside legacy services, taking over routes incrementally.
11. Branch Strategy¶
Branch strategy follows the Git Workflow & Branch Strategy standard (see Standards/GIT-WORKFLOW-STANDARD.md).
Summary¶
- Strategy: Trunk-based development with short-lived feature branches.
- Platform: Bitbucket.
- Main branch:
mainis always deployable. Protected: no direct pushes. - Feature branches:
feature/{ticket-id}-brief-description. Maximum lifetime: 3 days. If a branch lives longer than 3 days, it is too large — break it down. - Bug fix branches:
fix/{ticket-id}-brief-description. - Release branches:
release/v{major}.{minor}.{patch}— created only when a release needs stabilisation.
Branch Rules¶
- Rebase onto
mainbefore merging (no merge commits in feature branches). - Squash commits on merge to
main(one commit per story). - Delete feature branches after merge.
- No long-lived branches apart from
mainand activerelease/*branches.
12. Engineering Metrics¶
We track the four DORA metrics plus Simpaisa-specific operational metrics.
DORA Metrics¶
| Metric | Definition | Current State | Target |
|---|---|---|---|
| Deployment Frequency | How often code is deployed to production | 2-3x/week (Portal, Pay-In), 1x/week (Pay-Out) | Daily (all teams) |
| Lead Time for Changes | Time from commit to production | ~3-5 days | < 1 day |
| Mean Time to Recovery (MTTR) | Time from incident detection to resolution | ~2-4 hours (estimated) | < 1 hour for P1 |
| Change Failure Rate | Percentage of deployments causing incidents | Not currently tracked | < 5% |
Gap: DORA metrics are not systematically measured today. Target: automated DORA metric collection via Jenkins + Jira integration by end of Q3 2026.
Operational Metrics¶
| Metric | Definition | Tracked In | Reviewed |
|---|---|---|---|
| Throughput | Items completed per week per team | Jira | Fortnightly retrospective |
| Cycle time | Time from work started to work done | Jira | Fortnightly retrospective |
| PR review turnaround | Time from PR creation to first review | Bitbucket | Monthly by CTO |
| Build success rate | Percentage of CI builds that pass | Jenkins | Weekly by DevOps |
| Test coverage trend | Unit and integration coverage over time | SonarQube | Monthly by CTO |
| Tech debt ratio | Debt items created vs resolved per week | Jira | Quarterly (CTO + CDO) |
| Incident count by severity | Number of P1-P4 incidents per month | Incident tracker | Monthly (CTO + CDO) |
Metric Review Cadence¶
- Team level: Throughput and cycle time reviewed in fortnightly retrospectives.
- Monthly: CTO reviews PR turnaround, build success rate, coverage trends.
- Quarterly: CTO + CDO review DORA metrics, tech debt ratio, incident trends. Feed into quarterly technical debt review (per W-01).
13. Appendix: Quick Reference¶
What Needs What¶
| I want to... | I need... |
|---|---|
| Merge a PR | 2 approvals (1 peer + 1 lead/architect), all CI gates green |
| Deploy to production | Merged to main, staging verified by SQA, deploy window, Team Lead or Senior Engineer |
| Create a new service | ARB approval (STD-GOV-124) |
| Add a new dependency | Snyk scan clean, ARB approval if external service |
| Change a database schema | Backward-compatible migration, ARB approval if multi-service |
| Ship a user-facing feature | Feature flag, SQA sign-off, Product Manager acceptance |
| Skip tech debt allocation | You cannot. Escalate to CDO if pressured. |
| Deploy on Friday | You need CTO approval and a very good reason |
Key Documents¶
| Document | Location |
|---|---|
| Git Workflow & Branch Strategy | Standards/GIT-WORKFLOW-STANDARD.md |
| Incident Response Playbook | Standards/INCIDENT-RESPONSE-PLAYBOOK.md |
| Technical Debt Management | Standards/STD-GOV-125-TECHNICAL-DEBT-MANAGEMENT.md |
| ARB Charter | Standards/STD-GOV-124-ARCHITECTURE-REVIEW-BOARD-CHARTER.md |
| Post-Incident Review Standards | Standards/STD-DEVEX-093-POST-INCIDENT-REVIEW-STANDARDS.md |
| Service Level Objectives | Standards/STD-DEVEX-090-SERVICE-LEVEL-OBJECTIVES.md |
| Runbook Standards | Standards/STD-DEVEX-092-RUNBOOK-STANDARDS.md |
| Company Operating Rhythm | Standards/WaysOfWork/W-01-COMPANY-OPERATING-RHYTHM.md |
Compliance with This Document¶
This document describes how engineering at Simpaisa works today, with clearly marked targets where practice has not yet reached the standard. The CTO is accountable for adherence. The CDO reviews compliance quarterly.
Deviations from this document are permitted only with CTO approval (for tactical exceptions) or CDO approval (for structural changes). All exceptions are time-boxed and tracked.