Deployment & Release Management¶
Standard ID: DEPLOYMENT Version: 1.0 Effective: 2026-04-03 Owner: CDO
1. Environments¶
| Environment | Purpose | Deployment | Access |
|---|---|---|---|
| Sandbox | Merchant integration testing | On push to feature branch | External (merchants) |
| Dev | Internal development and smoke testing | Auto on merge to main |
Internal only |
| Test | QA, regression, performance testing | On release candidate tag | Internal only |
| Prod | Live traffic — PK, BD, NP, IQ, EG | Promoted from Test after approval | External (merchants + consumers) |
2. Deployment Strategies¶
| Strategy | When to Use |
|---|---|
| Blue/green | Default for stateless Go services behind KrakenD |
| Canary | High-risk changes, new payment flows, gateway integrations |
| Rolling | Infrastructure components, configuration changes |
Blue/Green Process¶
- Deploy new version to inactive (green) environment.
- Run health checks and synthetic transactions against green.
- Switch KrakenD routing from blue to green.
- Monitor error rates for 5 minutes.
- If healthy, decommission blue. If not, switch back immediately.
Canary Process¶
- Route 5% of traffic to canary.
- Monitor error rate, latency, and success rate for 15 minutes.
- Promote to 25%, then 50%, then 100% at 15-minute intervals.
- Automated rollback if error rate exceeds baseline by 1%.
3. Zero-Downtime Requirement¶
Mandatory for all payment services. No maintenance windows. All deployments must be zero-downtime.
- Application code must handle graceful shutdown (drain in-flight requests).
- Database migrations must be backward-compatible (see DATABASE-SCHEMA-CHANGE-STANDARD).
- KrakenD configuration changes applied via hot-reload, not restart.
4. Rollback¶
- Automated rollback: triggered if health checks fail within 5 minutes post-deployment.
- Manual rollback: available via pipeline for up to 1 hour post-deployment.
- Rollback plan: documented in every PR for production deployments.
- Database rollback: see DATABASE-SCHEMA-CHANGE-STANDARD for migration rollback procedures.
5. Feature Flags¶
Use PostHog for feature flag management.
| Flag Type | Use Case |
|---|---|
| Percentage rollout | Gradual feature release (5% → 25% → 100%) |
| Per-merchant targeting | Enable features for specific merchants |
| Kill switch | Instantly disable a feature in production |
Rules:
- All new merchant-facing features launch behind a feature flag.
- Flags must be removed within 30 days of full rollout.
- Flag naming: {product}-{feature} (e.g., payin-webhook-v2).
6. Release Cadence¶
| Environment | Cadence |
|---|---|
| Dev | Continuous (every merge to main) |
| Test | Continuous (every release candidate) |
| Prod | Weekly release train (Tuesday 10:00 UTC) or on-demand for critical fixes |
Emergency releases may bypass the weekly train with CDO approval.
7. Change Management¶
| Category | Approval | Example |
|---|---|---|
| Standard | Auto-approved by CI | Dependency updates, lint fixes, documentation |
| Significant | CDO review required | New features, API changes, config changes |
| Emergency | Deploy immediately, post-hoc review within 24h | P1 incident fix, security patch |
8. Pre-Deployment Checklist¶
Before promoting to Prod:
- All tests pass (unit, integration, contract).
- Security scan — no high/critical findings.
- OpenAPI spec linted and published.
- No open P1 bugs against this release.
- Rollback plan documented in PR.
- Database migrations tested on anonymised Prod clone.
- Feature flags configured for gradual rollout.
- Monitoring dashboards and alerts verified.
9. Post-Deployment Verification¶
Within 5 minutes of production deployment:
- Health checks — all service endpoints return 200.
- Synthetic transactions — automated test payments through each product (Pay-In, Pay-Out).
- Error rate — must not exceed pre-deployment baseline by more than 0.5%.
- Latency — P95 latency must not increase by more than 10%.
- OpenTelemetry traces — verify traces flow end-to-end through KrakenD to services.
10. Database Migrations¶
- Applied before application deployment.
- Must be forward-only and backward-compatible.
- See DATABASE-SCHEMA-CHANGE-STANDARD for full process.
11. Artefact Management¶
- Container images tagged with:
{service}:{git-sha}-v{semver}(e.g.,payin-api:a1b2c3d-v2.3.0). - Images stored in private container registry.
- Images are immutable — never overwrite a tag.
- Retention: keep last 20 images per service, plus all release-tagged images.
12. Notifications¶
| Event | Channel |
|---|---|
| Deployment started | Slack #deployments |
| Deployment succeeded | Slack #deployments |
| Deployment failed | Slack #deployments + #incidents |
| Rollback triggered | Slack #incidents + page on-call |