Production Engineer¶
Job Title: Production Engineer¶
Department: Production Engineering¶
Reports to: Senior Production Engineer¶
Role Overview:¶
We are seeking a Production Engineer to join Simpaisa Holdings, a cross-border payments and remittances company operating across the Middle East and South Asia. The ideal candidate will be responsible for the deployment, maintenance, monitoring, and support of production payment processing systems and infrastructure, contributing to the organisation's 99.99% uptime targets. This role encompasses supporting observability, incident response, infrastructure automation, and on-call operations for mission-critical payment systems. Strong expertise in cloud infrastructure (AWS), CI/CD practices, and production operations is essential. Experience with agile methodologies and collaborating with development and security teams is also preferable.
Key Responsibilities:¶
- Deploy, configure, and maintain infrastructure components of payment processing services, including servers, containers, databases, message queues, and cloud services (AWS).
- Monitor system performance using observability tools (metrics, logs, traces), identify potential issues, and proactively take steps to prevent service disruptions to payment processing.
- Troubleshoot and resolve production incidents related to payment processing systems in a timely and effective manner, adhering to Service Level Objectives (SLOs) and escalation procedures.
- Perform system upgrades, patching, and other maintenance tasks according to established schedules and change management procedures.
- Implement and maintain CI/CD pipelines (Bitbucket Pipelines) and infrastructure-as-code (Terraform) for automated, repeatable deployments.
- Implement and maintain security and compliance controls in production environments, aligned with PCI-DSS and ISO 27001 requirements.
- Document system configurations, runbooks, and troubleshooting procedures in a clear and comprehensive manner.
- Participate in on-call rotations to provide after-hours support for critical payment processing incidents.
- Collaborate with development, security, and data teams to resolve complex technical challenges and improve service delivery.
- Contribute to post-incident reviews and drive follow-up actions to prevent recurrence.
- Follow established service management processes and procedures, including incident management, problem management, and change management.
- Continuously learn and stay up-to-date with SRE practices, cloud technologies, and reliability engineering best practices.
Required Skills and Experience:¶
- Agile: Awareness of agile principles and how production engineering supports agile development teams.
- Communication: Good written and verbal communication skills with the ability to articulate technical issues and solutions clearly to both technical and non-technical audiences.
- Strategy and Planning: Ability to understand and follow deployment plans, runbooks, and maintenance schedules. Strong organisational skills for managing tasks and priorities.
- Leadership & Influence Skills: Ability to take ownership of production tasks and contribute to the overall stability and availability of payment processing systems.
- Problem-solving and Analytical skills: Strong problem-solving and troubleshooting skills to diagnose and resolve production issues effectively, particularly under incident pressure.
- Production Engineering Expertise: Solid understanding of cloud infrastructure (AWS), container orchestration (Kubernetes, ECS), CI/CD practices, and monitoring/observability tools (Datadog, Prometheus, Grafana, ELK). Familiarity with infrastructure-as-code (Terraform) and scripting languages (Python, Bash). Awareness of ITIL or SRE frameworks.
- Teamwork and Collaboration: Ability to work effectively in a collaborative team environment across geographically distributed teams.
General Requirements for the Role:¶
- Bachelor's Degree in related field: A bachelor's degree in Information Systems, Computer Science, Engineering, or a closely related STEM field is required.
- 3+ years of experience in SRE, DevOps, or production operations: Minimum of 3 years of progressive experience in deploying, maintaining, and supporting production systems and infrastructure.
- Experience with cloud platforms and CI/CD tools: Demonstrated experience in using cloud infrastructure (AWS), CI/CD pipelines, and monitoring tools.
- Proven track record of maintaining stable and reliable services: A verifiable history of contributing to the stable and reliable operation of production systems.
Benefits and Perks:¶
- Competitive salary and comprehensive benefits package.
- Opportunity to work with cutting-edge payments and fintech infrastructure and collaborate with skilled professionals across multiple markets.
- Professional development and training opportunities, including cloud certification sponsorship.
- Inclusive company culture that values diversity and innovation.