____________________
_____________________
_____________________
____________________
__________________
____________________
__________________
_______________________
_______________________
___________________
Successful candidates will join an on-site team dedicated to maintaining and supporting a managed cross-domain service utilising a broad spectrum of technologies, platforms, and tools. The team leverages site reliability engineering (SRE) tools and practices to continuously verify and enhance the service. Additionally, some code development is required, providing opportunities to learn and support Rust. Responsibilities Problem Diagnosis and Debugging Interpret dashboards and system logs to diagnose complex issues, especially when integrated with external data feeds. Use tools like Wireshark to analyze incoming data and debug system operations. Create representative XML data to replicate error scenarios. Develop and run Java end-to-end (E2E) and performance tests to ensure optimal system performance. Support and Troubleshooting Collaborate with development teams to ensure smooth integration of new releases into production environments. Provide detailed failure scenario descriptions to enable the remote development team to reproduce and fix issues. Conduct root cause analysis and proactive problem-solving, with the authority to deploy changes as needed. Verification and Monitoring Work with the platform team to manage OpenShift system resources and networking, identifying and resolving bottlenecks or network issues. Track performance and availability metrics of deployed services using Influx and Grafana. Configure automated alerts to detect problems before they escalate into incidents. Review application logs and respond to changes in system behavior as they occur. Build and Deploy Code from Multiple Project Teams Maintain and administer a CI pipeline that builds artifacts using Java and Maven. Configure and execute component and service acceptance test suites using Maven. Deploy and configure tested services using Terraform and Ansible, targeting platforms like OpenShift, RHEL/CentOS, and Docker. Configure and deploy third-party appliances and software services. Business-as-Usual Maintenance Utilize automation tools and techniques to minimize manual work. Perform PostgreSQL database housekeeping. Conduct OS-level health checks and patching. Generate and manage system SSL certificates. Key Skills Proficiency in Java Spring Boot microservice development. Experience with OpenShift or Kubernetes. Familiarity with asynchronous messaging platforms like AMQP. Knowledge of infrastructure-as-code tools such as Terraform and Ansible. Experience with S3 object storage tools and techniques. Familiarity with RDBMS platforms such as Oracle. Understanding of XML/XSD. Strong critical thinking and problem-solving abilities. Effective communication and interpersonal skills. Ability to quickly prioritize tasks and adapt to changing priorities during incident response. Experience with Git version control. Desirable Skills Experience with Atlassian tools, including Bamboo. Additional expertise with infrastructure-as-code tools: Terraform, Ansible, and Ansible Vault. Understanding of Docker and containerization.