Site Reliability Engineer

Eindhoven  ‐ Vor Ort

Schlagworte

Zuverlässigkeitstechnik Automatisierung Continuous Integration Distributed Computing Amazon Web Services Unterrichten UNIX Clojure Cloud Computing Computerprogrammierung Datenbanken Data Centers Debugging Linux Elasticsearch Incident Response Ingenieurwesen Unternehmerin Skalierbarkeit Python Gesundheitsüberwachung Software Architecture Redis Prozessautomatisierung Prometheus Softwareentwicklung Systemprogrammierung Grafana Kubernetes Low-latency Apache Kafka Kibana Terraform

Beschreibung

About this role:

The site reliability engineer has many responsibilities, including helping the team to design a platform that works across multiple data centers (reliably with low latency); help the team to design and implement software that covers most of the capabilities of software architecture; design and run tests to verify these capabilities, support team in recovering quickly from outages, implement automation tools for CI/CD pipelines, and also help the team to develop good practices around monitoring and incident response.

Who we are looking for:

We are looking for a site reliability engineer that is excited by the opportunity to contribute to the growth of the choreograph create platform. The site reliability engineer must have hands on experience on debugging both automated and human processes, experience in working both in software engineering and in automation. The engineer enjoys teaching and practicing site reliability concepts with the team members, can find a balance in all things, and have experience managing stateful distributed systems.

Role requirements:
  • Degree in Computer Science (or equivalent);
  • 5+ years of experience (or equivalent) in the field of site reliability and programming;
  • Designing and implementing software that improves stability, scalability, availability, and latency; and designing, building, and running tests to verify this;
  • Setting up system health monitoring and automated processes to prevent outages;
  • Defining correcting actions and support in recovering quickly from actual outages;
  • Implementing automation tools for continuous integration/delivery/deployment;
  • Help the team to develop good practices around monitoring and response;
  • Extra points if:
    • Ability to program with one or more high level languages (such as Python, Go or Clojure) with a proven record of accomplishment of automation and an algorithmic approach to solving problems.
    • In-depth knowledge and experience in at least one of: troubleshooting, host-based networking, Linux or UNIX engineering, systems programming, distributed systems, databases, cloud computing, and a desire to learn more.
    • Experience with one or more of the following: Terraform, AWS, Kubernetes, Helm, Prometheus, Grafana, Elasticsearch, Kibana, Redis, Kafka.


Success attributes:
  • High energy and passion for the job;
  • Motivated, self-starter, self-reliant, resilient, and ambitious;
  • Comfortable and thrive in a fast-paced, entrepreneurial, start-up environment


Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.
Start
04/2024
Von
Darwin Recruitment
Eingestellt
12.03.2024
Projekt-ID:
2727491
Vertragsart
Festanstellung
Um sich auf dieses Projekt zu bewerben müssen Sie sich einloggen.
Registrieren