Objectives of this role / About the job:
This role is all about collaboration across disciplines to test hypotheses and increase product stability and robustness to customers. As a site reliability engineer, you’ll work in a squad alongside software engineers, QA engineers, product managers, and more. Together you’ll build and support a particular part of FINNOMENA.
We have built a great service for digital financial products, now we need to scale it and make sure it is rock solid. You will be helping us build out the scalable infrastructure and maintain control over the cloud platform and support the services that allow us to tolerate millions of events without any loss of quality of customer experience even during crazy peaks. We are the ideal match for the applicant that seeks autonomy, mastery and purpose from their professional environment and excited to create a massive impact for millions of Thai people with us, to unlock their investment potential via cutting edge innovations.
Responsibilities:
- As a Site Reliability Engineer, you will combine software and infrastructure engineering to build and run large-scale, distributed, fault-tolerant systems
- Participate as a stakeholder in planning the product roadmap, sprint planning, standups
- You will be responsible for the implementation of scalable and resilient infrastructure
- Scale systems sustainably and development velocity through automation
- Build and improve performance of software integration and deployment automation for delivering product to our cloud platform
- Optimize logging, monitoring and alerting systems to ensure the highest availability and uptime
- Investigate and solve underlying ambiguity technical problems in different programming languages
- Troubleshooting networking, compute, and Kubernetes failures
- Practice sustainable incident response, perform root cause analysis, resolve incidents and write precise blameless postmortems
- Hardening security all around
- Continuously improve the way the team operates, and ensure everyone is happy and efficient
- Maintain existing services and tools, augmenting and replacing as required
- Research, design, develop, adopt tools to aid in improving infrastructure reliability
- Documenting systems, particularly tribal knowledge
- Sharing your knowledge and experience with others in the squad
- Respond to emergencies off-work hours
Preferred qualifications:
- 1 year of demonstrated software infrastructure experience as a site reliability engineer / DevOps engineer / DevSecOps engineer / software engineer.
- Infrastructure and application security engineering experience or understanding of information security is a plus.
- You have an understanding of basic networking.
- You have basic knowledge in Linux monitoring, troubleshooting, and administration.
- Competence in at least one programming language. Must be able to write and evaluate code for scalability/runtime. (Ideally, VueJS, Golang, NodeJS)
- Experience with container orchestration platforms such as Kubernetes
- Experience working with at least one DBMS Eg: MSSQL, MySQL
- Experience with monitoring, APM, and logging tooling E.g EFK, Grafana is a plus
- Experience with configuration management tools Eg: Helm is a plus
- Experience with Infrastructure-as-Code tools such as Terraform is a plus
- Experience with Kubernetes Multicluster is a plus