Operations Lead Engineer (#537)
Tokyo
Full time Permanent
Insurance
Job description
The Operations Lead Engineer is a professional engineer responsible for leading the operation, release management, and automation of business applications.
This role ensures stable daily operations under defined standards while proactively driving improvements in CI/CD processes, automation, monitoring, and overall system reliability.
Key Responsibilities
1. Business Application Operations & Maintenance
- Manage and support day-to-day operations and maintenance of business applications.
- Handle incident response (L1/L2), troubleshooting, root cause analysis, and propose permanent corrective actions.
- Plan and execute change management and release activities using tools such as:
- Jenkins
- Control-M
- GitHub
2. CI/CD & Automation Implementation
- Design, implement, and improve CI/CD pipelines using:
- Jenkins
- Jira
- GitHub Actions
- Automate manual operational tasks including:
- Deployment activities
- Data extraction
- Batch job execution
- Operational workflows
3. Monitoring & Observability Improvement
- Design and maintain monitoring, logging, and alerting solutions using tools such as:
- Prometheus
- Grafana
- Dynatrace
- CloudWatch
- Splunk
- Optimize alerts and drive improvements to reduce MTTR (Mean Time To Recovery).
- Analyze system performance and operational trends.
4. Infrastructure & Platform Operations
- Support application operations and configuration changes on:
- AWS
- OpenShift
- Managed Public IaaS platforms
- Collaborate with infrastructure teams for:
- Network configuration (DNS, Load Balancer, Firewall)
- Certificate management and renewals
- Platform-related activities
5. Knowledge Management & Team Contribution
- Create and maintain operational documentation:
- Runbooks
- KEDB (Known Error Database)
- Wiki documentation
- Participate in Agile practices including:
- Scrum
- Kanban
- Sprint planning
- Task management using Jira
- Provide technical support, mentoring, and knowledge sharing within the team.
6. Stakeholder Collaboration
- Collaborate with product teams and development teams to improve:
- Release processes
- Application quality
- System performance
- Communicate effectively with global teams in Japanese and English through:
- Meetings
- Emails
- Chat communication
Position Highlights
Business Impact Through Operations
This position provides an opportunity to directly contribute to business outcomes through production operations.
Rather than only maintaining systems, the engineer will improve release processes, automation, monitoring, and reliability to deliver business functions faster and more securely.
Exposure to Modern Cloud & DevOps Technologies
The role provides hands-on experience with modern technology stacks including:
- AWS
- OpenShift
- CI/CD tools:
- Jenkins
- GitHub
- SonarQube
- Artifactory
- Monitoring tools:
- Prometheus
- Grafana
- Dynatrace
- CloudWatch
- Splunk
Global Collaboration Environment
Work with:
- International engineers
- External partners
- Global AXA teams
The role involves regular communication in both Japanese and English.
Continuous Improvement Culture
The team encourages engineers to identify problems, suggest improvements, and automate processes instead of only following existing procedures.
Career Growth Opportunity
Opportunity to develop toward future Expert / Manager roles through:
- Leading technical tasks
- Supporting team members
- Driving operational improvements
Required Skills & Experience
Must Have
- 3+ years of experience in one or more areas:
- IT Operations
- SRE
- DevOps
Technical Skills
- Experience with:
- Linux (RHEL-based) or Windows Server administration
- Experience with CI/CD tools such as:
- Jenkins
- Practical experience with Git:
- Branch management
- Pull Requests
- Source control practices
- Automation scripting experience with any of:
- Shell
- Python
- Groovy
- PowerShell
- Node.js
- Basic knowledge/experience with:
- AWS
- Docker
- Kubernetes
- OpenShift
- Experience with monitoring/log analysis tools:
- Prometheus
- Grafana
- CloudWatch
- Splunk
- Dynatrace
Process & Development Practices
- Understanding of:
- CI/CD concepts
- Git workflow
- Software engineering best practices
- Experience working in Agile environments:
- Scrum
- Kanban
- Experience managing tasks using Jira or similar tools.
- Ability to create and maintain:
- Technical documents
- Runbooks
- Operational procedures
Nice to Have
- Experience in financial or insurance industry system operations.
- Experience with job schedulers such as:
- Control-M
- Experience with Infrastructure as Code:
- Terraform
- Knowledge of SRE practices:
- SLO
- Error Budget
- Post-mortems
- Knowledge of additional cloud platforms:
- Azure
- GCP
- Experience working with global teams or vendors.
Language requirement
Working hours
Back to jobs