Support the reliability and security of Tencent's systems in a cloud environment, manage incidents, develop automation scripts, and collaborate on infrastructure optimization and documentation.
Business UnitWhat the Role EntailsYou will support the reliability, scalability, and security of Tencent’s business-critical systems in a cloud-native environment:
System Monitoring & Incident Response
a. Monitor production systems using tools like Prometheus/Grafana; identify and troubleshoot outages.
b. Participate in on-call rotations to resolve real-time incidents (with mentor guidance).
Automation & DevOps Practices
a. Develop scripts (Python/Shell) to automate deployment, scaling, and recovery tasks.
b. Assist in CI/CD pipeline optimization using GitLab, Docker, and Kubernetes.
Infrastructure Optimization
a. Analyze system performance metrics; propose solutions to enhance reliability and cost efficiency.
b. Support cloud infrastructure management (Tencent Cloud/AWS/Azure).
Collaboration & Documentation
a. Work with cross-functional teams (Dev, Data, Security) to design SLOs/SLIs for critical services.
b. Document system configurations, runbooks, and post-incident reports.Who We Look ForCurrently pursuing a PhD or Master’s in Computer Science, AI, Machine Learning, or related fields
Strong background in deep learning and machine learning fundamentals
Solid programming skills in Python and PyTorch/JAX
- Bachelor’s/Master’s in Computer Science, IT, or related fields (2026 graduation).
OS: Linux/Unix system administration.
Scripting: Python, Shell, or Go.
Networking: TCP/IP, DNS, HTTP basics.
Core Competencies:
- Analytical problem-solving and passion for infrastructure technologies.
- Ability to learn quickly in a fast-paced environment.
- Bilingual Fluency in English & Chinese to deal with both HQ and International stakeholders (written and verbal)
- Basic Mandarin communication skills are required to collaborate with China-based teams and access internal resources.
- Experience with cloud platforms (Tencent Cloud, AWS, or Azure).
- Familiarity with IaC tools (Terraform, Ansible) or observability stacks (ELK, Prometheus).
Knowledge of containerization (Docker/Kubernetes).
Experience with at least one of:
- Vision–language models
- Large language models
- Video understanding/generation
- Reinforcement learning or imitation learning
- Strong problem-solving and research skills
Publications at top conferences (CVPR, ICCV, NeurIPS, ICLR, ACL, etc.)
Experience training large models or working with distributed systems
Experience with multimodal datasets and evaluation benchmarks
Familiarity with:
- Transformer architectures and scaling laws
- Multimodal alignment (contrastive learning, instruction tuning)
- Agent training (RLHF/RLAIF, planning, tool use)
- Synthetic data generation or simulation environments
- Experience with long-context training or memory mechanismsEqual Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Similar Jobs
Fintech • Mobile • Payments • Software • Financial Services
Drive analytics for Payment Operations to improve payment quality and efficiency. Build and maintain data pipelines, attribution models, metrics, and visualisations; partner with stakeholders to diagnose failures and implement data-driven improvements, incorporating AI where helpful.
Top Skills:
DbtLookerPower BIPythonSQLTableau
Fintech • Mobile • Payments • Software • Financial Services
Lead APAC Employee Relations, managing end-to-end complex ER cases, allocating work, coaching advisors, analysing trends, advising leaders, recommending policy changes, collaborating with People Partners and Global ER, and acting as escalation for disputes and tribunals to ensure compliant, inclusive employee outcomes.
Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Design, build, and operate the citizen‑developer platform and underlying Temporal-based workflow infrastructure. Improve developer experience, establish secure SDLC and guardrails, create self-serve tooling, contribute to Kubernetes/GitOps foundation, participate in code reviews and on-call incident response, and document standards and reference architectures for platform users.
Top Skills:
Apache AirflowArgocdAws Step FunctionsBashCadenceCassandraGitopsGoHbaseHelmJavaKubernetesPythonRedisTemporalTerraform
What you need to know about the Singapore Tech Scene
The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.


