Tencent Jobs

Research Intern — Coding LLMs

Tencent

Research Intern — Coding LLMs

Reposted 7 Hours Ago

Be an Early Applicant

In-Office

Singapore, SGP

Internship

In-Office

Singapore, SGP

Internship

Perform research on data-centric methods to improve coding LLMs: data filtering, quality assessment, deduplication, synthetic data generation, and evaluation. Build pipelines for code generation, editing, repo-level reasoning, tool use, and multi-step coding tasks. Run experiments on large-scale code corpora and agentic coding trajectories to analyze how data, models, and training strategies affect coding capabilities.

The summary above was generated by AI

Business UnitTechnology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.What the Role Entails

We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding.

Responsibilities

* Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis.
* Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi-step coding tasks.
* Run experiments to analyze how data, model, and training strategies affect coding capabilities.
* Work with large-scale code corpora, developer activity data, and agentic coding trajectories.

Who We Look For

* Strong programming skills in Python.
* Solid understanding of machine learning and large language models.
* Familiarity with LLM pre-training, mid-training, code models, data curation, evaluation, agents, or tool use.
* Strong experiment design, data analysis, and problem-solving skills.
* Interest in code intelligence, software engineering automation, and agentic coding.

Preferred Qualifications

* Experience with code data processing, GitHub-scale data, synthetic data, LLM evaluation, semantic deduplication, or agentic coding.
* Research experience, publications, or open-source projects in related areas are a plus.

What We Offer

* Access to large-scale real-world coding data and agentic trajectories.
* Rich compute resources and model APIs for fast research iteration.
* Opportunities to work on real-world coding model applications and the full model development loop.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Similar Jobs

BlackRock

Wealth Business, Retail - Multi-Channel, Vice President

An Hour Ago

In-Office

Singapore, SGP

Senior level

Fintech • Information Technology • Financial Services

The Vice President will lead client engagement strategies and develop relationships with wealth clients in Thailand while driving business growth and delivering tailored investment solutions.

Datadog

Recruiter

An Hour Ago

Easy Apply

Hybrid

Singapore, SGP

Easy Apply

Mid level

Artificial Intelligence • Cloud • Security • Software • Cybersecurity

Full-lifecycle recruiter responsible for sourcing, screening, and closing candidates; managing interview processes; negotiating compensation; using Greenhouse to track candidates; training interviewers and ensuring excellent candidate experience.

Top Skills: Greenhouse

Wise

Talent Development Specialist

4 Hours Ago

Hybrid

Singapore, SGP

Mid level

Fintech • Mobile • Payments • Software • Financial Services

Designs and delivers onboarding, leadership and regional learning programs; conducts learning needs analysis; partners with stakeholders to execute squad-level learning strategy; facilitates offsites; coaches and mentors; owns local product data hygiene and content updates; provides data-driven insights and maintains global learning calendar.

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.