Tencent Logo

Tencent

Research Intern — Coding LLMs

Posted 11 Hours Ago
Be an Early Applicant
In-Office
Singapore, SGP
Internship
In-Office
Singapore, SGP
Internship
Perform research on data-centric methods to improve coding LLMs: data filtering, quality assessment, deduplication, synthetic data generation, and evaluation. Build pipelines for code generation, editing, repo-level reasoning, tool use, and multi-step coding tasks. Run experiments on large-scale code corpora and agentic coding trajectories to analyze how data, models, and training strategies affect coding capabilities.
The summary above was generated by AI
Business UnitTechnology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.What the Role Entails

We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding.

Responsibilities

* Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis.
* Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi-step coding tasks.
* Run experiments to analyze how data, model, and training strategies affect coding capabilities.
* Work with large-scale code corpora, developer activity data, and agentic coding trajectories.

Who We Look For

* Strong programming skills in Python.
* Solid understanding of machine learning and large language models.
* Familiarity with LLM pre-training, mid-training, code models, data curation, evaluation, agents, or tool use.
* Strong experiment design, data analysis, and problem-solving skills.
* Interest in code intelligence, software engineering automation, and agentic coding.

Preferred Qualifications

* Experience with code data processing, GitHub-scale data, synthetic data, LLM evaluation, semantic deduplication, or agentic coding.
* Research experience, publications, or open-source projects in related areas are a plus.

What We Offer

* Access to large-scale real-world coding data and agentic trajectories.
* Rich compute resources and model APIs for fast research iteration.
* Opportunities to work on real-world coding model applications and the full model development loop.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Similar Jobs

An Hour Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Hardware • Information Technology • Machine Learning
Lead global CMP process and equipment improvements for high-volume manufacturing. Develop and share BKMs, support HVM sites and suppliers, drive cost/quality/yield targets, prioritize and lead multiple projects, and provide hands-on technical support and alignment across technology teams.
Top Skills: CmpCmp EquipmentHigh-Volume Manufacturing (Hvm)
An Hour Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Hardware • Information Technology • Machine Learning
As a Senior Engineer, you'll develop and optimize dry etch processes for NAND technology, collaborating on projects, conducting analysis, and supporting process transfers.
Top Skills: MS OfficePython
An Hour Ago
In-Office
Singapore, SGP
Senior level
Senior level
Artificial Intelligence • Hardware • Information Technology • Machine Learning
The Staff Data Scientist will optimize semiconductor manufacturing processes through data analysis, statistical modeling, and data visualization, collaborating closely with engineering teams.
Top Skills: AngularDashPlotlyPythonSQL

What you need to know about the Singapore Tech Scene

The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account