The Research Intern will conduct research on multimodal foundation models and agents, implement models, design architectures, and collaborate on projects and publications.
Business UnitWhat the Role EntailsWe are seeking highly motivated Research Interns to work on cutting-edge problems in multimodal foundation models and multimodal agents.
The intern will contribute to advancing models that understand, generate, and act across multiple modalities (e.g., vision, language, video, audio, and GUI environments), with applications in embodied intelligence, computer-use agents, and world models.
You will collaborate closely with a team of researchers and engineers to design novel algorithms, build large-scale training pipelines, and publish at top venues (e.g., CVPR, ICCV, NeurIPS, ICLR, ACL).
Responsibilities
- Conduct original research on multimodal foundation models or multimodal agents
- Implement and experiment with large-scale models (training, fine-tuning, evaluation)
- Design new model architectures, objectives, or data pipelines
- Work with large multimodal datasets (image, video, text, UI trajectories, etc.)
- Contribute to papers, technical reports, and open-source projects
- Collaborate with cross-functional teams on research prototypesWho We Look ForCurrently pursuing a PhD or Master’s in Computer Science, AI, Machine Learning, or related fields
Strong background in deep learning and machine learning fundamentals
Solid programming skills in Python and PyTorch/JAX
Experience with at least one of:
- Vision–language models
- Large language models
- Video understanding/generation
- Reinforcement learning or imitation learning
- Strong problem-solving and research skills
Publications at top conferences (CVPR, ICCV, NeurIPS, ICLR, ACL, etc.)
Experience training large models or working with distributed systems
Experience with multimodal datasets and evaluation benchmarks
Familiarity with:
- Transformer architectures and scaling laws
- Multimodal alignment (contrastive learning, instruction tuning)
- Agent training (RLHF/RLAIF, planning, tool use)
- Synthetic data generation or simulation environments
- Experience with long-context training or memory mechanismsEqual Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Top Skills
Jax
Python
PyTorch
Similar Jobs
Gaming • Software • Metaverse
The intern will conduct research on RL algorithms, design training infrastructures, and explore new RL paradigms for multimodal models.
Top Skills:
Autoregressive ModelsCpuDeep LearningDiffusion ModelsDistributed TrainingGpuReinforcement Learning
Gaming • Software • Metaverse
Conduct research on multimodal processing algorithms, optimize existing algorithms, stay updated with technologies, and develop reinforcement learning frameworks.
Top Skills:
C++CaffePythonPyTorchTensorFlow
Financial Services
Build and maintain data ETL pipelines, research and implement features for ML trading models, write tested production-quality code, debug production issues, and own small projects from design to deployment.
Top Skills:
Python,Cython,Pytorch,Polars,Postgres,Clickhouse,Sql
What you need to know about the Singapore Tech Scene
The digital revolution has driven a constant demand for tech professionals across industries like software development, data analytics and cybersecurity. In Singapore, one of the largest cities in Southeast Asia, the demand for tech talent is so high that the government continues to invest millions into programs designed to develop a talent pipeline directly from universities while also scaling efforts in pre-employment training and mid-career upskilling to expand and elevate its workforce.

