2025 Summer Data Engineering Intern - Prescient Design (roche)
Job Description
Department Summary:
The Large Language Model (LLM) team within Prescient Design is seeking undergraduate interns with experience in software and data engineering, as well as a keen interest in LLMs and biomedical data.
One of Prescient’s new, innovative efforts is in developing state-of-the-art large LLMs for scientific discovery and biomedical applications. We envisage LLMs for use across the drug discovery and development pipeline, including applications like scientific document classification to conversational models and multimodal learning of complex data types including biological sequences and high-resolution microscopy images.
As part of your internship, you'll work on projects surrounding building out LLM data infrastructure, improving data accessibility by contributing to our information retrieval platform, and large-scale data processing.
This internship position is located in New York City, on-site.
Key Responsibilities:
Develop systems to improve data infrastructure, with a focus on data quality and reliability.
View Orignal JOB on: nursingjobs.siteDesign and implement data pipelines that efficiently process, transform, and consolidate large-scale data from multiple sources.
Create user-friendly interfaces and tools that enable researchers to easily query, discover, and interact with complex data repositories.
Program Highlights
Intensive 12-weeks, full-time (40 hours per week) paid internship.
Program start dates are in May/June (Summer).
A stipend, based on location, will be provided to help alleviate costs associated with the internship.
Ownership of challenging and impactful business-critical projects.
Work with some of the most talented people in the biotechnology industry.
Who You Are
Required Education: Must be pursuing a Bachelor's Degree (enrolled student).
Required Majors: Computer Science, Engineering, or related fields
Skills and qualifications:
Strong programming skills– proficiency in Python; ability to write, optimize, debug, and test production-ready code, and previous software engineering work experience
Familiarity with data integration, ETL processes, and large-scale data processing techniques.
- Experience with or strong interest in LLMs, information retrieval, and data engineering is a huge plus!
Relocation benefits are not available for this job posting.
The expected salary range for this position based on the primary location of New York City is $45 hourly. Actual pay will be determined based on experience, qualifications, geographic location, and other job-related factors permitted by law. This position also qualifies for paid holiday time off benefits.
#GNE-R&D-Interns-2025
Genentech is an equal opportunity employer, and we embrace the increasingly diverse world around us. Genentech prohibits unlawful discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin or ancestry, age, disability, marital status and veteran status.