Postdoc in Clinical Data Science, NLP

United States
Job Posted Date: 
July 5, 2021
Full-time Positions
Part-time Positions
Postdoc Positions
Postdoc Positions at UCSF

A postdoctoral position is available in the Rudrapatna Laboratory within the Bakar Computational Health Sciences Institute at the University of California, San Francisco (UCSF). The research goal of the lab is to develop new methods for using electronic health records (EHR) and other large datasets to answer questions in the areas of real-world treatment effectiveness/safety, healthcare equity, and precision medicine. 

Although the primary clinical interest of the group is in Inflammatory Bowel Disease, the approaches being developed are quite general and future projects will extend across the spectrum of diseases seen and treated at UCSF. Our methods span the full spectrum of subfields under the umbrella of clinical data science, and includes structured data informatics, machine learning, epidemiology, biostatistics, natural language processing, knowledge representation and reasoning, and deep learning. Although much of our efforts to date have focused on the re-use of EHR data, we are increasingly interested in combining these with other large clinical datasets such as administrative claims, census-level surveys, and other emerging, prospectively-collected data (mHealth data, wearables). 

This position is initially available for one year with a possible extension for up to two years based on performance evaluation.

The Rudrapatna Laboratory ( is embedded in the Bakar Institute ( at the UCSF, a world-class health system and biomedical research university, and the top public recipient of NIH funding for the past 13 years straight. Postdoctoral fellows will work in a richly stimulating environment with ample access to expertise across domains, including epidemiology, biostatistics, data science, clinical informatics, and clinical areas across the full spectrum of medical and surgical specialities. Posdoctoral fellows will divide their time between a brand new, state-of-the-art building ( in the Mission Bay neighborhood of San Francisco and working remotely. The Bakar Institute features access to unique clinical data assets, including a de-identified extract of the complete EHR at UCSF, a machine redacted extract of the complete corpus of clinical notes authored at UCSF (100M+), as well as a cross-campus database covering over 7 million patients in order to enable multi-center studies. Other computational resources include access to a high performance computing cluster ( and GPUs for deep learning on clinical data.

Job Requirements: 

The successful candidate will work closely with the principal investigator and other members of a growing, multidisciplinary research team at the Bakar Institute as the primary driver of several funded research grants. The candidate will help assemble and consolidate data assets, collaborate closely with individuals developing gold-standard annotations for model training, perform modeling, and disseminate the results in the form of conference presentations and first-authored manuscripts.

The successful candidate is expected to additionally spend time developing an independent research program. Grant submission is an important part of the training, and the candidate will be expected to support PI submissions as well as initiate his or her own applications starting in the second year.


Essential Qualifications:

  • A PhD (or equivalent) in one of the following fields: epidemiology, computational biology, biostatistics, bioinformatics, data science, computational linguistics, computer science, or machine learning. MDs are welcome to apply if they have a strong background in one of the above fields.
  • A strong interest or background in clinical research and epidemiology
  • Experience in Python (preferred), R, or Julia. SQL is strongly recommended.
  • Excellent communication skills and a track record of peer-reviewed first-authored publications
  • A high degree of motivation and ability to operate independently

Desired Qualifications:

  • A background in natural language processing (NLP) and associated tasks, including text classification, information and relation extraction, knowledge representation. A specific background in clinical NLP would be extremely valuable.
  • A background in clinical informatics, including knowledge of the OMOP common data model. Experience using clinical databases, especially the Epic EHR database backends (Clarity, Caboodle) would be highly valuable.
  • A background in causal inference
  • A background in deep learning
How to Apply: 

Qualified candidates should email a statement of research interests, curriculum vitae, and list of three references as a single PDF to Dr. Vivek Rudrapatna at [email protected] with “Postdoc application” in the subject line.

San Francisco
Greater Bay Area