Postdoc in Clinical Data Science, NLP

United States
Job Posted Date: 
August 6, 2020
Full-time Positions
Part-time Positions
Postdoc Positions
Postdoc Positions at UCSF

A postdoctoral position is available in the Rudrapatna Laboratory within the Bakar Computational Health Sciences Institute at the University of California, San Francisco (UCSF). The research goal of the lab is to develop methods for the optimal repurposing of electronic health records (EHR) data to support multiple use cases – epidemiology, clinical decision making, healthcare reimbursement, drug and device regulation – all under the umbrella of a field now known as real-world evidence. Although the primary clinical interest of the group is in Inflammatory Bowel Disease, the approaches being developed are quite general and future projects will extend across the spectrum of diseases seen and treated at UCSF. Two methodologic themes that underlie the group’s aspirations are: 1) use of text mining techniques like information extraction/retrieval and knowledge base generation to augment existing EHR structured data assets, and 2) the integration of retrospective and prospective study designs. This position is initially available for one year with a possible extension for up to two years based on performance evaluation.

The Rudrapatna Laboratory is embedded in the Bakar Institute ( at the UCSF, a world-class health system and biomedical research university, and the top public recipient of NIH funding for the past 13 years straight. Postdoctoral fellows will work in a richly stimulating environment with ample access to expertise across domains, including epidemiology, biostatistics, data science, clinical informatics, and clinical areas across the full spectrum of medical and surgical specialities. Posdoctoral fellows will divide their time between a brand new, state-of-the-art building ( in the Mission Bay neighborhood of San Francisco and working remotely. The Bakar Institute features access to unique clinical data assets, including a de-identified extract of the complete EHR at UCSF, a machine redacted extract of the complete corpus of clinical notes authored at UCSF (80M+), as well as a cross-campus database covering over 5 million patients in order to enable multi-center studies. Other computational resources include access to a high performance computing cluster ( and GPUs for deep learning on clinical data.

Job Requirements: 

The successful candidate will work closely with the principal investigator and other members of a growing, multidisciplinary research team at the Bakar Institute as the primary driver of two research grants. The first aims to understand the real-world effectiveness of the drug Ustekinumab as used to treat Crohn’s disease, and to directly compare the treatment effects seen in the UCSF population with those studied in prior and ongoing Randomized Controlled Trials. The second aims to use both structured and free-text data from the EHR in order to identify undiagnosed patients with a rare genetic cause of recurrent abdominal pain and for which new treatments are now available (Acute Hepatic Porphyria). The candidate will help assemble and consolidate data assets, collaborate closely with individuals developing gold-standard annotations for model training, perform modeling, and disseminate the results in the form of conference presentations and first-authored manuscripts.

The successful candidate is expected to additionally spend time developing an independent research program. Grant submission is an important part of the training, and the candidate will be expected to support PI submissions as well as initiate his or her own applications starting in the second year.


Essential Qualifications:

  • A PhD (or equivalent) in one of the following fields: computational biology, biostatistics, epidemiology, bioinformatics, data science, computational linguistics, computer science, or machine learning. MDs are welcome to apply if they have a strong background in one of the above fields.
  • A strong interest or background in clinical research and epidemiology
  • Experience in Python (preferred), R, or Julia. SQL is strongly recommended.
  • Excellent communication skills and a track record of peer-reviewed first-authored publications
  • A high degree of motivation and ability to operate independently

Desired Qualifications:

  • A background in natural language processing (NLP) and associated tasks, including text classification, information and relation extraction, knowledge representation. A specific background in clinical NLP would be extremely valuable.
  • A background in clinical informatics, including knowledge of the OMOP common data model. Experience using clinical databases, especially the Epic EHR database backends (Clarity, Caboodle) would be highly valuable.
  • A background in causal inference
  • A background in deep learning
How to Apply: 

Qualified candidates should email a statement of research interests, curriculum vitae, and list of three references as a single PDF to Dr. Vivek Rudrapatna at [email protected] with “Postdoc application” in the subject line.

San Francisco
Greater Bay Area