Research — Building fair, trustworthy AI for healthcare

Patient Facing Chatbots

Large Language Models (LLMs) are increasingly being explored for enhancing patient communication by simplifying and explaining complex medical reports, such as X-ray findings and hospital discharge summaries. We focus on designing and evaluating AI systems that generate accurate, understandable, and clinically safe explanations for patients.

Care-LLM

Care LLM: Prompting and Evaluation of Patient-facing Chatbot

We develop a patient-facing chatbot powered by large language models (LLMs) to assist with understanding radiology reports and discharge summaries. Our system is designed to translate complex medical information into accessible, patient-friendly language, while maintaining accuracy and clinical relevance. We focus on both the prompting strategies needed to generate clear, supportive responses, and a rigorous evaluation framework to ensure quality, safety, and trustworthiness in patient communications.

Annotation Paper | Dataset Resource

SAFRAN

SAFRAN: Evaluation Framework for Factual Accuracy, Clinical Relevance, and Safety of Discharge Summaries Generated by LLMs

Hospital discharge is a vulnerable time for patients, and confusion around instructions can lead to poor outcomes and costly readmissions. Large language models (LLMs) could help answer patient questions at scale, but their outputs often contain factual errors, omissions, or misrepresented certainty, posing safety risks. Current evaluation methods miss these issues, and expert review is not scalable. No public dataset exists for benchmarking factual consistency in LLM responses to discharge questions. We introduce SAFRAN, a framework for evaluating and improving LLM reliability. We curate a dataset of real discharge summaries and synthetic variants simulating common LLM errors. Using GPT-4 and a validated error taxonomy, we generate structured comparative evaluations to train a compact model that performs expert-level assessments at scale. We release the SAFRAN dataset to enable benchmarking, reproducibility, and the development of safer, patient-facing tools like discharge chatbots.

Training Approach

Multi-Modal Modeling Across Healthcare

Clinical care generates diverse forms of data, including pathology slides, imaging, laboratory values, and clinical documentation. Each modality provides unique and complementary information about a patient's condition. Our work focuses on developing multi-modal approaches that leverage the full spectrum of medical data to deliver precise, robust, and clinically actionable insights across diagnosis, prognosis, and treatment decision-making.

VISTA: Pathology Foundational Models

VISTA: Foundational Pathology Models to Support Tumor Board Decision-Making

Pathology slides contain extremely detailed and diagnostically rich information, but this detail is embedded within vast, high-resolution images where critical features are often sparse and subtle. In VISTA, we develop foundational models trained on pathology slides to address this challenge, learning to identify, contextualize, and summarize fine-grained pathological findings. Our models aim to enhance tumor board decision-making by delivering robust, interpretable insights that support diagnosis, treatment planning, and disease monitoring.

Annotation Paper | Dataset Resource

AutoGKB: Automated Pharmacogenomic Annotation

AutoGKB: Automated Pharmacogenomic Annotation through Agentic Systems

Pharmacogenomic knowledge is critical for advancing precision medicine, linking genetic variation to drug response to guide safer and more effective treatments. However, building high-quality pharmacogenomic datasets is challenging due to the complexity of clinical language, the variability in how genetic and drug information are reported, and the difficulty of extracting structured associations from free-text scientific articles. In this project, we develop curated annotations from PharmGKB-relevant literature, focusing on extracting gene-drug-phenotype relationships with precision and consistency to support downstream modeling and discovery.

Training Approach

Advancing Dermatological Research

We work to advance dermatological research by building new datasets, developing predictive models, and designing clinical tools that improve patient care. Our efforts span outcomes prediction, biomarker discovery, imaging analysis, and patient-centered studies. Recognizing the challenges of limited diversity in existing dermatology resources, we prioritize inclusive approaches that capture the full range of skin types and clinical presentations. Our goal is to advance across the spectrum of dermatologic diseases.

DeepDerm: Public Pathology

DeepDerm: A Public Pathology Dataset for Melanoma Outcomes and Biomarker Prediction

Accurate prediction of melanoma outcomes and biomarker status from pathology slides is critical for advancing precision oncology. DeepDerm provides a publicly available, curated dataset of melanoma histopathology images linked to clinical outcomes and biomarker information. By making this resource available, we aim to accelerate research on prognostic modeling, biomarker discovery, and the development of AI systems that can improve melanoma diagnosis and treatment planning.

Annotation Paper | Dataset Resource

HS COSMOS

HS Cosmos: investigating Healthcare Disparities and Patient Perspectives in Hidradenitis Suppurativa leveraging EPIC COSMOS

Hidradenitis Suppurativa (HS) is a chronic, debilitating skin disease often associated with significant healthcare disparities and unmet patient needs. Through HS Cosmos, we leverage the large-scale EPIC COSMOS electronic health record network to investigate patterns of care, disparities in diagnosis and treatment, and patient-reported outcomes. Our goal is to generate a comprehensive, patient-centered understanding of HS to inform more equitable and effective care strategies.

Training Approach