Our Approach to Machine Learning: Art + Science

We founded Abridge with the belief that conversations between patients and doctors provide some of the most important insights into how people can live healthier lives. While the ability to extract insights from those conversations has historically been challenging, the many recent advances in machine learning have generated an opportunity for us to bring more focus and understanding to people’s health.

To begin with, our technology is able to identify key points in a conversation that help people quickly review the most important parts of their care. Our goal over time is to continue to research and build new ways to help people gain more understanding about their health and next steps.

Human conversation is our ground truth. And we have a lot of it — a one-of-a-kind, de-identified dataset containing over 10,000 hours of transcribed conversations from fully informed and consenting patients.

In collaboration with computer science professors at Carnegie Mellon University and clinical experts at the University of Pittsburgh Medical Center, we devised a system for data annotations based on the archetypal SOAP format for clinical documentation (Subjective, Objective, Assessment, and Plan). This annotation system covers the majority of information that both patients and clinicians are interested in understanding from their conversations. It also allows us to optimize data annotations for immediate information extraction and classification, as well as future challenges around dialogue understanding and visit summarization.

We use these proprietary datasets to help our users get the most from their conversations.

Prior research has found that reviewing simple audio recordings of health conversations can improve recall for patients and may lead to better health outcomes. However, it’s hard to play an entire audio file and find specific segments that are helpful to review. To solve this issue, our machine learning algorithms automatically create audio bookmarks that help people identify new medical terms and review the next steps in their care.

Some current areas of research:

Natural language processing

Can a machine process and extract medical information from a free-form conversation to form an understanding of intent and the related concepts in a doctor-patient dialogue? Research in dialogue modeling, information extraction, named-entity recognition, and summarization enables us to identify medical terms in a conversation and abridge long conversations into summaries for our users. (Interested in learning more? Read our latest publication in this space: Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations.)

Human-computer interaction

Building an interactive machine learning system ensures that we get health care information right in a way that acknowledges imperfections inherent to the technology. Our users validate and augment our insights, and we use these inputs to continuously improve the system’s intelligence.

Cutting edge privacy research

We started the company with a privacy-first mindset, which is why we used de-identified data from fully informed and consenting patients to train our systems. We are committed to embracing the latest technologies to protect people’s privacy, and are investing deeply in federated learning and differential privacy research. We value the trust users give us in storing their health conversations, and these systems will allow us to deliver differentiated, data driven, and privacy preserving user experiences into the future.

Above all, our research is in service of helping people achieve healthier outcomes. Learn more about our research and team here. We are just getting started and can’t wait to share more of our research and product updates soon!