When mathematicians Leland McInnes and John Healy walked into their work’s annual “Big Dig” — a sort of classified hackathon for Canada’s version of the National Security Agency — in 2017, they were not thinking about biology at all. They wanted to find a way to quickly spot the differences between computer viruses.
They ended up creating a tool to simplify datasets and visualize the data points in them: an algorithm they named Uniform Manifold Approximation and Projection, or UMAP. They published a paper on it in 2018. To their great surprise, in fewer than five years, it has become one of the most ubiquitous tools in modern biology research. UMAP has now been used to study everything from forecasting rain in the Alps to identifying the many-hued pigments in a Gauguin artwork to modeling how Covid-19 tweets are disseminated. And, of course, scientists have applied UMAP to studying the actual virus itself. The technique is now the method of choice for most computational biologists who want to see what, exactly, is going on in a dataset.
“Almost every paper is going to have a UMAP in figure one,” said John Marioni, a group leader at the European Bioinformatics Institute and a faculty member at the Wellcome Sanger Institute. “I would say it’s almost become standard in the analysis. There are a few alternative visualizations, but, in general, the first figure in most papers, it’s going to be generated using the UMAP algorithm.”
This article is exclusive to STAT+ subscribers
Unlock this article — plus in-depth analysis, newsletters, premium events, and networking platform access.
Already have an account? Log in
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect