Anyone collecting data needs a place to put it. Harvard geneticist George Church felt that need acutely in the early days of his Personal Genome Project: It was the early 2000s, and he had the audacious goal of sequencing some 100,000 human genomes — each 25,000 times the size of a traditional electronic record. But though his vision was ripe, the infrastructure to store and manipulate these titanic data sets wasn’t.
Church commissioned Alexander Wait Zaranek, a computer science researcher in his lab, to scope out the tools available to work through such large data sets. When none were available, Zaranek and his Church lab colleagues Ward Vandewege and Tom Clegg began building their own. And so, Arvados was born.
Arvados is a content management system for large bulky genomic data sets. Just as blogging platforms like WordPress let journalists and writers upload their data — text, videos, images — and work with them, so Arvados lets researchers and clinicians import genetic data files. Within the system, they can run a variety of analyses or share the data itself.
The first generation of Arvados was activated in 2007 to service the Personal Genome Project. By 2013, its founders had spun off the company, and in December 2013, Curoverse announced $1.7 million in seed funding to develop its software.
In the 10 years since the Personal Genome Project was conceived, the effort to use genetic data to inform medicine has exploded internationally. In the next year, researchers are expected to generate 85 petabytes of sequencing data from research subjects and patients. “That translates to about 21 million HD movies,” Curoverse chief executive Adam Berrey said.
With its Arvados software, Curoverse hopes to be the invisible infrastructure powering such analyses in research labs and clinics over the next decade.
So far, the system has been accessible by invitation only — Johns Hopkins University, Harvard Medical School, and the Wellcome Trust Sanger Institute (which is storing 20 petabytes of data) are among early adopters. Starting Tuesday, any group can sign up to use the system, which can be accessed through a website. Curoverse also sells the system on hardware that can be stored on-site and installed for a fee. The company is preparing for a commercial release this summer.
Image via Flickr user Dave Fayram