We are generating data at unprecedented speeds and volumes. The cancer genome atlas (TCGA), for example, which started out life as a three-year pilot project almost 10 years ago has grown annually to include sequences from thousands of tumors. That is just one example from the many more genomics data repositories that have been established, with each growing almost exponentially. Factor in clinical trials databases and electronic health records, to name just a few, and it is not surprising that we are almost drowning under a massive data deluge.