It’s typical in an election year to see an administration spend money on new initiatives. A new cost cutting initiative unveiled back in March has generally gone un-noticed by the main stream technology media. Called the “Big Data Research and Development Initiative” the program is focused on improving the U.S. Federal governments ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.
The program includes several federal agencies including NSF, HHS/NIH, DOE, DOD, DARPA and USGS who pledge more than $200 million in new commitments that they promise will greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.
In a statement Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy said “In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”
One of the more interesting aspects of this project is the use of public cloud infrastructure, as in cloud computing services provided by the private industry. Confusing I know. A great example of this plan in action is The National Institutes of Health who announced that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publicly available data set for free and researchers only will pay for the computing services that they use.
According to a recent article on genengnews.com this is part of a larger strategy to reduce the number of federal data centers from the current 3,133 data centers sliced by “at least 1,200” by 2015, representing a roughly 40% cutback at a $5 billion savings. This also extends the work started with the administration’s Cloud First Policy outlined last year as part of The White House’s Federal Cloud Computing Strategy.
In a world that is more dependant on data then ever before the stakes are high and so is the money. It will be interesting to follow this initiative over the coming months.