The National Institutes of Health recently deployed the Internet 2 network. It’s part of how some NIH agencies are forced to think differently to deal with the massive explosion of data over the last decade.
Alastair Thomson, the chief information officer of the National Heart, Lung, and Blood Institute in the NIH, said his agency has both high-performance computing needs as well as more typical requirements.
But the amount of data that his agency is using and sharing is putting a strain on the network.
“We are not just dealing with normal performance needs an organization has. It ranges from ‘I just need to store two or three petabytes of data’ to ‘I need to compute on hundreds of terabytes on a computer that has 20,000 CPUs in it,’” Thomson said. “It makes for some interesting challenges and we are still working out how do to some of that.”
Thomson said his agency can’t afford to store, manage or even process that amount of data without outside help.
“We have been working with both Amazon and Microsoft on models for doing that,” he said. “At the same time, we are increasingly seeing cloud as being just a good place to host the applications. We moved our public website to Amazon about 2 years ago and we are just about to move it into Microsoft Azure because their environment for us is more compelling at the moment, which is presenting some interesting challenges in terms of portability to be able to do it. Because we developed this using good open source technologies like Drupal, moving it is actually easy. You have to think about those things before you move to the cloud otherwise you will end up locked in to particular vendors.”
Thomson said part of the discussion also focuses on disaster recovery to take advantage of Azure’s govcloud and integrate applications across both the Microsoft cloud and with NIH’s internal data centers.
“That’s a longer term project that we hope over the next 6 months to start running some smaller applications in the Microsoft cloud,” he said.
Thomson said the cloud offers one of the best solutions to the growing data problem, because what tends to happen in the traditional storage and computing power set up is the more data pushed through the network, the more power is needed. He said high performance systems can talk to multiple CPUs at one time and because it’s doing parallel operations, NHLBI needs the storage to handle the processing too.
“We’ve been doing experimentation in both Azure and Amazon with spinning up clusters of many, many compute nodes and see how we can move data in and out of there in an effective manner,” he said. “It could be challenging if you have a constant workload that is consistently there, then the cloud model is more expensive than building something locally. But a lot of what we do is reboosting where you need to do an analysis on a massive amount of data so you need to spin up 40,000 nodes for a couple of days and you can then shut them down again. Cloud looks at a very compelling environment for that.”
NIH recently deployed the infrastructure, opening the door to more speed and computing power.
“It’s a critical resource for us. We are following the Department of Energy’s model of a science DMZ [demilitarized zone], which allows for the free flow of data using different compensating controls besides firewalls so things can move at high speed,” Thomson said. “There are interesting challenges in securing it. I have been working with the NIH CIO to make that work effectively for us. We have instruments and computer resources in the science DMZ now so we can collaborate with scientists across the country.”
Thomson said using the Internet 2 means moving large amounts of data across the network in a few hours instead of a few days.
“It enables science that we’ve never been able to do before,” he said. “That’s really why the investment was made, to enable the science. We don’t do this stuff for fun. It’s really about giving the researchers the tools to innovate.”
And only through innovation will NIH solve some of the most deadly diseases facing humankind today.