Technology helps NIH bureau solve foodborne mysteries

For the National Institutes of Health’s National Library of Medicine, the move to cloud services offers a huge payoff from both an information sharing and a mission perspective.

A recent interagency collaboration showed the potential power of the cloud in addressing foodborne illnesses.

Ivor D’Souza, the chief information officer of the National Institutes of Health’s National Library of Medicine, said his office worked with several agencies to help solve a food safety problem in an innovative way.

Ivor D’Souza is the chief information officer of the National Institutes of Health’s National Library of Medicine.
Ivor D’Souza is the chief information officer of the National Institutes of Health’s National Library of Medicine.

“The NLM is in partnership with FDA, CDC, USDA and a few others in the health safety arena and we are collaborating on a project that uses whole genome sequencing for the surveillance of foodborne disease,” D’Souza said. “This partnership takes advantage of the NLM’s genome database of pathogens, which are microorganisms such as bacteria or viruses that cause disease. The FDA uploads pathogen genomes that they encounter during their routine inspection of food plants to our genome database. Then CDC uploads pathogen genomes from patients that have the disease. So at NLM now we have pathogen genomes from patients as well as from food plants and we are able to look for matches in the database to see if we can find the same strain that links patients with the food products and their manufactures. Using this method, we’ve now been able to detect an offending source within two to three days, whereas the traditional 30-year-old approach that has been in place takes around two weeks to detect such relationships.”

While this partnership still is in its pilot stage, D’Souza said the interagency group tracked a multi-state outbreak of listeria in certain cheese products.

“We have great expectations when this initiative moves into full production will deliver the goods as we expect it to,” he said. “To understand why this initiative is so important, the CDC estimates that each year in the United States approximately 48 million people get some form of foodborne illness, 128,000 people are hospitalized and 3,000 die from foodborne illnesses. And with the continued globalization of food supply and distribution system, it’s more challenging than ever to ensure food safety.”

The potential use of the cloud for this project is just one of several ways the National Library of Medicine is looking to change its infrastructure.

NLM, like many big data agencies, faces a series of IT needs the cloud could help with, including the need for high volume storage, availability, scalability and speed.

Chat with Jonathan Alboum, USDA CIO, July 26 at 2 p.m. Sign up here.

“Our infrastructure, the way it’s built internally, is very much in the vein of a cloud, virtualized technology. We have many of the components internally that mimic what you would see in a traditional private sector cloud offering,” D’Souza said. “The things we are looking for are really opportunities to leverage the strength and diversity of private sector cloud offerings. That is what we are looking at. The areas that require thorough evaluation are two-fold: one is where we have a lot of interlinking between our various applications and information resources just because of the diversity of our applications, we want to find out what is the best fit to first move to the cloud, and then what can shortly follow thereafter. The second piece is to figure out whether there is enough of a marketplace to support something as extensive as ours in the National Library of Medicine.”

D’Souza said one of his biggest concerns about how NLM uses cloud is vendor lock-in. He said because of the size of the library’s data holdings, he wants to make sure there is more than one cloud provider who can meet NLM’s needs.

“We do feel pretty confident that there are cloud solutions out there that can fit at least some of our needs,” he said. “We are actively looking at those things as we speak for various things, including library specific functions. I would imagine by the end of the calendar year, we will make at least a brief foray into the cloud space.”

Along with cloud, D’Souza said NLM is a big supporter of open data. The library has developed more than 290 application programming interfaces (APIs) to help make data access and sharing easier.

“All the API-fronted data sets are structured data sets. Even the data sets we receive from other parties, a large part of our time is spent curating data, annotating data, structuring data to make it easier for people to consume,” he said. “Then we take it to the next step and open it up through APIs to make it easier for computers on the far end to consume as well.”

At the same time, D’Souza said the NLM is focusing heavily on securing data at the application layer. He said this effort takes into consideration three areas: secure tooling, best practice processes and the development of people.

NLM has invested in source code scanning tools used by software developers to identify security vulnerabilities early in the development stage.

“These source code scanning tools not only help us uncover security vulnerabilities early, but they seem to be a good platform for teaching software developers good coding practices in general. It not only shows them how to detect vulnerabilities but also take advantage of the libraries that are built into the software frameworks that our developers use to build applications,” he said. “We realized that tooling and security technology was not the only thing that will help us advance to the next stage of good application security, we realized early that we also needed to bring in some processes to make sure that technology is complimented equally with processes that will help us do things that technology couldn’t muster, such as business rules that existed in applications were things that humans knew but tools could not figure out.”

D’Souza said NLM implemented the secure software development lifecycle and the application security verification standard. He said the library decided on 46 security controls, which are given extra focus during the planning and early design phases.