EPA embracing data analytics from daily operations to chemical investigations

The Environmental Protection Agency has become more sophisticated when gathering and analyzing data. Its Chief Data Scientist Robin Thottungal said that although the agency’s day-to-day operations are largely regulatory or tied to emergency response, decisions are often made from a substantial amount of data.

“So what I have been doing as a chief data scientist for the agency is trying to understand how I can create a much more, better climate for the data scientists or the folks who are using data to drive decisions to get access to the tools and technologies to make their analysis much more easier to perform,” Thottungal said.

Right now, that means building a data science platform in the cloud. With that, Thottungal said different program offices can share data subsets for analysis more easily.

Lucky for Thottungal, he spends a bulk of his time within the office of the chief information officer, where he is surrounded by IT staff and those who run the EPA’s data center. He describe this group as “folks who are running the data center, folks who are thinking about data standards, folks who are thinking about how to deploy the next regulatory system and try to understand how we can make sure that all these things are designed or planned in a fashion where we can use data science or analytics at the end of the day to do more insight or to gain more insight.”

But for the rest of the agency, he said he is trying to create a decentralized data science team.  He hopes this will let different program offices know what data science can do for them.

“What that translates to is I do not have a 20- or 30-data scientist team,” he said. “What we have is I have a couple of key members who are really good at data science. And then we go and talk to the different program offices and identify who are not necessarily called a data scientist but have the ability and the skills to become a data scientist.”

From there, Thottungal and his team train those employees on new data science tools and techniques to solve problems. He said that leadership is coming to understand the possibilities of data science when it comes to solving mission-related problems.

Data science plays role in assessing toxic chemicals

EPA is also using data analytics to share information about toxins and chemicals. The agency’s Integrated Risk Information System (IRIS), a database of known toxins, and its CompTOX Chemistry Dashboard, which contains granular data on more than 700,000 substances, work together for this reason.

“The CompTOX dashboard is an application we’ve been bringing up over the past 2 ½ years and it is an integration hub for data of difference sources that serves our effort to develop support for computational toxicology,” said product owner Antony Williams. “So if you’re looking for a fast answer to get as much data as we have available on certain chemicals, you would come to the dashboard and do a search on a chemical and we will surface as much data as we have available for that.”

Advertisement
IRIS is one of its sources and does assessments of different chemicals’ effects on human health.

“These assessments, the result of them is typically a toxicity value for cancer, non-cancer outcomes,” IRIS program manager Kris Thayer said. “It’s not a regulatory decision, it’s a technical assessment that gets used in the decision making process by others. So on its own it’s not really a regulatory document. It needs to be combined with information on exposure from other offices.”

Because the chemicals assessed by IRIS are fairly controversial, Thayer said her team will survey literature on evidence in animals and humans, and integrate it to reach conclusions and toxicity values. The process is rigorous, time- and resource-intensive while Thayer said assessments also under a long peer review process.

So now, Williams said they are trying to develop a fast way to predict algorithms for prioritizing which chemicals deserve attention for further investigation.

“In order to do that we’re pulling together data of all different types in order to analyze and develop these algorithms,” he said.