The Department of Homeland Security thinks it may have found a promising way forward to tie together information from its various components while protecting privacy and civil liberties.
The department is constructing several demonstration systems under the auspices of the broader DHS Data Framework.
Its overall goal is to gather together huge stores of data from more than a dozen different IT silos scattered across the various DHS components, and then translate their content into a common, modern data format and make them centrally searchable by only the DHS staff who have a legitimate need to access a given piece of information.
“We collect a lot of data under a lot of different authorities for a lot of different purposes, and so there are a lot of different legal and privacy issues that come with that,” said Rebecca Richards, DHS’ senior director for privacy compliance. “But there are also a lot of good uses for that data, and right now, when we’re looking at an individual in front of us who might be on a terrorist watch list, we have to go to 17 different systems with 17 different logins. That’s just physically difficult. There’s nothing else to say about it.”
Richards explained details of the program last week to the Data Privacy and Integrity Advisory Committee (DPIAC), a panel of outside experts, a day after announcing she would depart DHS. She’ll soon assume duties as the National Security Agency’s first-ever civil liberties and privacy officer.
Looking for a solution
DHS has been testing out the program, which is actually made up of three separate new IT systems, for the last several months.
While it’s a long way from being ready for day-to-day mission use, officials say the pilots proved that the basic concept works technologically — which is a coup in and of itself — but also from the perspective of being able to maintain basic privacy safeguards.
The department has been searching for several years for a reasonable and cost- effective way to tie together all this information. Many of these various IT systems were designed and built decades before anyone even conceptualized the idea of consolidating the nation’s internal security functions into one cabinet-level department.
The department says vastly widening the datasets that DHS analysts and enforcement officers have easy access to is great from a mission perspective; but absent new safeguards, it also presents a huge potential for abuse, either making privacy violations much easier or enabling the next would-be Edward Snowden to walk off with a vast trove of data.
“What we wanted to do was build on this concept of making sure that we can give access to data to the people who should have it, but control it in a way that you’re not creating a honey pot for a whole bunch of people to come and look at,” Richards said. “And we need to make sure people only have access to it for the purposes they’re allowed to.”
To make that possible, DHS is first building a “user attribute hub” that will serve as the foundation for both user-based and role-based access control. It will manage various characteristics of every user in the department, such as their security clearance level, their job function and training level.
In the new aggregated data environment, each piece of data will be tagged with identifiers that control which users can access it, and for what purpose. And when users do get access to the data for any reason, that activity will be stored in audit logs that DHS says won’t be able to be changed or destroyed, even by system administrators.
Its sole job is to scoop information from those 17 IT systems across DHS on a regular basis and translate it into a common format, adding data tags along the way to dictate who can and can’t access it.
For the pilot, DHS officials pulled data from three separate systems in its components that weren’t originally designed to interoperate with one another: The Transportation Security Administration’s Alien Flight School Program, Immigration and Customs Enforcement’s Student Exchange and Visitor System, and Customs and Border Protection’s Electronic System for Visitor Authorization.
Neptune managed to turn data from those three systems into something that’s useful across organizational boundaries. But besides providing a proof-of-concept for the information sharing project, Richards said the pilot produced some other knock-on benefits, like pinpointing serious data quality issues in the DHS components’ legacy systems.
“It turns out there were four or five ways to identify that someone was from Germany,” she said. “Sometimes that was within the same data set and sometimes it was across several of them. This is something we can now bring down to these source systems to help improve their data quality, which has always been a concern.”
From there, the data makes its way to two other systems. One, called Cerberus, will operate in the classified domain, but will contain a mix of classified and unclassified data so that authorized users, operating for a specific purpose, can search everything from intelligence data to TSA passenger lists, visa applications and border patrol encounters all at once.
“Just because it’s in the classified environment doesn’t automatically make the data classified, but now you’re able to do a classified search and compare classified information that you have against unclassified data,” Richards said.
Avoided typical concerns
A third system, a prototype called the Common Entity Index, is designed to serve as a double-check on the controls DHS has designed.
CEI’s job is to aggregate data from DHS’ legacy systems and test whether the fundamental principles behind the access control scheme are working. If a user doesn’t have the legal authority to see data about a person in today’s legacy systems, he or she shouldn’t be able to access it in the new framework either.
Richards said a lot of privacy and technical questions still need to be answered before the system goes into use, but fundamentally, she thinks DHS has avoided many of the core concerns that tend to accompany data mining projects by involving its privacy and civil liberties officials early in the process, as well as at major steps in the framework’s design.
“It’s difficult to overstate how significant it is in a department this large to have these successful pilots with data from three different components,” she said. “It’s fantastic, and it’s the first time it’s happened in a way that’s included the oversight organizations and built us all the way into the process. They’ve shown they can do access control. That was very boring for our mission operators, and it was exciting for oversight. But we’re working together.”
DHS’ privacy office asked DPIAC to provide questions and input on potential concerns going forward.
Department privacy officials already have publicly released several privacy impact assessments (PIA) and one system of record notice (SORN) on the program, the barebones requirements under federal law when the government changes the way it handles personally identifiable information.
Richards said DHS actively is exploring other ways of notifying members of the public about the project, including possible engagements with industry groups whose members are required to register their information in DHS databases, such as transportation workers in the TWIC program.