The Defense Advanced Research Projects Agency (DARPA) is using a challenge program to find out whether it’s possible to put shredded documents back together again.
It turns out the answer is yes, but with one important caveat: Participants in the challenge are dealing with pieces of one document at a time, not the mishmash of paper shards that would likely be found in a typical organization’s recycling bin.
Nonetheless, with a little less than two weeks to go from the start of the Oct. 27 competition, participants have managed to meet four out of five challenges of incrementally increasing difficulty. Thousands of individuals or teams have signed up to take a crack, and the winner will take away $50,000.
As in some of its previous public challenges, DARPA wanted to see how people go about solving complex problems. And in this case, the answers could have some important military applications, said Dan Kaufman, director of DARPA’s Information Innovation Office.
“We shredded the documents in different ways, and we wanted to know if people could put them back together in a way that’s relevant to the military,” he said. “That means, can you do it fast enough and cheap enough to fit within a mission timeframe?”
It amounts to perhaps one of the most difficult sets of jigsaw puzzles in history. There are tens of millions of possible combinations, and unlike the usual jigsaw puzzle, the people putting it together don’t know what the final product is supposed to look like. The final and most difficult document, “number five”, contains 6,000 separate pieces of shredded paper.
But on Monday, DARPA was pleasantly surprised to find that the first four had already been solved. Dr. Norm Whittaker, deputy director of DARPA’s Information Innovation Office, said the teams took a variety of approaches.
“We saw almost immediate returns on the first puzzle within the first few days,” he said. “Some real clever jigsaw puzzle fans, I think, were the ones who stepped up. But as we get into the most difficult puzzles, the jigsaw puzzle folks are having a lot more difficulty with the thousands and thousands of pieces.”
Teams lining up to play
About 8,200 teams or individuals signed up to compete and the puzzles received more than 70,000 downloads from the DARPA website. Whittaker said the teams moving up the leader board seem to be ones that have figured out ways to go beyond the traditional, manual jigsaw-puzzle approach.
“We’re seeing automated approaches, and we’re seeing at least two crowdsourced efforts,” he said. “That looks like an approach that has a real potential to move fast.”
One of the teams taking the crowdsourcing approach is led by Dr. Manuel Cebrian, a research scientist at the University of California San Diego. He was on the Massachusetts Institute of Technology team that solved a DARPA challenge in 2009 in which participants had to track down ten red balloons hidden all over the country.
For the shredder challenge, his team managed to recruit 3,500 participants from across the globe in a single week, through a recursive scheme of monetary incentives.
If UCSD wins, the $50,000 will get divvied up among the people who helped solve the puzzles and their recruiters. Participants get a dollar every time they find a paper shard’s proper place, but 50 cents also goes to the recruiter, and a quarter goes to that recruiter’s recruiter. The idea, Cebrian said in an interview with Federal News Radio, is to expand the recruiting reach to people who don’t necessarily have puzzle-solving skills. If they know someone who does, they can get a piece of the action.
More help is needed
Hundreds of members of the UCSD team are working on the puzzle at any given time. To do it, they arrange the pieces collaboratively on a virtual tabletop and exchange ideas in an online chat environment. But even that approach isn’t enough to solve the more difficult puzzles, numbers four and five. So they’re getting some puzzle-solving help from computers.
“We’re working on computer vision techniques that can actually separate the pieces by similarity,” he said. “If there are some pieces that are blank, we’ll put them in one pile on the virtual board. If there are other pieces that are, for instance, stained from coffee, we’ll put them together. Even if we don’t know their exact location, we know they’re very likely to go together. So with these clustering techniques, we’re helping the actual participants assemble the puzzle.”
But even though the combination of technology and crowdsourcing has a lot of power, it also has some inherent vulnerabilities. Since the UCSD team lets anyone participate, competitors can log in, copy the team’s work, and claim it as their own.
Plus, as Cebrian’s team found out, their open platform tends to invite sabotage.
“When we made it into the top five, we ran into coordinated overnight attacks from at least three or four people. They came in at the same time and pretty much destroyed all the progress we had made,” he said. “We had another attack last night on a larger scale, and they did some really smart things we weren’t prepared for. So we’re improving our security, and we’re also implementing a new board that’s going to be by invitation only for just our top performers on the previous puzzles.”
Cebrian said the shredder challenge has proved the same point the red balloon challenge did: Combining the wherewithal of many people can solve seemingly intractable problems.
Techniques have value
That said, he doesn’t think there’s any reason for information security professionals to wring their hands about their organization’s shreds being pieced back together. After all, his team and its competitors are dealing with one document at a time, not the confetti of hundreds of shredded papers that a potential adversary would find in real-world trash.
But that doesn’t mean the techniques the teams are developing doesn’t have value.
“It could have a lot of applications in the field of biology,” Cebrian said. “It could come into play where we know that all of these biomolecules have a very specific structure that has to be determined. We can talk about proteins, we can talk about genome assembly. For that, this is a very good way to handle that problem. For those problems where you want to use crowdsourcing, you need to keep people engaged and you also need to filter everything that people are producing. As for other applications, I think we need to wait a little bit and let our imaginations fly.”
Photos of the amazing, moving, important and amusing things happening in the federal community.