panamapapers555295

The Panama Papers may have been the biggest cross-border investigative journalism project in history, but it’s only the beginning.

Investigations like this are made possible by intelligence analysis and document discovery software that until recent years was only available to intelligence and law enforcement agencies. But as computer power has increased, so has the size of the potential market. These tools are now priced such that even journalists can afford them, and they’re so well-designed that even journalists can use them.

For the Panama Papers, the International Consortium of Investigative Journalists used software from the Australian-born company Nuix. While the company certainly has some strong competitors — including Palantir, i2, and New Zealand’s Wynyard Group — Nuix’s tools are used by the US Secret Service and Department of Homeland Security, INTERPOL and, here in Australia, the Department of Defence, ICAC, and various other agencies.

Such tools ingest all of the documents associated with an investigation — whether they’re emails, reports, spreadsheets, faxes, or phone and data logs such as those kept under mandatory data retention laws — allowing investigators to search them in pretty much any way they like. The results are then linked to documents the system has decided are related.

“We [highlight] people’s names, countries, telephone numbers, email addresses, company names, credit card details, lots of very high-value pieces of information, depending on what you’re looking for,” said Nuix chief executive officer Ed Sheehy.

“[For example with] credit card numbers of interest, we will show you automatically that it was in an email, and there was four documents attached to it, and the same sender has sent 15 emails in the past, these [potentially different] credit card numbers are inside those 15 emails, and these were the images that were inside there.”

Sheehy was speaking at a gathering of some of the nation’s leading investigative journalists, plus me, at Nuix’s Sydney headquarters on Monday. Those who’d used the software on the Panama Papers sang its praises.

The AFR’s Neil Chenoweth described the process as “absolutely exciting”.

Four Corners journalist Marian Wilkinson was “incredibly impressed”, and told Crikey it was a “fantastic system, no doubt about that”.

“You could try and plough away in the server … to try and do the research on the documents, which was bloody hard I have to say, but also you could share information leads, and searching tips with the journalists from around the world working on the project, and that was immensely helpful,” Wilkinson said.

Problems included “lots and lots and lots” of false positives when the system returned every document referring to a particular name, not just the person of interest. That was inevitable, though, and it didn’t blunt Wilkinson’s opinion.

“For me, personally, again not coming from a background of big-data journalism, I was so immensely impressed … In my journalistic life, it was a life-changing experience. Because it showed me the potential of big-data searches, and it showed me something I’ve always believed in in journalism: the human element is critical to match with the data searching, [such as] the work of your colleagues who could share information.”

The 11.5 million documents of the Panama Papers actually provide a tiny dataset when compared to legal discovery cases. Nuix’s biggest effort was “just shy of four billion documents”, said Sheehy, including 3.1 billion emails, 440 million Word documents, 330 million Excel spreadsheets, and “a raft of other stuff” in the archives of a Wall Street bank that was hit with 15 different court cases at once.

Intelligence analysis software can already deal with huge datasets. The next breakthrough will be wider availability and sophistication of existing techniques such as latent semantic indexing, which allows the system to figure out whether a mention of “football” is about sport or a codeword for a bomb or a drug delivery, sentiment analysis, and voice and video searching and transcription.

There’s a flipside, though. Yes, investigative tools are getting more powerful, but organisations will tend to keep less historical data, making things harder for investigators, according to Chris Pogue, a member of the US Secret Service Electronic Crimes Task Force, and Nuix’s senior vice president for cyber threat analysis.

“A big part of security is not keeping things you don’t need any more, apart from compliance regulations and things like that, in almost every case it is in your best interests to only keep data that is relevant to your current operating business,” he said.