The Upside of NSA's PRISM and GCHQ's Tempora

Thanks to Edward Snowden, we've all read about PRISM and Tempora; how the NSA and the GCHQ are spying on everyone in the connected world. The US intelligence agencies want to achieve total information awareness, and needed a partner in crime, the GCHQ. 

The real question remains, however: How useful can these systems really be?

With several billion internet users, finding terrorists among them is like finding needles in a haystack. In order to be successful at this soul-crushing task, there's only one approach: maximize the number of needles per unit of hay. Hence, you can either make sure that there are lots of needles (not good; lots of terrorists), or that you have as little hay as possible (good; violates the privacy of fewer people). The NSA and GCHQ are doing neither.

Since we don't know exactly how these spy systems work technically, we have to model them as a black box. The output from the box is a list of potential terrorist and the input is communication data from a list of people. Essentially, the box is performing a filtration from X number of people to Y number of suspects. 

Now here's the catch: you want to avoid false-positives. Otherwise, you might end up dragging lots of innocent people off to secret locations to have them tortured...er. Anyway, avoiding false-positives means performing real-world investigation of everyone that your system flags as a terrorist. Since manpower is limited, so is the number of suspects you can investigate. In other words, Y cannot exceed, say, 10,000. Regardless of X!

PRISM and Tempora are useless because the input X is unbounded. This only increases the required filtration further down the computational chain. For example, if you put a million people into your system and have a near-magical filtration of 99%, you get 10,000 suspects out. If you're the NSA or GCHQ, feeling greedy and powerful, you put three billion people into your system. With the same unrealistic 99% filtration you now get a list of 30 million suspects. Good luck getting the FBI or Scotland Yard to investigate them all.

The very last thing you want to do when looking for needles in a haystack is to make the haystack bigger. In order to winnow down a list of three billion people to 10,000 suspects, you need an accuracy of 99.9997%. Given how few terrorists there are to train your algorithms, this is far beyond even the most sophisticated Baysian model. To top it off, you need to investigate all suspects while your algorithms are still in training, having a much worse filtration.

The bigger worry of PRISM and Tempora is the automatic handling of the output. Will all suspects be put on the no-fly list? Will they all have their bank accounts blocked? One thing you can be sure of: their privacy will be violated for a very long time, to furthest extent that the internet allows, regardless of national borders or legality. There are no good ends to any of these paths, and as the Economist points out: the real problem is bathtubs.