Police departments were in a difficult situation. Forced to monitor the streets using History, they suddenly witnessed every single time someone broke the law in public. They could not ignore what they saw because everyone else had the same access to History and could see it too. But the bureaucracy that came with dealing with every single petty theft or minor contravention was just too much.
It took no time for the tech sector to step up and offer a solution. Using Artificial Intelligence (AI), they promised to detect those “minor inconveniences” that drove the police crazy, identify the violators, and start legal processes automatically. Although automating law enforcement was enough to scare the most law-abiding citizens, the Faustian bargain did not stop there: the companies also promised to predict when crimes would happen.
Their solution was to use Machine Learning (ML), a branch of AI based on learning patterns from massive piles of data and using those patterns for recognizing objects, words, or people. Sometimes these systems are taught to identify specific things. For example, a system trained with millions of pictures of dogs can tell if another picture, which it has never seen, contains a dog. Others are generative: when asked for a picture of a dog, they create a new image with one (or at least something resembling a dog: it might have five legs or two tails, but it will still have a certain “dogness” to it.)
Those use cases (image recognition and synthesis) were, together with text composition and speech recognition, some of the most popular and advanced forms of ML until then. How can such a system detect crime?
Those systems work by predicting the correct answer to a question. They compose text by predicting one word at a time. They show photos of puppies by predicting the value of their pixels, one at a time.
As an example: which word should follow in the sentence “The fisher caught a”? Most people will guess “fish”, or the name of a fish species (some cynics might suggest a plastic bag or a shoe). We can say that because we know something about fishing and what inhabits bodies of water. But a Machine Learning system knows nothing about the world. Instead, after reading millions of sentences, it has learned which words are most likely to appear after others. The word “fish” means nothing to the ML system but, statistically speaking, it is the most likely one to follow “the fisher caught a.” Using this knowledge, a system can write a sentence starting with “The fisher” by predicting that the next words are probably “caught”, “a”, and finally, “fish”. It just creates a sequence of words, one by one.
The same technique can predict crime. Instead of training with words, the system trains with actions. Imagine the following: a person parks a car in front of a bank, grabs a gun, gets out of the car, and enters the bank. What is the following action in that sequence? Anyone who has watched enough Hollywood movies can assume they will rob the bank, enter the car, and escape.
A system for detecting crime would not train by watching films but with replays from History. And like the system that could guess the word “fish”, after watching enough replays, this one could predict that the person with the gun will rob the bank. Of course, this required a lot of work. The algorithms needed to be modified to train from History. The training data had to be collected and annotated with additional information (like the word “dog” in the pictures of dogs).
Until then, crime prediction algorithms had found two main uses. First, to predict which parts of a city had a higher likelihood of crime, allowing police departments to use their resources efficiently, patrolling some areas more than others. Second, to score people to determine whether they were likely to commit crimes in the future. Judges used those scores to rule on fines and bail. Both cases suffered from bias: among other issues, police would patrol predominantly poor neighborhoods, and judges would underestimate the risk of white people and overestimate that of people of color.
Tech companies claimed their systems would have no bias because as History contains “all the information on everything happening all the time,” everyone would be treated equally. Before there was time to prove them wrong, something else happened: someone stole and leaked the trained models.
The models are where the Machine Learning algorithms store what they learned. They result from thousands of hours of training using powerful computers and millions of examples. It is usual for companies to discuss and show their algorithms publicly. Models are usually not released, as the cost of creating them is high.
While other companies could not use the stolen models, a community of researchers and enthusiasts soon created others that mimicked their capabilities. These models also allowed fine-tuning. For example, a model that recognizes people’s behaviors and actions can be refined for sports, dance, or other human activities. These new, modified models were legal but still needed access to History to make predictions. The History Company saw the business potential and opened up metered access, letting anyone create predictive systems as long as they paid for the data. Every company, from insurance brokers to fashion brands, scrambled to find ways to add predictions to their services. Nobody wanted to be left out.
A problem with these models was that they used substantial amounts of computing power and energy. Big companies could afford to run them, but smaller ones and non-profit organizations could not. A second wave of models that traded off computing power for precision soon emerged.