Imagine a world where data is working hard to put resources in the right places and get ahead of problems: getting humanitarian aid to where it’s needed when it’s needed, identifying and prioritising individuals at risk so that they can be helped, or spotting where things are going wrong before they cause any harm. It’s exciting, and it’s obvious that this could make the world a much better place. In some cases, it’s already doing so.
Now imagine a world where data is creating a panopticon, where a picture of each individual or household – their propensities, faults, and characteristics – is known and owned by the state or the corporations that manage large datasets. It’s a panopticon that might be benign for many, but not for all.
Both of these worlds could be the same place, and that’s both the exciting promise and the risk of working with data. We can use collections of data to do very useful things which are very good for the individuals concerned.
If we’re in hospital, we want doctors to collect lots of data in the form of medical observations, to help them make medical decisions and make us well again. We’re also usually happy for them to use the same data to understand patterns and create predictions that might help future patients too.
We know that we need to be careful not to expose that personal data through, for example, poor info security practices. But we can also use data to infer all kinds of things which may be true but private, or even altogether false. In the ‘true but private’ group, there’s social media metadata that tries to work out who is having a romantic relationship with who. In the ‘entirely false’ group, there are some predictive policing systems that are much better at identifying “people who police tend to arrest” than actually identifying “people who have committed crimes”. A subtle, but important difference.
Using apparently ‘anonymous’ data doesn’t necessarily make this problem go away, either; we are increasingly realising that most anonymous data that is sufficiently detailed to be interesting, is actually surprisingly easy to re-attach to a person.
Major risks, major opportunities
The challenge of all of this is that, by gathering together and making use of data, we realise both the potential for enormous good, and create potential for enormous harm.
Harm that persists as long as the data does. That data might pass through various hands, each with different motivations, over time. The “move fast and break things” philosophy of the last decade’s tech culture becomes dangerous when thinking about getting and using data about humans. Once it’s out in the world, and connected to other datasets, data is very hard to make private again. This means that, whenever data is involved, we have to be much more circumspect about the what and how, than we are when we build other kinds of tech which are easy to shut down if necessary.
While the potential to use data to make things better is enormous, there are also a number of non-intuitive pitfalls to both collecting data and building and using AI and machine learning systems from that data, which we really have to be aware of if we’re going to move forward safely.
There is of course a very positive side to using data for better public services. The big shift of the last decade is that we now have all sorts of interesting and powerful new technologies that can manage, shape, and make effective predictions from data. Anything from the mundane (grocery shopping recommendations), to the super-complex (self-driving cars).
Public servants now have a real opportunity and responsibility to make the most of these technologies to make lives better. We can often do this using the data that we already have.
For example: data science charity DataKind UK (of which I am a trustee) recently helped a food bank use data to proactively identify clients who were at risk of becoming dependent on food parcels. As a result, those clients could be given early referral to support workers who could help resolve wider issues in their lives. The data already existed; it was used sensitively to create better outcomes for clients, without creating significant new risks to them.
Making sure data works for you
So for those who are keen to make better use of data in their organisation, there are some very clear lessons for 2020 and onwards:
- Think about what you want data to do for you. Forget about technology for a moment: consider what you would love to be able to predict, allocate, or understand better, and what benefits that would create for your clients or users.
- Think about whether this means that you really need to collect or pool new data, or whether you could do more with data that is already there.
- Make sure that you fully understand, and are comfortable with, the future risks any new data collection and collation might create for the people who are in that dataset. Don’t settle for vague assurances about what will happen later to the data. Ensure you understand how it will be made safe and that this will be guaranteed in some way.
- When you get technologists and ‘data people’ involved in your work – be a challenging and questioning customer. Good technologists should be happy to talk in non-technical terms about what the point of data use is; what risks it might or might not create in future, and how those will be managed; and what the benefits will be.
Overall, there is enormous and mainly untapped possibility of making citizens’ lives work better by using data well. Using it in the most thoughtful and effective way is harder than using it lazily, or not using it at all, but stands to be important to all of us in the decade ahead.