The important question of data ownership

Over the past months I’ve read 4 books that largely about the same theme: Where do we (as humans) come from, what is our current situation, and what could be our future. The books are Yuval Noah Harari’s ‘Sapiens’, ‘Homo Deus’ and ’21 Lessons for the 21st Century’, and Steven Pinker’s ‘Enlightenment Now: The Case for Reason, Science, Humanism, and Progress’. These books currently lead my list of best books I’ve ever read (with probably a slight win for Steven Pinker) and I encourage everybody to read them. These writers do an amazing job in sketching out the big picture for us humans, clarifying where we’re coming from, where we’re at and we might be heading, while in the same time ask all the right question all of us should be asking. I’ll come back to those books in later posts, since there are so many ideas in there that relate to my mission and the questions I have, but I want to pick out one insight that is particularly relevant for this blog. Both authors mention it, but Harari says it most clearly in his ’21 Lessons for the 21st Century’: The most important question we have to answer is ‘who is going to own the data?’. More details on this later, but the negative and unfixable consequences answers like ‘I don’t know’ or ‘the big tech company CEO’s’ will have are impossible to overestimate. Harari ventures to say that this is the single most important question to be answered in the history of human kind. And that’s a question whose time has come and which WE have to answer. Let that sink in for a moment. As Harari points out data is the resource that will set the global balance of power and over which wars will be fought. He warns not to make the same mistake native Americans for instance made when deceived by imperialists with beads and gold. In other words, realise what you’re giving away and understand the consequences that might have.
One of the biggest problems I see is that most humans have no idea what data is theirs, why this is important and what they should do. The human tragedy of the tendency to value short term positive consequences higher than long term negative consequences. So I’m not convinced we should depend on humans to make the right decisions. The only option we have is to design media that respect mentioned data ownership requirements and make sure those media are available and easy to use.
In one of my previous companies (Contracts11) we worked on solution that I think deserves more attention. Our idea was a new way to build information systems that respected the following requirements:
1. There is only one source of every piece of data
2. Data ownership is clear for everybody
3. Data exchange is always governed by a contract
Ad 1. This idea is one of the deep insights from Georg Gilder’s ” Telecosm: The World After Bandwidth Abundance” (2002) that copies are no longer needed given enough bandwidth. A single source suffices. The practical consequences of this insight are quite amazing if you think about it, but one of the most profound is that it becomes easier to assign ownership and stay in control. This might have seem as a pipe dream for the last decades, but we are slowly but surely moving in this direction. The slow migration of almost every conceivable software service to the cloud is unstoppable. Music (Spotify), games (Steam), films (Netflix), productivity applications (Office 360, Google Suite, Photoshop), etc. It already doesn’t make much sense anymore to buy a multi-terrabyte laptop (although they are available on the market), since you won’t store any films, photo’s, applications, etcetera on them any more. A clear example that we are heading towards the world Georg Gilder described. You could say that files are replicated in ‘the cloud’, but conceptually your dealing with one piece of data (there is one access point to it, and one owner).
Ad 2. Possibly one of the biggest tragedies of the last decade is that it has been unclear who owns which data and the big tech companies (Google, Amazon, Alibaba, Facebook, etc) stepped forward and claimed their turf. A bit like the British colonised many parts of the globe by ‘the cunning use of flags’ (as hilariously pointed out by comedian Eddie Izzard). The natives were so impressed by the flag, the free beads, gold, email/chat/doc services that they gave away something of which they only later realised its worth. By then it was simply too late to turn back the clock and set the record straight.
Ad 3. To prevent misuse of data rightful owners should be able to enforce everybody to play by the rules. Fortunately there is an institute that was designed just for that, the nation state. With its trias politica nation states have a mechanism to create, set and enforce the rules through politics, the military and the legal system. The only thing owners of data have to do is set the conditions under which their data can be used and what will happen if others don’t abide to those rules. This can, obviously, be set in a contract.
So what we at Contracts11 have built were contract-based information systems. They consisted of data sources whose ownership was clear, which contained original data (instead of copies), and where every data exchange was governed by a contract. For every use of a certain piece of data consumers had to sign the contract and in case they misused it they could expect legal consequences. What the contract enforced was of course up to the owner of the data, but generally they would state requirements such as:
– The owner of the data
– Purposes (processes) for which the data might be used
– Who could use the data
– Whether the data could be temporarily stored by the consumer
– Which court of law would be used in case of a dispute
There were a number of insights taken away from a number of pilot projects we did. First of all it turned out that such a contract-based system was no less user friendly than regular applications. On the contrary, instead of having to fill out forms for e.g. an address (with the chance of making spelling errors in copies of that data), users could simply read the contract and check a box.
Another insight was that having no copies had a large number of unforeseen positive consequences. Because it became increasingly clear that ‘big data’ was more often a burden than a blessing. You had to store, maintain, protect, clean, copy, backup, etcetera it, while at that same time it was unclear why you needed all that data in the first place.
This resulted in another insight, that such contract-based information systems enforce data-consumers to only require data that they really need. In other words, they would rethink their processes and model them in such a way that they would lead to the desired state with as minimal data as possible. We called this ‘data minimalism’ and often explained this with the Albert Heijn Sperziebonen example (sorry if you’ve never been to The Netherlands). AH have been tracking most of their costumers for many years via their bonus card. This card is an excellent example of consumers letting short term gains prevailing over long term losses because they don’t fully understand what they are giving away. The strange thing is that in the name of ‘big data all the things’ AH has been harvesting an incredible amount of information while the only answers they needed were often pretty simple and could be asked directly, without the need for this whole ‘big data’ circus. They might for instance be interested wether you (as a customer walking in the store on a Wednesday) would be interested in sperziebonen. Your answer would be a simple yes, no or maybe, and that would have been enough for AH to fire off some process of, for instance, actually offering you sperziebonen for a special price. No need to collect all kinds of relevant data and process, maintain and protect it.
This approach is a practical solution that could help answer the all important question of ‘who owns the data’, as stated in the beginning of this post. It shows that we have to think about, and work on, the medium through which the information flows. It should enforce proper behaviour, adapt to its users, be transparant for any kind of message, not alter the message, etc. This is a big undertaking that requires the combination of many disciplines, from deeply technical to highly philosophical, but it can be done. And should be done. And that is one of my interests and topics of this blog.