A People-Centered Data Privacy AI World

Nov 30 2017

Dr. Pablo Rodriguez

By the middle of the 1980's there was a new idea in the world. An idea that a new system, an invisible system, could run society through a computer world with giant networks of information. This was the beginning of cyberspace. A space where a group of visionaries thought there could be a dream of a new utopia, of a place where the old world could be reinvented. The technological utopians of America imagined cyberspace could become a safe new world where radical dreams could come true.

Despite the great success of the Internet, such a dream is proving more and more elusive. On a day-to-day basis, individuals are leaving an exponentially growing data footprint across all sorts of channels and media. Traces of our Data Souls linger. Increasingly sophisticated AI (Artificial Intelligence) algorithms are beginning to exploit these data traces to create potential information biases and steer people’s attention in ways the public doesn’t realize or want and in ways the cyberspace utopians never imagined.

The benefits of these new technologies are legion. Data and AI algorithms are fast becoming our new guides, offering new intuition, changing us on the inside, making us healthier and more cognitively and physically capable. Fuelled by Big Data, these systems filter, sort, score, recommend, personalize, and otherwise shape human experiences of socio-technical systems. In fact, every waking moment from sunrise to sunset is already suffused with AI. At this very moment, AI algorithms are hard at work predicting our next moves, our likes and wants and needs, connecting us with people we should meet, attacking cancer, and even learning to drive.

Albeit the promise of ground breaking opportunities for social good, improved public health, and easier daily lives, access to the Big Data that is the lifeblood of AI systems often comes at the expense of privacy. This is by no means a trivial matter. Concerns over data security grow with every breach of formerly private databases. A lot of data being collected is not fully secure. Till recently, the combined volume, complexity, and lack of structure of leaked data sets made it difficult for criminals to meaningfully exploit the larger databases. But machine learning has changed all that, with increasingly powerful tools to extract and de-anonymize private information from petabyte data stores. 

Although AI systems bring myriad benefits, they also contain inherent risks, such as codifying and entrenching biases; reducing accountability; hindering due process; and increasing the information asymmetry between the data producing consumers and data holders operating Big Data and AI systems. Without a more people-centered approach, Big data and AI systems could exploit what people perceive as benefits (recommendation engines, suggested news, personalized advertising, etc), to drive purposes which could be in conflict with consumer intentions, needs or goals, e.g. using AI tools to maximize profits regardless of social consequences, would makes experts rightfully concerned.

Consequently, individuals increasingly want to know when their data is being collected, what is being stored, by whom, how it is being protected and how AI algorithms work and make decisions based on it. In fact, a recent survey by the Chartered Institute of Marketing (CIM) of 2,500 people found 57% did not trust companies to handle their data or actions related to that data responsibly. Tim Berners Lee, the inventor of the Web, recently reported the three things that need to be changed to save the Web. And the first one is the fact that people have to regain control of their personal data because he thinks we have lost it.

New anonymized data sharing protocols, informed by lessons from those techniques long used in the medical community, must be developed. The idea is to develop systems which discover data activity patterns without exposing the data itself. For instance, privacy-by-design data science describes the maximum amount of spatial and temporal information an individual can expose that would enable accurate modelling and inference while maximizing privacy preservation. However, we are still a few years from designing AI algorithms that can work with encrypted data, though this is certainly the future we should aspire to. As a result, from the very outset of architectural design, we must make a critical call to arms for everyone to embed new levels of security, authentication and access controls that begin to better limit the data distribution, and at the very least, let the data creators know as the data becomes public. 

The rapid commercial advance in AI capabilities raises a number of important issues, challenging society to ensure non-discrimination and understand decision making in automated systems. Policy makers, regulators, and advocates are all rightfully concerned about the prospect of AI inadvertently encoding bias into automated decisions. In fact, we are only starting to talk about AI algorithmic fairness, accountability, and transparency (FAT).

Current AI tools based on Machine Learning embody the biases of their creators and programmers, and the people who select and configure their training and test data sets. Thus, it is important to be aware of the intrinsic predispositions of the programmers, and develop systems and approaches to control systemic biases in general software systems including these Machine learning tools. 

Once we develop that next stage of AI with inferential and generalization capabilities, and endow it with the ability to make important, even life and death, decisions, then, it becomes even more critical to be able to transparently audit and tune the ethical and programmatic aspects of such AI decision systems. And we can then focus our efforts on transformational opportunities: Digital ethics and digital empathy. 

Similarly, there is increasing alarm that the complexity of machine learning may reduce the justification for consequential decisions to “the algorithm made me do it”. Many AI algorithms employ such complexity that not even their designers can explain or interpret how they arrive at answers. The decisions are delivered from a “black box” and must be taken on faith. That may not matter if AI is recommending a movie, but the stakes are higher if AI is driving a car, offering a health diagnosis, or recommending a next job.

It will be critical that over the next years we ensure more control and understanding of how personal data is being used by AI algorithms to make decisions. There are already various ongoing efforts to make this happen. For instance, MIT, Telefonica, Mozilla and others, have created the Data Transparency Lab as an NGO where scientists, open Internet advocates and public institutions around the world develop tools, services, and personal data banks to help consumers regain control of their data and better understand how it is being used by AI algorithms. This is in line with Telefonica’s approach to hand back data control to users through Aura, a personal data space that would hold all interactions that a customer had with the company and make cognitive sense of this data flow.

A storm may be coming. But that does not mean that our boat will sink. We need to prepare ourselves now by having an open debate about what kind of AI and data world we want. Over the coming years, we should treat our data and AI algorithms as carefully as we treat our souls. We must protect our “Data Souls” with new paradigms that ensure more control and understanding because they will be an integral part of our lives as we evolve as a species.

Dr. Pablo Rodriguez (@pabloryr) is the CEO of Alpha– an innovation facility established by Telefonica to create Moonshots -- multi-year development projects that address big societal problems. Prior to Alpha, Pablo led Telefonica´s corporate research lab and incubator. He has worked in several Silicon Valley start-ups and corporations including Inktomi, Microsoft Research and Bell-Labs. He is best known for his work on peer-to-peer systems in the mid 2000s, for which he was named fellow of the Association for Computing Machinery (ACM). He has co-founded the Data Transparency Lab, an NGO to drive data privacy and transparency. His current interests and upcoming book delve into AI, privacy, Big Data stories, and how to re-think the Internet ecosystem to do Good and be more People-Centered.

Dr. Pablo Rodriguez