What is AI’s black-box and why should you care?

Interest in technologies like artificial intelligence and machine learning has never been greater, with applications being of benefit to both consumers and enterprise. But often such tech must be fed large datasets to be effective, meaning there is a continuous hunger for vast and wide-spanning data.

As a result, public debate around privacy has also skyrocketed.

The AI and privacy balance

Last year’s General Data Protection Regulation (GDPR) was Europe’s first large-scale effort to protect consumers, though one year on little has changed. Today, it is still unclear how society, industry and governments balance our technological reliance, its reliance on data and our right to privacy.

In business, AI is streamlining processes, improving productivity and enhancing customer service. So much so that by 2020, 87% of enterprise software will have integrated AI in some form. While consumers are impacted heavily by business decisions in a wider context, day-to-day implications of AI will likely be through “smart tech”. In the UK, 22.4% of internet users also use a smart speaker — the largest percentage in Europe.

As a result of rising use of internet-enabled devices that consistently collect and analyse data, almost everything you do is trackable. And even what you don’t do.

Didn’t open that email? Noted. Didn’t click that link, or that ad? Noted. All of this information is fed back, accumulated and used across different projects and industries.

There is, therefore, an inherent conflict between AI and privacy. This is then worsened by AI’s lack of transparency in regards to decision making. More times than not, your data is fed into a system and influences the final outcome in some way, but it is not known directly how.

Black-box versus white-box models

Providing explanations has been identified as one of the biggest challenges with AI, third to the importance of context and the need for education and awareness.

Some AI decision making can and already is explainable, but this is restricted to white-box models whereby behaviours, predictions and variables are clearly explained.

This is often the case with linear tree models. Imagine your family tree; a mum is linked directly to her son and this is clearly identifiable to others.

However, black-box methods — used primarily for deep learning and seen in autonomous cars, voice and facial recognition — are vastly different. Due to the complexity of neural networks, black-box models are harder to explain, as only input and output data is visible, meaning a decision is made but there is little transparency as to how or why.

It seems like a no-brainer to opt for white-box models, however some of the most powerful models are black-box. And so programmers are faced with a dilemma: should accountability or accuracy be compromised?

Ideally, neither. That’s why, as more and more organisations implement AI in automated decision making, the problem has been thrust into the limelight. As technology streamlines business processes, those affected by its decisions should understand why that decision was made.

Neural networks mostly use backpropagation in order to learn patterns in data. This is where an algorithm, after coming to the wrong conclusion, is given the correct answer which is then fed backwards through the neural network so it can try to correct itself slightly. The system is then run again, correcting itself a little more each time and gradually finding the best way to structure itself. However, garbage in = garbage out, so if the data is bad, the learned model will be bad, and that also means any biases in the data will be passed forward to the model.

For example, in the case of job applications, race and gender should not be an influential factor when identifying candidates. But in the case of Amazon, its AI was found to be biased against women. Amazon’s algorithm was trained to filter applications based on those of similar tech roles over a 10-year period. However, being a male-dominated industry, the data contained a lot of examples of men being ideal candidates, and thus the AI taught itself exactly that. Not because there was a pattern to suggest men were better, but simply because the data was already biased towards them. As a result, tech’s gender problem is exacerbated. Amazon’s algorithm was later found to be somewhat racist too.

There are also more sinister examples of this, where a “predictive policing” system was deployed in New Orleans that was trained on historic data to “forecast crime and help shape public safety strategies”. This happened a year after a report showed that the police force had “used excessive force, and disproportionately against black residents; targeted racial minorities, non-native English speakers, and LGBTQ individuals; and failed to address violence against women”. Any algorithm trained on this data would simply learn the same biases of the police force that contributed to building the data. This, again, further exacerbates the existing problem.

The AI ran for a year before issues were identified, demonstrating further how problematic black-box models can be. White box models, in contrast, can aid accountability, regulation and audit capabilities. Given the number of organisations becoming reliant on this technology, including governmental organisations to optimise law enforcement efforts, it’s critical that the black-box is cracked open.

Tackling the unknown

AI models are becoming more complex, with greater data requirements as technology and its capabilities advance. When discussing people’s lives, medical diagnosis and pushing scientific boundaries, you really need to know how AIs make their decisions and not just the end result. As a developer, you have to know there isn’t some minute detail throwing the entire model off.

While GDPR does not directly address AI, it does give guidance on autonomous decision-making. Principle 1. (a) requires personal data processing to be fair, lawful, and transparent. Theoretically, this makes black-box models in direct conflict with the guidance.

Explainability may be helped by developers documenting an AI system’s programming to a granular level, demonstrating it is strictly privacy-by-design while paying particular attention to fairness and non-discriminatory decision making. New and emerging ways of training and using AI, such as federated learning and learning on encrypted data, are excellent ways of protecting a users privacy, but these are still open research topics and are not quite ready to be used commercially and come with all the accountability challenges. At Fair Custodian, we are exploring how these methods can help us provide the best experience for users without ever seeing their data!

Recent research found people care about explainable decision making in some contexts more than others, begging questions: will regulation appropriately cater towards different situations and how will a one size fits all approach impact research and AI applications?

And to what level of detail must technology have to reveal? While many believe AI explanations should largely reflect the way human decision-makers provide explanations, others believe a higher standard should be met.

Justifyingly, there are attitudes of exceptionalism around AI. Technology is expected to make greater, more accurate decisions that explore wider contexts and analyse situational outcomes to then choose the best course of action.

Explainability may be one of the largest hurdles to overcome for AI decision making to be accepted in the mainstream, but it is only one of the many problems that the industry will face. After all, people are scared of the unknown. But once they know why AI thinks the way it does, that can only be a step in the right direction. If you want to know more about explainability in AI, we recommend you check out this video.

Fair Custodian is building a platform to create a new type of relationship between consumers and businesses. A relationship based on trust, transparency, and personal empowerment. To find out more about us, check us out at www.faircustodian.com.