You know me. And for those who do not know me yet – I love to share my stories.
Subscribe
Icon Rounded Closed - BRIX Templates

AI faces a snag: nobody knows how it works. Is that a problem?

AI is poised to rapidly change our societies. But there’s one snag: nobody fully understands how many AI models make decisions, including the developers who created them. Why is AI decision-making so hard to untangle and who is legally responsible for the actions of AI models? With the new EU AI law coming into effect soon, let’s take a look at this fascinating topic.

Email Icon - German Ramirez

Subscribe to my newsletter

Lipiscing elit. Mi, diam venenatis amit.

Thanks for joining our newsletter.
Oops! Something went wrong.
AI faces a snag: nobody knows how it works. Is that a problem?

AI is poised to rapidly change our societies. But there’s one snag: nobody fully understands how many AI models make decisions, including the developers who created them. Why is AI decision-making so hard to untangle and who is legally responsible for the actions of AI models? With the new EU AI law coming into effect soon, let’s take a look at this fascinating topic.

The price you pay for a product online, the cost of your insurance cover, the response to your job application, and even your request for medical treatment: All of these decisions could soon be taken with the assistance of AI. 

In some respects, debates about AI ethics are nothing new. Just like many other transformative technologies that came before it, AI has a range of consequences that could be considered positive or negative. Many inventions can not be neatly categorised as good or bad. The internet made communication at scale trivially cheap, connected millions of people, and gave rise to democratic movements. But it has also been used by governments to carry out mass surveillance and repression. Similarly, nuclear technology can be used to produce a stable and reliable supply of energy with a relatively low carbon footprint, but has also been used to produce weapons of mass destruction.

Thus, not all of the ethical issues posed by AI are entirely new. Just like other technologies, AI could be hacked or abused by malicious actors, making cybersecurity a critical concern. Similarly to big data and social media, there are major concerns about data privacy — particularly as people begin to bond with their AI assistants, let their guard down, and reveal some intimate and deeply personal information like advice on saving their marriage. But in at least one respect, AI is radically different to everything that came before.

Why AI is different from every other technology

With all previous iterations of technology, there were clear lines of accountability. If you created or misused the machine or software, you were responsible for the consequences. With AI, it is not always that simple. 

Unlike traditional software, machine learning (ML) models can change when exposed to new data. To understand why, you need a high-level overview of how AI models are trained. If an ML model is trained through supervised learning, labelled data is used by the model to predict future outcomes. For example, a model could be fed with emails which have been classified as spam, promotional, urgent or standard. The model would then analyse the data and use it to predict how other emails outside the set should be filtered. (Incidentally, remember all those times you had to identify traffic lights in a CAPCHA? The main purpose of this was not really to prevent bots, but to label Google’s images for supervised learning.)  

In other cases, unsupervised learning can be used to uncover hidden connections in unlabelled data. IBM says that this method is often used for cross-selling strategies, customer segmentation and recognizing unlabelled images. Finally there is reinforcement learning. This has been compared to training a dog — the ML model is trained on real-world, labelled data and tries to optimise the number of rewards and minimise the number of punishments it receives for making specific decisions. 

Supervised learning typically produces the most consistent and accurate outcomes over time as it’s based on specific input-output pairs. That’s the reason our spam filters are so good these days. However, unsupervised learning and reinforcement learning are less predictable, so it can be really hard to know exactly what they will do in every situation. Previous computer programs, while sometimes complex and often inflexible, were theoretically predictable — fixed code produced fixed outcomes. In comparison, AI models are more akin to an organism that is constantly adapting to the environment around it.  According to experts, this involves billions of connections, “the equivalent of many millions of lines of code”. As you might expect, this makes it difficult if not impossible for any one human to grasp a model in totality.

Behind the curtain: Peaking into AI’s black box

This unpredictability comes with risks. Take OpenAI’s latest reasoning model, o1, which uses a combination of the different learning models described above. The AI security firm Apollo found that in some instances, the model is so focussed on earning rewards that it will trick the user to receive them. 

For instance, when asked for a brownie recipe with references, the reasoning model recognized that it couldn’t access URLs. Confronted with the possibility of disappointing the user by informing them of this limitation, it decided to fabricate URLs to maximise its chances of receiving a higher reward. Unlike AI hallucinations, which can be considered unintentional, this was a case of deliberate deception of the user. The model even checked to see if the developers were watching before taking action. 

Now, we’ve all known a dog that steals food from the kitchen counter when he thinks his owner is not looking. But AI security researchers are worried about something a lot more consequential: so-called “runaway scenarios”, where a model is so transfixed on achieving certain objectives that it ignores or fakes compliance with in-built safety measures. At the moment, this remains a theoretical concern rather than a clear and present danger, but it certainly provides food for thought given the pace at which the technology is progressing.  

To mitigate these types of risks, researchers and policymakers emphasise the importance of study and testing to ensure the reliability and consistency of systems. AI models often appear to be a bit like a black box: developers and external researchers find it hard to explain why certain inputs result in specific outputs. Anthropic, the company behind the Claude chatbot, has a whole team of researchers focussed on AI interpretability. The primary goal of this research is to try to better understand how their own model reaches decisions. But there is still a long way to go — some researchers think that it is a similar undertaking to fully understanding the inner workings of the human brain, a challenge that has eluded neuroscience for decades.

 

So how can we control or regulate systems that we do not fully understand? And who is responsible for actions taken by an AI model that were not explicitly coded by the developer?

Code & consequence: How the new EU AI Act assigns accountability

Regulators have proposed a range of measures including fallback mechanisms, automatic stops and mandatory human intervention in certain scenarios to ensure that AI retains an element of human oversight. For example, the recent EU AI Act stipulates that providers and users of high-risk AI systems need to specify the intended purpose of a model, and include systems for human oversight and monitoring. It also includes requirements concerning data quality, traceability, transparency, accuracy, cybersecurity and robustness. Any system that profiles individuals, processes personal data, or acts as a safety component of a product is considered “high-risk” under the legislation. 

Legally, in Europe at least, this makes developers and users accountable for AI models. It is still not clear how AI providers will ensure compliance with the legislation, but with fines of up to 7% of annual turnover, they will need to figure it out soon. The law is scheduled to be introduced in phases over the next two years. ETH Zurich and Bulgaria's INSAIT created a tool to assess current compliance, finding that prominent models like ChatGPT score relatively low on discriminatory output and that others face challenges related to cybersecurity.

Humans may not fully understand how AI models work yet, but they will be held responsible if things go wrong. It will be fascinating to observe how this shakes out over the next few years.

Photo by Sam Moghadam on Unsplash

Thanks for joining our newsletter.
Oops! Something went wrong.