Can We Trust AI Emotion Recognition Technology?

about 2 weeks ago by Lianne Frith

With decision making in AI (or artificial intelligence) bigger than ever, engineers are now focused on emotion recognition technology (emotion AI), which has been developed to infer people’s feelings based on facial analysis.

But what our faces show and our actual emotions are very different things. Does emotion AI need further development before it can be trusted?


How Emotional AI Works

Emotion recognition (ER) technology, also known as affective computing, works to code facial expressions. ER systems and products are designed to recognise, interpret, process and simulate human feelings and emotions. This is done by tracking smirks, smiles, frowns and furrows and translating them to the basic emotions of anger, disgust, fear, happiness, sadness, and surprise.

Emotion AI can use any optical sensor, such as a webcam, to measure unfiltered and unbiased facial expressions, identifying faces either in real-time or using an image or video.

Computer vision algorithms pinpoint key landmarks on the face, such as the corners of the mouth, the tip of the nose, or tops of the eyebrows. Then, deep learning algorithms analyse pixels in the landmark areas, classify facial expressions, and map them to emotions. 

However, humans use a lot of non-verbal cues, not just their facial expressions, such as gestures, body language, and tone of voice to express their feelings. To try to account for this, as well as a webcam interpreting facial cues, they also can track colour changes in the face, which pulses each time the heart beats. Ultimately, this is to detect emotions the way humans do, from several channels at once. On the surface, the technology seems to do what it promises.


A webcam that can be used as an optical sensor.

A webcam that can be used as an optical sensor to measure facial expressions. 


Potential Applications

There are a considerable number of companies using emotion algorithms, including big players such as Microsoft, IBM, and Amazon. Affectiva is amongst the leaders, focusing its efforts on using emotion AI within vehicles. To run through some of the potential applications:



Responding to driver emotions to adjust music and temperature or to pull over and offer emergency roadside assistance.


Personalised Viewing

Creating films in which the story changes depending on how the viewer is feeling—the next level in immersive entertainment.


Border Security

Adding an additional layer of detection to determine whether someone is lying at immigration.


Automated Surveillance

Looking for angry threats and then altering surveillance systems accordingly.


Job Interview Software

Using facial cues to detect bored or uninterested candidates.


And that is just the beginning. In theory, the technology could be used to influence legal decisions, educational practices, and national security protocols, as well as to guide the treatment of psychiatric illnesses.

With all of these applications, there needs to be a huge amount of certainty in the results. However, at the time of writing, the technology still has faults: it misreads emotions and misleads the people depending on its accuracy. 


Artificial intelligence concept graphic.

The core concern related to the use of emotion recognition AI is that the readings are still considered unreliable. 


Faults in Emotion Recognition Technology

The problem at the core of emotion recognition technology is that it assumes that the facial expressions we make relate to our emotions. However, a recent study has shown that facial configurations are not reliable diagnostic displays like fingerprints. It is not possible to reliably predict happiness from a smile or sadness from a frown, and much existing technology incorrectly applies this theory as a scientific fact.


An Inaccurate Emotion Labelling System

Existing ER technology combines the techniques of computer vision, which identifies facial expressions and machine learning algorithms to analyse and interpret emotions from those expressions. It is the latter part of the puzzle that is lacking. The technique uses supervised learning, allowing algorithms to recognise expressions that they have seen before.

The labelling method is derived from the Facial Action Coding System, which was developed in the 1980s by Paul Ekman. However, while the theory states that universal emotions show on faces even when a person tries to hide them, many scientists question the theory. The critique is that the labelling method gives preselected emotion labels, whereas emotions are actually more dynamic. The current techniques don’t cover the full cognitive processes, person-to-person interactions, or cultural competencies.


Addressing Context and Bias in Emotion Recognition

Of course, companies at the forefront of emotion AI are trying to overcome the hurdles. Affectiva constantly improves the richness and complexity of its data by using video alongside still images, including contextual data (such as voice, gait, and minute changes in the face that are beyond human perception).

Cultural nuances, moreover, are accounted for by adding another layer of analysis to the system to create ethnically-based benchmarks. However, while all of this should result in more accurate results, there is a chance that it could cause more harm than good: some studies show that facial technologies can create racial biases


A man undergoing a digital facial scan.
A concept image of a man undergoing a digital facial scan. 


Plans for Future Development 

Emotional AI is certainly a market that is set to grow. Gartner has predicted that by 2022, ten per cent of our mobile devices will have emotion AI capabilities. There is a clear driver for improving customer experience and providing entertainment, but also a much more serious one relating to healthcare and automotive that requires extreme accuracy.

As with many machine learning applications, progress in emotion recognition relies on the quality and breadth of the data. There is a focus on multi-modal methods for recognising emotions, enhancing accuracy with data from increased signals.

The other area of growth is in relating emotions to context. A hybrid and multi-model approach would improve results, with machine learning detecting how people move their facial muscles—with another layer accounting for the context of that expression.

Ultimately, existing models are just too simple. It’s not enough to focus on six basic emotions and an archetypal one-to-one mapping. Metrics shouldn’t be used to make decisions just because they can be measured and connections can be made.

The relationship between facial expressions and emotions is a lot more complex, and the algorithms being used need to account for that. The future needs to see the development of more sophisticated metrics and a review of the labelling system to ensure accuracy. Only then can we actually put faith in the technology to influence our lives.