No, machine learning cannot predict reliability based on faces

Although facial recognition technology is used by police departments, governments and even online surveillance services, it has little scientific basis and often leads to discriminatory and harmful outcomes. In 2020, research in Nature Communication claimed that changes in historical reliability could be tracked through a machine learning algorithm that evaluated facial clues in the paintings.

Just eight days after the study was published, the journal added an editor’s note, saying some of the criticisms of the article are under further scrutiny. The editors of Nature Communication are still reviewing and reconsidering the study, nearly two years later.

In a response to The debriefing on the status of this review process, a representative for the publication wrote that it is still ongoing.

“Whenever concerns are raised about an article we have published, we consider them carefully, following an established process, consulting with authors and, where appropriate, seeking input from peer reviewers and experts. ‘other outside experts,’ said a spokesperson for Nature Communication in an email.

“Once these processes are complete and we have the information necessary to make an informed decision, we will provide the most appropriate response that clarifies the outcome for our readers,” the spokesperson added.

“In this case, the process is ongoing,” the spokesperson said, noting that they would be willing to provide additional details at a later date “when more information becomes available.”

Rory Spanton, a Ph.D. researcher at the University of Plymouth, and Olivia Guest, Ph.D., assistant professor at Radboud University, recently wrote a review of the study which is currently being studied by the journal. The premises and methods used to develop the reliability algorithm are based on flawed and racist premises.

“Reliability is not a very stable construct between individuals or between societies,” Spanton explained. “What one person sees as a mark of trust may well be interpreted in a completely different way by another person.”

Even though the algorithm was trained to embody how participants rated the reliability of faces in paintings, that doesn’t mean “the algorithm is scientific,” Spanton explained. Spanton and Guest referred to the fact that the use of facial features to identify reliability is rooted in racist and pseudoscientific practices called physiognomy.

The physiognomy gained popularity in the 18th and 19th centuries and was militarized by the Nazis during the World War II. Research doesn’t even support the idea that we can make it accurate character judgments.

“There have always been these kinds of fringe scientists doing this kind of pseudo-scientific work,” Guest said. “But now it’s meeting the hype of machine learning and artificial intelligence.” In recent years, researchers have made similarly erroneous claims about facial features. These studies automate physiognomy to “predict” attraction, sexual orientationand even crime.

In addition to the use of a pseudoscientific premise, Spanton and Guest identified statistical issues in the article as well as selection bias. “That’s the added level of surrealism in this paper,” Guest said, explaining that the study looked at facial features in portraits of predominantly white individuals. The portraits themselves are a romanticized representation. Even participants’ ratings were only weakly correlated with the algorithm’s output, meaning it wasn’t even effective at automating human biases.

The enthusiasm around these technologies allows these studies to proliferate and legitimizes their deployment. “While the algorithm we specifically criticize has not been used or sold, in a commercial sense,” Spanton added. “I think there is real precedent and history of similar packaged algorithms.” Failing facial recognition has real consequences that have led to abusive incarceration and used by repressive regimes to hunt down ethnic minorities.

Technology based on the racist pseudoscience of physiognomy cannot be objective, but the ethics and history of psychology are rarely taught. “Greater engagement with the ethics and then history of these more problematic pseudoscientific practices would be a very good start to empower researchers and funders to critique this type of work,” Spanton said.

Even though pseudoscience is the basis of many facial recognition applications, it continues to be developed and adopted. Buzzwords like machine learning and artificial intelligence continue to garner million-dollar investments and grab headlines.

Simon Spichak is a science communicator and journalist with a master’s degree in neuroscience. His work has been featured in outlets that include Massive science, being patient, futurism, and Time magazine. You can follow his work on his website,, or on Twitter @SpichakSimon.

Sherry J. Basler