Demystifying machine learning systems using natural language

MIT researchers have created a technique that can automatically describe the roles of individual neurons in a natural language neural network. In this figure, the technique identified “the upper limit of horizontal objects” in the photographs, which are highlighted in white. Credit: Jose-Luis Olivares, MIT

Neural networks are sometimes called black boxes because, despite the fact that they can outperform humans in some tasks, even the researchers who design them often don’t understand how or why they work so well. But if a neural network is used outside the lab, perhaps to classify medical images that could help diagnose heart disease, knowing how the model works helps researchers predict how it will behave in practice.

MIT researchers have now developed a method that sheds light on the inner workings of black-box neural networks. Inspired by the human brain, neural networks are organized into layers of interconnected nodes, or “neurons,” that process data. The new system can automatically produce descriptions of these individual neurons, generated in English or another natural language.

For example, in a neural network trained to recognize animals in images, their method might describe a certain neuron as detecting fox ears. Their evolutionary technique is able to generate more precise and specific descriptions for individual neurons than other methods.

In a new paper, the team shows that this method can be used to audit a neural network to determine what it has learned, or even modify a network by identifying and then turning off unnecessary or incorrect neurons.

“We wanted to create a method where a machine learning practitioner can give this system their model and it will tell them everything they know about that model, from the perspective of the neurons in the model, in language. helps answer the basic question, “Is there anything my model knows that I didn’t expect it to know?” says Evan Hernandez, graduate student in the Computer Science Lab and Institute of Artificial Intelligence (CSAIL) and lead author of the paper.

Automatically generated descriptions

Most of the existing techniques that help machine learning practitioners understand how a model works either describe the whole neural network or require researchers to identify the concepts they think individual neurons might focus on.

The system developed by Hernandez and his collaborators, dubbed MILAN (mutual information-guided linguistic annotation of neurons), improves on these methods because it does not require a list of concepts in advance and can automatically generate natural language descriptions of all neurons in a network. . This is especially important because a neural network can contain hundreds of thousands of individual neurons.

MILAN produces descriptions of neurons in trained neural networks for computer vision tasks such as object recognition and image synthesis. To describe a given neuron, the system first inspects the behavior of this neuron over thousands of images to find the set of image regions in which the neuron is most active. Then, it selects a natural language description for each neuron to maximize a quantity called point mutual information between image regions and descriptions. This encourages descriptions that capture each neuron’s distinctive role within the larger network.

“In a neural network trained to classify images, there will be tons of different neurons that detect dogs. But there are many different types of dogs and many different parts of dogs. So even though ‘dog’ can to be an accurate description of a lot of these neurons, that’s not very informative. We want very specific descriptions of what that neuron does. It’s not just dogs, it’s the left side of the ears German Shepherds “Hernandez.

The team compared MILAN to other models and found that it generated richer and more accurate descriptions, but the researchers were more interested in seeing how it could help answer specific questions about vision models by computer.

Analyze, audit and edit neural networks

First, they used MILAN to analyze which neurons are most important in a neural network. They generated descriptions for each neuron and sorted them according to the words in the descriptions. They slowly removed neurons from the network to see how its accuracy changed and found that neurons that had two very different words in their descriptions (vases and fossils, for example) were less important to the network.

They also used MILAN to audit models to see if they had learned anything unexpected. The researchers took image classification models trained on datasets in which human faces were blurred, ran MILAN, and counted the number of neurons nonetheless sensitive to human faces.

“Blurring faces in this way reduces the number of face-sensitive neurons, but far from eliminates them. In fact, we hypothesize that some of these facial neurons are very sensitive to specific demographic groups, which is quite these models have never seen a human face before, and yet all kinds of facial treatments are happening inside of them,” Hernandez says.

In a third experiment, the team used MILAN to modify a neural network by finding and removing neurons that detected bad correlations in the data, resulting in a 5% increase in network accuracy on inputs. exhibiting the problematic correlation.

While researchers were impressed with MILAN’s performance in these three applications, the model sometimes gives descriptions that are still too vague, or it will make an erroneous estimate when it doesn’t know the concept it is supposed to identify.

They plan to address these limitations in future work. They also want to continue to enrich the wealth of descriptions that MILAN is able to generate. They hope to apply MILAN to other types of neural networks and use it to describe what groups of neurons do, since neurons work together to produce an output.

“This is an approach to interpretability that starts from the bottom up. The goal is to generate open, composed job descriptions with natural language. We want to tap into the expressive power of human language to generate descriptions that are much more natural and rich for what neurons are doing. Being able to generalize this approach to different types of models is what excites me the most,” says Schwettmann.

“The ultimate test of any explainable AI technique is whether it can help researchers and users make better decisions about when and how to deploy AI systems,” says Andreas. “We are still a long way from being able to do this in general. But I am optimistic that MILAN – and the use of language as an explanatory tool more broadly – will be a useful part of the toolkit.”

The research was published on arXiv.

Artificial networks learn to smell like the brain

More information:
Evan Hernandez et al, Natural Language Descriptions of Deep Visual Features, arXiv:2201.11114 [cs.CV],
Journal information:

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (, a popular site that covers news about MIT research, innovation, and education.

Quote: Demystifying Machine Learning Systems Using Natural Language (January 27, 2022) Retrieved January 28, 2022 from html

This document is subject to copyright. Except for fair use for purposes of private study or research, no part may be reproduced without written permission. The content is provided for information only.

Sherry J. Basler