How This Machine Learning Researcher Combines Art and Data

Artist and researcher Caroline Sinders uses research-based art projects to examine the impact of data and technology on society.

Although data analytics has become one of the most valuable assets in a company’s arsenal, it is not without its flaws.

A major issue is how societal biases such as sexism can show up in datasets and AI algorithms due to the data that has been entered. One woman trying to tackle gender biased data is Caroline Sinders, a researcher and machine learning design artist.

Speaking to, Sinders said she enjoys using art as a mechanism for criticism.

“Art allows me to visualize current emergencies, or current imaginaries or potential speculative solutions. It also allows me to really play.

Sinders has worked on a number of projects using data science, machine learning, and art. One of them is Feminist Data Set, a multi-year research-based art project that interrogates every step of the AI ​​process, including data collection, data labeling, data training, selection of an algorithm to use and the algorithmic model to check for bias.

She said one of the reasons she wanted to incorporate art into the project was to involve community members in the process and let people ask questions about how to generate an algorithmic model in a way feminist.

“If I was doing this in a much more controlled environment, like in a lab for example, it would have been a much shorter project and we would probably have a lot fewer participants,” she said.

“What I love is that by making it an art project, by stretching it out, it allows me to do things over and over and over again, and I can change things, I can adjust things, I can move things. But also, it allows me to follow the provocations made by the participants. Instead of saying, “Oh, that’s a great idea, but that’s irrelevant,” it gets me to say, “Oh, actually, let’s follow this thread for a second.”

“Datasets should be viewed as organic entities that will one day expire”

She added that making it as strict as a full research project would also limit the type of paper participants could submit. Currently, entrants can submit any type of text, including poetry, blog posts, and song lyrics, to form a model text.

She said that this text template was going to be “misshapen” due to the different types of text used. She also said that, unlike natural image processing, she is interested in manually annotating the data “to then try to imbue the kind of additional narrative within it.”

“It’s also an artistic choice. It becomes like a form of poetry, which also becomes a form of text itself that can fold into text, but that’s not how you would actually generate an NLP model. And I think that’s okay, because it’s still an illustrative step because it’s like a kind of analysis maybe.

As part of the Feminist Data Set project, Sinders also created Technically Responsible Knowledge (TRK), which is a tool and advocacy initiative highlighting unfair labor in the machine learning pipeline.

It includes an open source data tagging and training tool and salary calculator, and was created to be used by non-coders.

“I wanted to include that aspect of the datasheets of, well, what’s the summary someone would add to that? Who did it and what is it and where did it come from? Why is that does it exist? And it becomes the way to sign a data set,” she explained.

“One of the things that really interests me is this idea that datasets should maybe be considered organic entities that will one day expire. So what is the lifecycle or lifespan of a dataset? And then a dataset needs a label. It should have the day it was made or the day it was finished, and who worked on it and where did they come from. So those are other things that I was including in that as well.

AI, machine learning and public good

Outside of her Feminist Data Set project, Sinders is also extremely passionate about designing for the public good and has noticed many examples of how machine learning can benefit society.

One of the areas in which she saw AI used for the good of society was when she was a writing staff on Google’s People and AI Research (PAIR) team, where she examined how different cities used artificial intelligence.

One example was Amsterdam using AI with humans to analyze people making non-emergency phone calls, such as reporting fallen trees or illegal parking.

“They apparently had great success using this. It helped them create different buckets and then it helps humans sort faster for the most part.

“One of the reasons they wanted to do this is that they recognized that a telephone tree they design is probably very confusing to an ordinary citizen or consumer. They know which department has to deal with fallen trees, but a consumer may not know that.”

Sinders also said machine learning has a huge role to play when it comes to the climate crisis. When she was an artist embedded with the European Commission, researchers explained how they used machine learning to analyze changes in thousands of images of coastlines to monitor erosion along with other tools like thermal mapping .

“Machine learning is just able to sort through these images much faster than a person could. And it also provided these different levels of analysis on how things have changed. So machine learning kind of becomes this extra extension of the researcher and is able to provide this really useful analysis,” she said.

“I think there are a lot of interesting moves in the climate change space from companies that are already using machine learning to help analyze aspects of climate change, but also project and create simulations of what is a future if we change different parts of our present,” she added. “I think that’s a really good use of machine learning.”

10 things you need to know straight to your inbox every weekday. Sign up for the brief dailythe summary of essential science and technology news from Silicon Republic.

Sherry J. Basler