Open Source Language AI Challenges Big Tech Models Machine Learning Times

Originally published in Nature.comJune 22, 2022.

An international team of around 1,000 volunteers, many of them academics, tried to break big tech’s hold on natural language processing and reduce its harms. Trained with $7 million in publicly funded computing time, the BLOOM language model will rival those of Google and OpenAI companies in size, but will be open source. BLOOM will also be the first model in its range to be multilingual.

The collaboration, called BigScience, launched an early version of the model on June 17 and hopes it will eventually help reduce harmful outputs from artificial intelligence (AI) language systems. Models that recognize and generate language are increasingly being used by big tech companies in applications ranging from chatbots to translators, and can seem so eerily human that a Google engineer claimed this month that the the company’s AI model was sentient (Google strongly denies that AI has sentient). But these models also suffer from serious practical and ethical flaws, such as the repetition of human biases. These are difficult to address because the inner workings of most of these models are closed to researchers.

As well as being a tool for exploring AI, BLOOM will be open to a range of research uses, such as extracting information from historical texts and making classifications in biology. “We believe that getting access to the model is an essential step in doing responsible machine learning,” says Thomas Wolf, co-founder of Hugging Face, a company that hosts an open-source platform for AI models and insights. datasets, and helped spearhead the initiative.

To continue reading this article, click here.

Sherry J. Basler