Latest machine learning research at Apple offers ‘DD-GloVe’, a train time debiasing algorithm to learn word embeddings by leveraging dictionary definitions

Word embeddings can meaningfully capture semantic and syntactic similarities between words. Word2Vec, GloVe and FastText are popular incorporations. Despite the growing popularity of contextual word integrations such as BERT and ELMo integrations, current research continues to use static word integrations as input to its state-of-the-art algorithms in natural language processing and computer vision applications in downstream. Despite its effectiveness, biases in word incorporations show negative associations between certain ideas. The researchers first discovered that the distance between man and woman is comparable to that between the programmer and the housewife. Similar phenomena in word incorporations lead to biased readings in the analog word task, with specific terms associated with gender, racial, and religious biases. If used in downstream tasks, biased word embeddings would generate allocation and representation damage.

Learning unbiased word incorporations is essential. However, dictionary definitions are a neutral source for reducing bias in word incorporations. Objective, unbiased, brief definitions of terms in a dictionary can serve as unbiased reference points. They suggest encouraging word incorporations to be comparable to their relatively neutral representations in a dictionary for bias reduction. Simultaneously train and deflect word embeddings from a new starting point to learn distribution patterns while mitigating bias using dictionary definitions. Also, some gender-bias reduction algorithms rely on a pre-compilation list of seed words to approximate the gender direction, along which the vector component is removed to mitigate the bias.

Dictionary definition Contributions from us They present DD-GloVe, a train-time debiasing approach for learning bias-reduced GloVe word embeddings, taking advantage of the benefits of definitions. They found that, given a pair of seed words, dictionary meanings could aid in the automated search for suitable seed words. Consequently, the compilation of the seed word becomes automatic. They also discovered that the less there is human effort. In the word integration space, the artificially created seed words better encapsulate the concept of gender.

In summary, they provide the following:

  1. They provide four dictionary-guided loss functions that promote word incorporations to contain less biased information and deeper semantic understanding by referring to their relatively neutral dictionary definition representations.
  2. Given a single pair of initial seed words, DD-GloVe automatically approximates the skew direction. This approach identifies the most attribute-specific definitions by projecting the embedded definitions onto the difference of the embedded definitions of the original seed words. They average the embeddings of the most attribute-specific terms to approximate the direction of the bias.
  3. They empirically show that DD-GloVe successfully learns bias-reduced word embeddings with peak WEAT results. Moreover, their tests indicate that debiasing can be accomplished without affecting semantic meaning.

The code for DD-GloVe, a train time debiasing approach to learning GloVe word embeddings using dictionary definitions, is publicly available on GitHub.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Learning Bias-reduced Word Embeddings Using Dictionary Definitions'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github link and reference article.

Please Don't Forget To Join Our ML Subreddit

Consultant intern in content writing at Marktechpost.

Sherry J. Basler