Automatic identification of bias in Large Language Models

Defense
Author

Fernanda Malheiros Assi

Published

May 21, 2026

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, from legal reasoning to clinical decision support. As these models become increasingly integrated into real-world applications, concerns about their reliability, fairness, and ethical implications have emerged. Studies have shown that LLMs can produce biased outputs, reinforcing harmful stereotypes and discriminating against marginalized groups. This work proposes a systematic and scalable framework for evaluating and ranking LLMs based on stereotype generation in Brazilian Portuguese. The framework combines template-based sentence generation, human annotation, and supervised classification into a unified pipeline. A set of 164 sentence templates, covering gender, race, and their intersections, was used to elicit completions from 37 LLMs from multiple providers. The resulting sentences were annotated by human annotators along two dimensions: alignment with social stereotypes and potential harm. The stereotype alignment labels served as the foundation for training a BERTimbau-based classifier, selected via nested cross-validation, which achieved a macro-averaged F1 of 0.665. Classifier predictions were then used to construct pairwise match tables, feeding to an Elo rating system, that generated two complementary rankings: a model ranking and a social marker ranking. The results reveal that smaller open-source models tend to generate less stereotyped content than larger commercial ones, and that social markers combining race and gender consistently elicit the most stereotyped outputs across all models. The framework is made available as an interactive interface that supports the incremental addition of new models.

Video

Further information

The Master’s Dissertation Defense by Fernanda Malheiros Assi took place on May 21, 2026 and the commitee was composed of Professors Marlo Souza (UFBA), Renato Silva (ICMC/USP) and the advisor Helena Caseli (UFSCar).