AI models exhibit bias in résumé evaluations: A new study reveals that large language models, specifically Massive Text Embedding (MTE) models, display racial and gender biases when evaluating résumés, mirroring longstanding human biases in hiring practices.
Study methodology and key findings: Researchers from the University of Washington conducted a comprehensive analysis using three MTE models to evaluate hundreds of résumés against job descriptions.
- The study utilized MTE models based on the Mistal-7B LLM, fine-tuned for tasks like document retrieval, classification, and clustering.
- Résumés were first evaluated without names to check for reliability, then run again with names chosen for high racial and gender distinctiveness.
- Over three million résumé and job description comparisons were analyzed, focusing on the top 10% of résumés judged most similar to each job description.
Racial bias in AI evaluations: The study revealed a strong preference for white names across all three MTE models tested.
- White names were preferred in 85.1% of the tests conducted.
- Black names were preferred in only 8.6% of the tests.
- The remaining tests showed insignificant differences in scores.
Gender bias in AI evaluations: The models also demonstrated a significant bias towards male names in résumé evaluations.
- Male names were preferred in 51.9% of the tests.
- Female names were preferred in just 11.1% of the tests.
Intersectional bias: The study found that the bias was even more pronounced when considering both race and gender simultaneously.
- In comparisons between Black male names and white male names, Black male names were never preferred (0% of bias tests).
Consistency across job types: The observed biases were consistent across various job descriptions, regardless of real-world gender or racial distributions in those occupations.
- This suggests that the bias is inherent to the model’s preferences rather than learned from occupational patterns during training.
- The researchers concluded that the models treat “masculine and White concepts” as the default, with other identities viewed as divergent alternatives.
Magnitude of bias: While the preference for any one group in each test was often small, the consistency of the bias across numerous tests is significant.
- The “percentage difference in screening advantage” was typically 5% or lower in most comparisons.
- This bias, while smaller than that often observed in human recruiters, could have substantial cumulative effects across many job applications and roles.
Real-world implications: The study’s findings may not directly reflect how AI tools are used in actual recruitment processes.
- Salesforce, for example, stated that their production models undergo rigorous testing for toxicity and bias before release.
- The company also emphasized the implementation of guardrails and controls to protect customer data and prevent harmful outputs.
Historical context: The study’s results echo previous instances of AI bias in recruitment, such as Amazon’s scrapped internal AI recruiting tool in 2018 that showed bias against women.
Broader implications: This research challenges the notion that AI systems are inherently objective or free from human biases.
- It highlights that AI models can reflect and potentially amplify biases present in their training data.
- The study underscores the importance of ongoing research and vigilance in developing and deploying AI systems, especially in sensitive areas like hiring and recruitment.
Looking ahead: The persistence of bias in AI models raises important questions for the future of recruitment and AI ethics.
- How can developers and companies better mitigate these biases in AI systems?
- What role should regulation play in ensuring fairness in AI-assisted hiring practices?
- How can we balance the potential efficiency gains of AI in recruitment with the need for fairness and equal opportunity?
As AI continues to play an increasingly significant role in various aspects of our lives, including employment decisions, addressing these biases becomes crucial for ensuring equitable outcomes and maintaining public trust in these technologies.
Study: AIs prefer white, male names on resumes, just like humans