back
Get SIGNAL/NOISE in your inbox daily

Researchers are exploring the use of synthetic faces—computer-generated images that don’t belong to real people—to train facial recognition AI systems, potentially solving major privacy concerns while maintaining fairness across demographic groups. This approach could eliminate the need for scraping millions of real photos from the internet without consent, addressing both ethical data collection issues and the risk of identity theft or surveillance overreach.

The big picture: Facial recognition technology has achieved near-perfect accuracy rates of 99.9 percent across different skin tones, ages, and genders, but this success came at the cost of individual privacy through massive data collection from real faces.

Why this matters: Current training methods involve collecting millions of real photos without consent, creating significant privacy risks and potential for identity theft, while synthetic data could provide the same training benefits without compromising personal information.

How synthetic face training works: The process requires two main steps to create effective training datasets.
• First, researchers generate unique fake faces using the same technology behind deepfakes.
• Then, they create variations of each synthetic face under different lighting conditions, angles, and with various accessories.
• The generators still need training on thousands of real images initially, but far fewer than the millions required for direct recognition model training.

Current performance challenges: Models trained on synthetic data still lag behind those using real-world faces in terms of accuracy.
• A recent study found that while real-face-trained models achieved 85 percent average accuracy, synthetic-face-trained models reached only 75 percent.
• However, the synthetic models showed significantly less bias, with only one-third the variability between demographic groups compared to real-data models.
• The accuracy gap stems from generators’ limited ability to create unique identities and their tendency to produce “pretty, studio-like pictures” that don’t reflect real-world image variety.

What the research shows: A July 2024 study demonstrated that demographically balanced synthetic datasets can reduce racial bias more effectively than real datasets of similar size.
• When tested on African, Asian, Caucasian, and Indian faces, the real-data model showed 90 percent accuracy for Caucasian faces but only 81 percent for African faces.
• The synthetic-data model, despite lower overall accuracy, performed far more consistently across all racial groups.

The privacy problem: Major tech companies have scraped billions of images without permission, creating massive ethical and legal concerns.
• IBM’s “Diversity in Faces” dataset contained over 1 million images taken from Flickr without owner consent.
• Clearview AI, a vendor used by law enforcement, has gathered an estimated 60 billion images from Instagram and Facebook without permission.
• These practices have triggered significant backlash over privacy violations and potential misuse.

What experts are saying: Researchers emphasize the balance between accuracy and ethical considerations in facial recognition development.
• “Every person, irrespective of their skin color or their gender or their age, should have an equal chance of being correctly recognized,” says Ketan Kotwal of the Idiap Research Institute in Switzerland.
• “If you use a less accurate system, you are likely to track the wrong people,” Kotwal adds, arguing for highly accurate systems if they’re to be used at all.

Next steps: Researchers plan to explore hybrid approaches that combine synthetic and real data to improve accuracy while maintaining ethical standards.
• The hybrid method would use synthetic data to teach facial features and demographic variations, then fine-tune with consensually obtained real-world data.
• The field is advancing rapidly, with first proposals for synthetic data in facial recognition emerging only in 2023.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...