News/Research
AI Agent Benchmarking Flaws Could Hinder Real-World Applications, Princeton Study Finds
The rapid development of AI agents has the potential to revolutionize real-world applications, but a recent study from Princeton University researchers highlights several shortcomings in current benchmarking practices that could hinder their practical usefulness. Cost vs. accuracy trade-off: Current agent evaluations often fail to control for the computational costs associated with improving accuracy, potentially leading to the development of extremely expensive agents: Some agentic systems generate hundreds or thousands of responses to increase accuracy, significantly increasing inference costs, which may not be feasible in practical applications with limited budgets per query. The researchers propose visualizing evaluation results as a Pareto...
read Jul 5, 2024AI Lie Detectors Promise Accuracy, but Trust Issues Loom
AI lie detectors show promise in detecting deception more accurately than humans, but their widespread use could erode trust and social bonds. Key study findings: A recent study by researchers at the University of Würzburg in Germany developed an AI-based lie detection tool that outperformed humans in spotting lies: The AI tool, trained on a dataset of true and false statements, correctly identified lies 67% of the time, significantly better than the typical human accuracy of around 50%. When given the option to use the AI tool for a small fee, only one-third of study participants chose to do so,...
read Jul 4, 2024AI Manager Matches Human Performance, But Partnership Delivers Best Results, Study Finds
A study comparing the performance of an AI manager to human managers found that the AI achieved similar results in keeping employees on task, but the best outcomes came from AI and human managers working together. The implications for companies introducing AI tools are explored: Key Takeaways: The study, while small in scale, points to interesting implications for companies introducing AI management tools: The AI manager achieved comparable results to human managers in getting employees to pre-plan workdays (44% vs 45%) and motivating timely log-ins (42% vs 44%). The highest success rates came when the AI manager worked in partnership...
read Jul 3, 2024AI Uncovers EV Adoption Barriers, Sparking New Climate Research Opportunities
Electric vehicle transition facing challenges: The growth in electric vehicle sales has slowed this year, particularly for market leader Tesla, raising concerns about achieving emissions reduction goals in Massachusetts where transportation accounts for the largest share of carbon emissions. Potential factors contributing to the slowdown include high vehicle prices, concerns about limited driving range, insufficient charging infrastructure, and waning consumer confidence in Tesla CEO Elon Musk. A recent survey found that many EV drivers are considering switching back to gas-powered vehicles, further highlighting the challenges in the transition to electric mobility. Leveraging AI to understand consumer sentiments: Researcher Omar Asensio...
read Jul 3, 2024AI Models Used to Analyze Medical Images May Be Biased, Study Finds
A new study has revealed that AI models used to analyze medical images can exhibit biases in their diagnostic performance across different demographic groups, with the most accurate models showing the largest fairness gaps. The key findings and implications are: AI models' surprising capabilities linked to fairness gaps: The AI models that were most accurate at predicting patient demographics like race and gender from chest X-rays also had the biggest discrepancies in diagnostic accuracy between groups, suggesting the models may be relying on "demographic shortcuts" that lead to incorrect results for women, Black patients, and other groups. MIT researchers previously...
read Jul 3, 2024Meta’s AI Breakthrough: Text-to-3D in Under a Minute
Meta introduces groundbreaking AI system for rapid 3D asset creation from text prompts, hinting at the transformation to come in industries from gaming to architecture. Key Takeaways: Meta's new 3D Gen AI system can generate high-quality 3D assets with detailed textures and material maps in under a minute, marking a significant advance in generative AI for 3D graphics: The system combines Meta 3D AssetGen for creating 3D meshes and Meta 3D TextureGen for generating textures, producing assets with high-resolution textures and physically based rendering (PBR) materials 3-10 times faster than existing solutions. Support for PBR materials allows for realistic relighting...
read Jul 2, 2024AI Breakthrough: One-Size-Fits-All Exoskeleton Improves Walking and Running
Researchers at North Carolina University's Lab of Biomechatronics and Intelligent Robotics have developed an exoskeleton that adapts to new users without the need for individual adjustments, thanks to AI-generated control policies. The AI was trained using digital models of the human musculoskeletal system and an exoskeleton robot, eliminating the need for lengthy human trials for each user. This breakthrough could significantly reduce the cost of exoskeletons from around $200,000 to between $2,000 and $5,000 within the next few years. Record-breaking performance: Exoskeleton outperforms competitors in walking, running, and stair climbing; The AI-powered exoskeleton achieved record-breaking metabolic rate reductions in various...
read Jul 1, 2024Telltale Signs: AI’s Growing Influence on Scientific Writing Revealed by Word Frequency Analysis
A surge in certain words and phrases in scientific papers suggests the growing use of large language models (LLMs) in academic writing since their widespread introduction in late 2022: Detecting LLM-generated text through "excess words": Researchers analyzed millions of scientific abstracts, comparing the frequency of words before and after the introduction of LLMs, and found telltale signs of AI-generated content. Words like "delves," "showcasing," and "underscores" appeared up to 25 times more frequently in 2024 abstracts compared to pre-LLM trends. The post-LLM era saw a significant increase in the use of "style words" such as verbs, adjectives, and adverbs (e.g.,...
read Jul 1, 2024AI Prizes Spur Innovation in Animal Communication and Niche Areas, Offering Millions in Rewards
A new wave of AI prizes is spurring innovation in communication with animals and other niche areas, offering millions in rewards to anyone who can push the technology forward in unconventional ways. The Coller Dolittle Challenge: A $10 million prize for interspecies communication; British financier Jeremy Coller has launched a competition to incentivize the development of AI systems that can successfully communicate with animals using their own communication systems: The goal is to decipher animal communication and engage with them using their own "language," rather than training them to respond to human commands. Yossi Yovel, a professor at Tel Aviv...
read Jun 28, 2024ChatGPT Aces Intro Psych Exams, Stumbles in Advanced Courses, Challenging Academic Integrity
ChatGPT outperforms undergrads in introductory psychology courses but struggles in higher-level classes, raising questions about the impact of AI on academic integrity and the effectiveness of AI detection tools. Study tests ChatGPT's performance on university psychology exams: Researchers at the University of Reading conducted an experiment where they submitted ChatGPT-generated answers to exam questions in five undergraduate psychology modules, spanning all three years of study: The AI-generated submissions included both short 200-word answers and longer 1,500-word essays, with minimal editing or formatting applied to the AI-produced content. Out of the 63 AI-generated submissions, 94% went undetected by markers, and nearly...
read Jun 28, 2024AI Game Characters Offer New Window into the Human Mind and Behavior
The way we interact with AI video game characters can shed light on how the human mind works and reveal insights into human behavior, cooperation, and relationships. Here is a summary of the key points: AI characters open up new avenues for studying the mind: AI-driven non-player characters (NPCs) in video games could enable neuroscientists and psychologists to probe more deeply into the mysteries of the human brain and behavior: Scientists have long used video games as research tools to study learning, navigation, decision-making and cooperation. AI characters could greatly expand the possibilities. Rich, immersive game worlds filled with realistic...
read Jun 27, 2024AI Chatbots Are Now Outperforming University Students on Exams, Study Shows
A recent study suggests AI-powered chatbots can outperform university students on exams, often going undetected by markers. This finding raises important questions about the integrity of educational assessments in an era of rapidly advancing AI technology. Key findings from the University of Reading study: Researchers created 33 fake students and used ChatGPT to generate answers for undergraduate psychology exams, with surprising results: The AI-generated exam answers scored half a grade boundary higher on average compared to real students' submissions. 94% of the AI-generated answers did not raise concerns with markers, suggesting they were largely undetectable as being written by AI....
read Jun 26, 2024AI Breakthrough: MatMul-Free Language Modeling Slashes Energy Use, Boosts Accessibility
Researchers claim a breakthrough in AI efficiency by eliminating matrix multiplication, a fundamental operation in neural networks that is accelerated by GPUs, which could significantly reduce the energy consumption and costs of running large language models. Key innovation: MatMul-free language modeling; The researchers developed a custom 2.7 billion parameter model that performs similarly to conventional large language models (LLMs) without using matrix multiplication (MatMul): They demonstrated a 1.3 billion parameter model running at 23.8 tokens per second on a GPU accelerated by a custom FPGA chip, using only about 13 watts of power. This approach challenges the prevailing paradigm that...
read Jun 25, 2024Groundbreaking AI Model Slashes Energy Use, Matches Top Performance
UC Santa Cruz researchers have developed a highly energy-efficient large language model that maintains state-of-the-art performance while drastically reducing computational costs. Key innovation: Eliminating matrix multiplication in neural networks; The researchers eliminated the most computationally expensive element of large language models, matrix multiplication, by using ternary numbers and a new communication strategy between matrices: Instead of real numbers, the matrices use ternary numbers (-1, 0, 1), reducing computation to summing rather than multiplying. The matrices are overlaid and only the most important operations are performed, further reducing computational overhead. Time-based computation is introduced during training to maintain performance despite the...
read Jun 25, 2024Deepfakes Threaten Democracy: Google Study Reveals AI’s Role in Swaying Public Opinion
A new study from Google's DeepMind reveals that the most common misuse of AI is creating political deepfakes to sway public opinion, raising concerns about the impact on elections and the spread of misinformation. Key findings: The research, conducted in collaboration with Google's Jigsaw unit, analyzed around 200 incidents of AI misuse and found that: Creating realistic fake images, videos, and audio of politicians and celebrities was the most prevalent misuse, nearly twice as common as the next highest category. Shaping public opinion was the primary goal, accounting for 27% of misuse cases, followed by financial gain through services like...
read Jun 21, 2024Oxford Researchers Develop Method to Detect AI Confabulation, Preventing Misinformation Spread
Oxford researchers have developed a method to identify when large language models (LLMs) are confabulating, or making up false information, which could help prevent the spread of misinformation as these AI systems become more widely used. Key Takeaways: The researchers' approach focuses on evaluating the semantic entropy of an LLM's potential answers to determine if it is uncertain about the correct response: If many of the statistically likely answers are semantically equivalent, the LLM is probably just uncertain about phrasing and has the correct information. If the potential answers are semantically diverse, the LLM is likely confabulating due to a...
read Jun 21, 2024Breakthrough AI “Hallucination” Detection Method Unveiled, Boosting Reliability
New research reveals a breakthrough method for detecting AI "hallucinations," paving the way for more reliable artificial intelligence systems in the near future, although challenges remain in integrating this research into real-world applications. Key Takeaways: The study, published in the peer-reviewed scientific journal Nature, describes a new algorithm that can detect AI confabulations, a specific type of hallucination, with approximately 79% accuracy: Confabulations occur when an AI model generates inconsistent wrong answers to a factual question, as opposed to providing the same consistent wrong answer due to issues like problematic training data or structural failures in the model's logic. The...
read