News/AI Models
OpenAI Reportedly Working on New Advanced Model Codenamed “Strawberry”
OpenAI is reportedly developing an advanced AI model codenamed Strawberry that can autonomously navigate the internet to conduct in-depth research and reasoning. Key details about Strawberry: The specifics of the model are being kept tightly under wraps, but sources suggest it will bring advanced reasoning capabilities to OpenAI's AI: The model is thought to be a renamed version of OpenAI's previously reported project Q. It aims to enable AI to "see and understand the world more like we do" according to an OpenAI spokesperson. Strawberry will reportedly be able to perform research independently by navigating the internet. Reactions and competitive...
read Jul 13, 2024Meta AI’s “System 2 Distillation” Represents an Efficiency Breakthrough for LLMs
Meta AI researchers are advancing a new technique for Large Language Models (LLMs) called "System 2 distillation," which improves the reasoning capabilities of these models without requiring intermediate steps. The finding holds implications for making models faster and more computationally efficient. System 1 and System 2 thinking in cognitive science and LLMs: The article draws a parallel between the two modes of thinking in humans - fast and intuitive System 1, and slow and analytical System 2 - and how they relate to LLMs: LLMs are usually considered analogous to System 1 thinking, as they can generate text quickly but...
read Jul 13, 2024DeepMind’s “PEER” Architecture Scales Language Models While Keeping Costs Low
DeepMind's new Mixture-of-Experts (MoE) architecture, PEER, scales language models to millions of tiny experts, improving performance while keeping computational costs down. Key innovation: Parameter Efficient Expert Retrieval (PEER); DeepMind's novel MoE architecture introduces a learned index to efficiently route input data to a vast pool of millions of tiny experts, enabling significant scaling without slowing down inference: PEER replaces the fixed router in traditional MoE with a fast initial computation to create a shortlist of potential expert candidates before activating the top experts. Unlike previous MoE architectures with large experts, PEER uses tiny single-neuron experts in the hidden layer, allowing...
read Jul 13, 2024AI’s $100B Ceiling: Scale is Driving the AI Industry, But How Long Can It Last?
Azeem Azhar discusses the scaling laws driving progress in AI: AI's rapid progress is being propelled by exponential increases in model size and training costs, but questions remain about the long-term sustainability and limits of this scaling approach. Key takeaways from AI scaling laws: Research has consistently shown that larger AI models, trained on more data, tend to outperform smaller models: Simple, general learning approaches leveraging massive datasets have proven more effective than attempts to build in human knowledge and intuition. OpenAI's study on "Scaling Laws for Neural Language Models" highlighted that performance depends strongly on model size, dataset size,...
read Jul 13, 2024SambaNova Wins “Coolest Technology” Award at VentureBeat 2024
SambaNova Systems, a Palo Alto-based AI chip startup, has won the "Coolest Technology" award at VentureBeat Transform 2024, highlighting its innovative approach to AI computation and its potential to reshape the enterprise AI landscape. SambaNova's unique architecture: The company's latest chip, the SN40L, is built from the ground up for AI computation, using a "reconfigurable dataflow" architecture that optimizes data movement and provides the lowest latency inference, the highest number of concurrent LLMs, and the lowest switching time between different LLMs. SambaNova's approach focuses on streamlining data movement, which is identified as the critical bottleneck to the performance of high-performance...
read Jul 13, 2024Pinterest Launches Canvas AI: Transforms Images, Enhancing Visual Discovery and UX
Pinterest unveils Canvas AI, a powerful new foundation model that enhances visual discovery by transforming existing images based on user preferences and textual descriptions. Key Takeaways: Canvas represents a significant advancement in Pinterest's AI strategy, focusing on enhancing existing images rather than generating new ones from scratch: The model aligns with Pinterest's mission of inspiring users through visual discovery, allowing for personalized image manipulation and enhancement. Canvas opens up new possibilities for users and brands, such as showing products in various settings or exploring different styling options. Pinterest emphasizes responsible AI development, prioritizing safety and trust through their STEAM framework....
read Jul 12, 2024Researcher Reproduces GPT-2 Using C/CUDA, Making LLM Training More Accessible
In a GitHub post, Andrej Karpathy explains how he and a team were able to successfully reproduce the full 1558M parameter version of GPT-2 using llm.c, training it on a single 8XH100 node for 24 hours at a cost of $672. This demonstrates the dramatic improvements in compute, software, and data that have made reproducing large language models much more feasible in the 5 years since GPT-2 was originally introduced. Key Takeaways: The trained model performs qualitatively similarly to the original GPT-2 on prompts, generating coherent and relevant continuations. On the HellaSwag eval, it matches GPT-2 performance around 25K steps...
read Jul 12, 2024AI “Coach” Detects Hallucinations, Boosts Accuracy and Safety
A new AI model called Lynx, developed by Patronus AI, aims to detect and explain hallucinations produced by large language models (LLMs), offering a faster, cheaper, and more reliable way to catch AI mistakes without human intervention. Addressing the challenge of AI hallucinations: Patronus AI's founders, ex-Meta AI researchers Anand Kannappan and Rebecca Qian, recognized the need for a solution to the problem of AI models confidently making factual errors: Kannappan and Qian spoke with numerous company executives who expressed concerns about launching AI products that could make headlines for the wrong reasons due to AI hallucinations. Lynx is designed...
read Jul 12, 2024There’s a New Emotionally Intelligent AI Model Offering Empathetic Support and Companionship
A groundbreaking conversational AI model, HelpingAI-15B, has been introduced, setting a new standard for emotionally intelligent and empathetic human-machine interaction. Key Objectives: HelpingAI-15B is designed to engage users in open-ended dialogue while providing emotionally attuned and supportive responses: The model can recognize and validate user emotions and contexts, ensuring sensitive and ethical interactions. It aims to provide psychologically-grounded responses that cater to a wide range of emotional states and communicative needs. Continuous improvement is a priority, with the model constantly enhancing its emotional awareness and dialogue skills. Innovative Training Methodology: The development of HelpingAI-15B combines cutting-edge techniques and insights to...
read Jul 12, 2024The Dark Secret Powering AI: Exploited Human Labor Fuels Tech Giants
AI's rise powered by exploited human labor: When tech companies present their AI products as sleek, autonomous machines, they often ignore the reality of the low-paid, menial labor that trains these systems and is managed by them. Illusion of autonomous AI has historical roots: The current perception of AI as fully automated has parallels to the 18th-century "Mechanical Turk" chess-playing machine, which secretly relied on a human chess master to operate it. Similarly, today's sophisticated AI software functions only through thousands of hours of low-paid human labor. Amazon coined the term "artificial artificial intelligence" to describe the process of keeping...
read Jul 12, 2024AI Vision Models Fail Basic Tests, Highlighting Significant Capability Gaps
State-of-the-art AI models struggle with basic visual reasoning tasks that are trivial for humans, highlighting significant gaps in their capabilities: Key findings: Researchers tested four top-level AI vision models on simple visual analysis tasks and found that they often fall well short of human-level performance: The models struggled with tasks such as counting rows and columns in a blank grid, identifying circled letters in a word, and counting nested shapes. Small changes to the tasks, like increasing the number of overlapping circles, led to significant drops in accuracy, suggesting the models are biased towards familiar patterns they were trained on....
read Jul 12, 2024Microsoft’s VALL-E 2 Achieves Human-Level Speech Synthesis, Sparking Ethical Debate
Microsoft's VALL-E 2 reaches human parity in text-to-speech synthesis, raising ethical concerns about potential misuse. Key breakthrough: VALL-E 2, Microsoft's latest text-to-speech (TTS) generator, has achieved "human parity" for the first time, producing speech indistinguishable from a human voice: The model only needs a few seconds of audio to reproduce a voice that matches or exceeds the quality of human speech when compared to standard speech libraries. VALL-E 2 consistently generates high-quality, natural-sounding speech even for traditionally challenging phrases due to its "Repetition Aware Sampling" and "Grouped Code Modeling" features. Potential applications and risks: While Microsoft sees beneficial uses for...
read Jul 12, 2024A Startup Spun out of Meta Has a Massive AI Model that Speaks the Language of Proteins
A startup spun out of Meta has unveiled a massive AI model that speaks the language of proteins, creating new fluorescent molecules in an impressive proof-of-principle demonstration. EvolutionaryScale debuts protein language model ESM3: EvolutionaryScale, launched by former Meta scientists, announced its new protein language model ESM3 this month alongside $142 million in new funding to apply the model to drug development, sustainability, and other areas: ESM3 was trained on over 2.7 billion protein sequences and structures, as well as data on protein functions, allowing it to design proteins to user specifications. The model is seen as a frontier in the...
read Jul 12, 2024IBM Exec: Integrating Enterprise Data into AI Models is Key to Success
IBM's David Cox champions open innovation in enterprise generative AI, emphasizing the importance of transparency, collaboration, and the integration of proprietary business data into AI models. Nuanced view of openness in AI: Cox challenges the notion that openness in AI is a simple binary concept, highlighting the growing ecosystem of open models from various sources, including tech giants, universities, and nation-states: He raises concerns about the quality of openness in many large language models (LLMs), noting that some provide only a "bag of numbers" without clear information on how they were produced, making reproducibility difficult or impossible. Cox outlines key...
read Jul 11, 2024Amazon’s Alexa Team Shifts Focus to Develop Rival to GPT-4, PaLM
Amazon has efforts underway to develop a competitive large-language model and reinvigorate its Alexa voice assistant through its artificial general intelligence (AGI) division, led by ex-Alexa chief scientist Rohit Prasad. Key details about Amazon's AGI division: The majority of the division's staff comes from the former Alexa team, with several dozen recent hires from the startup Adept: Approximately 8,000 out of the 10,000 people who were under Prasad at Alexa were transferred to the new AGI division when it was formed last summer. Prasad, who previously served as the chief scientist and senior vice president for Alexa, now reports directly...
read Jul 9, 2024Aitomatic Wants To Use AI to Revolutionize The $500B Semiconductor Industry
Aitomatic's SemiKong AI model is set to revolutionize the semiconductor industry by bringing domain-specific AI capabilities to chipmaking processes, potentially reshaping the $500 billion industry in the coming years. Key Takeaways: SemiKong is the first open-source AI Large Language Model (LLM) designed specifically for the semiconductor industry, aiming to improve accuracy, relevance, and understanding of semiconductor processes: Developed by Aitomatic in collaboration with FPT Software and semiconductor industry experts from the AI Alliance, SemiKong outperforms generic LLMs on industry-specific tasks. The model's smaller version often surpasses larger general-purpose models in domain-specific applications, offering potential for accelerated innovation and reduced costs...
read Jul 9, 2024Groq’s Lightning-Fast LLM Engine Attracts Developers, Hints at AI’s Efficient Future
Groq unveils a lightning-fast large language model (LLM) engine, attracting over 280,000 developers in just 4 months, demonstrating the growing interest in efficient and powerful AI tools. Key Takeaways: Groq's new web-based LLM engine showcases impressive speed and flexibility, hinting at the potential of AI applications when powered by efficient processing: The engine achieves a blistering 1256.54 tokens per second, outpacing GPU-based solutions from competitors like Nvidia, and improving upon Groq's previous demo of 800 tokens per second in April. Users can interact with the LLM through typed queries or voice commands, with the engine supporting various models such as...
read Jul 9, 2024Meta AI Just Unveiled a Mobile LLM Showing Impressive Performance Gains
Meta AI has unveiled MobileLLM, a new approach to creating compact and efficient language models designed for smartphones and other resource-constrained devices, challenging assumptions about the necessary size of effective AI models. Key innovations in MobileLLM: The research team focused on optimizing models with fewer than 1 billion parameters, implementing several design choices to improve efficiency: Prioritizing model depth over width Implementing embedding sharing and grouped-query attention Utilizing a novel immediate block-wise weight-sharing technique Impressive performance gains: MobileLLM outperformed previous models of similar size by 2.7% to 4.3% on common benchmark tasks, representing meaningful progress in the competitive field of...
read Jul 8, 2024DeepMind Just Made a Breakthrough in AI Training
Google DeepMind's JEST AI training method promises significant speed and efficiency gains over traditional techniques, potentially addressing concerns about AI's growing power demands. Key Takeaways: DeepMind's JEST (joint example selection) training method breaks from traditional AI training by focusing on entire batches of data instead of individual data points: A smaller AI model first grades data quality from high-quality sources and ranks batches by quality. The small model then determines the batches most fit for training a larger model, resulting in up to 13 times faster training with 10 times less computation. Addressing AI's Power Demands: The JEST research comes...
read Jul 7, 2024AI Agent Benchmarking Flaws Could Hinder Real-World Applications, Princeton Study Finds
The rapid development of AI agents has the potential to revolutionize real-world applications, but a recent study from Princeton University researchers highlights several shortcomings in current benchmarking practices that could hinder their practical usefulness. Cost vs. accuracy trade-off: Current agent evaluations often fail to control for the computational costs associated with improving accuracy, potentially leading to the development of extremely expensive agents: Some agentic systems generate hundreds or thousands of responses to increase accuracy, significantly increasing inference costs, which may not be feasible in practical applications with limited budgets per query. The researchers propose visualizing evaluation results as a Pareto...
read Jul 7, 2024“Tokenization” Fuels Breakthroughs but Limits Potential
The rise of tokenization in AI models is limiting their potential, according to a recent article exploring how this text processing method creates biases and odd behaviors in today's generative AI systems. Key takeaways: Tokenization, the process of breaking down text into smaller pieces called tokens, enables transformer-based AI models to take in more information but also introduces problems: Tokenizers can treat spacing, case, and individual characters differently, leading to strange model outputs that fail to capture the intended meaning. Many tokenizers were designed with English in mind and struggle with languages that don't use spaces between words or have...
read Jul 5, 2024Meta’s Latest AI Breakthrough: Multi-Token Prediction Models
Meta's multi-token prediction models revolutionize AI efficiency and accessibility, setting the stage for a new era of innovation and collaboration in the field of artificial intelligence. A breakthrough in AI efficiency: Meta's novel approach to training large language models (LLMs) promises significant improvements in performance and training times: By predicting multiple future words simultaneously, instead of just the next word in a sequence, these models can develop a more nuanced understanding of language structure and context. This technique has the potential to curb the trend of AI models ballooning in size and complexity, making advanced AI more accessible and sustainable....
read Jul 4, 2024GPT4All 3.0: Run AI Models Offline, Ensuring Privacy and Local Data Control
GPT4All introduces major update enabling local AI model access on personal computers: The open-source AI platform GPT4All has released version 3.0, allowing users to chat with thousands of large language models offline on their Mac, Linux, or Windows laptops, ensuring data privacy and security. Key improvements in GPT4All 3.0: Expanded Model Support: Users can now interact with a wide variety of AI models like LLaMa, Mistral, and Nous-Hermes locally on their devices. Enhanced Compatibility: The update fully supports Mac M Series chips and AMD/NVIDIA GPUs for smooth performance across different hardware configurations. LocalDocs Integration: Users can grant their local AI...
read Jul 3, 2024Salesforce’s “Tiny Giant” AI Challenges Industry Giants, Heralds New Era of Efficient AI
Salesforce's "Tiny Giant" AI model, xLAM-1B, is challenging the notion that bigger is always better in the world of artificial intelligence, potentially paving the way for a new era of efficient, on-device AI applications. Small but mighty: The power of efficient AI; Salesforce's xLAM-1B model, with just 1 billion parameters, outperforms much larger models from industry leaders like OpenAI and Anthropic in function-calling tasks, thanks to the company's innovative approach to data curation: The key to xLAM-1B's performance lies in the quality and diversity of its training data, generated by the APIGen pipeline, which leverages 3,673 executable APIs across 21...
read