News/Data
AI adoption is surging while data quality is plummeting, new report finds
Generative AI adoption surges amid data challenges: The rapid growth of generative AI in enterprise settings is accompanied by significant hurdles in data management and quality assurance, according to Appen's 2024 State of AI Report. Generative AI adoption increased by 17% in 2024, with expanded use in IT operations, manufacturing, and R&D sectors. Companies are facing a 10% year-over-year increase in bottlenecks related to sourcing, cleaning, and labeling data for AI systems. The demand for high-quality, accurate, diverse, and properly labeled data tailored to specific AI use cases is growing as AI models tackle more complex problems. Enterprise AI deployments...
read Oct 22, 2024X users revolt over new terms that allow AI training on posts
X's new terms of service spark controversy: The social media platform X, formerly known as Twitter, has updated its terms of service, granting itself broad rights to use user-generated content for AI training, raising concerns among users and privacy advocates. Key changes to data usage: X's updated terms, effective November 15, 2024, allow the platform to use all user-submitted content for various purposes, including training machine learning and AI models. Users automatically grant X a "worldwide, non-exclusive, royalty-free license" to make their content available globally by continuing to use the platform. The new terms explicitly mention the use of user...
read Oct 18, 2024How Walmart is evolving its data analytics platform around AI
Walmart's data analytics evolution: Walmart is transforming its data analytics platform to leverage AI and provide actionable insights for suppliers and merchants. Walmart Data Ventures, established nearly four years ago, created the Walmart Luminate platform to help both the retailer and its suppliers understand shopping patterns and product movement. The platform is now being rebranded as Scintilla, reflecting its expansion beyond the U.S. market and its focus on AI-driven recommendations. Scintilla aims to provide not just data points but also actionable recommendations to help drive business decisions. Key features and developments: The platform's evolution includes new tools and features designed...
read Oct 18, 2024New scraping technique lets you extract data from any screen recording
AI-powered video scraping emerges as a game-changing data extraction technique: Simon Willison, an AI researcher, has developed a novel method called "video scraping" that uses AI to extract structured data from screen recordings at an incredibly low cost. Willison demonstrated the technique by recording his screen while viewing payment data in emails, then feeding the video into Google's Gemini AI model. The AI successfully extracted and structured the payment information from the video with high accuracy, at a cost of less than one-tenth of a cent. This breakthrough showcases the ability of modern AI models to process video inputs and...
read Oct 16, 2024Tech giants back AI search visualizer Tako with major investment
AI-powered data visualization startup secures significant funding: Tako, a startup leveraging generative AI to create interactive data visualizations from text queries, has raised $5.75 million in seed funding from prominent tech industry figures. The seed round was led by Eventbrite co-founder Kevin Hartz and included investors such as Stanley Druckenmiller, Naval Ravikant, Guillermo Rauch, and Joe Montana. Tako's technology transforms text queries into visually appealing, interactive charts and graphs called "knowledge cards" using public and licensed data. The startup has partnered with AI search engine Perplexity to integrate its knowledge cards into certain data-centric queries. Tako's vision and potential applications:...
read Oct 16, 2024How to navigate data drift and bias in enterprise AI adoption
The importance of data quality in AI adoption: As organizations increasingly turn to AI technologies for innovation and competitiveness, the quality of data used to train AI models becomes a critical factor in determining their effectiveness and accuracy. AI technologies rely heavily on data to learn and make predictions, making high-quality data essential for obtaining accurate results and realizing the full benefits of these systems. Two significant challenges that can impact data quality and AI model performance are data drift and data bias, both of which require careful consideration and management. Understanding data drift: Data drift occurs when the statistical...
read Oct 15, 2024Lack of quality AI training data to hinder science progress, Nobel laureate warns
AI's impact on scientific discovery: Recent Nobel Prize awards in Chemistry and Physics highlight the transformative role of artificial intelligence in advancing scientific research, particularly in biochemistry and protein structure prediction. David Baker, a biochemist at the University of Washington, received the Nobel Prize in Chemistry for his pioneering work using AI to design new proteins. The Chemistry prize was also awarded to Demis Hassabis and John M. Jumper of Google DeepMind for their development of AlphaFold, an AI system capable of accurately predicting protein structures. In the field of Physics, Geoffrey Hinton and John Hopfield were recognized for their...
read Oct 14, 2024Adobe’s new AI video model emphasizes responsibly trained AI
Adobe Unveils Firefly Video Model with Responsible AI Training Approach: Adobe has introduced its Firefly Video Model at the annual Max conference, emphasizing a commitment to ethical AI training practices and creator-friendly policies. Key features of Adobe Firefly: Generative Extend: Allows users to extend video clips and audio by small increments Text-to-video: Generates videos based on detailed text descriptions Image-to-video: Creates short clips using reference images and accompanying text Firefly Image 3 model: Claimed to be four times faster than previous versions Responsible AI training practices: Adobe compensates creators for training data Does not train on customer content or scrape...
read Oct 14, 2024Flex unveils new liquid-cooled data center solutions at OCP
AI data center evolution: Flex, a major player in the IT solutions industry, is expanding its role in the AI data center market with innovative power, cooling, and infrastructure solutions designed to meet the demands of cutting-edge AI servers. Flex has positioned itself as a full-stack provider of power, cooling, and IT infrastructure for AI-driven data centers, addressing the unique challenges posed by the increasing computational requirements of artificial intelligence. The company is launching liquid cooling-ready servers featuring direct-to-chip cooling technology, housed in racks that comply with Open Compute Project (OCP) specifications. Company background: Flex, formerly known as Flextronics, has...
read Oct 13, 2024LLMs don’t outperform a 1970s technique, but they’re still worth using
LLMs show promise in anomaly detection despite performance gaps: A recent study by MIT's Data to AI Lab explored the use of large language models (LLMs) for anomaly detection in time series data, revealing both limitations and unexpected advantages compared to traditional methods. Key findings and implications: The study compared LLMs to 10 other anomaly detection methods, including state-of-the-art deep learning tools and the decades-old ARIMA model. LLMs were outperformed by most other models, including ARIMA, which surpassed LLMs on 7 out of 11 datasets. Surprisingly, LLMs managed to outperform some models, including certain transformer-based deep learning methods. LLMs achieved...
read Oct 12, 2024How to protect your data from AI training
AI data training opt-outs: A growing trend: As concerns about data privacy and AI ethics intensify, tech companies are increasingly offering users the ability to opt out of having their content used for AI model training. Major players like Google, OpenAI, and Amazon Web Services have implemented opt-out mechanisms, responding to public pressure and potential legal challenges. The effectiveness of these opt-outs may be limited, as many AI models have already been trained on vast amounts of scraped web data. Companies often lack transparency about the specific data sources used in their AI training processes, making it difficult for users...
read Oct 11, 2024OpenAI’s new benchmark tests AI’s ability to handle data science problems
OpenAI's MLE-bench: A new frontier in AI evaluation: OpenAI has introduced MLE-bench, a groundbreaking tool designed to assess artificial intelligence capabilities in machine learning engineering, challenging AI systems with real-world data science competitions from Kaggle. The benchmark includes 75 Kaggle competitions, testing AI's ability to plan, troubleshoot, and innovate in complex machine learning scenarios. MLE-bench goes beyond traditional AI evaluations, focusing on practical applications in data science and machine learning engineering. This development comes as tech companies intensify efforts to create more capable AI systems, potentially reshaping the landscape of data science and AI research. AI performance: Impressive strides and...
read Oct 9, 2024How Domino Data Lab plans to eliminate AI governance concerns
AI governance emerges as critical focus: As artificial intelligence continues to rapidly advance and proliferate, the tech industry is shifting attention towards establishing robust governance frameworks to ensure responsible AI development and deployment. The growing emphasis on AI governance parallels early concerns around cloud computing security, with stakeholders now recognizing the need for guardrails and ethical guidelines. There is an increasing push to address issues like bias in AI models, compliance with regulations, and appropriate use cases across different industries and regions. Domino Data Lab introduces AI governance solution: The company has launched Domino Governance, a software platform designed to...
read Oct 9, 2024The genAI opportunity: From ‘data to insight’ to ‘context to action’
The rise of generative AI in data science: Generative AI (genAI) is presenting a unique opportunity to bridge the gap between traditional data science methods and real-time business decision-making, potentially revolutionizing how organizations turn context into actionable insights. The data science dilemma: Data scientists have long faced challenges with data silos and lengthy processing times, creating a disconnect between their mission of transforming data into insights and the immediate needs of business teams. Traditional data science approaches often involve collecting, cleaning, and rigorously analyzing large datasets, which can be time-consuming and may not align with urgent business priorities. In contrast,...
read Oct 8, 2024Vectorize launches AI-powered enterprise data platform
Pioneering Agentic RAG: Vectorize's Enterprise Data Solution: Vectorize, a startup founded by former DataStax executive Chris Latimer, has unveiled its innovative platform designed to streamline enterprise Retrieval Augmented Generation (RAG) implementations. The company has secured $3.6 million in seed funding led by True Ventures, marking a significant milestone in its mission to revolutionize enterprise AI deployments. Vectorize's platform focuses on the critical data engineering aspects of AI, addressing the challenges of preparing and maintaining data for vector databases and large language models. The solution enables near real-time data capabilities through an agentic RAG approach, offering a production-ready data pipeline for...
read Oct 8, 2024Anthropic launches ‘Message Batches API’ to streamline large-scale data tasks
New Message Batches API revolutionizes large-scale data processing: Anthropic has introduced a powerful and cost-effective solution for processing high volumes of queries asynchronously, offering significant benefits for developers and businesses. Key features and advantages: The Message Batches API allows developers to send up to 10,000 queries per batch, with processing completed within 24 hours at half the cost of standard API calls. The API is currently available in public beta, supporting Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku on the Anthropic API. Amazon Bedrock customers can utilize batch inference with Claude, while support for Google Cloud's Vertex...
read Oct 7, 2024ByteDance’s new web scraper is hoovering up data at an unprecedented rate
ByteDance's aggressive web scraping: ByteDance, the parent company of TikTok, has launched a new web crawler called Bytespider that is rapidly collecting online data at an unprecedented rate. Bytespider, introduced in April, has quickly become one of the most aggressive web scrapers on the internet, surpassing the data collection efforts of major tech companies like Google, Meta, Amazon, OpenAI, and Anthropic. According to research by Kasada, a bot management company, Bytespider is scraping data at approximately 25 times the rate of GPTbot, which collects data for OpenAI's ChatGPT platform. The bot's scraping activity has shown significant spikes over the past...
read Oct 7, 2024The race to block OpenAI’s web crawlers is slowing
AI Data Scraping Landscape Shifts: OpenAI's recent licensing agreements with publishers have led to a significant change in how news outlets approach web crawler access, particularly for AI training data collection. The initial surge in blocking OpenAI's GPTBot through robots.txt files has reversed, with the number of high-ranking media websites disallowing access dropping from a peak of over one-third to around a quarter. Among the most prominent news outlets, the block rate remains above 50%, but this is down from nearly 90% earlier in the year. The trend towards increased blocking appears to have ended, at least temporarily, as more...
read Oct 7, 2024AI data centers guzzle water at alarming rates
Water consumption surge in data centers: The rapid growth of data center water usage, particularly in Virginia's 'data center alley', has raised significant concerns about the sustainability of IT infrastructure in the AI era. Data centers in northern Virginia consumed over seven billion liters of water in 2023, marking a substantial 64% increase from 2019 levels. The region surrounding Ashburn, VA, is estimated to handle 70% of the world's daily internet traffic, making it a critical hub for global data processing. This dramatic rise in water consumption threatens to undermine the sustainability goals of major tech companies. AI's role in...
read Oct 4, 2024Microsoft just released Drasi, and it could change how we handle big data
Microsoft unveils Drasi: A game-changer in data processing: Microsoft has launched Drasi, an open-source data processing system aimed at simplifying the detection and response to critical events in complex infrastructures, marking a significant advancement in cloud computing and event-driven architectures. The big picture: Drasi represents a new category of data processing systems, designed to address the growing complexity in event-driven architectures, particularly in scenarios like IoT edge deployments and smart building management. Mark Russinovich, CTO and Technical Fellow at Microsoft Azure, described Drasi as "the birth of a new category of data processing system" in an interview with VentureBeat. The...
read Oct 3, 2024Why AI analytics software is crucial to unlocking value for SMBs
A game-changer for small businesses: Modern analytics software has become more accessible and affordable, offering powerful insights and growth potential for small businesses of all sizes. • Software vendors have improved their offerings, incorporating features like generative AI and low-code automation into standard packages. • Many vendors are reducing prices and offering scalable pricing models, making analytics software more attainable for small businesses. • Even for solopreneurs and nascent businesses, investing in analytics early can provide a strong foundation for growth and competitiveness. Democratizing data analysis: The evolution of analytics software has eliminated the need for specialized data scientists in...
read Oct 3, 20243 key data management strategies for successful gen AI projects
The dawn of generative AI in enterprises: As companies increasingly adopt generative AI technologies, IT leaders must navigate complex data management challenges to ensure successful implementation and scaling of these projects. Generative AI's potential to transform business operations has led to widespread adoption across various industries. The success of these AI initiatives heavily depends on the quality and management of data used to train and operate the models. IT leaders are faced with the task of adapting existing data management practices to meet the unique demands of generative AI technologies. Data collection, filtering, and categorization: The foundation of AI success:...
read Oct 3, 2024Data goldmine: People are sharing their most intimate secrets with AI chatbots
AI chatbots: The new confessional of the digital age: As AI chatbots like ChatGPT gain popularity, users are increasingly sharing intimate personal details, raising concerns about privacy and data exploitation. The allure of AI confidants: Users are drawn to AI chatbots for their non-judgmental nature and perceived confidentiality. People feel more comfortable sharing personal information with AI than with human friends, as noted by OpenAI CEO Sam Altman. The absence of social evaluation by machines encourages users to be more open and vulnerable. Chatbots' ability to provide tailored responses incentivizes users to share more specific details about their lives. Privacy...
read Oct 3, 2024Neurelo’s new tech will generate realistic mock data for database testing
Neurelo's innovative approach to mock data generation: Neurelo has developed a cutting-edge technology for generating realistic mock data based on database schemas, addressing key challenges in database testing and development. The company's solution works with popular databases including MongoDB, MySQL, and Postgres, generating realistic data automatically without requiring user input. Neurelo prioritized low cost and fast response time in their development process, utilizing native Rust for optimal performance. Initial challenges and pivots: The path to developing this technology was not without obstacles, prompting Neurelo to adapt their approach. An initial attempt using Large Language Models (LLMs) to generate code failed...
read