News/Data

Sep 2, 2024

AI Training Data Shortage Looms as Websites Block Crawlers

Web crawling restrictions reshape AI training landscape: The increasing use of robots.txt files to limit web crawler access is significantly impacting the availability of high-quality training data for generative AI models, potentially altering the future development of artificial intelligence. Generative AI models, which power popular tools like ChatGPT, rely heavily on vast datasets compiled from publicly available web data. A growing number of websites, particularly news outlets and artists' pages, are implementing restrictions on web crawlers to protect their content and livelihoods from AI exploitation. The Data Provenance Initiative's recent report highlights this trend, revealing a marked increase in crawled...

read
Aug 27, 2024

Rand Report Shows 80% of AI Projects Fail Due to Leadership and Data Issues

AI project failure rates soar: A new RAND Corporation report reveals that over 80% of artificial intelligence projects fail, double the rate of non-AI IT projects, despite skyrocketing private-sector investment in the technology. Leadership shortcomings at the root: Business leaders' misunderstanding of AI capabilities and poor communication of project goals are primary reasons for AI project failures. Executives often have inflated expectations of AI's potential, fueled by impressive demonstrations and sales pitches. Many underestimate the time and resources required for successful AI implementation. A critical disconnect exists between business leaders and technical teams, leading to misaligned project goals. Organizations frequently...

read
Aug 27, 2024

AI Medical Devices Lack Crucial Patient Data, Study Finds

AI medical devices face scrutiny: A comprehensive study reveals that nearly half of FDA-approved AI medical devices lack reported clinical validation data using real patient information, raising concerns about their effectiveness and safety in healthcare settings. Researchers from UNC School of Medicine, Duke University, and other institutions analyzed over 500 AI medical devices approved by the FDA since 2016. The study, published in Nature Medicine, found that approximately 43% of these devices lacked published clinical validation data. Some devices were validated using computer-generated "phantom images" rather than real patient data, failing to meet proper clinical validation requirements. Rapid growth in...

read
Aug 27, 2024

‘Model Collapse’ Has Experts Questioning Inevitability of AI Model Performance

AI performance decline raises concerns: Recent observations suggest that popular AI models like ChatGPT and Claude are experiencing a noticeable decrease in performance and accuracy, challenging the expectation of continuous improvement in AI technology. Steven Vaughan-Nichols, in a Computerworld opinion piece, highlights the erratic and often inaccurate responses from major AI platforms. Users on the OpenAI developer forum have reported a significant decline in accuracy following the release of the latest GPT version. One user expressed disappointment, stating that the AI's performance fell short of the surrounding hype. Potential causes of AI degradation: Several factors may contribute to the perceived...

read
Aug 26, 2024

Incestual AI: How AI Models Training on Their Own Outputs May Lead to Disaster

AI-generated content poses unprecedented challenge: The proliferation of AI-generated content is creating a significant hurdle for AI companies, as they risk training new models on their own output, potentially leading to a deterioration in quality and diversity. OpenAI alone is estimated to produce about 100 billion words per day, contributing to the growing pool of AI-generated content on the internet. This surge in AI-created material raises concerns about the unintentional feedback loop that could occur when AI systems inadvertently ingest their own output during training. Researchers have identified a phenomenon called "model collapse," where the quality and diversity of AI-generated...

read
Aug 25, 2024

Frontier AI Models Could Cost $250B by 2027, Experts Predict

Scaling AI: The path to colossal models: Recent research and analysis suggest that by 2027, we could see the emergence of a $100 billion AI model, with further scaling beyond this point becoming less certain. Epoch AI's research forecasts AI training compute to reach 2e29 floating-point operations per second by 2030, requiring hardware investments of approximately $250 billion. This projected scale dwarfs current investments, being over five times Microsoft's annual capital expenditure. The study indicates no insurmountable technical barriers to this level of scaling, although there is high uncertainty surrounding various factors. Infrastructure challenges: Power availability and chip production present...

read
Aug 25, 2024

How ‘Intelligent Document Processing’ Boosts Efficiency for Enterprises

Intelligent document processing (IDP) is revolutionizing how businesses handle unstructured content, using artificial intelligence to automate traditionally manual and time-consuming tasks. The big picture: IDP technology leverages AI and machine learning to process unstructured data, which constitutes 80-90% of new enterprise information, enabling organizations to streamline operations and improve efficiency. IDP systems employ advanced technologies like large language models and natural language processing to interpret various document types, including paper documents, emails, PDFs, forms, and images. By extracting relevant data and inputting it into other systems, IDP significantly reduces the need for manual data entry, leading to faster and more...

read
Aug 23, 2024

AI and the Evolving State of Enterprise Content Management

The AI revolution in enterprise content management: Artificial intelligence is transforming enterprise content management (ECM) systems, addressing the growing challenge of information overload and inefficient content retrieval in modern workplaces. A pressing business challenge: Recent data reveals a significant increase in workplace productivity issues related to content management and information retrieval. A Forrester study found that 40% of companies believe their employees spend too much time searching for information, a sharp rise from 19% just five years ago. This trend underscores the growing need for more efficient content management solutions in the enterprise environment. The role of AI in modernizing...

read
Aug 23, 2024

Generative AI Ranks Second in Marketers’ 2024 Budget Forecasts

Digital advertising landscape poised for growth: Marketers are set to maintain or increase ad spending across all channels in the latter half of 2024, with social media and digital display leading the charge, according to Mediaocean's 2024 H2 market report. Top consumer technology trends: Connected TV (CTV) and streaming emerge as the most significant trends, followed closely by generative AI and social video platforms like TikTok. 56% of marketers identified CTV and streaming as a top trend, highlighting the growing importance of these platforms in reaching audiences. Generative AI ranked second at 55%, indicating its rapid adoption and potential impact...

read
Aug 21, 2024

AI Healthcare Firm’s Disposed Device Exposes Massive Data Breach

Major data breach discovered through discarded device: A significant security lapse has been uncovered involving an AI healthcare company's failure to properly erase sensitive data from disposed equipment. The discovery: An individual obtained a small computer (NUC) from electronic waste that was previously used by an AI healthcare company, revealing a trove of unwiped sensitive information. The hard drive contained approximately 11,000 WAV audio files of customer voice commands, potentially exposing private health-related conversations. Videos from cameras installed in customers' homes were also found, raising serious privacy concerns. Log files detailing information about sensors placed in bathrooms and bedrooms were...

read
Aug 19, 2024

Why AI Models Are Collapsing and How to Fix the Problem

AI model collapse: A looming challenge for the tech industry: The phenomenon of "model collapse" is emerging as a significant threat to the progress and reliability of artificial intelligence systems, potentially undermining recent achievements in the field. AI models are experiencing degradation over time when trained on data that includes content generated by earlier versions of themselves, leading to a drift away from accurate representation of reality. This recursive learning process, akin to making copies of copies, results in compounding mistakes and less diverse, creative, and useful AI-generated content. The implications of model collapse extend beyond technical concerns, posing substantial...

read
Aug 17, 2024

AI Enhances LLMs With RAG and Fine-Tuning Techniques

Enhancing LLMs: RAG vs. Fine-Tuning: Retrieval-Augmented Generation (RAG) and Fine-Tuning are two powerful techniques used to improve the performance of Large Language Models (LLMs) for specific tasks or domains. The big picture: As LLMs continue to advance, data scientists and AI practitioners are exploring methods to tailor these models to particular use cases, with RAG and Fine-Tuning emerging as prominent approaches. RAG, introduced by Meta in 2020, connects an LLM to a curated, dynamic database, allowing the model to access up-to-date information and incorporate it into responses. Fine-Tuning involves training an LLM on a smaller, specialized dataset to adjust its...

read
Aug 15, 2024

How LangChain Empowers Developers to Build Advanced AI Apps

LangChain is an innovative framework that empowers developers to create sophisticated applications powered by language models. By providing a structured approach to working with LLMs, LangChain simplifies the process of building intelligent, language-based applications. Key features and components: LangChain offers a modular architecture, seamless integration with popular language models, and robust tools for data handling and evaluation. The framework's core components include chains, agents, tools, memory, and callbacks, each serving a specific purpose in the application development process. Chains act as the fundamental building blocks, allowing developers to create sequences of operations for processing input and generating output. Agents provide...

read
Aug 15, 2024

SiFive Launches New Processor for AI-Driven Datacenters

SiFive's new RISC-V datacenter processor, the Performance P870-D, aims to meet the growing demand for high-performance, energy-efficient computing solutions in datacenters, vehicles, and embedded systems, with a particular focus on AI workloads. Key features and improvements: The SiFive Performance P870-D builds upon its predecessor, offering enhanced scalability and compatibility with industry-standard protocols. The processor supports the open AMBA CHI protocol, allowing customers to scale up to 256 cores for improved performance and power efficiency. It enables coherent high core count heterogeneous SoCs and chiplet configurations through compatibility with Compute Express Link (CXL) and CHI chip to chip (C2C) protocols. The...

read
Aug 14, 2024

AI’s Energy Demands Spark Waste-to-Power Innovation

Artificial Intelligence's rapid growth is creating unprecedented energy demands, prompting innovative solutions to meet these needs while addressing environmental concerns. The AI energy challenge: The computational power required for sustaining AI's growth rate is doubling every 100 days, creating a massive energy demand that threatens to outpace current supply capabilities. By 2030, the additional power demand generated by U.S. data centers is estimated to be nearly seven times that of New York City's current annual electricity consumption, according to Wells Fargo. Goldman Sachs projects that nearly half of the supply of new energy must come from renewables, but traditional sources...

read
Aug 14, 2024

Y Combinator Company Launches AI-Powered ETL Solution

Trellis, a new Y Combinator-backed startup, is introducing an innovative AI-powered ETL solution designed to transform unstructured data into structured SQL format, potentially revolutionizing how businesses handle complex data processing tasks. The big picture: Trellis aims to bridge the gap between messy, unstructured data sources and the structured data formats required for efficient analysis and operations. The startup's technology can convert various unstructured data types, including phone calls, PDFs, and chat logs, into SQL-compatible formats based on user-defined schemas. This capability addresses a significant pain point for data and operations teams who often struggle with manual data entry and the...

read
Aug 12, 2024

Apache Airflow Integrates Google’s Generative AI for Enhanced Data Pipelines

Apache Airflow introduces new operators for Google's generative AI, enabling seamless integration of Vertex AI's powerful models into data pipelines orchestrated by Airflow and Cloud Composer. Key developments: The latest release of the apache-airflow-providers-google package (version 10.21.0) includes three new Airflow operators designed to interact with Vertex AI's generative models. The new operators are TextGenerationModelPredictOperator, TextEmbeddingModelGetEmbeddingsOperator, and GenerativeModelGenerateContentOperator. These operators allow data analysts to leverage Google Cloud's Vertex AI platform, including models like Gemini, within their Airflow-managed workflows. The integration aims to streamline the incorporation of generative AI capabilities into data analytics pipelines, enhancing their functionality and efficiency. Potential applications:...

read
Aug 12, 2024

New Research Yields Framework to Improve Ethical and Legal Shortcomings of AI Datasets

The growing importance of responsible AI has prompted researchers to examine machine learning datasets through the lenses of fairness, privacy, and regulatory compliance, particularly in sensitive domains like biometrics and healthcare. A novel framework for dataset responsibility: Researchers have developed a quantitative approach to assess machine learning datasets on fairness, privacy, and regulatory compliance dimensions, focusing on biometric and healthcare applications. The study, conducted by a team of researchers including Surbhi Mittal, Kartik Thakral, and others, audited over 60 computer vision datasets using their proposed framework. This innovative assessment method aims to provide a standardized way to evaluate and compare...

read
Aug 12, 2024

AI Query Engine Transforms Unstructured Data Analysis

Revolutionizing unstructured data analysis: Roe AI has introduced an innovative query engine that leverages artificial intelligence to enable data analysts to perform SQL queries on unstructured data, including videos, images, webpages, and documents. The core technology: Roe AI's platform employs large language models (LLMs) as data processors to extract meaningful information from diverse unstructured data sources. The system aims to simplify the traditionally complex process of analyzing unstructured data by reducing it to a few lines of SQL queries. This approach bridges the gap between structured and unstructured data analysis, potentially opening up new possibilities for data-driven insights. Key features...

read
Aug 11, 2024

New Game Theory Research Suggests How Humans May Bias AI Model Training

The discovery that people alter their behavior when knowingly training AI systems raises important questions about the potential introduction of biases and the effectiveness of human-in-the-loop AI training methods. Study methodology and key findings: Researchers at Washington University in St. Louis conducted a game theory experiment to examine how people's decision-making changes when they believe they are training an AI system. The study utilized a classic game theory setup where participants could accept or reject monetary offers from a partner. Some participants were informed that their partner was an AI being trained through their interactions. Results showed that people were...

read
Aug 11, 2024

How to Navigate the Complex Landscape of Data Privacy and Compliance

Generative AI is rapidly becoming a dominant force in enterprise technology, with nearly a third of executives already leveraging its capabilities. However, the path to successful implementation is fraught with challenges, particularly concerning data privacy, compliance, and quality. The rise of generative AI in business: Generative AI is quickly gaining traction among companies, with 29% of surveyed executives already utilizing this technology in their operations. The adoption of generative AI is outpacing other AI solutions, signaling a significant shift in how businesses approach artificial intelligence. This rapid uptake indicates that companies are recognizing the potential of generative AI to transform...

read
Aug 10, 2024

AI and Blockchain Convergence May Unlock Trillion-Dollar Market

The convergence of artificial intelligence (AI) and blockchain technologies is creating new opportunities and challenges, with innovative projects emerging to harness the strengths of both fields while addressing their limitations. The big picture: AI and blockchain, despite their apparent differences, are increasingly intersecting in ways that could revolutionize data management, privacy, and technological innovation across various industries. AI relies on massive datasets and high-performance computing, while blockchain emphasizes decentralization but faces constraints in memory and throughput. The global electricity demand for AI is projected to rise significantly, with estimates suggesting it could account for 16% of the USA's current electricity...

read
Aug 10, 2024

How the World’s Busiest Airport Aims to Transform Operations with AI

Atlanta's Hartsfield-Jackson International Airport, the world's busiest airport, is embarking on a comprehensive data transformation journey to enhance operations, boost revenue, and improve the traveler experience through the use of machine learning and generative AI. Pilot project success: A visual business intelligence dashboard developed by the airport's IT team has demonstrated significant improvements in operational efficiency and decision-making speed. The dashboard, built using Databricks' Azure data lake and Microsoft's Power BI, provides critical operational data in a single view, including flight information, security wait times, and other key metrics. Implementation of the dashboard has resulted in an 80% improvement in...

read
Aug 10, 2024

How Snowflake Approaches AI Investment Strategy

Snowflake, a leading data company, is making significant strides in the artificial intelligence space, focusing on four key areas of investment to enhance its capabilities and offerings. Cortex: Bringing AI to data: Snowflake is developing Cortex, a comprehensive suite of AI building blocks designed to empower customers with large language model (LLM) capabilities and application development tools. Cortex supports both structured and unstructured data, allowing users to extract information from various sources, including presentations and PDFs. This initiative aims to make AI more accessible and actionable for Snowflake's customers, enabling them to leverage advanced AI technologies within their existing data...

read
Load More