Data - CO/AI

News/Data

Sep 17, 2024

Hugging Face Launches SQL Console for Advanced Dataset Exploration

Revolutionizing dataset exploration on Hugging Face: Hugging Face has introduced a powerful new SQL Console feature for datasets, enabling users to directly query and analyze data within their web browser. The SQL Console is now available for all public datasets on the Hugging Face Hub, accessible via a dedicated badge on each dataset page. This tool leverages DuckDB WASM technology, allowing users to perform complex queries without any backend dependencies or setup requirements. The console supports full DuckDB syntax, which is similar to PostgreSQL, providing a wide range of capabilities for data manipulation and analysis. Key features and functionality: The...

read Sep 17, 2024

Salesforce Boosts Data Cloud with AI-Powered Data Tools

Data Cloud expansion: Salesforce unveils significant enhancements to its Data Cloud platform, introducing new features aimed at improving data analysis, integration, and AI capabilities for enterprises. The announcement, made at the annual Dreamforce conference, showcases Salesforce's commitment to providing comprehensive data solutions for businesses. These updates are designed to enhance the functionality of Salesforce's AI agents, known as Agentforce, by providing more robust and diverse data inputs. Unstructured data analysis: Salesforce introduces support for unstructured audio and video data, enabling businesses to extract valuable insights from a wider range of customer interactions. This feature allows companies to analyze data from...

read Sep 13, 2024

Meta Announces Plans to Train AI on UK User Data

Meta expands AI training data to include UK user content: The tech giant announces plans to incorporate public posts, comments, and photos from British Facebook and Instagram users into its AI training datasets. Meta aims to leverage this data to accelerate the development and deployment of its generative AI products in the UK market. The company states that this initiative will help its AI systems better reflect "British culture, history, and idiom," though the specifics of this goal remain unclear. The data collection process will commence in the coming months, affecting adult accounts on both Facebook and Instagram platforms. Data...

read Sep 13, 2024

Google’s DataGemma AI Models Target Statistical Inaccuracies

Breakthrough in AI accuracy: Google has introduced DataGemma, a pair of open-source AI models designed to reduce hallucinations in large language models (LLMs) when answering queries about statistical data. DataGemma builds upon Google's existing Gemma family of open models and leverages the extensive Data Commons platform, which contains over 240 billion data points from trusted organizations. The models are available on Hugging Face for academic and research purposes, signaling Google's commitment to advancing AI research in the public domain. Two distinct approaches, Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation (RAG), are employed to enhance factual accuracy in the models'...

read Sep 12, 2024

If You Didn’t Set Your Photos to Private, Meta Probably Trained Its AI on Them

Meta's extensive data usage for AI training: Meta has confirmed that it has been using public posts and photos from adult Facebook and Instagram users since 2007 to train its artificial intelligence models. The revelation came during an Australian government inquiry into AI adoption, where Meta's global privacy director, Melinda Claybaugh, initially denied but later confirmed the extent of data usage. This practice includes all public text and photo content posted by adult users on Facebook and Instagram over the past 17 years. Users who have not explicitly set their posts to private have had their data included in Meta's...

read Sep 12, 2024

Google’s AI Fact-Checker Aims to Curb Hallucinations

Google's new AI fact-checking tool: Google has unveiled DataGemma, a tool designed to enhance the accuracy and reliability of large language models by grounding their responses in verifiable data. DataGemma employs two primary methods: Retrieval-Interleaved Generation (RIG) and Retrieval-Augmented Generation (RAG), both of which leverage Google's Data Commons to verify and augment AI-generated responses. The tool aims to address the persistent issue of AI hallucinations by providing a mechanism for language models to cross-reference their outputs against real-world statistical data. Currently, DataGemma is exclusively available to researchers, with potential plans for broader access pending further testing and refinement. How DataGemma...

read Sep 12, 2024

Zoho Releases AI-Powered Analytics to Boosts Business Intelligence

AI-powered upgrade revolutionizes Zoho Analytics: Zoho Corporation has released an enhanced version of its self-service business intelligence and analytics platform, Zoho Analytics, featuring over 100 improvements including advanced AI and machine learning capabilities. Key enhancements and features: The new Zoho Analytics introduces significant upgrades across four primary areas: Data Management, Artificial Intelligence, Data Science & Machine Learning (DSML), and Extensibility. The Data Management Hub now includes Stream Analytics and 25 additional data connectors, expanding its connector portfolio to over 500. Users can create and manage complex ETL data pipelines using a visual builder or the new Python Code Studio. The...

read Sep 12, 2024

Why Good Data is the Foundation for Success in the AI Era

Generative AI's success hinges on high-quality data: Organizations face challenges in preparing and processing data effectively for AI initiatives. The data dilemma: Many businesses struggle with data preparation for AI projects, leading to potential setbacks in their artificial intelligence initiatives. Gartner analysts predict that at least 30% of generative AI projects will be abandoned after proof of concept through 2025, with poor data quality cited as a primary reason. Having data ready for AI can drive greater business outcomes by 20%, according to Gartner Senior Director Analyst Roxane Edijlala. Organizations often lack clarity on how to prepare their data, especially...

read Sep 11, 2024

Meta Mines 16 Years of Social Data for AI Training

Meta's extensive data mining for AI training revealed: Meta has admitted to scraping all public posts from Facebook and Instagram since 2007 to train its generative AI model, raising significant privacy concerns and contrasting sharply with Apple's approach to AI development. Scale of data collection: Meta's admission covers a vast trove of user-generated content spanning nearly two decades, encompassing billions of posts, photos, and comments from its social media platforms. The data collection includes all public posts made on Facebook and Instagram since 2007, excluding only content from users under 18 and private accounts. Photos of children posted by parents...

read Sep 11, 2024

Facebook Admits to Scraping Australians’ Photos, Posts to Train AI

Facebook's massive data collection for AI training: The social media giant has confirmed it is scraping public data from all Australian adult users on its platform to train AI models, without offering an opt-out option. Facebook is collecting public photos, posts, and other data from Australian adult users' accounts dating back to 2007 for AI training purposes. The company initially denied this practice but later confirmed it when pressed during an inquiry. Data from users under 18 is not scraped, but public photos of children posted on adult accounts are included in the collection. Discrepancy in user privacy options: Facebook...

read Sep 10, 2024

New AI Framework ChartEye Will Extract Info From Any Chart

Innovative framework for automated chart analysis: ChartEye, a new deep learning framework, offers a comprehensive solution for extracting information from charts and infographics, addressing the complex challenges in automated chart understanding. Developed by researchers Osama Mustafa, Muhammad Khizer Ali, Momina Moetesum, and Imran Siddiqi, ChartEye tackles multiple tasks in the chart information extraction process. The framework utilizes advanced machine learning techniques, including hierarchical vision transformers and YOLOv7, to perform chart-type classification, text-role classification, and text detection. To improve optical character recognition (OCR) accuracy, ChartEye employs Super Resolution Generative Adversarial Networks (SR-GANs) to enhance detected text. Key performance metrics: Experimental results...

read Sep 10, 2024

Oracle Launches New AI Developer Assistant for Fusion Data Intelligence Platform

Oracle enhances Fusion Data Intelligence with AI: Oracle is integrating a new generative AI-powered developer assistant into its Fusion Data Intelligence service, part of the Fusion Cloud Applications Suite, as announced at CloudWorld 2024. Key features of the AI-powered developer assistant: Streamlines the configuration of Oracle Fusion Data Intelligence service Accelerates the addition of third-party data sources through a guided, step-by-step process Aims to simplify and expedite development tasks for users of the platform Fusion Data Intelligence overview: Updated version of Fusion Analytics Warehouse Combines enterprise data, ready-to-use analytics, and prebuilt AI and machine learning models Delivers comprehensive business intelligence...

read Sep 7, 2024

How Power BI and Tableau Empower Marketers in the AI-Driven Economy

The evolution of marketing data visualization: Marketing has undergone significant changes in recent years, with data visualization tools like Power BI and Tableau emerging as powerful allies for marketers navigating an increasingly complex digital landscape. According to a HubSpot survey, over 75% of marketers believe marketing has changed more in the last three years than in the previous fifty. Factors contributing to this shift include advancements in AI, the rise of influencers, growth of new channels like podcasts and short-form video, and users' deeper digital footprints. Companies now use between 3 to 12 different channels for marketing outreach, leading to...

read Sep 7, 2024

Major Platforms Block Apple’s AI Crawler Amid Data Concerns

AI data collection faces pushback: Apple's introduction of Applebot-Extended, a tool allowing websites to opt out of AI training data collection, has sparked a significant response from major online platforms and publishers. Prominent sites including Facebook, Instagram, Craigslist, Tumblr, and several leading news outlets have chosen to block Applebot-Extended, signaling a shift in attitudes towards web crawlers and their role in AI development. Website owners can prevent Applebot-Extended from accessing their content by updating their robots.txt file, a standard method for controlling web crawler access. Recent analyses indicate that 6-7% of high-traffic websites are blocking Applebot-Extended, with news and media...

read Sep 6, 2024

How Effective Metadata Management Unlocks AI Potential for Enterprises

Metadata management is becoming increasingly crucial for enterprises as they navigate the complexities of AI and ML implementation, offering a pathway to streamlined data operations and enhanced security. The big picture: As AI and ML reshape industries, effective data management has become essential for organizations, with metadata management emerging as a critical component for driving success in these technologies. AI and ML require large amounts of accurate data, necessitating comprehensive data management strategies that address security, regulations, efficiency, and architecture. A Cloudera study reveals that 73% of enterprise IT leaders report their company's data exists in silos and is disconnected,...

read Sep 6, 2024

How AI Assistants Will Revolutionize IT Operations with Smart Data Tools

The rise of AI assistants in IT: Artificial Intelligence (AI) assistants are revolutionizing the IT industry by simplifying complex data tasks, automating processes, and enhancing decision-making capabilities. AI-driven solutions are becoming essential tools for IT professionals, addressing the challenges of managing and interpreting vast and complex datasets. A recent Cloudera survey revealed that 55% of IT leaders would prefer a root canal over navigating their data access challenges, highlighting the critical need for more efficient data management solutions. SQL AI Assistant: Transforming data queries: AI-powered SQL assistants are streamlining the process of crafting and optimizing database queries, making data access...

read Sep 5, 2024

This YC Startup Is Bringing the Full Power of AI to Spreadsheets

Revolutionizing spreadsheets with AI: Paradigm, a new startup backed by Y Combinator, is reimagining spreadsheets for the modern era by integrating generative AI into every cell. Founded by 22-year-old Anna Monaco, a recent University of Pennsylvania graduate, Paradigm has emerged from stealth with $2 million in seed funding. The company's software uses AI agents built on proprietary and open-source generative AI models, including OpenAI's GPT-4 and Meta's Llama family. Paradigm claims to be 1000 times faster than manual data collection, completing an average of 500 cells per minute. Key features and capabilities: Paradigm's AI-powered spreadsheet software offers a range of...

read Sep 4, 2024

How to Use GPT-4o for Web Scraping

AI-assisted web scraping with GPT-4o: OpenAI's new structured outputs feature in their API has opened up exciting possibilities for AI-assisted web scraping, as demonstrated by a recent experiment using GPT-4o. Initial approach and model selection: The experiment utilized Pydantic models to define the structure for parsed columns and tables A system prompt was crafted to instruct GPT-4o on its role as an expert web scraper GPT-4o outperformed GPT-4o mini in parsing accuracy, leading to its selection for further experimentation Performance on complex tables: GPT-4o successfully parsed a 10-day weather forecast table from Weather.com, correctly handling varying row sizes and hidden...

read Sep 4, 2024

How AI is Transforming Emissions Data into Business Opportunities

Applying artificial intelligence (AI) to emissions data can reveal valuable insights for businesses, helping them reduce their carbon footprint and identify investment opportunities. However, the effectiveness of AI in analyzing emissions data hinges on data consistency and organization across complex enterprises and their supply chains. The dual challenge of emissions data: Companies face regulatory pressures to report and reduce emissions while also seeking to capitalize on business opportunities related to emissions management. Regulations are driving the need for accurate emissions reporting and reduction over time. The U.S. Inflation Reduction Act offers investment credits for carbon sequestration and storage, creating potential...

read Sep 4, 2024

Digital Experience Assurance: How to Ensure Data Fidelity in the Age of AI

The rise of AI-driven Digital Experience Assurance (DXA) is transforming how IT teams manage and optimize digital experiences, particularly in the context of unowned cloud and Internet infrastructure. This emerging discipline offers solutions to the challenges of data overload and the need for rapid, intelligent action in IT operations. The digital experience challenge: IT teams are tasked with ensuring optimal performance of digital experiences, even when they don't control the underlying cloud and Internet networks. The Internet has become the new corporate backbone, creating a vast, unowned environment critical for business operations. The sheer volume of data from various sources...

read Sep 4, 2024

Cybersecurity Experts Share Key Strategies for Managing AI-Related Threats

The evolving threat landscape: As artificial intelligence becomes more prevalent, cybersecurity professionals are adapting their approaches to protect against a diverse array of actors and intentions across the global internet. The threat landscape can be likened to a "Game of Thrones" style battle, involving various regional actors and players with different motivations, rather than a simple confrontation between "white hats" and "black hats." Cybersecurity efforts now focus on protecting AI systems and creating robust AI policies to address the unique challenges posed by this technology. Key cybersecurity strategies: Experts recommend several approaches to enhance security in AI-related systems and manage...

read Sep 4, 2024

This New Organization Is Trying to Make AI Data Licensing Ethical

The formation of the Dataset Providers Alliance (DPA) marks a significant step towards ethical data licensing in the rapidly evolving AI industry, advocating for creator consent and standardized practices. A new player in AI ethics: The Dataset Providers Alliance, a trade group formed in the summer of 2024, aims to establish ethical standards and practices for data licensing in the artificial intelligence sector. Comprised of seven AI licensing companies, the DPA represents a collective effort to address the ethical concerns surrounding data usage in AI development. The alliance's primary focus is on promoting an opt-in system for data usage, ensuring...

read Sep 3, 2024

Apple’s AI Web Crawler Blocked by Major Websites

AI training data controversy: Apple's web crawler for AI training, Apple-Extended, is facing widespread blocking from major websites, highlighting growing tensions in the AI industry over data access and usage. Major news publishers including The New York Times, The Atlantic, The Financial Times, Gannett, Vox Media, and Condé Nast have altered their robots.txt files to prevent Apple-Extended from scraping their content. Social media platforms Facebook, Instagram, and Tumblr, as well as Craigslist, have also confirmed blocking Apple's AI-focused web crawler. These actions reflect the increasing value and sensitivity surrounding high-quality, human-generated content for AI training purposes. Industry dynamics and partnerships:...

read Sep 2, 2024

New Survey Reveals AI and Privacy Attitudes Among U.S. Workers

Generative AI and data privacy attitudes in focus: A new survey conducted by Zoho Corporation and CRM Essentials reveals contrasting viewpoints on the use of generative AI and data privacy among U.S. employees. The study surveyed 1,000 employees across various industries, company sizes, and disciplines to understand their interactions with generative AI at work, attitudes toward the technology, and concerns regarding data privacy. This survey comes amid growing interest in AI technologies and their implications for businesses and individuals. Adobe enhances marketing campaign efficiency: Adobe has launched Workfront Planning, a new offering within its enterprise work management application, Adobe Workfront....

read