News/Data

Feb 6, 2025

Sanctum’s local AI app may be just what you need to keep your data private

AI privacy and data protection take center stage with Sanctum, a new locally-installed artificial intelligence application that processes queries entirely on users' devices without sharing data with third parties. What you need to know: Sanctum operates as a desktop application available for MacOS and Windows platforms, with Linux support planned for future release. The application focuses on data privacy through local processing and encryption Users can choose from thousands of GGUF models including Gemma, Llama, and Mistral The software provides capabilities like PDF summarization and offline functionality Being open-source, Sanctum offers transparency in its operations and data handling Core features...

read
Feb 5, 2025

How a small Philadelphia consulting firm went from open-source data tool to unicorn

The startup story: Dbt Labs transformed from a small Philadelphia consulting firm into a billion-dollar enterprise software company by developing an innovative data analytics tool. Founded by Tristan Handy nearly a decade ago, the company originated as Fishtown Analytics, a consultancy helping businesses maximize cloud-based data tools The company's core product, Dbt Core (data built tool), began as an internal tool to streamline data cleaning and conversion processes After open-sourcing Dbt Core in 2020, the company rebranded as Dbt Labs and pivoted to a software-focused business model Current business performance: Dbt Labs has achieved significant growth milestones, demonstrating strong market...

read
Feb 5, 2025

App Orchid, Google Cloud partnership to bring data analytics into the natural language era

Google Cloud and App Orchid have announced an expanded partnership to integrate natural language AI querying capabilities for enterprise data analytics, powered by Google's Gemini AI models. The partnership details: App Orchid's technology will be integrated with Google Cloud's Cortex Framework to help businesses analyze data from enterprise systems like SAP and Salesforce using conversational queries. The collaboration enables business users to ask questions about their data in plain English and receive AI-generated insights App Orchid claims their solution can reduce data preparation time by up to 85% compared to traditional methods The Easy Answers application is now available on...

read
Feb 2, 2025

Generative adversarial networks explained

Generative Adversarial Networks (GANs) are machine learning models that create synthetic data by pitting two neural networks against each other in a competitive process. Core concept and evolution: GANs, introduced in 2014 by Ian Goodfellow, have transformed the landscape of artificial content generation through their ability to create increasingly realistic synthetic data. These models can generate various types of content including images, text, audio, and video Applications range from creating artificial faces to colorizing black-and-white images GANs play a crucial role in creating synthetic training data for AI models when real data is scarce Technical architecture: The GAN framework consists...

read
Feb 2, 2025

Shutterstock-Lightricks partnership offers example of how to ethically source training data for AI video

Shutterstock's innovative "research license" model with Lightricks marks a significant shift in how AI companies can legally and ethically access training data, potentially making high-quality datasets more accessible to startups and smaller developers. The groundbreaking partnership: Shutterstock and AI creative technology company Lightricks have established a new licensing framework that allows AI companies to access training data through a graduated approach. Lightricks will train its open-source video generation model LTXV using Shutterstock's HD and 4K video library The model enables companies to begin with a smaller research license for testing before upgrading to commercial licenses This approach directly addresses the...

read
Jan 30, 2025

Forrester Wave Report: AI-driven innovation reshapes enterprise content platforms

A new Forrester Wave report evaluates 12 leading content platform vendors, identifying four Leaders, five Strong Performers, and three Contenders in the rapidly evolving enterprise content management market. Market evolution and current landscape: The enterprise content management industry has transformed significantly, now centered on AI-enabled cloud content platforms that go beyond basic document management. Content platforms have become fundamental components of digital workplaces, integrating with productivity suites and enterprise applications Technical leaders seek flexible, extensible platforms for developing content-rich applications The evaluation covered 12 key vendors across 24 criteria through detailed questionnaires, demonstrations, and customer feedback AI-driven innovation highlights: Artificial...

read
Jan 30, 2025

Why knowledge graphs are the missing link in enterprise AI

The convergence of technologies: Knowledge graphs are emerging as a critical bridge between traditional enterprise data structures and modern AI systems, particularly in conjunction with retrieval augmented generation (RAG). Major tech companies including Microsoft, Google, Amazon, and specialized vendors like NebulaGraph and Neo4j have launched GraphRAG solutions to integrate knowledge graphs with LLMs Knowledge graphs provide a structured way to represent relationships between data points, making it easier for AI systems to understand and utilize enterprise information The combination of knowledge graphs with RAG systems helps AI better comprehend complex business contexts and relationships Technical implementation and benefits: GraphRAG integration...

read
Jan 29, 2025

Apple faces allegations of outsourcing unethical data sourcing

Apple's AI practices face scrutiny from shareholders ahead of its February 25 Annual Shareholder Meeting, with specific concerns about data privacy and partnerships with AI companies. Key allegations: The National Legal and Policy Center (NLPC) has filed a proposal with the SEC questioning Apple's approach to AI development and data collection practices. The proposal, listed as No. 4 in Apple's 2025 proxy materials, calls for detailed reporting on AI data acquisition and ethics NLPC criticizes Apple for allegedly outsourcing "unethical practices" to partners while maintaining a privacy-friendly public image A particular focus is placed on Apple's $25 billion partnership with...

read
Jan 29, 2025

Observo’s new AI-native data pipelines reduce noisy telemetry by 70%

A new AI-powered platform from Observo AI significantly reduces enterprise telemetry data noise while improving security incident response times. The core innovation: Observo AI has developed an AI-native data pipeline platform that automatically filters and routes telemetry data (logs, metrics, and traces) to optimize enterprise security operations. The platform uses machine learning to analyze incoming data streams and identify critical signals for incident detection Early customers report 70% reduction in noisy telemetry data and 40% faster incident response times The solution adapts automatically to new threats without requiring manual rule updates Market context: Enterprise systems are generating unprecedented volumes of...

read
Jan 29, 2025

Essential good practices to consume and produce data for AI implementation

AI data management requires robust ecosystems that balance accessibility with governance, enabling organizations to effectively produce and consume data at scale. Current data landscape; Organizations face unprecedented challenges in data management, with global data volume doubling in five years and 68% of enterprise data remaining unused. Approximately 80-90% of data is unstructured, according to MIT research, creating significant complexity in data utilization Modern use cases demand extremely fast data availability, with some requiring sub-10 millisecond access times The rise of AI has intensified the need for sophisticated data management strategies Core principles for effective data management; Three fundamental elements form...

read
Jan 29, 2025

Web developers deploy digital quicksand to fight back against AI crawlers

A new battlefront has emerged in the struggle over AI training data, as tech developers deploy sophisticated "tarpit" software designed to entangle and frustrate AI web crawlers that ignore traditional access controls. These digital traps, including tools like Nepenthes and Iocaine, create endless mazes of meaningless data specifically engineered to ensnare AI companies' web crawlers while wasting their computational resources. The development of these defensive measures marks an escalation in the ongoing tension between AI companies' aggressive data collection practices and website owners' attempts to maintain control over their content, though their long-term effectiveness remains to be seen. The core...

read
Jan 28, 2025

Is your data safe if you use DeepSeek?

Core security concerns: DeepSeek AI, a new artificial intelligence platform competing with ChatGPT, is raising significant privacy and data security concerns, particularly regarding its data collection practices and server locations in China. Privacy policy red flags: The platform's privacy policy reveals extensive data collection and storage practices that may put user information at risk. DeepSeek collects various forms of personal data, including names, birth dates, email addresses, and all user interactions with the platform User content, including text inputs, audio, uploaded files, and chat histories, is stored on servers located in China The company retains user data for an unspecified...

read
Jan 28, 2025

AI data startup Turing triples revenue to $300M

A San Francisco-based AI data company, Turing, has announced a revenue surge to $300 million in 2024, marking a significant milestone in the AI data labeling industry. Company Performance and Growth: Turing has achieved profitability while tripling its revenue, demonstrating the increasing demand for specialized AI training services. The company's valuation stood at $1.1 billion as of 2021 Major clients include industry leaders OpenAI, Google, Anthropic, and Meta Turing maintains a network of over 4 million human experts available for AI training projects Business Model and Services: Turing connects AI companies with specialized human trainers who help improve AI model...

read
Jan 27, 2025

HBR: The biggest ways AI is changing how companies work

The artificial intelligence revolution is creating fundamental shifts in how businesses operate, with new capabilities enabling continuous evolution, real-time intelligence, and multi-modal data processing. Key developments: AI is transcending previous technological limitations in three major areas that are reshaping business operations and strategy. Continuous enterprise reinvention is replacing periodic transformation, allowing companies to constantly adapt and evolve Real-time intelligence is superseding traditional episodic software updates, enabling more dynamic decision-making Multi-modal data synthesis capabilities are breaking free from single-input restrictions, combining text, visual, and other data types Historical context: The impact of AI represents a technological leap comparable to the invention...

read
Jan 24, 2025

The top data trends shaping business strategy in 2025

New developments in data technology for 2025 show a tension between consolidation of existing tools and expansion driven by artificial intelligence capabilities. Key industry shifts; The data technology landscape is experiencing dual forces of consolidation in traditional infrastructure while AI drives unprecedented expansion of capabilities. Companies are actively simplifying their data architectures, with many enterprise customers explicitly requesting fewer, not more, tools Major platforms like Snowflake and Databricks are becoming dominant as enterprises select their primary architecture Business Intelligence (BI) tools are consolidating around solutions that balance central and distributed control, such as Omni Financial pressures and efficiency; Cost considerations...

read
Jan 23, 2025

How to enhance data backup and recovery with AI

The integration of AI and Machine Learning technologies is transforming data backup and recovery solutions, enabling more robust protection against cyberthreats, hardware failures, and human errors. The evolution of data protection: AI and ML technologies are fundamentally changing how organizations approach data backup and recovery by enabling advanced threat detection and automated response capabilities. Real-time monitoring systems can now detect unusual activities and potential cyberthreats, including unauthorized access attempts and abnormal data transfers Machine learning algorithms optimize backup processes by learning from historical data patterns The Veeam 2024 Data Protection Trends Report emphasizes the crucial role of AI/ML integration in...

read
Jan 21, 2025

Experts explain why AI models struggle to accurately diagnose cancer

AI's latest attempts to diagnose cancer through pathology and imaging analysis demonstrate both promise and significant challenges in achieving clinical-grade accuracy. Current landscape: The Mayo Clinic and Aignostics have developed Atlas, a new AI model trained on 1.2 million tissue samples, marking one of several recent efforts to apply artificial intelligence to cancer diagnosis. Atlas achieved 97.1% accuracy in identifying cancerous colorectal tissue, matching human pathologist diagnoses The model's performance varied significantly across different cancer types, with only 70.5% accuracy for prostate cancer biopsies Overall, Atlas matched human expert diagnoses 84.6% of the time across nine benchmarks Technical challenges: Processing...

read
Jan 15, 2025

Not every creator doesn’t want their content scraped by AI — here’s why

A growing trend of deliberate content creation aimed at influencing AI training data has sparked discussion about the most effective platforms and methods for ensuring content inclusion in future AI models. Current landscape; The practice of "writing for AI" represents a strategic effort by content creators to have their thoughts and beliefs incorporated into AI training datasets. LessWrong is widely recognized as a platform likely to be included in AI training data scraping efforts Twitter/X's content may primarily benefit specific AI models like Grok, limiting broader influence Questions remain about the effectiveness of personal blogs and technical configurations for ensuring...

read
Jan 13, 2025

Real-world video data provides virtually unlimited training material for AI models

Embodied AI's ability to collect real-world data through cameras and sensors represents a fundamental shift away from reliance on internet-sourced training data. Key metrics and scale: The volume of data collected through real-world capture far exceeds traditional internet-based sources. A single camera running continuously can generate the equivalent of FineWeb's entire 15T token dataset (the largest open-source English training dataset) in just 15.6 years A network of one million cameras could generate one trillion training tokens in the time it takes to read a short article The data collection equation is straightforward: Data Scale = Number of Sensors × Time...

read
Jan 9, 2025

AI revolutionizes university archives by uncovering historical insights and preserving the past

A rise in artificial intelligence adoption is changing how universities handle archival practices, with new tools enabling faster document processing while raising concerns about data accuracy and permissions. Current landscape: Universities serve as crucial repositories for historical documents, research papers, and increasingly, digital content like websites and social media posts. Traditional archival work remains largely manual, including document scanning and metadata entry Most new university materials are created in digital formats like PDFs Many institutions face backlogs in converting analog materials to digital formats AI's role in paper archives: Artificial intelligence technologies are demonstrating significant capabilities in processing historical documents...

read
Jan 9, 2025

Meta secretly trained its AI models on a Russian ‘shadow library,’ court docs show

Meta's use of pirated books database LibGen to train its AI language models has been revealed through court-ordered document unredaction, marking a significant development in an ongoing copyright lawsuit filed by authors. The core revelation: Meta accessed and utilized Library Genesis (LibGen), a controversial pirated content database, for AI model training, despite internal concerns about the legality and optics of this approach. Internal company discussions about using LibGen data were escalated to CEO Mark Zuckerberg Meta employees expressed hesitation about accessing LibGen data from corporate laptops The company's AI team ultimately received approval to use the pirated materials Legal context...

read
Jan 9, 2025

AI models easily absorb medical misinformation, study finds

Large language models (LLMs) can be easily compromised with medical misinformation by altering just 0.001% of their training data, according to new research from New York University. Key findings: Researchers discovered that injecting a tiny fraction of false medical information into LLM training data can significantly impact the accuracy of AI responses. Even when misinformation made up just 0.001% of training data, over 7% of the LLM's answers contained incorrect medical information The compromised models passed standard medical performance tests, making the poisoning difficult to detect For a large model like LLaMA 2, researchers estimated it would cost under $100...

read
Jan 8, 2025

Easily embed RAG into your product with Ragie’s new ‘Connect’ platform

Ragie Connect is a platform that helps developers integrate Retrieval-Augmented Generation (RAG) capabilities into their applications using customers' existing data sources. Core functionality; Ragie Connect simplifies the implementation of RAG systems by providing seamless integration with popular data sources like Google Drive, Salesforce, and Notion. The platform automates user authentication and data synchronization processes, reducing development complexity and time-to-market RAG technology, which combines AI language models with specific data retrieval, allows applications to generate responses based on users' own information Developers can implement AI features into their products rapidly without building complex data integration pipelines Technical details; The platform offers...

read
Dec 29, 2024

How ‘federated learning’ in AI enhances privacy without sacrificing innovation

Federated learning represents a significant advancement in AI technology that enables machine learning models to learn from distributed data sources while maintaining data privacy and security. Core concept and innovation: Federated learning fundamentally changes how AI systems learn by bringing the model to the data rather than centralizing data in one location, enabling privacy-preserving machine learning at scale. Instead of collecting data in a central repository, the AI model travels to where data resides, whether on smartphones, hospital servers, or smart devices The approach allows AI systems to learn from millions of data points while keeping sensitive information secure at...

read
Load More