back
Get SIGNAL/NOISE in your inbox daily

Multimodal LLMs appear to leverage conversation memory in ways that affect their performance and reliability, particularly when interpreting ambiguous visual inputs. This research reveals important differences in how models like GPT-4o and Claude 3.7 handle contextual information across conversation threads, raising questions about model controllability and the nature of instruction following in advanced AI systems.

The experiment setup: A researcher tested GPT-4o’s and Claude 3.7’s visual recognition capabilities using foveated blur on CAPTCHA images of cars.

  • The test used 30 images with cars positioned in different regions, applying varying levels of blur that mimicked human peripheral vision.
  • Initially asking “Do you see a car in this?” seemed too leading, so the researcher switched to the more neutral “What do you see in this image?”

Unexpected findings: GPT-4o consistently identified cars in heavily blurred images within established conversation threads, but struggled with the same images in fresh threads.

  • The model maintained high accuracy in identifying cars in ongoing conversations, even when images were blurred beyond human recognition.
  • When the same images were presented in new conversation threads, GPT-4o’s performance dropped significantly, once misidentifying a staircase as a dessert.
  • When questioned, GPT-4o initially denied using prior context as assistance, but later acknowledged that earlier conversation history had influenced its responses.

Model differences: Claude 3.7 demonstrated more consistent behavior across different conversation threads.

  • Claude provided more cautious responses regardless of conversation history.
  • Even when primed with the word “car,” Claude’s answers showed less influence from prior context compared to GPT-4o.

Broader implications: The research suggests multimodal LLMs possess a form of implicit memory beyond their explicit context windows.

  • This aligns with concerns raised in a LessWrong post from two years earlier about LLMs lacking access to long-term memory beyond immediate contexts.
  • The heavy utilization of this persistent memory, even when instructed otherwise, complicates model controllability.
  • Instructions to “ignore previous context” appear to function as probability influencers rather than hard rules that override prior activations.

Why this matters: This phenomenon raises important questions about how reliably we can control and direct multimodal AI systems in real-world applications.

  • The implicit memory effect could lead to inconsistent performance in safety-critical applications where context isolation is important.
  • Understanding these memory dynamics is crucial for developing more reliable and controllable AI systems.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...