×
Telltale Signs: AI’s Growing Influence on Scientific Writing Revealed by Word Frequency Analysis
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

A surge in certain words and phrases in scientific papers suggests the growing use of large language models (LLMs) in academic writing since their widespread introduction in late 2022:

Detecting LLM-generated text through “excess words”: Researchers analyzed millions of scientific abstracts, comparing the frequency of words before and after the introduction of LLMs, and found telltale signs of AI-generated content.

  • Words like “delves,” “showcasing,” and “underscores” appeared up to 25 times more frequently in 2024 abstracts compared to pre-LLM trends.
  • The post-LLM era saw a significant increase in the use of “style words” such as verbs, adjectives, and adverbs (e.g., “across,” “additionally,” “comprehensive,” “crucial”), unlike the pre-LLM period where sudden word frequency changes were primarily linked to major world events and mostly involved nouns.

Estimating the prevalence of LLM-assisted writing: By identifying hundreds of “marker words” that became more common post-LLM, researchers estimate that at least 10% of scientific abstracts in 2024 were written with some level of LLM assistance.

  • The actual percentage could be higher, as the analysis may have missed LLM-assisted abstracts that don’t contain the identified marker words.
  • LLM usage varies across countries, with papers from China, South Korea, and Taiwan showing marker words 15% of the time, possibly due to non-native English speakers using LLMs for editing assistance.

Implications and future challenges: Detecting LLM-generated text is crucial because these models can make false claims or provide inaccurate information while sounding authoritative. However, as awareness of telltale signs grows, human editors may become better at removing marker words, making detection more difficult.

Looking ahead: The study highlights the need for ongoing research into detecting AI-generated content, as LLMs may evolve to mask their outputs better. The increasing use of LLMs in scientific writing also raises questions about the potential impact on the integrity and credibility of academic publications, emphasizing the importance of developing robust detection methods and guidelines for responsible use of AI in research.

The telltale words that could identify generative AI text

Recent News

Limitless AI’s $499 pendant promises to be your always-on memory assistant

The $499 wearable listens continuously to users' conversations, translating unstructured speech into organized information while addressing privacy concerns through encryption and data deletion policies.

Volvo combines crash data history with AI simulation to redefine vehicle safety

Leveraging decades of crash data, Volvo uses AI simulation to test thousands of virtual accident scenarios, enabling more comprehensive safety solutions than physical testing alone.

Supermicro unveils petascale storage server with NVIDIA Grace CPU for AI workloads

The new server pairs NVIDIA's 144-core Arm-based Grace CPU with ultra-dense storage capabilities, enabling a single rack to hold up to 39.3 petabytes of data while consuming less power than traditional x86 alternatives.