A surge in certain words and phrases in scientific papers suggests the growing use of large language models (LLMs) in academic writing since their widespread introduction in late 2022:
Detecting LLM-generated text through “excess words”: Researchers analyzed millions of scientific abstracts, comparing the frequency of words before and after the introduction of LLMs, and found telltale signs of AI-generated content.
Estimating the prevalence of LLM-assisted writing: By identifying hundreds of “marker words” that became more common post-LLM, researchers estimate that at least 10% of scientific abstracts in 2024 were written with some level of LLM assistance.
Implications and future challenges: Detecting LLM-generated text is crucial because these models can make false claims or provide inaccurate information while sounding authoritative. However, as awareness of telltale signs grows, human editors may become better at removing marker words, making detection more difficult.
Looking ahead: The study highlights the need for ongoing research into detecting AI-generated content, as LLMs may evolve to mask their outputs better. The increasing use of LLMs in scientific writing also raises questions about the potential impact on the integrity and credibility of academic publications, emphasizing the importance of developing robust detection methods and guidelines for responsible use of AI in research.