Thomas Lee & Jesse Egbert on AI and Corpus Linguistics [Updated]
Michael Ramsey
At Volokh Conspiracy, Justice Thomas Lee (Utah Supreme Court, ret.) & Jesse Egbert (Northern Arizona University - Applied Linguistics) are guest-blogging about their article Artificial Meaning? Here is the article abstract:
The textualist turn is increasingly an empirical one—an inquiry into ordinary meaning in the sense of what is commonly or typically ascribed to a given word or phrase. Such an inquiry is inherently empirical. And empirical questions call for replicable evidence produced by transparent methods-not bare human intuition or arbitrary preference for one dictionary definition over another.
Both scholars and judges have begun to make this turn. They have started to adopt the tools used in the field of corpus linguistics—a field that studies language usage by examining large databases (corpora) of naturally occurring language.
This turn is now being challenged by a proposal to use a simpler, now-familiar large language model (LLM)—AI-driven LLMs like ChatGPT. The proposal began with two recent law review articles. And it caught fire—and a load of media attention—with a concurring opinion by Eleventh Circuit Judge Kevin Newsom in a case called Snell v. United Specialty Insurance Co. The Snell concurrence proposed to use ChatGPT and other LLM AIs to generate empirical evidence of relevance to the question whether the installation of in-ground trampolines falls under the ordinary meaning of "landscaping" as used in an insurance policy. It developed a case for relying on such evidence—and for rejecting the methodology of corpus linguistics—based in part on recent legal scholarship. And it presented a series of AI queries and responses that it presented as "datapoints" to be considered "alongside" dictionaries and other evidence of ordinary meaning.
The proposal is alluring. And in some ways it seems inevitable that AI tools will be part of the future of an empirical analysis of ordinary meaning. But existing AI tools are not up to the task. They are engaged in a form of artificial rationalism—not empiricism. And they are in no position to produce reliable datapoints on questions like the one in Snell.
We respond to the counter-position developed in Snell and the articles it relies on. We show how AIs fall short and corpus tools deliver on core components of the empirical inquiry. We present a transparent, replicable means of developing data of relevance to the Snell issue. And we explore the elements of a future in which the strengths of AI-driven LLMs could be deployed in a corpus analysis, and the strengths of the corpus inquiry could be implemented in an inquiry involving AI tools.
And here are the initial posts:
Corpus Linguistics, LLM AIs, and the Assessment of Ordinary Meaning
Corpus Linguistics, LLM AIs, and the Future of Ordinary Meaning
LLM AIs as Tools for Empirical Textualism?: Manipulation, Inconsistency, and Related Problems
UPDATE: A further post, which is a core part of the argument: