« Ilan Wurman on what Originalism Conserves
Michael Ramsey
| Main | Ryan Williams: Personal Jurisdiction and the Declaration of Independence
Michael Ramsey »


Donald Drakeman: Is Corpus Linguistics Better than Flipping a Coin?
Michael Ramsey

Recently published, in the Georgetown Law Journal Online, Donald L. Drakeman (Notre Dame): Is Corpus Linguistics Better than Flipping a Coin? (109 Geo. L.J. Online 81 (2020)). Here is the abstract:

Corpus linguistics offers the promise of “Big Data” solutions to difficult issues of constitutional interpretation. By searching the millions of words in COFEA, the Corpus of Founding-Era American English, scholars have reached what they have described as rigorous, reliable, and reproducible conclusions about the original meaning of the Constitution. These conclusions rely on unexamined assumptions about the nature of the database and the reliability of the methods employed for interpreting the data. This Article is the first to analyze those assumptions, and it shows why digital searches in COFEA are unlikely to be more accurate than flipping a coin. An understanding of these methodological assumptions will enable researchers to make the necessary adjustments to increase the odds of success in the future.

And from the introduction (footnotes omitted):

“Originalism is on the cusp of its own Big Data revolution,” declares Lee Strang, noting that “[f]or the first time, both a body of data of the Constitution’s original meaning and the technology to utilize that data are becoming available.” Legal scholars started this revolution by borrowing a fascinating tool from their colleagues in language, literature, and history—large digital compendia of written texts associated with the field of corpus linguistics2—with the aim of using targeted digital searches to discover the meaning of constitutional terms in the Founding era. Rather than relying on the limited information available in the few relevant dictionaries, or going through the painstaking process of finding and reading the statutes, legislative debates, newspapers, legal cases, novels, almanacs, and other materials making up the documentary record of the latter part of eighteenth-century America, scholars can perform computer searches in databases consisting of thousands of texts and millions of words. Originalism can now be “datadriven,” “scientific,” and “rigorously empirical.”


Strang is certainly right about two things: We have digitized collections of texts representing language use in the constitutional era and the technology to access them on a word-by-word basis. The remaining essential questions are whether those collections are genuinely representative and whether we have the necessary data-analysis tools to make sense of all of the resulting information in a way that clearly points towards an accurate understanding of the objective meaning of the text. As Strang observes, there are some cases where the technological approach may not eliminate the possibility of inaccuracy, and whether tools of corpus linguistics can deliver a single clear original public meaning will need to be evaluated on a clause-by-clause basis.

In practice, corpus linguistics searches for the Constitution’s original meaning have often sought to select one of two possible meanings. For example, is “religion” in the First Amendment limited to theism? Did the terms “commerce” and “emoluments” carry a broad or narrow definition? The goal has been to determine the answer objectively and empirically through a Big Data analysis of language use in the Founding era. For the sake of argument, and to highlight the key role of assumptions in applying this methodology to constitutional interpretation, I will propose an alternate approach to resolving lawsuits that has the advantage of being equally or more objective, while also being faster, cheaper, and a great deal less complicated: flipping a coin, for which the odds of an accurate answer to these kinds of binary questions is 50%. Moreover, as with other approaches to the search for original meaning, coin flipping would go a long way towards addressing one of the jurisprudential issues frequently cited by advocates of originalism—that is, the need to restrain judges from making decisions based on their own preferences. Despite its numerous advantages, coin flipping in cases of constitutional interpretation is normatively weak compared to the promise of scientifically based results. It is hard to imagine that an interpretive theory would be adopted by the Supreme Court if cases involving the interpretation of texts with contested original meanings would be decided by a coin toss or by any other method that could not make a better claim of accuracy than randomly being right half of the time.

Is corpus linguistics likely to be accurate more than half of the time? This Article will show that, in a number of important ways, corpus linguistics may not be up to the assigned task (at least yet), despite the sophisticated constitutional analyses that have appeared so far. The problems are not rooted in the impressive research done by scholars to date but in the historical and methodological assumptions they are making when they set out to use corpus linguistics databases for the purpose of constitutional interpretation.

Professor Strang's article is How Big Data Can Increase Originalism’s Methodological Rigor: Using
Corpus Linguistics to Reveal Original Language Conventions, 50 U.C. Davis L. Rev. 1181 (2017), available here.