« Neal Goldfarb on Justice Gorsuch's Dictionaries
Michael Ramsey
| Main | The Alien Tort Statute and "The Constitution's Text and Customary International Law"
Michael Ramsey »


Ethan Herenstein: Difficulties in Operationalizing Ordinary Meaning Through Corpus Linguistics
Michael Ramsey

Ethan Herenstein (Stanford Law School, Students) has posted The Faulty Frequency Hypothesis: Difficulties in Operationalizing Ordinary Meaning Through Corpus Linguistics (70 Stanford Law Review Online 112 (Dec. 2017)) on SSRN.  Here is the abstract: 

Promising to inject empirical rigor into the enterprise of statutory interpretation, corpus linguistics has, over the past couple years, taken the legal academy by storm. However, little attention has been paid to an important premise implicit in the operationalization of ordinary meaning through corpus linguistics: Where an ambiguous term retains two plausible meanings, the ordinary meaning of the term is the more frequently used meaning of the term. Call this the Frequency Hypothesis. 

This Essay identifies and explores an important reason to doubt the Frequency Hypothesis: A word might be used more frequently in one sense than another for reasons that have little to do with the ordinary meaning of that word. Specifically, a word’s frequency will not necessarily reflect the sense of a word or phrase that is most likely implicated in a given linguistic context, but will instead, at least partly, reflect the prevalence or newsworthiness of the underlying phenomenon that the term denotes. Accordingly, the Essay advocates for a more thorough reckoning of the Frequency Hypothesis before judges and scholars embrace corpus linguistics as a tool for statutory interpretation.

The essay is also available at the Stanford Law Review website, here.

RELATED:  First posted last year, but recently revised, Justice Thomas R. Lee (Utah Supreme Court) & James Cleith Phillips (The Becket Fund for Religious Liberty; University of California, Berkeley - Department of Jurisprudence & Social Policy), Data-Driven Originalism.  This paper will be presented at the upcoming originalism works-in-progress conference in San Diego.  Here is the abstract:

The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill-suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for the consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common). Founding-era dictionaries, moreover, were generally the work of one individual, tended to plagiarize each other, and relied on famous, often dated examples of English usage (from Shakespeare or the King James Bible). 

Originalists have also turned to examples of usage in founding-era documents. This approach can address some of the shortcomings of dictionaries; a careful inquiry into sample sentences from founding-era literature can consider a wide range of semantic context. Yet even here the standard inquiry falls short. Originalists tend to turn only to certain sources, such as the Federalist Papers or the records of the state constitutional conventions, and those sources may not fully reflect how ordinary users of English of the day would have understood the Constitution (or at least have used language). Second, the number of founding-era documents relied on is often rather small, especially for generalizing about an entire country (or profession, in the case of lawyers). This opens originalists up to criticisms of cherry-picking, and even if that is not the case, sample sizes are just too small to confidently answer originalist questions.

But all is not lost. Big data, and the tools of linguists, have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding-era texts: the beta version of the Corpus of Founding-Era American English. The initial beta version will contain approximately 150 million words, derived from the Evans Early American Imprint Series (books, pamphlets and broadsides by all types of Americans on all types of subjects), the National Archives Founders Online Project (the papers of Washington, Franklin, Adams, Jefferson, Madison, and Hamilton, including correspondence to them), and Hein Online’s Legal Database (cases, statutes, legislative debates, etc.).

The paper will showcase how typical tools of a corpus — concordance lines, collocation, clusters (or n-grams), and frequency data — can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, the paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years — commerce, public use, and natural born. We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism.

Larry Solum has predicted that “corpus linguistics will revolutionize statutory and constitutional interpretation.” Our paper seeks to chart out the first steps of that revolution so that others may follow.