« Richard Primus: Enumerated Powers and the Bank of the United States
Michael Ramsey
| Main | Mark Pulliam: Unleashing the "Least Dangerous" Branch
Michael Ramsey »


Corpus Linguistics Arrives At the Supreme Court
James A. Heilpern

[Editor's note:  For this guest post we welcome James Heilpern, Law & Corpus Linguistics Fellow at the J. Reuben Clark Law School, Brigham Young University.]

As Mike Rappaport explained in a previous post, “corpus linguistics is a part of linguistics which uses databases and sophisticated software to study the ‘real life’ use of language.” In 2011, Justice Thomas Lee of the Utah Supreme Court became the first judge in the country to employ this methodology in a judicial opinion. Although a number of judges around the country have followed suit, thus far all of the opinions citing corpus data have focused on statutory or trademark disputes. That changed last week when Justice Thomas issued not one, but two opinions influenced by corpus linguistics, marking a historic step forward for proponents of data-driven originalism.

First, on Thursday the Court released its opinion in Lucia v. SEC, declaring that administrative law judges were “officers” subject to the Appointments Clause of the U.S. Constitution. The 7-2 majority―authored by Justice Kagan―reiterated the two prong test articulated in past case law for determining whether a federal official is an officer or mere employee: (1) is the position “continuing and permanent” and (2) does the official “exercise significant authority pursuant to the laws of the United States.”  If the answer to both questions is yes, then the official is an “Officer of the United States” and must be appointed by the President, the Courts, or a Department Head. The majority recognized that the second prong, in particular, was rather vague and subject to various “glosses” but declined to “refine or enhance the test.” Instead, they found Lucia to be virtually indistinguishable from Freytag v. Commissioner, 501 U.S. 868 (1991), where the Court held that Special Tax Judges were subject to the Appointments Clause.

In his concurring opinion, Justice Thomas (joined by Justice Gorsuch) explained that he felt the Court should have provided more guidance for future cases: “While precedents like Freytag discuss what is sufficient to make someone an officer of the United States, our precedents have never clearly defined what is necessary.” He then dove into an analysis of “the original public meaning of ‘Officers of the United States.’” Although he did not explicitly cite corpus data, he cited Jennifer Mascott’s recent Stanford Law Review article four times which did, endorsing the conclusion that “[t]o the Founders, this term encompassed all federal civil officials with responsibility for an ongoing statutory duty.”

This tepid reliance on corpus data in a Supreme Court opinion was considered a major victory in its own right for proponents of data-driven originalism. But the following day, Justice Thomas took it a step further and cited corpus linguistics directly in his dissent in Carpenter v. United States. Carpenter asked whether the Fourth Amendment prevented law enforcement officials from obtaining (without a warrant) personal location information (i.e. GPS coordinates) stored by third-party cell phone providers. A 5-4 majority―authored by the Chief and joined by Justices Ginsburg, Breyer, Sotomayor, and Kagan―concluded that it did because Carpenter had a “reasonable expectation of privacy” to his physical location.

Justice Thomas dissented on originalist ground, taking onus with the Court’s line of cases protecting a person’s “reasonable expectation of privacy”―as first articulated in Justice Harlan’s concurrence in Katz. As Justice Thomas explained, “[t]he most glaring problem with this test is that it has ‘no plausible foundation in the text of the Fourth Amendment’” and thus “distorts the original meaning of ‘search’―the word in the Fourth Amendment that it purports to define.” To prove this point, he cites definitions for the word “search” from a number of historical dictionaries and notes that “[t]he phrase ‘expectation(s) of privacy” does not appear in the pre-Katz federal or state case reporters, the papers of prominent Founders, early congressional documents and debates, collections of early American English texts, or early American newspapers.” As evidence, he cites six different electronic databases containing documents contemporaneous (or nearly contemporaneous) to the Constitution: Founders Online, the Library of Congress’s Century of Lawmaking for a New Nation; BYU’s Corpus of Historical American English (COHA) and American Google Books corpus; BYU Law’s new Corpus of Founding Era American English (COFEA); and Readex’s database of Early American Newspapers. It is worth noting that this evidence was not cited by any of the parties or amici, meaning that Justice Thomas―like his former clerk Justice Lee of Utah―engaged in corpus analysis sua sponte![1] That said, it is difficult to miss the fact that three of the databases―Founders Online, COFEA, and Readex―were cited in Mascott’s Stanford article that formed the basis of his Lucia opinion.

* *  *

Although last week marked the first time a Justice was willing to consider corpus data to answer a constitutional question, at least two other Justices have shown a willingness to consider (and perhaps seek out on their own) data drawn from electronic databases in the past. Clear back in 1997, Justice Breyer performed a corpus-like search of “computerized newspaper databases”  in Muscarello v. United States to help identify the ordinary meaning of the phrase “carry a firearm” as used in 18 U.S.C. § 924(c)(l). Then during the oral argument of FCC v. AT&T in 2011, Justice Ginsburg favorably referenced[2] corpus data provided to the court in an amicus brief submitted by the Project on Government Oversight. Although Chief Justice Roberts eventual opinion did not explicitly cite corpus linguistics, its reasoning tracked that of the brief Justice Ginsburg cited.  By my count that makes at least five justices[3]―Thomas, Gorsuch, Roberts, Ginsburg, and Breyer―showing some openness to such linguistic data, and at least two willing to engage in such research sua sponte. Litigators should take note and hasten to oblige.


[1] Michael Varco did cite Founders Online in his amicus brief, but did not engage in anything like corpus linguistics.

[2] See Transcript of Oral Argument in No. 09-1279 at 37

[3] Maybe six―Justice Kennedy joined Justice Breyer’s majority opinion in Muscarello.