The Limits of Corpus Linguistics for Legal Purposes: A Response to David Weisberg
Karen Sullivan

[Editor's Note:  For this guest post, we welcome Karen Sullivan, Senior Lecturer in Linguistics, University of Queensland, Australia.]

I appreciate David Weisberg’s post Corpus Linguistics and Heller in the The Originalism Blog responding to my Duke Law blog post on the Second Amendment. Weisberg’s post opens up an opportunity to clarify what corpus linguistics can and cannot do for legal interpretation. Weisberg seems misguided on minor points which may be relevant to the Second Amendment, which I note below. More generally, however, I agree with Weisberg that even rare senses of a word or construction cannot at present be ruled out in the legal interpretation of a given instance.

First, I would like to observe that Weisberg’s understanding of external causals apparently relies on the lone instance that I provide in my blog. This seems to have led to the mistaken impression that external causals must involve “the physical world”. Present-Day English examples of external causation include Stay away from the cliff because it’s dangerous or The virus is contagious so hygiene is important. External causation is entirely compatible with “abstract concepts”. I would recommend Jean-Cristophe Verstraete’s 1998 paper “A semiotic model for the description of levels in conjunction: external, internal-modal and internal-speech functional” (Functions of Language 5(2): 179-211), which provides tests for identifying different types of causation, as a starting point.

Second, I’m not surprised that my modern translations of the Second Amendment’s syntax sound unnatural to “a native speaker of American English” such as Weisberg. The premise of my post is that the Second Amendment cannot be read as if it were Present-Day English, so a literal translation of the Amendment is unlikely to roll off the tongue. This is particularly true because I left unchanged vocabulary such as militia and bear arms that were not the topic of the study. Regardless, I suspect that Weisberg was in fact able to process my translations. His criticism of the external reading noted above suggests that he understood that on an external reading, the right guaranteed by the Amendment would apply only when arms were kept and borne for the purpose of a well-organized militia, and not when arms were kept and borne for other purposes.

Returning to Weisberg’s main point, however, I agree that there is no guarantee that an external causal meaning of the Amendment was intended. Linguistics offers many clues as to a speaker’s potential meaning, but these may not be sufficient to narrow the interpretations down to one. However, linguistic information can (1) establish which interpretations are more likely, and (2) determine which interpretations are possible. I argue that the first of these functions will be helpful to legal interpretation only once legal scholars have agreed on a standard approach to probability. The second function, I argue, should be of immediate value.

Whereas the role of probability is well established in linguistics, my impression is that the legal profession lacks a shared understanding of probability. In linguistics and other sciences, measures of statistical significance determine which hypotheses are considered probable enough to be interesting or publishable, for example. In legal studies, there seems to be no predetermined level of probability that will either cause an interpretation to be accepted or rejected. Perhaps it is unrealistic to imagine that the legal community could reach a binding agreement that any interpretation with less than 1% probability would be discarded, for example. It would of course be necessary to also agree on what kind of corpus would be used and how probability would be determined. However, an agreement of this type would make corpus data phenomenally useful. Corpus results would settle debates rather than inflaming them.

Even without a general agreement of this kind, linguistics still offers the second function noted above, in that it can determine which interpretations are possible. If a particular sense of a word or construction has 0% probability based on the largest available corpus – the entire record of English – then I suggest it should be rejected as an interpretation. If legal scholars can accept this modest proposal, then corpus linguistics can immediately serve to eliminate many readings of words and phrases. For example, the Court’s opinion in Heller describes the Second Amendment clause headed by being as having a “clarifying function” for the main clause, a description that best fits an “addition/accompanying circumstance” clause as defined in Bernd Kortmann, Free adjuncts and absolutes in English: Problems of control and interpretation. (London: Routledge, 1991). However, being-clauses that precede a main clause, in all documented forms of English, ceased to permit this function around 1600. My 2018 search of three major English corpora, including over 3,000 instances of being, revealed not a single example of an initial being-clause with this function for almost two hundred years before the Second Amendment was written. (Karen Sullivan, “Being-clauses in historical corpora and the U.S. Second Amendment,” English Studies 99(3): 1-19 (2018)). Of course, there are English documents not covered in these corpora. If an instance of the function were found, then the probability would no longer be 0% and the interpretation would again be on the table. I invite supporters of the “addition/accompanying” interpretation to look through records from the appropriate time period. But if none of us can find even one example of this type, can the Heller interpretation be defended? If legal scholars cannot accept a complete lack of attestation in recorded English as evidence of non-occurrence, then there is no level of probability that they can agree on as convincing, and corpus results will not resolve any questions of interpretation.

Personally, I fail to understand how any claims to originalism can exist without historical corpus linguistics. You have to study a language variety before you can understand it. Nonetheless, a haphazard application of corpus linguistics is surely worse than none at all. I hope that legal scholars are prepared to do the hard yards of establishing a shared framework for the construal of probabilities. If not, then I agree with Weisberg that interpretations cannot be rejected on the basis of rarity.