Machine Learning, Causal Inference, and Normative Judgment

Scholars and judges have long made arguments about laws and regulations and justified their arguments with theories about the effects of these legal rules. What empirical law and economics scholars like to do is to evaluate these claims with data. Now a particularly challenging dimension of studying the effects of rules and regulations is that many other aspects of society change at the same time, so what we need are natural experiments.

Why is this even an important endeavor? There are jurists, such as Judge Richard Posner, who argue that understanding the empirical consequences of judicial decisions is important so that judges can make better cost-benefit utilitarian analyses. There are also jurists, such as Justice Stephen Breyer, who argue that understanding the consequences of their decisions is important so judges can make decisions that accord with the democratic will of the people.

Take, for example, a famous case Supreme Court case where the judges were debating whether to allow government expropriation of private land. There, the judges debated whether eminent domain would spur economic growth or increase inequality. Judges are constantly facing policy questions like this, but to date, they speculate on the potential effects of their decisions rather than relying on hard data.

The dominant legal theory in US courts, at least, is law and economics, which articulates deterrence as the primary explanation for societal response to law. But a large body of work in psychology and sociology suggests that laws can shape behaviors simply by telling individuals what is the right thing to do. From abolition of slavery, to women’s liberation, to environmentalism, the courts are speculated to play a key role in shaping values, yet little causal evidence exists to date. Whether courts shape values and in which direction is important to both arbitrate between competing theories about the effects of laws and as inputs into better judicial decisions.

There are three empirical challenges to identifying causal effects. First, legal decisions are not random. They’re endogenous to the societal trends that they potentially affect. So how do you determine between cause and effect? Second, there’s substantial cross-fertilization between different legal areas. Law students are sometimes taught to analogize from another legal area. Roe v. Wade was based on a part of the Constitution that used to govern government regulation of contracts. If many legal areas are changing at the same time, how do we know what is the causal effect of one legal area as opposed to another that can be changing at the same time. And third, there’s selection of cases into the courts. If the precedent is very strong and in favor of the plaintiff, then weaker cases on the merits may enter into the courts. According to some selection models, plaintiff win rates would be 50/50, challenging any inference of the effects of court decisions.

In medicine, they used to also theorize about the effects of medical interventions, but then there were clinical trials where you could randomize treatment and placebo. We can’t randomize judicial decision since that undermines the notion of justice and equal treatment before the law, but judges are randomly assigned and there’s substantial variation in how they decide—systematic habits or legal philosophies drive their decisions.

We can use the U.S. federal courts as a natural laboratory—the Supreme Court, the 12 regional circuit courts beneath them, the 94 district courts, and the judges making incremental common law decisions that are binding precedents within the region of the court. For the circuit level at least, 98% of their decisions are final, and we have random assignment of judges to panels of three drawn from a pool of 8 to 40 lifetime-appointed judges. What’s important for this empirical strategy is that the judge’s characteristics predict their decisions. This can be their biographical characteristics, but also how they write or how they cite. This is where machine learning comes in because there is a high-dimensional characterization of the judge’s thinking.

This is the core of a broader pipeline, where we might go from the district cases and district judges (also randomly assigned) that then lead to an appeal into the circuit courts. Practically speaking, how we can use machine learning to help automate this pipeline, to identify the nearest cases when a circuit judge is facing a difficult challenge. How can we automate the judge’s vote classification rather than relying on manually labelled cases? What if it’s not just the decision–for or against–but it’s the something about the reasoning or what the cases are citing that can influence the outcomes? For this and other projects (, I’ve been collecting all 380,000 cases, a million judge votes, 94 topics, from 1891 in US circuit courts, engineered two billion N-grams of length 8 and 5 million citation edges across cases, and linked it to the 268 life-tenured judges, 250 biographical features, a 5% random sample that was hand coded for 400 features, and 6000 cases hand-coded for meaning in 25 polarized legal areas.

I also have over a million criminal sentencing decisions that, via FOIA-request, was linked to the judge identities, 1300 judge biographies, and the digital corpus of their opinion since 1923. Through these kinds of data, which in aggregate comprise 4-5 terabytes, we can ask questions on normative judgments and social preferences that previously were restricted to the lab and estimate economic models in a highly non-parametric manner.

By Daniel Chen


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s