Creating reliable econometric models of the CJEU case law: a response to criticisms (by Arrebola, Mauricio & Jimenez)

One of the most satisfactory activities in academia is to engage in debate and discussion. Only by subjecting ideas to tough scrutiny can we advance in our knowledge. Thus, I am extremely pleased that Carlos Arrebola, Julia Mauricio and Hector Jimenez have reacted so quickly to my criticism of their recent paper (here) and come back with a thoughtful and forceful rebuttal. I am posting it below. You will see that there are important points of disagreement that will probably require two (or more) follow-up studies in the future. Seems like I need to brush up my econometrics...

Creating reliable econometric models of the CJEU case law:
a response to Sanchez-Graells’ criticisms

by Carlos Arrebola, Julia Mauricio and Hector Jimenez

In a recent study, we used econometric methodology to quantify the degree of influence of the Advocate General on the Court of Justice. Based on data collected from 20 years of actions for annulment, we concluded that the Court is 67% more likely to annul an act if the Advocate General suggests so in her opinion. In a post last Tuesday, Sanchez-Graells examined our paper. As he said, our conclusion is ‘bold [...] and controversial [for its] implications’, and as such it should be subject to ‘tough scrutiny’. We most definitely agree on both the importance of our claim and the need to test it rigorously. As we stated in our paper, if the conclusions are true, the role of the Advocate General within the Court might need to be reconsidered in order to secure judicial independence.

However, Sanchez-Graells voiced several criticisms regarding our econometric model that prevent him from accepting the validity of our results. We greatly welcome the debate, and appreciate the comments in his post, although we ultimately disagree. While we acknowledge that quantitative methodology is not perfect, we argue that our results are a reliable estimation of the influence of the Advocate General (hereinafter, “AG”) on the Court. If not in the specific number of 67% increased probability of a judicial outcome, our results are at least an indication that the influence relationship is positive, as it is shown by the six different econometric models estimated in our study. In the spirit of discussion and debate of this blog, we address Sanchez-Graells’ criticisms along with several other factors that, in our opinion, should have been taken into account when assessing our paper’s reliability.

1. The impossibility of using Randomised Controlled Trials

In his post, Sanchez-Graells suggests that we were too quick to discard the possibility of testing the hypothesis of the influence of the AG on the Court using Randomised Controlled Trials (“RCTs”). For a layperson, RCTs are the type of scientific methodology used in many areas of science to study causality. One of the main examples where RCTs are used is medicine. In order to prove the validity of a new drug, several groups of patients with similar features are randomly selected. Normally, one of those groups would be the control group. The control group would receive a placebo, instead of the actual drug. In this way, the researchers can easily infer whether the health outcome is caused only by the drug. If both the group taking the placebo and the group taking the drug had the same reaction, it would be clear that some external factor other than the drug had caused it. If, on the other hand, the group taking the drug and the placebo group reacted differently (for example, in the case of an illness, if the group taking the drug was the only one to recover), it could be said with certain confidence that the drug caused the recovery.

In our paper, we suggested that RCTs are not a possibility because it would require using the Court of Justice as a laboratory, experimenting with cases, judges and AGs. Nevertheless, Sanchez-Graells argued that we should have considered those cases in which the AG does not participate as our “control group”. This is a misconception about how RCTs are designed. A vital feature in the design of RCTs is making sure that the observations that included in the sample are randomly drawn. This is because, ideally, you would like every observation to be identical, so that the only factor that affects it is the treatment that you are examining in the experiment. In the case of medicine-related RCTs, you want patients with the same characteristics, symptoms, etc., so that whatever happens after taking the drug can only be traced back to the drug. In our study, we would need the same case to be repeated several times, with the same legal problem to be solved by the same judges, having access to the same amount of precedent, lawyers with the same ability to plead cases, etc. Only having that could we then observe what would happen if we took the element of the Advocate General out of the equation. However, cases are never the same. Unlike illnesses, where patients tend to have the same symptoms, cases are much more complex. Legal problems rarely have the same surrounding circumstances.

So, if we followed Sanchez-Graells’ suggestion, we would be ignoring a set of external factors that actually affect the outcome of a case. We would be wrongly attributing it to the Advocate General’s intervention, when actually it could be something else. That is, if we had two cases, one with an AG’s opinion, and one without, in which the Court reached different results, we could not say that the Advocate General caused that different result. It could be that the case had different facts, and that is why the Court decided differently. Or, it could well be that the judges were presented with different arguments by the parties, and it was the lawyers, and not the AGs, who persuaded the Court. Furthermore, Sanchez-Graells’ suggestion is unfeasible because there is a clear bias. As he explained, the cases in which the CJEU considers that there are not going to be problematic legal issues, they decide not to have an AG opinion. It means that from the very beginning of the case they are sensing that it might have an easy or clear legal solution. In other words, Sanchez-Graells is suggesting that we compare in our analysis a simple cold, with a more complicated condition, such as cancer, and that we can thus establish whether radiotherapy has any impact on health. The outcome to such a query would have a misleading result, because the colds would have a rate of recovery close to 100%, whether the cancer would be lower. However, that would not tell us anything about the effectiveness of radiotherapy. In the same way, if a case deals with unproblematic legal issues, the opinion of the AG will probably not do much to affect the Court, because the Court would have come to that conclusion by itself without any external influence. We cannot simply compare those two scenarios without losing information. After all, there would not be any “random” selection of groups, clearly not fulfilling the requirements to conduct a RCT.

For that reason, the only way to approximately estimate causality is to use regressions, in which you can account for as many variables as possible that may influence the Court, including the Advocate General, and including variables that will account for how easy it is to solve a case or clear a case is. That way we will know the exact magnitude of the variable AG on the Court.

2. Designing a reliable regression

Once we establish that the most accurate measure is a regression model accounting for variables that affect the outcome of the Court, the difficulty arises in deciding which variables to include and how to code them. It is in this respect that we think Sanchez-Graells raises his most valid criticism of our study. We acknowledge that our variables are not perfect. We will never be able to establish causality without a shadow of a doubt. This is simply because, as we said, we will always miss variables that affect the case that we will not be able to track, codify and insert in our database. Taking this to an extreme and absurd example, we will never be able to verify whether the judge in the deliberating room had a headache and wanted to go home soon, rushing her decision. However, the fact that we will always miss variables does not mean that our model cannot be reliable. We still include a number of important variables that can explain a substantial amount of what goes on in the courtroom. There are different ways in econometrics to determine the extent to which a model, albeit missing variables, is an accurate depiction of reality. For our study, these measures suggest that the model is indeed reliable. We will come back to this in a moment.

Another aspect of coding variables is, as Sanchez-Graells comments, the oversimplification. In our study, we used actions for annulment, where the outcomes of a case can be (i) annulment, (ii) partial annulment, (iii) dismissal of the case, or (iv) inadmissibility of the case. We decided to simplify this variable by looking only at whether the Court decided to annul (in any of its forms) or not. But, the oversimplification is necessary to make it more reliable, because in order to have a dataset capable of yielding significant results, we need to have a representative sample. In our case, we only had data for a very small number of partial annulments. Including them as a separate variable from total annulment would have only created “noise” in our model, making the results less significant, statistically speaking.

Sanchez-Graells especially criticises our grouping of dismissal and inadmissibility cases together, because he says that dismissing a case and declaring it inadmissible are very different things. However, that discussion in his post is unnecessary, because as he himself notes later on, our results ‘cannot be interpreted regarding inverse AG recommendations (ie recommendations to inadmit/dismiss)’. Our results are only relevant for decisions to annul or partially annul; we do not make any claim about other type of cases, which Sanchez-Graells also criticises.

However, the fact that we decided to look at the question in terms of what happens if the AG suggests to annul the act, rather than if she suggests to dismiss it or declare it inadmissible, does not affect the reliability of our results. In fact, the only thing that Sanchez-Graells is postulating is a new hypothesis. He is saying that, in his opinion, we would have got other results if we had constructed the model differently. That is a point that we cannot falsify without fiddling for a few more weeks with our data in the econometrics software. But, we invite people, and we ourselves may do it in the future, to carry out other studies, with the same or different data to check that the results are not affected if we look at things in a different way; by, for example, looking at what happens if the AG suggests dismissal, or what happens if we gather data from other periods of time. Nonetheless, the reliability of the results that we presented is a separate issue.

So, if we have acknowledged that we are not going to be able to include every variable, and that our data is only a sample, why are we confident in our results? In the paper we explain it more technically, but, basically, there are econometric measures that indicate that the model that we have created is accurate when the estimation that we get from the model is compared with actual data from reality. That is the reason why we know it is a fairly reliable model.

3. Final caveat

Whilst reading Sanchez-Graells’ words, we could not avoid feeling something we felt many times before. Lawyers are more comfortable sticking to arguing with words.  We feel somehow threatened by this terra incognita called econometrics. There seems to be a certain reticence to attempting to use mathematics to help us in our enquiries. It is worth saying that we are not accusing Sanchez-Graells of not wanting to engage with quantitative methodology. In fact, we know that he has used some statistics previously, and we would not expect a “more economic approach” type of person to disregard this evidence-based methodology.

We want to end this post with a final note about quantitative methodology. We want to say that although judicial proceedings and legal arguments cannot always be equated to numbers, and other methodologies are extremely valuable to legal research questions, quantitative analysis can help elucidate complex legal questions. As many other subjects in social sciences did before us, statistics can become a tool at the service of legal researchers. In this sense, it is worth reminding the readers that, a few centuries ago, economics was equally a merely discursive subject, and anyone who has read the Wealth of Nations can be a witness to that.  But, now, economics and mathematics cannot be separated. Therefore, we would encourage researchers to embrace statistics and econometrics, and see how they can help with their enquiries. Quantitative analysis tries to be evidence-based and objective. Therefore, anyone who believes in the benefits of science will prefer a claim based on quantitative methodology to a hypothesis made, to follow the words of Sanchez-Graells, on the basis of ‘anecdotal impression’.