Legal text analytics: some thoughts on where (I think) things stand

Researching the area of artificial intelligence and the law (AI & Law) has currently taken me to the complexities of natural language processing (NLP) applied to legal texts (aka legal text analytics). Trying to understand the extent to which AI can be used to perform automated legal analysis—or, more modestly, to support humans in performing legal analysis—requires (at least) a view of the current possibilities for AI tools to (i) extract information from legal sources (or ‘understand’ them and their relationships), (ii) assess their relevance to a given legal problem and (iii) apply the legal source to provide a legal solution to the problem (or to suggest one for human validation).

Of course, this obviates other issues such as the need for AI to be able to understand the factual situation to formulate the relevant legal problem, to assess or rank different legal solutions where available, or take into account additional aspects such as the likelihood of obtaining a remedy, etc—all of which could be tackled by fields of AI & Law different from legal text analytics. The above also ignores other aspects of ‘understanding’ documents, such as the ability for an algorithm to distinguish factual and legal issues within a legal document (ie a judgment) or to extract basic descriptive information (eg being able to create a citation based on the information in the judgment, or to cluster different types of provisions within a contract or across contracts)—some of which seems to be at hand or soon to be developed on the basis of the recently released Google ‘Document Understanding AI’ tool.

The latest issue of Artificial Intelligence and the Law luckily concentrates on ‘Natural Language Processing for Legal Texts’ and offers some help in trying to understand where things currently stand regarding issues (i) and (ii) above. In this post, I offer some reflections based on my understanding of two of the papers included in the special issue: Nanda et al (2019) and Chalkidis & Kampas (2019). I may have gotten the specific technical details wrong (although I hope not), but I think I got the functional insights.

Establishing relationships between legal sources

One of the problems that legal text analytics is trying to solve concerns establishing relationships between different legal sources—which can be a partial aspect of the need to ‘understand’ them (issue (i) above). This is the main problem discussed in Nanda et al, 'Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives' (2019) 27(2) Artificial Intelligence and Law 199-225. In this piece of research, AI is used to establish whether a provision of a national implementing measure (NIM) transposes a specific article of an EU Directive or not. In extremely simplified terms, the researchers train different algorithms to perform text comparison. The researchers work on a closed list of 43 EU Directives and the corresponding Luxembuorgian, Irish and Italian NIMs. The following table plots their results.

Nanda et al (2019: 208, Figure 6).

The table shows that the best AI solution developed by the researchers (the TF-IDF cosine) achieves levels of precision of around 83% for Luxembourg, 77% for Italy and 68% for Ireland. These seem like rather impressive results but a qualitative analysis of their experiment indicates that the significantly better performance for Luxembourgian transposition over Italian or Irish transposition likely results from the fact that Luxembourg tends to largely ‘copy & paste’ EU Directives into national law, whereas the Italian and Irish legislators adopt a more complex approach to the integration of EU rules into their existing legal instruments.

Moreover, it should be noted that the algorithms are working on a very specific issue, as they are only assessing the correspondence between provisions of EU and NIM instruments that were related—that is, they are operating in a closed or walled dataset that does not include NIMs that do not transpose any of the 43 chosen Directives. Once these aspects of the research design are taken into account, there are a number of unanswered questions, such as the precision that the algorithms would have if they had to compare entire NIMs against an open-ended list of EU Directives, or if they were used to screen for transposition rules. While the first issue could probably be answered simply extending the experiment, the second issue would probably require a different type of AI design.

On the whole, my impression after reading this interesting piece of research is that AI is still relatively far from a situation where it can provide reliable answers to the issue of establishing relationships across legal sources, particularly if one thinks of relatively more complex relationships than transposition within the EU context, such as development, modification or repeal of a given set of rules by other (potentially dispersed) rules.

Establishing relationships between legal problems and legal sources

A separate but related issue requires AI to identify legal sources that could be relevant to solve a specific legal problem (issue (ii) above)—that is, the relevant relationship is not across legal sources (as above), but between a legal problem or question and relevant legal sources.

This is covered in part of the literature review included in Chalkidis & Kampas, ‘Deep learning in law: early adaptation and legal word embeddings trained on large corpora‘ (2019) 27(2) Artificial Intelligence and Law 171-198 (see esp 188-194), where they discuss some of the solutions given to the task of the Competition on Legal Information Extraction/Entailment (COLIEE) from 2014 to 2017, which focused ‘on two aspects related to a binary (yes/no) question answering as follows: Phase one of the legal question answering task involves reading a question Q and extract[ing] the legal articles of the Civil Code that are relevant to the question. In phase two the systems should return a yes or no answer if the retrieved articles from phase one entail or not the question Q’.

The paper covers four different attempts at solving the task. It reports that the AI solutions developed to address the two binary questions achieved the following levels of precision: 66.67% (Morimoto et al. (2017)); 63.87% (Kim et al. (2015)); 57.6% (Do et al. (2017)); 53.8% (Nanda et al. (2017)). Once again, these results are rather impressive but some contextualisation may help to assess the extent to which this can be useful in legal practice.

The best AI solution was able to identify relevant provisions that entailed the relevant question 2 out of 3 times. However, the algorithms were once again working on a closed or walled field because they solely had to search for relevant provisions in the Civil Code. One can thus wonder whether algorithms confronted with the entirety of a legal order would be able to reach even close degrees of accuracy.

Some thoughts

Based on the current state of legal text analytics (as far as I can see it), it seems clear that AI is far from being able to perform independent/unsupervised legal analysis and provide automated solutions to legal problems (issue (iii) above) because there are still very significant shortcomings concerning issues of ‘understanding’ natural language legal texts (issue (i)) and adequately relating them to specific legal problems (issue (ii)). That should not be surprising.

However, what also seems clear is that AI is very far from being able to confront the vastness of a legal order and that, much as lawyers themselves, AI tools need to specialise and operate within the narrower boundaries of sub-domains or quite contained legal fields. When that is the case, AI can achieve much higher degrees of precision—see examples of information extraction precision above 90% in Chalkidis & Kampas (2019: 194-196) in projects concerning Chinese credit fraud judgments and Canadian immigration rules.

Therefore, the current state of legal text analytics seems to indicate that AI is (quickly?) reaching a point where algorithms can be used to extract legal information from natural language text sources within a specified legal field (which needs to be established through adequate supervision) in a way that allows it to provide fallible or incomplete lists of potentially relevant rules or materials for a given legal issue. However, this still requires legal experts to complement the relevant searches (to bridge any gaps) and to screen the proposed materials for actual relevance. In that regard, AI does hold the promise of much better results than previous expert systems and information retrieval systems and, where adequately trained, it can support and potentially improve legal research (ie cognitive computing, along the lines developed by Ashley (2017)). However, in my view, there are extremely limited prospects for ‘independent functionality’ of legaltech solutions. I would happily hear arguments to the contrary, though!