Thesis Title: Discourse Segmentation of Judicial Decisions for Fact Extraction
Date & Time: May 12th, 2021 @ 1:00 pm
Dr. Peter Rigby - Chair
Dr. Leila Kosseim - Supervisor
Dr. Peter Rigby - Examiner
Dr. Olga Ormandjieva - Examiner
In order to justify rulings, legal documents need to present facts as well as an analysis built thereon. However, identifying what a Fact is and what a non-Fact is, and how they relate in order to bring together a legal document is often not an easy task. Because of a number of reasons, such as the domain-specific definition of the term Fact, or the technical vocabulary that permeates the documents in which these are to be found, extracting the case-related facts from a legal document is usually time-consuming and expensive.
In this thesis, we present two methods to automatically extract case-relevant facts in French-language legal documents pertaining to tenant-landlord disputes. We base our approaches on the assumption that the text of a decision will follow the structural convention of first stating the facts and then performing an analysis based on them. This assumption is itself based on the widespread application of the IRAC legal document writing model and its many variants (Beazley 2018). Given a legal document, we perform text segmentation to extract the parts that contain the case-relevant Facts using two different approaches based on neural methods commonly used in Natural Language Processing and a novel heuristic method based on the density of Facts in a segment of the text.
Our two approaches are based on the representation of legal texts as binary strings, where contiguous subsequences of 1's represent sentences containing Facts and, conversely, contiguous subsequences of 0's correspond to sentences containing non- Facts. The first approach consists of classifying each sentence in the document as either Facts or non-Facts using an ensemble model of independent word embeddings (Mikolov 2013), GRU networks (Cho 2014) and Convolutional Neural Networks (Kim 2014). The second approach consists of a contextual classification of sentences as either class using recurrent architectures to create the binary string representation of the document; we experiment using LSTM networks (Hochreiter 1997), GRU networks, and Attention Encoder-Decoder models (Bahdanau 2014). The segmentation is carried out by introducing the concept of purity, a measure of the density of facts in a given subsequence in the binary string; the facts are extracted by maximising both the length and the purity of the substring containing the facts.
Extrinsic evaluations of both approaches show that the second approach outperforms the first by producing the greater number of documents whose segmentation point is predicted within a single sentence of difference from the one indicated by the gold standard. Nevertheless, a significant percentage of segmentation points are underestimated by being predicted more than four sentences away from the point determined by the gold standard.