Hefei Normal University A Discourse Analysis Based Approach to Automatic Information Tagging in Chinese Judicial Discourses Bo SUN Hefei Normal University
Contents 1. Introduction 2. Information Features of Judicial Discourses 3. Information Identification based on Information Rules 4. Evaluation of Automatic Information Identification 5. Limitations and Suggestions for Further Research
Why information identification? 1-1 Research Background Why information identification? Become more and more important; Case guidance has been introduced into Chinese legal system; Provide reference for parties in a lawsuit; Litigants, attorneys, judges Facilitate legal writing education Law school students and teachers
Why automatic information identification? Information identification is a necessary nuisance in reading judicial discourse. not all the information is needed; time-consuming and labour-exhausting;
Search engines usually present the whole discourse, not the very information in need. 百度 Bing Google … Such deficiency can also be found in some famous databases, such as “北大法宝”.
Some legal databases are so complicated that users need to be trained, such as Westlaw (Dai, 2013).
Discourse Information Theory Research on mass communication generalizes information by five interrogative words, namely “who”, “says what”, “to whom”, “in what channel”, and “with what effect” . (Lasswell, 1948) This division is developed by adding “under what circumstances” and “for what purpose”. (Braddock (1958)) This treatment of information accords with our intuitive understanding of information, but it outlines only the linear structure of language information.
Discourse Information Theory pays more attention to the hierarchical structure of discourse and perceives clause as the basic unit, which is called information unit (IU). IUs can be represented by 15 interrogative expressions: What Attitude (WA), What Basis (WB), What Condition (WC), What Effect (WE), What Fact (WF), What Change (WG), How (HW), What Inference (WI), What Judgement (WJ), When (WN), Who (WO), What Disposal (WP), Where (WR), What Thing (WT), and Why (WY) [7].
Example: 1WB|In accordance with Articles 348 and 357 of the Criminal Law of People’s Republic of China, the decision is made as follows.2WJ||The defendant Liu X is guilty of the crime of illegally holding drugs, 3WP|||so he is sentenced to nine months’ imprisonment and 4WP|||fined 3000 Yuan.
1-2 Research Objective and Research Questions to develop an automatic approach to identification discourse information in Chinese judicial texts Automatic Method Legal Discourse User
Research questions: What are the information features of Chinese judicial discourse? What information processing rules can be constructed according to the features? How does computer perform in information identification when equipped with the rules?
2. Information Features of Judicial Discourses
Inter-type constancy Macro features Micro features Initiating with kernel information; Ending with formulation date; Setting formulator prior to formulation date Micro features Words that occur in more than 80% of the collected texts provide a clue for finding constant elements
Intra-type Constancy Macro features Micro features Same keyword in kernel information Obligatory upper-level information units Obligatory lower-level information units Micro features Similar processes, participants and circumstances are likely to be shared by obligatory upper-level units
Intra-type variation Macro features Optional upper-level information units Optional lower-level information units Micro features (shared by optional units) Process elements in optional upper-level units tend to be implicit, while those in optional lower-level units may overlap Participant elements can be shared by both upper and lower level units Few common circumstance elements can be found
3. Information Processing Rules Automatic processing Automatic text classification Automatic identification of upper-level units Automatic identification of lower-level units
(Halliday & Matthiessen, 2004) Automatic text classification 《刑事判决书的写作要领》 (Halliday & Matthiessen, 2004)
Automatic text classification Key word in kernel information Process elements in obligatory upper-level units as the primary variable Participant as a substitute for process The order of these elements
Pretreatment of discourse information Information Element Participant Process Circumstance
Identification Rules of Information Units Three dimensions of the identification rules Which elements should be selected? What’s the logical relationship among the selected elements? How they stand to each other?
Identification rules (upper-level units) 1) Combination of process and participant as the main indicator a) The synonyms of process verbs or participant nouns should be included. b) Where one IU contains more than one common process verb, all these verbs probably need to be considered. 2) Absolute position as the complementary indicator
Identification rules (lower-level units) Obligatory lower-level information items as the default value Process plus participant as the main indicator Relevant position as the alternative indicator
Realization of automatic information identification
4. Evaluation of Automatic Information Processing The information rule based text classification vs. an SVM classifier preprocessing includes word segmentation and stop-word removal, with only nouns and verbs kept we represent these words with Vector Space Model and calculate their tf-idf value 1000 features with the highest 2 values are selected Directed Acyclic Graph SVMs (DAGSVM) is adopted for this multiclass classification
The information rule based information identification vs The information rule based information identification vs. a Viterbi Identifier λ = (Q, O, A, B, ) Q denotes the information category; O is the category of process verbs; A means the probability of information category transition; B refers to the likelihood that a category of process verb occurs in the case of a given information category; is the initial probability of information category.
SVM and Viterbi algorithm are very successful in NLP. Their weak performance might be attributed to the feature (observation) selection. Both of them ignore the hierarchical structure of discourse.
5. Limitations and Suggestions for Further Research More types of legal texts should be explored The rules require a very detailed discourse analysis Suggestions for further research: The rule-based approach can cooperate with statistically based approaches
Thank You !