Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL
Main content 1 Introduction 2 Concepts 3 How it do
1. Introduction SEAL (Set Expander for Any Language) is a set expansions system that accepts input elements (seeds) of some target set S and automatically finds other probable elements of S in semi-structured documents such as web pages. CSEAL (Coupled SEAL) is a SEAL systems which is added 2 constrants: mutual-exclusion type-checking constraints
1. Introduction Coupled SEAL : A semi-structured extractor SEAL: use wrapper induction algorithm Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables 5 queries/category 10 queries/relation fetches 50 web pages/query Rank by probabilities assigned as in CPL
1. Introduction Beliefs CSEAL New candidate facts Internet
1. Introduction Beliefs Candidate facts Knowledge Integrator CPL RL CMC CSEAL Data Resources Knowledge Base Subsystem Components
Example
2. Concepts Seed: input element Wrapper: defined by 2 character strings, which specify the left-context and right-context necessary for an entity to be extracted from a page. These strings are chosen by 2 conditions: Maximally-long contexts At least 1 occurrence of every seed strings on a page
Example
3. How it do
References Toward an Architecture for Never-Ending Language Learning ( Language-Independent Set Expansion of Named Entities using the Web ( Coupled Semi-Supervised Learning for Information Extraction ( Character-level Analysis of Semi-Structured Documents for Set Expansion (