Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras.

Slides:



Advertisements
Similar presentations
1.Read the project carefully 2.Write down what you have to do 3.Include all secondary items Secondary items are items that support your project.
Advertisements

Communicating Information: Web Design. It’s a big net HTTP FTP TCP/IP SMTP protocols The Internet The Internet is a network of networks… It connects millions.
Search Bootstrapping How / Where to get started. Crawling Start with Nutch – Index directly to SOLR –
Focused Crawling in Depression Portal Search: A Feasibility Study Thanh Tin Tang (ANU) David Hawking (CSIRO) Nick Craswell (Microsoft) Ramesh Sankaranarayana(ANU)
Classifying University Web Pages According to Academic Field Richard Wang Tim Isganitis 01/26/ Read the Web: Project Proposal.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Scone Knowledge Base Scott E. Fahlman April 14, 2005 School of Computer Science.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
11 October HTML: Links and Forms. Agenda News: William Knight Review of HTML Pages Meeting sheet passed HTML Links Networking and the Internet HTML Forms.
Extracting Academic Affiliations Status Report Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Conference and Workshop Tracker Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.
Relation Extraction for Academic Collaboration Project Proposal Justin Betteridge, Matthew Bilotti, Simon Fung, Sophie Wang Jan 26, 2006.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Computer Science in a Nutshell Eugene Fink
Ensemble Solutions for Link-Prediction in Knowledge Graphs
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
Web Quest WebQuest Today’s classroom activity is a webquest. A webquest utilizes the Internet to provide a guided lesson online. We are attempting.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Ta Nha Linh 1TIM13 March 2009 Harvesting useful information on researchers' home pages Ta Nha Linh Supervisor: Asst. Prof. Min-Yen Kan.
Content analysis and CERN Roman Chyla. Artificial intelligence Natural language processing Web of data Content analysis.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
Roya Zandi Associate professor Department of Physics and Astronomy UCR & A former UC Presidents’ postdoctoral Fellow Department of Chemistry and Biochemistry.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
CAREER PATHWAYS MAPS WEST LOS ANGELES Accelerated College Transfer & WEEKEND COLLEGE.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Dr. Tian-You Yu Associate Professor, School of Electrical & Computer Engineering and Atmospheric Radar Research Center (ARRC) Adjunct Associate Professor,
 A search agent scours the entire web.  Constantly Evolving and Expanding.
READ THESE Instructions! The following presentation is intended to help you think critically and intelligently about websites. To move from one page to.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Ta Nha Linh 1TIM13 March 2009 Harvesting useful information on researchers' home pages Ta Nha Linh Supervisor: Asst. Prof. Min-Yen Kan.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Multiplication Facts. 9 6 x 4 = 24 5 x 9 = 45 9 x 6 = 54.
Running External Applications 1. There are times when you may wish to run an external application from a VB application. 2 External Applications.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Multiplication Facts. 2x2 4 8x2 16 4x2 8 3x3.
Multiplication Facts Review: x 1 = 1 10 x 8 =
N EVER -E NDING L ANGUAGE L EARNING (NELL) Jacqueline DeLorie.
EVALUATING SOURCES HOW DO I DETERMINE A RESOURCE’S CREDIBILITY?
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Info Tech. Period 3.  Computer Programmers write computer languages like JavaScript and Jscript  They Debug programs by testing and finding errors.
Definition, purposes/functions, elements of IR systems Lesson 1.
By: Felix Andino Period 2. Most employers prefer applicants with a bachelor’s degree. Experience with a variety of computer systems and technologies.
Multiplication Facts All Facts. 0 x 1 2 x 1 10 x 5.
Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.
Einat Minkov University of Haifa, Israel CL course, U
Networking Objectives
Multiplication Facts.
WEB SPAM.
AACRAO-AICE SURVEY.
Introducing the World Wide Web
APIS New Austrian Prosopographical Information System
FATMA ISMED K1-09 Websites in ELT.
Multiplication Facts.
Understanding the internet
Looking for the following people:
ET-710 Web Technology: Building and Maintaining Web Sites
OMICS Journals are welcoming Submissions
Introduction Task: extracting relational facts from text
Unit 3 Test Building a Web Site Test.
Understanding the Features of a Web Site
Multiplication Facts.
Students: Sahar Elhayani, Koby Cohen and Daniel Sar Israel
Learning to Rank Typed Graph Walks: Local and Global Approaches
Internet Vocabulary Terms
Presentation transcript:

Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras

Determine academic institutions with which a professor is or has been affiliated –Where degrees earned –Previous affiliations, including post-doc –Current affiliation Why would this be useful? –Studying social networks in academia –Person entity disambiguation The Problem

Knowledge We Will Learn Example text rules to be learned: –If string=“ received his in from ”, Then: 'Affiliated(, )‘ –If string=“,, ” on ’s home page, Then: 'Affiliated(, )'” Class of beliefs to be learned: –Affiliated(, )

Sources of redundant information URL of professor’s personal home page (e.g., Text found on multiple web pages, especially in resume, CV, or biography section of personal home pages Links incoming and outgoing from personal home pages

Additional information Dictionary of institution names Dictionary of degrees –E.g. Ph.D., B.S., B. Tech., etc Map of domain names to institution names –E.g cmu.edu -> Carnegie Mellon University –This could be learned but we will leave that for another group!

Bootstrapping Logistics Start with a few seed rules and seed facts Use these rules to learn more facts, these facts to learn more rules, etc etc!

Our seed facts Affiliated(, )

Our seed rules If URL of personal web page is in the academic URL dictionary, then believe Affiliated(, ) If looking at a resume or personal web page and any of the patterns below are found, then believe Affiliated(, ): –".. –". ” –", ” –" received from "

Algorithm walk-through 1) Start with known belief Affiliated(William Cohen, Duke University) 2) Extract sentences from William Cohen web page that contain "William Cohen" and "Duke" a. Found pattern "William Cohen received his bachelor's degree in Computer Science from Duke University in 1984 ” b. Learned new pattern "received from ”

Walk-through continued 3)Search for new web pages matching our pattern "received his degree from” a. Found example: "Adnan Darwiche is an Associate Professor of Computer Science at UCLA, having received his PhD and MS degrees in Computer Science from Stanford University” b. Extracted belief Affiliated(Adnan Darwiche, Stanford University)