Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI.

Slides:



Advertisements
Similar presentations
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
Advertisements

Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
Copyright Policy Copyright Cathy O’Bryan This work is the intellectual property of the author. Permission is granted for this material to be shared.
1Key – Report Creation with DB2. DB2 Databases Create Domain for DB2 Test Demo.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
The Sixth Form College Farnborough Online Applications David Woodward & Mark Embling The Sixth Form College Farnborough.
Query Manager. QM is a collection of tools you can use to obtain information from the AS/400 database Used to –select, arrange, and analyze information.
Information Retrieval in Practice
Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University.
A Tool to Support Ontology Creation Based on Incremental Mini- Ontology Merging Zonghui Lian Data Extraction Research Group Supported by Spring Conference.
Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University.
Co-Directors: Yigal Arens USC / Information Sciences Institute Judith Klavans Columbia University.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Columbia University Dept of Computer Science Center for Research on Info Access University of So. Calif Information Sciences Institute (ISI)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Interfaces for Querying Collections. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting.
A Portal for Access to Complex Distributed Information about Energy Jose Luis Ambite, Yigal Arens, Eduard H. Hovy, Andrew Philpot DGRC Information Sciences.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Copyright Statement © Jason Rhode and Carol Scheidenhelm This work is the intellectual property of the authors. Permission is granted for this material.
Overview of Search Engines
WSV314. MAP 5.5 Internet ExplorerWindows 7 Software Usage Tracking Heterogeneous Server & Database Inventory Windows Server 2008 R2 Hyper-V SQL Server.
AERO Meeting | September 24, 2009 EthicShare: Building an Inter-Institutional Scholarly Research Community Kate McCready Cecily Marcus.
1 No More Paper, No More Stamps: Targeted myWSU Communications Lavon R. Frazier April 27, 2005 Copyright Lavon R. Frazier, This work is the intellectual.
Herding CATS: the Community of Academic Technology Staff Lou Zweier, Director CSU Center for Distributed Learning The California State University NLII,
ATLAS Utility Management Software Users Group Meeting January 14-15, 2003.
WCL318. MAP 5.5 Internet ExplorerWindows 7 Software Usage Tracking Heterogeneous Server & Database Inventory Windows Server 2008 R2 Hyper-V SQL Server.
Case Studies Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIVERSITI MALAYSIA SARAWAK.
FINDING FUNDING RESOURCES IN SUSTAINABILITY JUSTIN MILLER, MPA PROPOSAL MANAGER, SPONSORED PROGRAMS OFFICE GREEN FUNDING SPECIALIST, COUNCIL ON THE ENVIRONMENT.
Broad outreach objectives Increase and sustain funding Impact interventions, policy, and community action Impact research Increase awareness about CEHTP;
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Funding: A Grants Overview October 23 & November 5, 2014 Ann Combs, MA Mary Hitchcock, MA, MS Ebling Library 1.
Future Directions Angela Ruffin Public Health Outreach Forum National Library of Medicine Lister Hill Auditorium Bethesda, Maryland.
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
Roger Miller, Arkansas Department of Environmental Quality Barry Jackson, USGS Arkansas Water Science Center ARKANSAS EXCHANGE NETWORK FOR GROUNDWATER-QUALITY.
QIPM SURVEY RESULTS 2006 MANAGEMENT SCIENCE DEPARTMENT (NOW THREE DEPARTMENTS)
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan Pengfei Research Institute of Statistical Sciences.
Accessing PHRU data All projects must be submitted for formal review to a Data Access Committee. –DAC meets monthly to review projects.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Census Mapping A Case of Zambia UN Workshop on Census Cartography and Management, Lusaka, 8-12 th October 2007.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
United Nations Statistics Division Bringing Information to the World.
GSA Expo 2010 DoD Travel Programs Customer Assistance Tools and Services Mr. Joe Ward and Ms. Margaret Hebert GSA Expo May 2010.
Amy Dai Machine learning techniques for detecting topics in research papers.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Requirement Elicitation Process Lesson 3. Requirements Elicitation  Is the process to find out the requirements for an intended software system by communicating.
Launching a College Transition Program: Improving Student Connection to Disability Services.
Anne K. Stratton National Center for Health Statistics Centers for Disease Control and Prevention National Center for Health Statistics Re-engineering.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
United Nations Economic Commission for Europe Statistical Division Data Initiatives: The UNECE Gender Database and Website Victoria Velkoff On behalf of.
Building Dashboards SharePoint and Business Intelligence.
Using NIH’s Research Portfolio Online Report Tool (RePORT) to Your Advantage April 2012 Megan Columbus Director, Division of Communications and Outreach.
Second-Order Integrated Developmental Database Systems: EHDI Applications Craig A. Mason, Ph.D.Shihfen Tu, Ph.D. University of Maine Centers for Disease.
Brooke L. Hemming, Ph.D. US EPA/National Center for Environmental Assessment Stefan Falke, Ph.D. Washington University in St. Louis Terry Keating, Ph.D.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Environmental Protection Agency 1 The High Production Volume Information System (HPVIS) Demonstration and Status National Environmental Partnership Summit.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Archivists' Toolkit - All Hands Meeting Use Case Method Vernacular technique for modeling user requirements. Tells the story of how a user accomplishes.
TRANS: T ransportation R esearch A nalysis using N LP Technique S Hyoungtae Cho, Melissa Egan, Ferhan Ture Final Presentation December 9, 2009.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
PDA IN APPLICATIONS OF ICT IN LIBRARIES Levels 7 & 8.
Automating Work Order Processes for Advanced Metering Infrastructure (AMI) Devices with Collector for ArcGIS and Portal for ArcGIS Subrahmanyam Pendyala.
Collection Synthesis Donna Bergmark Cornell Digital Library Research Group March 12, 2002.
Information Retrieval in Practice
Search Engine Architecture
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Professor John Canny Fall 2001 Nov 29, 2001
Presentation transcript:

Recent Work at ISI Jose Luis Ambite Yigal Arens Eduard Hovy Andrew Philpot USC/ISI

Overview 1. EDC system –NHANES health questionnaire data –(Semi-)automatic domain model construction –NL-based question understanding 2. Proposals –Urban Transportation SGER awarded –Submitted proposal to ITR 3. Outreach –Connections to USC campus –Conference planning: dg.o 2002

NHANES Data Collection We acquired and wrapped NHANES database –From National Center for Health Statistics –Survey of thousands of records (people), each record contains max. 12,000 questions about health, family, medical history, etc. –Database wrapped and accessible via EDC system Challenge: can we learn the domain model automatically? –Try to extract terms from DB, cluster them, and then link them into Ontology –Then test Domain Model using SIMS

Automated Domain Modeling Research Step 1: performed manual pre-test –extracted approx. 60 column headings (database questions) –clustered them manually –compared accuracy: about 50% overlap only Step 2: developed clustering toolkit –assembled CLINK, SLINK, Median, k-Means, etc. into toolkit –developed speedup techniques Step 3: ran series of 10 experiments –various word manipulations (word weighting by inverse frequency, etc.; word stemming; longer passage extracts; etc.) –mapped out extensive parameter space; did pinpointed sweep Results still not great

NL Question Understanding Challenge: can we interpret user’s question when posed in English, not using menus or ontology? Approach: 1. create new Finite State Machine 2. create question grammar and lexicon (linked to Ontology) 3. create conversion routines that assemble SQL queries out of user input 4. test and evaluate using EDC system and SIMS Current status: –new FSM completed –grammar and conversion routines under construction –will demo English (+ other?) query input at conference

Proposals SGER proposal funded –Topic: Urban transportation study—new methods for freight tracking in LA by comparing across databases –Grant awarded to USC, shared by ISI and USC’s Dept of Policy and Planning –Jose Luis Ambite will spend approx. 25% time on this study White paper to DoT –Topic: Searching for patterns in freight traffic –Submitted by USC campus people and Jose Luis Ambite ITR proposal submitted –Topic: Semi-automated topic hierarchy creation –Partners: Eduard Hovy communicated with EPA group –If funded will use EPA’s CARAT ontology as starting point and evaluation standard

Outreach USC Campus Group –Urban policy planners, digital democracy sociologists, industrial and systems engineers, etc. –Held several meetings, chaired by Yigal Arens and Genevieve Giuliano, to explore collaborations and to see if we can extend DGRC to start a separate organization –Drafted a statement of goals to hand to Provost and USC-based small funding offices New issue of DG Online! Conference: dg.o 2002 –Hotel arranged –Website up (but still need fancy graphics) –Call for presentations disseminated –Some portions of program and invitees determined