One Tool, Many Industries Text Mining with Oracle Omar Alonso Chuck Adams Oracle Corp. Text Mining Summit, Boston, 2005.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Enterprise Search with FAST Rick McDannel Manager of Information Technology.
Enterprise Search with SharePoint Portal Server Level: 300 Collaboration and Business Productivity.
Idaho National Engineering and Environmental Laboratory Drill Down! The INEEL Docu-Search provides extensive searching capabilities Von Crofts Interlab.
FAST Radar System Engineering Overview. FAST Radar Overview –What’s Required? IIS 6.0  With Microsoft.NET Framework 1.1 and SMTP for MS SQL Server.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Introduction to SQL Server 2005 Analysis Services Melville Thomson IT Pro Evangelist.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Libraries and Institutional Content Management Systems
Business Intelligence
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Databases & Data Warehouses Chapter 3 Database Processing.
Governance, Risk, and Compliance Bill Greene Senior Industry Director.
Text Analytics And Text Mining Best of Text and Data
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Business Productivity Infrastructure Optimization The Business Productivity Infrastructure Optimization Campaign For Microsoft Office 2007 Module 25 –
Satish Ramanan April 16, AGENDA Context Why - Integrate Search with BI? How - do we get there? - Tool Strategy What - is in it for me ? - Outcomes.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
IoT Meets Big Data Standardization Considerations
On-To-Knowledge review Juan-Les-Pins/France, October 06, 2000 Hans Akkermans, VUA Hans-Peter Schnurr, AIFB Rudi Studer, AIFB York Sure, AIFB KMKMMethodology.
Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Jarg Corporation Seeks Sponsors/Partners, Who: Identify Solutions To Problems With Our Pilot (life science) Demonstrations of: Effective Semantic Use of.
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Governance, Risk, and Compliance Bill Greene Senior Industry Director
Creating New Business Value with Big Data
Federated & Meta Search
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Data Warehousing Concepts
PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

One Tool, Many Industries Text Mining with Oracle Omar Alonso Chuck Adams Oracle Corp. Text Mining Summit, Boston, 2005

Agenda Introduction Text mining Define problems Present solutions A look at Oracles technology stack Oracles roadmap A case study Conclusions

Data mining and Text mining OLTP OLAP DM Keyword search BK TM Classification Clustering Ontologies NLP Inexact match Structured DataUnstructured Data

An analogy RFID and robot vision – Put tags on everything instead having the robot do the vision Similar approach for text mining – Language is very social, not technical – Instead, start with a unified storage model – Then do mining

What about text mining? Text mining is one of many features in text technology Real future of text technology is business intelligence (BI) What is BI? – Ability to make better decisions What are the obstacles today? – Structured data is well understood – Unstructured data is different

Text and XML Increased exploitation of structure Plain Old File System File System on Steroids (WinFS) Records Mgmt, ECM Dynamic Doc Generation Traditional Content Mgmt XML Content Mgmt.

First problem: access No uniform access over all sources Each source has separate storage and algebra Examples – – Databases – Applications – Web

Second problem: management Management of unstructured of data very poor compared with structure data Cleaning Noise is larger than in structure data Security Multilingual

Third problem – user needs Perception with current search engines Large data -> 80/20 rule Doesn't provide uniform information Two users type same query and get the same results – Cricket the game or cricket the bug?

Foundations XML as the common model XML allows: – Manipulation data with standards – Mining becomes more data mining – RDF emerging as a complementary model The more structure you can explore the better you can do mining Integration use cases

Foundations - II Unstructured data is too AI Too easy to get fooled by the complexity Hybrid solution Domain knowledge – You know your domain – You own the content – You can do better

Remember?

Personalization problem Lack of personalization You own the content, you own the user Two users type the same query: financials – Sales rep looks for customers and other deals – Tech guy looks for bugs, architecture, etc. LDAP shows who they are Combination with query logs shows patterns in the same peer group Recommendation systems

Better Answers: Beyond Keywords Noise theory – As you cast your nets ever wider, you catch disproportionately more junk Must develop new models of Quality in the face of comprehensiveness – Combine Link-Analysis with Context-sensitive relevance – Personalization Must summarize information – Theme Maps, Gists Show patterns in information vs. many pages of hit-lists – Tree Maps, Stretch Viewer Ability to post-process and refine search hit lists – Dynamic categories for navigation – Reorder by date Progressive query relaxation – Nearest inexact match

Technology Stack Better Answers Relevance Toward BI Progressive Relaxation Multi-Criterion Support Visualization Classification Personalization Direct Answers Link Analysis Query Log Analysis Metadata Extraction Keyword Ranking Intelligent Match Duplicate Elimination

Oracles position Text mining is one of many tools for information retrieval and discovery in many assets Text mining is best used in the context of other techniques – Personalization – Search query logs – Visualization Product: one integrated platform

Oracle platform Integrated platform vs. niche technology Full-text searching XML Classification Clustering Visualization Google, FAST Tamino Autonomy Vivisimo Inxight One platform, low cost, low complexity Several products, different APIs, performance, maintenance cost, etc. Application searchSAP/TREX

Oracle platform If I can see further than anyone else, it is only because I am standing on the shoulders of giants – Isaac Newton Oracle provides you all the functionality – Plus you get backup, recovery, scalability, and other benefits You build the mining application

Case study Federal customer High Performance Text Information Mining and Entity Extraction

Business Need Enterprise Search Capability Information Fusion Profiles and alerting Security – user need to know Entity identification and extraction High Performance ingestion, search, and indexing Scalability

Challenges Search quality Performance Scalability Document formats Integration Operations and maintenance

Solutions Architecture Oracle 10g Integrated Framework 10g release 2 – Oracle Real Application Clusters – Oracle Text Full text and rule based indexing Extensible thesauri Document classification Document filters – Oracle Partitioning – Oracle Virtual Private database – Oracle Advanced Security

Technical Architecture

Scalable load and indexing

Real world results Single search for user Profiles and alerts Couple second query response 80,000,000 + documents indexed 1.2 TB raw text and growing 700 Gig index size Incremental index 1-2 Gig / day

Next Steps Entity Extraction and Relationship Awareness

Oracle database 10g release 2 Enterprise Search Capability Information Fusion Profiles and alerting Security – user need to know Entity identification and extraction High Performance ingestion, search, and indexing Scalability

Conclusions Text mining is one of many features needed for BI on unstructured data – Not a silver bullet in itself Must exploit other approaches – metadata (XML, RDF), personalization, classification, entity extraction, full-text search, … – Hybrid solution Focus on an integrated platform that gives you all the functionality Drive the platform for your information need