TEXT MINING, TERM MINING, AND VISUALIZATION IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING MONDAY 16 APRIL 2012 NICE, FRANCE Marjorie M.K. Hlava, President.

Slides:



Advertisements
Similar presentations
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
Advertisements

SCOPUS Searching for Scientific Articles By Mohamed Atani UNEP.
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
PATENTSCOPE What’s new?
Bringing It All Together: An Academic Viewpoint (What is needed and what is likely to come next?) Association of Information and Dissemination Centers.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Translation tools Cyberworld June 2014 Sandrine Ammann Marketing & Communications Officer.
Chapter 5: Introduction to Information Retrieval
ISI Web of Knowledge – Innovative Solutions ISI Web of Knowledge / Web of Science – coming developments BIOSIS Archive Web Citation Index – New product.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Engineering Disciplines © 2012 Project Lead The Way, Inc.Introduction to Engineering Design.
SciTech Strategies, Inc. William Pickering Dick Klavans Marjorie M.K. Hlava IEEE SciTech Strategies Access Innovations / Data Harmony March 23, 2010 Found.
Access Innovations, Inc. Marjorie M.K. Hlava Jay Ven Eman.
The Thomson Reuters CITATION CONNECTION Digital Library st March – 3 rd April 2014, Jasná David Horký Country Manager – Central and Eastern Europe.
Bringing IEEE Xplore ® to your Alumni Alumni Library Forum, Manchester Business School May 2014 Presented by: Steven Tweedie.
Shou Ray Information Service Co., Ltd.
Taxonomies in Electronic Records Management Systems May 21, 2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 15 Usability 3.
Tony Wilson Academic Liaison Librarian for Computer Science May 2011.
Bringing ASME & IEEE content to your Alumni Alumni Library Forum, Cranfield University Julia Stockdale May 2015.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
IEEE Membership - Benefits & Join IEEE. The Value of IEEE Membership Knowledge... staying current with the fast changing world of technology Community.
PubMed/How to Search, Display, Download & (module 4.1)
Engineering or Mechanical Engineering?
ROI & Impact: Quantitative & Qualitative Measures for Taxonomies Wednesday, 11 February :00 – 12:30 PM MST Presented by Jay Ven Eman, Ph.D., CEO.
OARE Module 5B: Searching for Scientific Research Using Environmental Issues and Policy Index (EBSCO)
NPSS AdCom Meeting Jozef Modelski Division IV Director 02 November 2013.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
1 Chapter Two Electrical & Computer Engineering Specialization.
Agenda Item: GSC 2GSC The IEEE Standards Association Serving Global Standardization Through Openness and Collaboration GSC April - 1 May 2003.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
©The University of Alabama in Huntsville The North Alabama Region A Globally Competitive Community Industry Clusters The.
Engineering Disciplines
Global Shape Memory Alloy Market Future Market Insights
Elsevier Engineering Information Inc. Engineering Village 2 : Compendex & INSPEC ChemVillage CRC Press Handbooks NTIS Jack O’Toole EI Vice President of.
Fast-track to Innovation Inspec on EBSCO HOST January and February 2006 Amy Barnes Inspec Customer Relationship Manager, Europe, Africa.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
JOIN IEEE COMMUNICATIONS SOCIETY. IEEE Society membership enhances the benefits of IEEE membership 39 Societies representing a full spectrum of technical.
What is the IEEE? Lewis Terman IEEE 2009 Past President IEEE-Industry Day 5 February, 2009.
Making Your Conferences A Part Of The Future: Why Proceedings Matter Mark A. Vasquez Strategic Program Development Sr. Manager, IEEE Meetings & Conferences.
TOPIC: Transportation Research Thesaurus: Taxonomy Development and Use Cases 14 February :00 PM EST Presented by Jay Ven Eman, Ph.D., CEO Access.
Evolution of a production pipeline Marjorie M.K. Hlava President Access Innovations.
Optical Transreciever Market Volume Analysis, size, share and Key Trends by Future Market Insights
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
What is Technology? The process by which humans modify nature to meet their needs and wants.
Electrical Engineering Six Primary Specialties 1. Computers / Computer Engineering 2. Electronics 3. Communications 2 nd Largest of all Engineering disciplines.
Engineering Disciplines
To find journals by language of publication, click on the Languages bar in the horizontal frame. The Languages drop down menu appear and we will choose.
TENCON 2009 Singapore The conference for IEEE Region 10 members By Region 10, For Region 10 Organizer: “We are proud to be the organizer, and we look forward.
Engineering Disciplines Get out your notes:. Learning Objectives Be able to describe several different engineering fields –General Description –Specific.
IEEE – Advancing Technology for the Benefit of Humanity Howard E. Michel, Ph.D. IEEE 2011 Vice-President Member and Geographic Activities.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
Innovative Novartis Knowledge Center
The Sci-tech Information Industry: Changes, Challenges and the CAS Response Presented by Michael Walsh Manager of International Marketing Operations Chemical.
IEEE Membership Benefits
PATENTSCOPE Translation tools
PRIMARY DATA vs SECONDARY DATA RESEARCH Lesson 23 June 2016
Bibliometrics toolkit: Thomson Reuters products
Chapters Coordinator Responsibilities Key items
MECHANICAL ENGINEERING The “Gears” of FUTURE. A simple machine is a device that simply transforms the direction or magnitude of a force A machine is a.
John B. Anderson and Ken Moore
Tips & Tricks for Searching IEEE Xplore
Tips & Tricks for Searching IEEE Xplore
Introduction to Information Retrieval
How to Search in PubMed and ESGO Journal
PubMed.
Presentation transcript:

TEXT MINING, TERM MINING, AND VISUALIZATION IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING MONDAY 16 APRIL 2012 NICE, FRANCE Marjorie M.K. Hlava, President Jay Ven Eman, CEO Access Innovations, Inc. 1

What we will cover today Term and Text Mining The basics of visualization Case studies Using subject terms as metrics Applications Visualizing the results

Definitions Term Mining - a systematic comparison processing algorithmic method to find patterns in text Text Mining – using controlled vocabulary tags in text to find patterns and directions Term & text mining  Many similarities  Can be complimentary; not mutually exclusive

Term mining Precise  Meaningful semantic relationships; contextual  Replicable; repeatable; consistent  Vetted; controlled  Based on a controlled vocabulary  Trends; gaps; relationship analysis; visualizations  Less data processing load

Text mining  Algorithmic; formulaic  Neural nets, statistical, latent semantic, co - occurrence  Serendipitous relationships  Sentiment; hot topics; trends  False drops; noise;  Misleading semantic relationships  Heavy processing load

Why take a visual look? Humans can process information 17 times faster in visual presentations Now data can be analyzed, manipulated and presented as visual displays. To see the trends effectively we need to make the data into rich graph-able formats 6

Visualization of data Needs −Measurement −Metrics −Numbers Shows −Adjacency −Relationships −Trends −Co – occurrence −Conceptual distance Is richer with −Linking −Semantic enrichment −Classification Supports −Forecasting −Trend analysis −Segmentation −Distribution 7

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Man’s attention to visual display to convey knowledge is ancient 8

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony The art in maps is a longstanding tradition 9

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Super imposing data is now common A mash up example 10 Traffic Injury Map UK Data Archive US National Highway Safety Administration Google Maps Base Accident categories include children automobile bicycle etc. Data time place type Source: JISC TechWatch: Data Mash-ups September 2010

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Mash up of bird flight migrations and weather patterns

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 12

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony How does it work?  Develop controlled vocabulary » Prefer one with hierarchy  Apply to full text » Or to the “heads”  Decide on data points to convey information  Divide the XML into graphable sections

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Start with data – like this XML file 14

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Index or tag using subject terms from thesaurus or taxonomy  date, category, taxonomy term, frequency 15

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Many views of one set of data 16

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Load to a visualization program Like Prefuse 17

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Or Pajek 18

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 19

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony National Information Center for Educational Media Albuquerque’s own » Sandia developed VxInsight » Access Innovations = NICEM Same data – several views Primary and Secondary Education in US Shows the US Valley of Science Little Science taught in elementary years 20

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony

Using visualization to show  From a society / publisher perspective » Identify Core, Boundary and Cross Border » Provides Indicators  Activity  Growth  Relatedness  Centrality » Locates Journal domains  From a thesaurus perspective » Identifies terms that are too broadly defined » Potential Improvements in thesaurus structure using topic structures 23

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Case Study: Mapping IEEE thesaurus space  We are interested in an expanded map that includes adjacencies to the IEEE data » Expanded term set shows adjacent white space; opportunities for expansion  Overlaps and edges of the science » We need comparison data  Learn the directions in the field » Low occurrence rate in IEEE documents? » Linkage to terms in IEEE documents?  Where do we find these terms? How can we add them? 24

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony The process  Built a rule base to auto index IEEE content » “90 % accuracy out of the box on journal data”* » “80% out of the box on proceedings data”*  The overlapping data sets » Auto indexed 1.2 million Xplore records » Auto indexed 10 years of US Patent data » Auto indexed 10 years of Medline  Term sets used » IEEE thesaurus terms rule base » Medical Subject Headings (MeSH) (and simple rule base) » Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base) » Similar level of detail to current IEEE thesaurus terms 25

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Defining expanded term space 26 IEEE 2k terms 1.2M documents 1. The data - Select related corpus 14k DTIC 475k patents 24k MeSH PubMed 525k docs

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Defining expanded term space 27 IEEE 2k terms 1.2M documents 2. Identify related terms Use the IEEE Thesaurus to index the three collections

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Defining expanded term space 28 IEEE 2k terms 1.2M documents 2. Identify related terms Use MESH and DTIC to also index the three collections

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 29 IEEE 2k terms 1.2M documents 3. Resulting term set The co-indexed items from the three collections Defining expanded term space

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Term:Term Matrix Where do the articles and their indexing intersect? Defining expanded term space

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 31 Visualization Strategies Matrix Visualization Software

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 32 All data up-posted to the top level

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 33 Many map options IEEE ExperiencePrevious Experience

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 34 Sensors Council Nucl Plasma Sci Soc Nanotech Council Ultrason, Ferro … Prod Saf Engng Soc Oceanic Engng Soc Geosci Rem Sens Soc Council Supercond Compon, Packag … Instr Measur Soc Magnetics Soc Dielectr El Insul Soc Electromag Compat Soc Antennas Propag Soc Power Electron Soc Electron Dev Soc Circuits & Systems Power & Energy Soc Industry Appl Soc Solid St Circuits Soc Industr Electr Soc Microwave Theory Soc Aerosp Electr Sys Soc Sys Man Cyber Society Computer Intelligence Society Systems Council Reliability Society Education Society Prof Commun Society Computer Society Robot Autom Soc Social Impl Techn Council Electr Design Auto Signal Proc Soc Intell Transp Sys Soc Commun Soc Info Theory Soc Vehicular Techn Soc Consumer Electr Soc Broadcast Techn Soc Photonics Soc Eng Med Biol Sci IEEE Portfolio

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 35 Radial Visualization

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 36 Publication Strategy JASIST reference

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 37 Conference Strategy

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony 38 Turbines Measurement Circuits Amplifiers Displays Games Toys Flow Cooling Heating Components Gearing Brakes Dynamics Vehicles, Parts Disk Optics Photochem Molding Conductors Coatings Lasers Lamps Motors Plants, Micro-orgs Control Boats Oilfield Services Med Instruments Welding Conveyers Rubber Acyclic Comp Footwear Lubricants Radiology Catalysis Macromolecules Sprayers Electrochem Fitness Hygiene Cleaning Printing Paper IC Engines Magn/Elect Magnets Textiles Layers Medical Devices Clocks Pipes Valves Blasting Cables Appliances Outerwear Exhaust Pumps Tools Furniture Packaging Aircraft Semiconductors Use a Thesaurus to Label Maps Agriculture Food Consumer Products Construction Automotive + Defense Industrial Products Leisure Energy Telecom Computer HW/SW Electronics Chemicals Pharma Metals Health Care

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Questions Answered  Is there a way, using our own information, to forecast our direction?  Where is the industry headed? What about by technology sector?  Does our coverage match our mission and vision?  Can we become smarter about our data and potential markets using our collection in new ways? Are the societies publishing and talking about what their charter indicates they cover?  What are the trends – are topics emerging/cooling?  Can we use technology and our own data to explore these questions while enhancing our data? 39

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony The research team  Access Innovations / Data Harmony » Founded in 1978 » Data enrichment and normalization » Suite of Semantic Enrichment tools  SciTechStrategies » Understanding data through visualization  IEEE Indexing & Abstracting Group 40

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony We looked at visualization of data  Finding the Metrics » Measurement » Numbers » Terms as indicators  Ways to show » Adjacency » Relationships » Trends » Co – occurrence » Conceptual distance  How to enrich with » Linking » Semantic enrichment » Classification  Maps supporting » Forecasting » Trend analysis » Segmentation » Distribution 41

Well Formed Data Semantic Enrichment Taxonomies Access Innovations Data Harmony Effective maps require  Contextual data  Detailed data  Classification methods  At least two directions in the matrix  A little art for fun 42

43 It just takes a little imagination Thank you Marjorie M.K. Hlava President Jay Ven Eman, CEO Access Innovations