Ricardo EIto Brun Strasbourg, 5 Nov 2015

Slides:



Advertisements
Similar presentations
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Advertisements

Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Engineering Village ™ Basic Searching.
Mapping Studies – Why and How Andy Burn. Resources The idea of employing evidence-based practices in software engineering was proposed in (Kitchenham.
Research Tool for Excellence
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
Senior Thesis: Review of Literature Samples, Citation help, Search techniques.
New Advanced Higher Subject Implementation Events
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
CHAPTER 15, READING AND WRITING SOCIAL RESEARCH. Chapter Outline  Reading Social Research  Using the Internet Wisely  Writing Social Research  The.
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
WISER: Citation searching Web of Knowledge is a powerful way to access the ISI's multidisciplinary citation indexes. It allows you to discover what research.
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Systematic literature searching Information skills for PhD students: 2 Jane Falconer Improving health worldwidewww.lshtm.ac.uk.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
MARKO ZOVKO, ACCOUNT MANAGER STEPHEN SMITH, SOLUTIONS SPECIALIST JOURNALS & HIGHLY-CITED DATA IN INCITES V. OLD JOURNAL CITATION REPORTS. WHAT MORE AM.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
4 Steps to follow when writing an original research article.
Tools for Effective Evaluation of Science InCites David Horky Country Manager – Central and Eastern Europe
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Information Retrieval in Practice
Daniel R. Harris Center for Clinical and Translational Sciences
Queensland University of Technology
Bibliometrics toolkit: Thomson Reuters products
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
TJTS505: Master's Thesis Seminar
Researching for your Literature Review
Software Maintenance.
Parts of an Academic Paper
European VIRTA pilot – current situation
The Basics of Literature Reviews
Outline What is Literature Review? Purpose of Literature Review
Critically Reviewing the Literature
European VIRTA pilot – eurooppalaisen julkaisutietovirran pilotointi
Elsevier Activity Range
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
Literature review Lit. review is an account of what has been published on a topic by accredited scholars and researchers. Mostly it is part of a thesis.
AP Seminar: irr directions & rubric analysis
Search Techniques and Advanced tools for Researchers
EBSCOhost Basic Searching for Academic Libraries
Introduction to Systems Analysis and Design
Citation Searching with Web of Knowledge
Introduction to EBSCOhost
How to Use “Indian Citation Index (ICI)”
IL Step 3: Using Bibliographic Databases
Advanced search techniques in databases
TECHNICAL REPORT.
Revised Higher Course Event
Bibliometrics: the black art of citation rankings
Visualizing Scholarly Communication
Comparing your papers to the rest of the world
Citation Searching with Web of Knowledge
WISER: Citiation searching
Information Analysis, Organization, and Presentation
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
LITERATURE REVIEW by Moazzam Ali.
Learning outcomes By the end of this chapter you should: • understand the importance and purpose of the critical literature review to your research project;
Analyzing and Organizing Information
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Software Re-engineering and Reverse Engineering
Presentation transcript:

Ricardo EIto Brun Strasbourg, 5 Nov 2015 Terminology extraction and the identification of research areas : an essay for Space Engineering” Ricardo EIto Brun Strasbourg, 5 Nov 2015

The value of Terminology as a research evaluation tool From a pragmatic perspective, terminology work and Terminography needs to analyze the “language and vocabulary” used in the documents: New terms are created to refer to new concepts, tools or techniques. Terms are created by “combination” of existing terms. The “frequency of use” of the terms may be a good indicator of the “popularity” of the terms or the extent to which it has been adopted by a particular community. The “decreasing frequency” of the use of the terms may be an indicator of a “lack of interest” or “commoditization” of a concept, tool or technique.

The value of Terminology as a research evaluation tool In the context of scientific and technical knowledge: “Recombination of existing knowledge” is a means of creating new knowledge. E.g. the application of a new technique or tool on a specific process may help improve its performance/capability in a significant way The use of an existing method in a different context may be useful to solve a known-problem. There are “research trends” that are followed by researchers, funding agencies, etc., that guide the research investment policies.

The value of Terminology as a research evaluation tool How can terminology be useful in this context? Is it possible to apply terminology analysis techniques to get a profile or research trends? Can these terminological analysis be used with a retrospective purpose (history of techniques)? Can these terminological analysis be used with a prospective purpose (identification of future research trends)?

Research Presentation This research purpose is to use proven, widely available term extraction techniques coupled with “bibliometric analysis” techniques to characterize research trends (retrospective approach). Focus of research is the scientific and technical production of the European Space Agency (ESA). A preliminary analysis has been run with a small set of patents (334 documents) to assess the feasibility of the approach. Terminology extraction is done with AlchemyAPI . TermMine tool is another candidate.

Context of Research This activity is part of a bibliometric analysis of the ESA scientific and technical production. In the last 50 years, ESA staff has produced more than 12.000 scientific and technical articles and proceedings. Bibliometric study aims to analyze: Productivity (who are the most productive authors, productivity be period) Impact – number of citations received by the different researchers, evolution in time. Collaboration patterns – to which extent ESA collaborated with other entities, and its evolution in time. Areas of research – and its evolution in time.

Research objectives (current) Identify the “subject areas” and topics in the research conducted by ESA in different time periods. Characterize research topics by using techniques like term-frequency and “word co-occurrence” Analyze the “lifecycle” (evolution) of different research topics.

Research objectives (future stage) Compare the “terminological profile” of the patents released in a particular period, with the “terminological profile” of the “basic research” conducted before. Is there any relationship between the basic research, and its translation into “working innovations” (products, methods or services? Characterize research fronts by sets of well-defined terms. Analyse the relationships between citing and cited documents from a terminological perspective: Is the research described in a specific document, the result of the recombination of the terms used in previously (cited) conducted research?

1. Identify source documents Research steps 1. Identify source documents WoS Derwent Database Data set: 334 patents. Export result set to “tagged format”

2. Convert results to xml and split records Research steps 2. Convert results to xml and split records Custom XSLT style sheet with Altova® MapForce® Convert “tagged” data set into XML. Split record set into individual records. Keep only “relevant fields”: title, abstract, keywords.

2. Terminology extraction Research steps 2. Terminology extraction Running the AlchemyAPI too in batch mode. Command line tool, generates as output the set of “terms” and its “relevance”. Built-in PHP script to process the set of files. Results is a file with docId, term and weight in document. AlchemyAPI can be called using different programming languages. It does not extract only “words”, but “terms made up of two or more words”

Research steps 2. Terminology extraction

Research steps 3. Terminology clean-up The output generated by the tool was visually inspected to identify “extracted terms” that should be removed and no-later processed. This happened mostly with words appearing in section titles or generic terms (e.g. Advantage). The possibility of defining “stop word lists” to guide term extraction is being considered to get more accurate results.

4. Analysis of terms used in a specific period. Research steps 4. Analysis of terms used in a specific period. Extracted terms are “tagged” with a specific time period. Extraction process was run considering a “five year period”. This can be changed anyway to a one year period. Tagging terms extractions with dates allows getting the evolution in the use of terms across time. Word clouds can be generated with the most relevant terms per period. Word clouds give a quick overview of the “main concepts” involved in research for that period.

Research steps 4. Analysis of terms used in a specific period.

5. Evolution of the “Use of terms” Research steps 5. Evolution of the “Use of terms” The evolution of the “weight” of individual terms may be relevant to identify “research trends”. These values show how important the term was in the different periods. At this stage, a second conceptual analysis is needed to group terms that refer to more generic or specific concepts . Setting up these hierarchies allows an analysis at different levels, e.g.: “research on propulsion”, “research on “Combined ion-electric propulsion”. Setting up this hierarchies is manual work done with the support of subject experts.

5. Evolution of the “Use of terms” Research steps 5. Evolution of the “Use of terms” Example:

Research steps 5. Evolution of the “Use of terms”

6. “Concept identification” Research steps 6. “Concept identification” Co-occurrence of terms is considered a good indicator to identify “relationships between concepts” This idea has been widely used in bibliometric analysis and Information Retrieval to analyse “areas of knowledge”. In our case, term co-occurrence may be considered an indicator of “patterns of knowledge re-combination” Co-occurrence is calculated with the BibExcel tool, using the output generated with AlchemyAPI. Note: some bibliometric tools make co-occurrence analysis, but they work on “single words”, not “compound terms”.

Research steps 6. “Concept identification”

7. “Drawing conclusions” Research steps 7. “Drawing conclusions” Graphical representations can be generated with BibExcel for the Payek and VosViewer tools. These are dynamic tools that may be used to “explore” the network of related terms. This analysis can be restricted to specific time periods.

Research steps 7. “Drawing conclusions”

Way Forward…. Preliminary results are satisfactory: Terminology extraction tools provide good performance, although some pre- and post-processing is still needed. Visual displays provide an interesting tool to “present terms and show their relevance and relationships (based on co-occurrence). The execution of the analysis on a bigger set of records is expected to increase quality of results (as well as complexity for “data cleaning”) Checking differences in the “terms profile” of set of documents may be considered an evidence of “knowledge recombination” that leads to innovation.

Way Forward…. But… Analysis is restricted to title, abstract, keywords due to the unavailability of full-text search in the chosen database. Stop-word lists need to be refined for a better data “clean up”. To get a detailed characterization of research fields, it is still necessary to identify relationships between concepts (mainly IS_A / BT/NT and RT) to support an analysis at different level of aggregations. A way to show and compare the evolution of the “terms” upon user-demand needs to be automated. Data currently kept in files (xml, Excel and text). A database is needed for further analysis.