Publication Output on the Topical Area of "Energy" and Real Estate (Education) Bob Martens.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Improved TF-IDF Ranker
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Information Retrieval in Practice
Disasters and Human Factors Literature Nestor L Osorio Northern Illinois University.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Chapter 5: Information Retrieval and Web Search
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Systematic Approaches to Literature Reviewing. The Literature Review ? “Literature reviews …… introduce a topic, summarise the main issues and provide.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
1 Literature review. 2 When you may write a literature review As an assignment For a report or thesis (e.g. for senior project) As a graduate student.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Sharing lessons through effective modelling Hilary Dexter University of Manchester Tom Franklin Franklin Consulting.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
Chapter 6: Information Retrieval and Web Search
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Vector Space Models.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Domain Model A representation of real-world conceptual classes in a problem domain. The core of object-oriented analysis They are NOT software objects.
DATA ANALYSIS Indawan Syahri.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
1 Which came first – terminology or models? Contributed Paper No. 28 Miroslava Brchanova.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Text INTERNAL February 11, 2011 Problem Solving. INTERNAL Tech Republic’s railway department wants a solution Tech Republic’s railway department.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Information Retrieval in Practice
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Queensland University of Technology
Linguistic Graph Similarity for News Sentence Searching
Sharing lessons through effective modelling
Computing and Data Analysis
Text Based Information Retrieval
Web News Sentence Searching Using Linguistic Graph Similarity
Statistics in Science.
Information Retrieval and Web Search
Search Engine Architecture
The Research Paper: An Overview of the Process
Improving a Pipeline Architecture for Shallow Discourse Parsing
Information Retrieval and Web Search
Search Techniques and Advanced tools for Researchers
Action Research Reviewing literature on the Research Problem
Text Mining with JMP Pro 13: A Case Study
HCC class lecture 13 comments
Bursty and Hierarchical Structure in Streams
Citation-based Extraction of Core Contents from Biomedical Articles
Search Engine Architecture
Building Valid, Credible, and Appropriately Detailed Simulation Models
Paper Reading and Writing
Information Retrieval and Web Design
Presentation transcript:

Publication Output on the Topical Area of "Energy" and Real Estate (Education) Bob Martens

Outline 1. Introduction 2. Data sources 3. Text mining 4. ERES Education Seminar 2017 case

1. Introduction Disciplines evolve over time Associations like ERES need to track this evolution support and adjust to emerging themes How can we identify those changes? How substantial are they? Which topics are rising? Which topics are declining? [See Maier/Martens ERES2017-Delft]

2. Data sources Records of the ERES conferences 2012-2017 as stored in the "conf-vienna" database Submission environment for the annual conference Representation of the scholarly discussion Statistics: Titles and abstracts of all accepted papers to these annual conferences 1,780 papers - 452,074 words - 15,399 different words Every word is repeated 29.47 times in average Most frequent word “the” is found 32,000 times

MySQL-view

3. Text mining An analytical method based on text rather than numbers The method is dependent on language It needs to take into account – to some extent – the complexity of languages: Text structure (sections, paragraphs), sentence structures, grammar Synonyms (two words mean the same), polysemes (one word with different meanings) Importance of context, irony, satire

Text mining – basic concepts Analysis of text in a (large) number of documents Text mining requires simplifying assumptions and cannot extract meaning perfectly from text Text mining incorporates linguistics, statistics, mathematics & computer science Bag of words model: Within each document the text is split into words. Every word is treated the same irrespective of its position in the document. Relationships between words are ignored (at this stage).

Text mining – basic concepts Not every word is really relevant: Some words occur very often, but do not carry much information. E.g., “the”, “a”, “of”, “and”, “in”. They are stopwords and excluded from the analysis. Some words occur in one document but most likely not in any other. Often these are numbers. E.g., “0.3993” will most likely not occur in another document. Such words are also excluded. Some words are different as strings, but mean the same. Differences due to grammar. E.g., “originate”, “originates”, “originating”. Such terms are changed to their common word stem  stemming

Data sources: „cleaning“ 15 most frequently used words in the corpus:  after stopword removal  after stemming

Data sources: „ranking“ The rankings of the terms are surprisingly stable over the years

Text mining – word pairs Shortcoming of the "bag of words" approach is that it does not store the relation between terms From the fact that “real”, “market”, and “estate” are the most frequent terms (text corpus) we cannot conclude that their combination – “real estate market” – is a key topic Deviation from the bag of words model and identify word pairs  dyads – for example “energy efficiency” Calculating the frequencies of dyads  Leads to the representation of ERES core areas

Overall frequency of dyads

Dyads: Volatility of ranking The secondary aspects “energy efficient” and “green building” are much more volatile in their position over the annual conferences (2012-2017) “Energy efficient” ranked 20th in 2012, 4th in 2016 and 30th in 2017 The dyad “green building” ranked 5th in 2012, 38th in 2015 and dropped out of the top 40 in 2017

4. ERES-EDU Case Data mining towards "Energy" AND "Real Estate“ Extraction of data with all the records (2012-2017), in which "energy", "real" and "estate“ appear at the same time Sorted by a score, which is the product of the respective number of three terms 68 papers – “energy” appears 272 times

Raw data

Top 10

Word cloud

Acknowledgements I would like to express my sincere gratitude to Gunther Maier, for maintaining "confvienna" as well as the data mining for this contribution!