Building Topic/Trend Detection System based on Slow Intelligence

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

All Rights Reserved, Copyright © FUJITSU LABORATORIES LTD An approach to KNOW-WHO using RDF Nobuyuki Igata, Hiroshi Tsuda, Isamu Watanabe and Kunio.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Maximizing Data and Data Services Monday, October 14, 2013 Location: Denver CO© 2013 Child Care Aware ® of America.
PRODUCT FOCUS 4/14/14 – 4/25/14 INTRODUCTION Our Product Focus for the next two weeks is Microsoft Office 365. Office 365 is Microsoft’s most successful.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Online communities 1 Theory revision Complete some of the activities in this powerpoint and use the revision book to answer questions.
Using Social Care Online: an overview Version 1.0 April 2015.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Top Objectives: 1.Increase web traffic and exposure 2.Become definitive authority on Coffee 3.Increase sales to coffee centric Food Service Operators 4.Engage.
Presentation By: Brian Mais. What Is It? Content Management Systems(CMS) describes software that manage content, workflow, and collaboration online and.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Anomaly detection with Bayesian networks Website: John Sandiford.
In addition to Word, Excel, PowerPoint, and Access, Microsoft Office® 2013 includes additional applications, including Outlook, OneNote, and Office Web.
Leveraging Technology: Strategies to Engage your Students and Make your Job Easier! Chryssa Jones, Veterans Services Coordinator University of California,
Attract & More... Blogging Easily create remarkable content that will help your business get found. Social Inbox Publish and see Social Analytics across.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
Yang Hu University of Pittsburgh Department of Computer Science.
Introduction to Web Applications Programming Lab : II MS (IT)
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Slow Intelligence Systems Session and Panel. 2 Panelists Erland Jungert Francesco Colace Tiansi Dong Shi-Kuo Chang (Moderator)
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
What is an Annotated Bibliography? First, what is an annotation?  More than just a brief summary of an article, book, Web site etc.  It combines summary.
What keywords are Terma Labs going after? What are they doing to add new content Used to write blog posts but haven’t in a year.
1 Applications of Slow Intelligence Systems. 2 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application:
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
National Training Number Database on the Faculty Website: a proposal Carlos Hoyos Faculty Website Committee November 2002.
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Jan 27, Digital Preservation Seminar1 Effective Page Refresh Policies for Web Crawlers Written By: Junghoo Cho & Hector Garcia-Molina Presenter:
Off-Site SEO to Improve Your Website’s Page Rank Straight Up Marketing.
Data mining in web applications
Search Engine Optimization
Presentation by: Rebecca Chambers WebDuck Designs
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Chapter 18 Maintaining Information Systems
Altmetrics What do they measure?
Summon discovers contents from one search box!
Who am I? Post graduate in MCA Tech Associate in Top IT Company
Prepared for SEO Analysis Prepared for 17 June 2014.
SMS MARKETING.
Elsevier Activity Range
E-Commerce Theories & Practices
Online marketing is undoubtedly a great way to grow your business and generate more profits. The latest statistics confirm that a huge number of people.
Webdesigningpune.net Webdesigningpune.net Ltd, started in the year 2013, and is recognized as one of the fastest growing and most experienced Digital Marketing.
Best SEO Technique for Blog By: BloggerBlogger. Promoting could be the best and the direct way to produce yourself and your website popular. Lots of folks.
How to Get Extra traffic from High CPC Keywords By: BloggerBlogger.
Restrict Range of Data Collection for Topic Trend Detection
Chapter 7 e-Business Systems.
Data Warehousing and Data Mining
Smart Portal To Protect Child Online
How to Use Social Networking to Help Job Seekers
Manuscript Transcription Assistant Initiative
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Emna Krichene 1, Youssef Masmoudi 1, Adel M
Pei Lee, ICDE 2014, Chicago, IL, USA
Best SEO Techniques To Increase Organic Traffic Presented By:- Abhinav Shashtri.
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Brilliant. Sharp. Inspiring.
Presentation transcript:

Building Topic/Trend Detection System based on Slow Intelligence Chia-Chun Shih & Ting-Chun Peng Institute for Information Industry Taipei, Taiwan Presented at DMS’10 special session on Slow Intelligence Systems

Agenda Introduction Topic/Trend Detection System Topic/Trend Detection System with Slow Intelligence Conclusion

Introduction

Introduction Social media is prevailing Blog Posts Twitter Posts Facebook Users Social media is prevailing Social media is a reflection of real-world An experiment from HP Social Computing Lab shows: Twitter-rate time series can accurately predict box-office movie sales with Adjusted R2 = 0.973 (amazing!!) The emerging market for Social Media Monitoring Service E.g., Nielsen Buzzmetrics, Radian6

Introduction Topic Detection and Tracking (TDT) (cont’d) Initiated by DARPA at 1996 discover the topical structure in unsegmented streams of news reporting as it appears across multiple media Tasks: Topic Detection Topic Tracking First Story Detection Story Segmentation Link Detection

Knowledge-based Controller Introduction (cont’d) Slow Intelligence provides a software development framework for systems with insufficient computing resources to gradually adapt to environments to handle complexities Environment Knowledge-based Controller Problem Solution 1 2 3 4 Enumerator Adaptor Eliminator Concentrator Slow Intelligence System

Introduction (cont’d) In this paper, we propose a design of online topic/trend detection system for Social Media with the advantages of Slow Intelligence. Four complexities of designing online topic/trend detection systems are identified, along with corresponding Slow Intelligence solutions.

Topic/Trend Detection System

Topic/Trend Detection System Objective Detect current hot topics and to predict future hot topics based on data collected from Social Media Three components Crawler & Extractor: Collect data and extract information from Social Media Topic Extractor: Detect hot topics from a set of text documents Trend Detector: Detect trends (future hot topics) based on currently available data Current Hot topics Social Media Crawler & Extractor Topic Extractor Trend Detector Future Hot topics

Topic/Trend Detection System (cont’d) Crawler & Extractor Social Media User’s Keywords of Interests HTML documents Web Crawler Text documents Web data DB Information Extractor Topic Extractor * Extract articles and metadata (title, author, content, etc) from semi-structured web content Crawler & Extractor

Topic/Trend Detection System (cont’d) Topic Extractor Web data DB Current topics Topic Word Extraction Topic Word Clustering Apply TF-IDF scheme to generate Top-N topic words for each document Apply clustering algorithm to cluster topic words into topic groups. The topic groups are treated as “topics” Current Hot topics Hot topic extraction Apply aging theory to find hot topics Topic Extractor

Topic/Trend Detection System (cont’d) Trend Detector Current topics Trend Estimation Algorithms Topic Trend (Future Hot Topics) Trend Detector The Trend Estimation Algorithm is a black box now, however, it will “find its way” when Slow Intelligence is involved in the system

Topic/Trend Detection System with Slow Intelligence

T/TD System with Slow Intelligence Four complexities of designing online topic/trend detection systems 1. It is unlikely to collect all web data based on limited amount of computing resources. The system needs to develop data collection strategies which can concentrate limited resources on collecting important web data. Crawler & Extractor

T/TD System with Slow Intelligence (cont’d) 2. Many computation methods are available for estimating trends. If parameter settings are also taken into account, there are too many combinations to choose. Furthermore, Internet is a changing environment, which means current best solution may not perform well in the future. The system needs to automatically (or at least quasi-automatically) find best solution from many alternatives in a changing environment. Trend Detector

T/TD System with Slow Intelligence (cont’d) 3. The crawler needs to revisit websites to collect up-to-date data in hourly or daily intervals. Each site has different amount of to-be-update data and different policy to restrict frequent access, which are unknown beforehand. The system needs to find feasible data collection schedule based on past experience. Crawler & Extractor

T/TD System with Slow Intelligence (cont’d) 4. Any changes in web pages may disrupt Extractors. It needs automatic repair mechanism for Extractors if many websites are being monitored. The repair mechanism needs to detect errors of Extractors, find alternatives, and choose the best solution from alternatives to fix the disrupted Extractors. Crawler & Extractor

T/TD System with Slow Intelligence (cont’d) 1. SIS to help restrict the range of data collection Knowledge of data Knowledge of algorithm

T/TD System with Slow Intelligence (cont’d) 2. SIS to help select and adapt trend detection algorithms

T/TD System with Slow Intelligence (cont’d) 3. SIS to help scheduling Crawler

T/TD System with Slow Intelligence (cont’d) 4. SIS to help adapt Extractors

Conclusion

Conclusion An online trend detection system requires careful resource allocation and automatic algorithm adaptation to process huge size of heterogeneous data. This research adopts Slow Intelligence, which provides a framework for systems with insufficient computing resources to gradually adapt to environments, to response the challenges. Four Slow Intelligence subsystems are proposed, and each subsystem targets a challenge in designing online topic/trend detection systems.

If you have any questions, please e-mail us chiachun@iii.org.tw (Chia-Chun Shih) markpeng@iii.org.tw (Ting-Chun Peng)