Background “Dynamic” web –Blogs The most look-ed up word on Merriam- Webster's internet site this year –RSS Feeds Mass Media.

Slides:



Advertisements
Similar presentations
COMBASE: strategic content management system Soft Format, 2006.
Advertisements

Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
CSc 667/867 Java Web Start / JNLP RSS. Deploying Software with JNLP and Java Web Start Delivering client-side Java technology-based programs has recently.
What is RSS? Kate Pitcher ©
1 of 2 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
IS 360 Web Promotion. Slide 2 Overview How to attract visitors.
Eric Sieverts University Library Utrecht IT Department Institute for Media & Information Management (Hogeschool van Amsterdam)
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
JavaServer Pages TM Introduce by
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
Don Westover Director of Instructional Design Mount Wachusett Community College.
Defining Blogs & RSS Feeds. What is a blog?  A web log  Definition by Darlene Fichter….a blog is a “web page containing brief entries arranged chronologically.”
+ RSS Aggregation and Syndication. + Really Simple Syndication (aka, Rich Site Summary) Image source:
The RSS Editor Programme: RSS_broker A.Annunziato, C. Best JRC Ispra
Static VS Dynamic websites. 1-What are the advantages and disadvantages? 2- Which one should you choose and why?
Web Content Management at GCN.com The Gilbane Conference: Content Technologies for Government Alec Dann SVP of Internet Publishing PostNewsweek Tech Media.
Business Overview Who Is ROCKETinfo?. The Business Rocketinfo is a Web 2.0 Company focusing on providing Web-based information. The goal is to provide.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Introduction: Drupal is a free and open-source content management system (CMS). A content management system(CMS) is a computer program that allows publishing,
MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer.
Adobe Dreamweaver CS3 Revealed CHAPTER ONE: GETTING STARTED WITH DREAMWEAVER.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 13 Key Concepts 1 Copyright © Terry Felke-Morris.
1 Web Developer & Design Foundations with XHTML Chapter 13 Key Concepts.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Maximizing Online Information Retrieval: How Theological Librarians Can Best Access the Gnostic Areas of the Internet Libby Peterek, M.S.Info.St. Division.
OBJECTIVES  What is HTML  What tools are needed  Creating a Web drive on campus (done only once)  HTML file layout  Some HTML tags  Creating and.
 Popularity of browsers:  Popularity of search.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
What is RSS? And how do I use it to make my life easier.
Introduction to Blogs as an Information Resource Kevin Reiss Rutgers School of Law- Library
HTML | DOM. Objectives  HTML – Hypertext Markup Language  Sematic markup  Common tags/elements  Document Object Model (DOM)  Work on page | HTML.
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
Using an RSS Feed Aggregator An Introduction Prepared by Liz Rodrigues.
For web 2.0.  Digital media files that is made available for download via web syndication.  It is a way to receive audio/video files over the internet.
Monitoring web sites RSS and other tools. Monitoring web sites Why monitor? What? How will we monitor? How will we get the results?
Emily Puleston. Wordpress is a free blogging website It is the #1 Content Management System site today First released in May, 2003 Has been downloaded.
R. Suresh (NASA/MTECH) Ben Burford (JAXA) Bernhard Buckl (DLR) Contact: - CEOS WGISS Meeting, Beijing, China, September 2004 A RSS.
1 Emerging Technology Using RSS RSS and syndication By Steve Sloan RSS and syndication By Steve Sloan.
CSC 101 Spring 2007 Kelly Schneider. Open Office  OpenOffice.org is a multiplatform and multilingual office suite and an open source project  Some functions.
HootSuite #1 Integrating Your Social Media. HootSuite: Integrating & Automating Your Social Media #1 Integrated Dashboard View LI + TW + FB WPB #2 Automated.
WebEx. Google 101: Getting more from Google 7/26/2010.
How to optimise your WordPress website for search engines and get your offerings found by the right people Presented by: Women In Business with Maggie.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Blogging. Website and blog A website, also written as web site,or simply site, is a set of related web pages typically served from a single web domain.
Confidential Information Embargoed until notified by FileMaker, Inc. Managing Web Content A Case Study on FileMaker Dave Dumas Web Operations Manager
Introduction to RSS RSS is a method that uses XML to distribute web content on one web site, to many other web sites.
1 Emerging Technology: RSS Understanding RSS CATS 2005 Presentation Steve Sloan
Automated Access to Statistical Facts via Statline4 Web Services Olav ten Bosch Statistics Netherlands UN-ECE conference, Bratislava April.
RSS Syndication CS 431 – Carl Lagoze – Cornell University.
INTERNET MEDIAS CSC *PODCASTS, BLOGS, WIKI’S, AND RSS’S* Mallory Sanders.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Information Retrieval in Practice
Search Engine Architecture
LIS1510 Library and Archives Automation Issues Basics of XHTML
Web 2.0 and Library 2.0 A Brief Overview
Raven tools
Thawatchai Piyawat Jantawan Noiwan Anthony F. Norcio
“Real Simple Syndication” (RSS)
Search Engines & Subject Directories
Eric Sieverts University Library Utrecht Institute for Media &
User IA: Blogs, WIKIs & RSS feeds
Search Engines & Subject Directories
Search Engines & Subject Directories
Presentation transcript:

Background “Dynamic” web –Blogs The most look-ed up word on Merriam- Webster's internet site this year –RSS Feeds Mass Media

RSS Feed Content-syndication technology –Provides a site's content for use by other services Massively popular The content (feed) consists –Directed content itself –Metadata -- information about the content.

RSS Feeds (Cont.) Headlines Links to other stories Stripped of layout Mainly to notify the users of sites updates XML For example: –Blogs

How to Get Feeds Feed Aggregators –Web based: BlogLines –Desktop: SharpReader, Straw Search Engines –Snewp

How to Get Feeds (cont.) Registries –Sites that list the details of thousands of feeds –Tested and categorized for ease of use –Offer tools and web services (XML-RPC) –Syndic8.com 170,000 RSS Feeds –Moreover.com

My Project Lots of information on Dynamic Web Difficult to track Aggregators and search engines don’t help very much because: –Not automated –Don’t consider recency

My Project (cont.) Weight the word frequencies Create a pattern Understand what the world is paying attention to

Gathering RSS Feeds Internet Information Retrieval User Interface Database Backend

Gathering RSS Feeds 1.Download updated RSS feed from a registry

Gathering RSS Feeds (cont.) Process the RSS File to get the RSS feed for each source Download the data from the links provided in each RSS feed

HTML XHTML Information Retrieval Information/Data Retrieval

Definition is somewhat loose –DR Exact matching –IR Partial matching, Best match Separating Content from metadata Retrieving data from content –Specifically Names –Words

Separating Content from metadata How to differentiate between main content and the rest of information –HTML/XHTML Tags –Irrelevant information such as advertisement –Main content of the page Irregularity of content –Some sites use comments to indicate beginning, end of content –Needsexperiment with different sites to find pattern

Retrieving data from content What to look for? Words don’t tell much –ambiguity Nouns are difficult to find –Syntactic and Semantic patterns Names (people, places)

Future work Refining the RSS gathering Doing research to improve the IR/DR processing –Semantics –Syntax User interface