LIS618 lecture 10 Thomas Krichel 2003-04-23. Structure some repeats from last week other special syntaxes usenet news in google open directory project.

Slides:



Advertisements
Similar presentations
LIS618 lecture 6 Thomas Krichel Structure Google –news –interfaces to non-web sources Usenet ODP relational databases OpenURL file sharing.
Advertisements

By: Laura Henderson WHAT IS DMOZ ?  What does DMOZ stand for ? It was originally known as DMOZ, from ‘Directory.MOZilla.org’, Now called.
Google for Genealogists. Google's mission statement “Organize the world's information and make it universally accessible and useful."
LIS618 lecture 9 Thomas Krichel Structure Google “theory”, see essay by Brin and Page fullpapers/1921/com1921.htm.
Mass Communication on the Internet
Project 1 Introduction to HTML.
CIS101 Introduction to Computing Week 05. Agenda Your questions Exam next week - Excel Introduction to the Internet & HTML Online HTML Resources Using.
LIS618 lecture 9 Google Thomas Krichel
CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.
Introduction to HTML 2006 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Introduction to HTML 2004 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
1st Project Introduction to HTML.
What Is A Web Page? An Introduction to the Internet.
CIS101 Introduction to Computing Week 06. Agenda Your questions Excel Exam during second hour Our status after the snow day Introduction to the Internet.
The Internet Week 2. Introduction The largest computer network in the world with more than 200,000 computer networks Also known as “The Net” when a computer.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Web Development 311 Fall : Fall Why web pages? Most companies have intranets, extranets, and web sites Content can be changed quickly and.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
1 Lesson 29 Web Content Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Chapter 5 Searching for Truth: Locating Information on the WWW.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
First things, First Do you belong in here? – 10 – 12 – Comp. Discovery or Keyboard/Comp Apps – Do you have any experience with Web Page Design?????
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Chapter 8 The Internet: A Resource for All of Us.
LIS618 lecture 5 Thomas Krichel structure Google “theory”, mainly page rank Google query language Google special services and features –Images.
Surrey Public Library Electronic Classrooms Internet Survival Skills.
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Introduction To Internet
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
8 1 ADVANCED COMMUNICATION TOOLS Using Chat, Virtual Worlds, and Newsgroups New Perspectives on THE INTERNET.
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
Validating, Promoting, & Publishing Your Web Site Writing For the Web The Internet Writer’s Handbook 2/e.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
LIS618 lecture 5 Thomas Krichel Structure Theory on query languages Web information retrieval –Google “theory”, see essay by Brin and Page.
Networked Information Resources Online Retrieval.
Introduction to Internet. What is Internet? A network of networks A network of networks Internet is a network made of lots of interconnected networks.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Link: link: restricts the results to those web pages that have links to the specified URL. There can be no space between link: and the URL. Source:
XP New Perspectives on The Internet, Fifth Edition— Comprehensive, 2005 Update Tutorial 7 1 Mass Communication on the Internet Using Newsgroups Tutorial.
Web Browsers  Web browser- software that you run on your computer to make it work as a web client.  Web Servers- Computers connected to the Internet.
Web Server.
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Google Hacking University of Sunderland CSEM02 Harry R Erwin, PhD Peter Dunne, PhD.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
and Internet Explorer.  The transmission of messages and files via a computer network  Messages can consist of simple text or can contain attachments,
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Chapter 1 Introduction to HTML.
Project 1 Introduction to HTML.
Computer Literacy BASICS: A Comprehensive Guide to IC3, 3rd Edition
Web Page Concept and Design :
Searching for Truth: Locating Information on the WWW
Introduction to HTML Simple facts yet crucial to beginning of study in fundamentals of web page design!
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Presentation transcript:

LIS618 lecture 10 Thomas Krichel

Structure some repeats from last week other special syntaxes usenet news in google open directory project in google.

query language II * is a wildcard for any word +stopword requires the presences of a stop word stopword. But the list of stop words has not been published. In fact it depends from query to query There is a limit of 10 words, but a * does not count towards the limit

special syntax I intitle: find in title only, "intitle: google" intext: find in text only. This will exclude occurrences of the search term in anchor or title data. "intext: html" inanchor: This option requests pages, for which there is another page that links to them with the anchor text in the query. example: inanchor:"a list of my courses" finds my courses page because it has a link with that text

special syntax cache: pages that are in the google cache, useful if query result has nothing to do with the query terms cache:openlib.org/home/krichel will show the cached version of the page. If you add further terms, they will be highlighted.

daterange: special syntax limits the search to pages indexed between a range of dates. Changed pages are reindexed, unchanged pages are not reindexed when the crawler visits a page. dates are expressed in the Julian period, i.e. number of days after :00 UTC of the Julian calendar. Today is example: daterange:

mixing special syntax expressions The link: syntax does not mix with others. Other bad ideas: –"site:openlib.org –inurl:openlib" –"site:edu site:com" Things that work well –intitle:search –Intitle:biology inurl:help

Examples George Bush site:nytimes.com "Copyright * The New York Times" "George Bush" Intitle:"directory * * trees" Botany intitle:"directory of" site:edu "powered by blogger" or site:blogspot.com "classical music" (inurl:mailman | inurl:listserv)

phonebook: special syntax also rphonebook for residential and bphonebook for businesses A location seems to be required, i.e. phone: long island university phone: long island university ny no –wildcards –exclusions –or

stocks on google stocks: ticker will look up a ticker symbol ticker at you can find ticker symbols there ticker symbols are useful to find financial information about publicly traded companies.

google images it has the following special syntaxes –intitle searches for images on a page with a given title, "intitle: long island university" –Inurl: searches for images in pages that have a certain url, inurl:liu.edu –site: restricts the search to a certain site, should be combined with a search term like "site:liu.edu koenig"

Google interfaces to 3 rd party data Google groups are an interface to usenet news Google directory is an interface to the Open Directory Project. In both cases Google is dependent on the quality of these underlying data source.

usenet news Usenet is a collection of user-submitted notes on various subjects that are posted to servers on a worldwide network. Each subject collection of posted notes is known as a newsgroup. A newsgroup is a discussion about a particular subject consisting of notes written to a networked site and distributed through Usenet. Newsgroups are hierarchical. Hierarchical levels are separated by dots example: comp.text.tex alt stands for anarchists, lunatics and terrorists.

usenet history The idea of network news was born in 1979 when two graduate students, Tom Truscott and Jim Ellis, thought of using UUCP to connect machines for the purpose of information exchange among users. They set up a small network of three machines in North Carolina. UUCP is ``UNIX to UNIX copy'' a protocol that is used to copy files between machines running some flavor of UNIX, without the need for IP protocol. Usenet is older than the Internet

decline of usenet essentially open to all (peer-to-peer system) used by spammers for –posting –gathering addresses steady decline of quality of contribution steady decline of quantity of contributions

usenet worth checking out independent reviews of products, often written by experts. Example: interpretation of beethoven sonatas by Wilhelm Kempff. Sorting by date reveals that the newsgroup rec.music.classical.recordings is still active. On a good day, you will find no finer guide to records.

special syntax for usenet group: limits posting to a certain group title: limits to titles of postings author: searches for author name or address Mixing syntaxes works well

the open directory project "The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors. Claim that there is a historic precedence in the Oxford English Dictionary. Formerly known as ``GnuHoo'', then ``NewHoo'', then acquired by NetScape, and called ``dmoz''.

dmoz.org dmoz is maintained by volunteers ``net-citizen''. No special qualifications required, but claimed to be experts. There are about 30,000 volunteers (they claim). Powers the core directory services for the Web's largest and most popular search engines and portals –Netscape Search AOL Search –GoogleLycos –HotBot DirectHit Headquarters run by Netscape

Thank you for your attention!