Presentation is loading. Please wait.

Presentation is loading. Please wait.

LIS618 lecture 10 Thomas Krichel 2003-04-23. Structure some repeats from last week other special syntaxes usenet news in google open directory project.

Similar presentations


Presentation on theme: "LIS618 lecture 10 Thomas Krichel 2003-04-23. Structure some repeats from last week other special syntaxes usenet news in google open directory project."— Presentation transcript:

1 LIS618 lecture 10 Thomas Krichel 2003-04-23

2 Structure some repeats from last week other special syntaxes usenet news in google open directory project in google.

3 query language II * is a wildcard for any word +stopword requires the presences of a stop word stopword. But the list of stop words has not been published. In fact it depends from query to query There is a limit of 10 words, but a * does not count towards the limit

4 special syntax I intitle: find in title only, "intitle: google" intext: find in text only. This will exclude occurrences of the search term in anchor or title data. "intext: html" inanchor: This option requests pages, for which there is another page that links to them with the anchor text in the query. example: inanchor:"a list of my courses" finds my courses page because it has a link with that text

5 special syntax cache: pages that are in the google cache, useful if query result has nothing to do with the query terms cache:openlib.org/home/krichel will show the cached version of the page. If you add further terms, they will be highlighted.

6 daterange: special syntax limits the search to pages indexed between a range of dates. Changed pages are reindexed, unchanged pages are not reindexed when the crawler visits a page. dates are expressed in the Julian period, i.e. number of days after -4713-01-01 0:00 UTC of the Julian calendar. Today is 2452739 example: daterange: 2452640-2452739

7 mixing special syntax expressions The link: syntax does not mix with others. Other bad ideas: –"site:openlib.org –inurl:openlib" –"site:edu site:com" Things that work well –intitle:search –Intitle:biology inurl:help

8 Examples George Bush site:nytimes.com "Copyright * The New York Times" "George Bush" Intitle:"directory * * trees" Botany intitle:"directory of" site:edu "powered by blogger" or site:blogspot.com "classical music" (inurl:mailman | inurl:listserv)

9 phonebook: special syntax also rphonebook for residential and bphonebook for businesses A location seems to be required, i.e. phone: long island university phone: long island university ny no –wildcards –exclusions –or

10 stocks on google stocks: ticker will look up a ticker symbol ticker at http://finance.yahoo.comhttp://finance.yahoo.com you can find ticker symbols there ticker symbols are useful to find financial information about publicly traded companies.

11 google images it has the following special syntaxes –intitle searches for images on a page with a given title, "intitle: long island university" –Inurl: searches for images in pages that have a certain url, inurl:liu.edu –site: restricts the search to a certain site, should be combined with a search term like "site:liu.edu koenig"

12 Google interfaces to 3 rd party data Google groups are an interface to usenet news Google directory is an interface to the Open Directory Project. In both cases Google is dependent on the quality of these underlying data source.

13 usenet news Usenet is a collection of user-submitted notes on various subjects that are posted to servers on a worldwide network. Each subject collection of posted notes is known as a newsgroup. A newsgroup is a discussion about a particular subject consisting of notes written to a networked site and distributed through Usenet. Newsgroups are hierarchical. Hierarchical levels are separated by dots example: comp.text.tex alt stands for anarchists, lunatics and terrorists.

14 usenet history The idea of network news was born in 1979 when two graduate students, Tom Truscott and Jim Ellis, thought of using UUCP to connect machines for the purpose of information exchange among users. They set up a small network of three machines in North Carolina. UUCP is ``UNIX to UNIX copy'' a protocol that is used to copy files between machines running some flavor of UNIX, without the need for IP protocol. Usenet is older than the Internet

15 decline of usenet essentially open to all (peer-to-peer system) used by spammers for –posting –gathering addresses steady decline of quality of contribution steady decline of quantity of contributions

16 usenet worth checking out independent reviews of products, often written by experts. Example: interpretation of beethoven sonatas by Wilhelm Kempff. Sorting by date reveals that the newsgroup rec.music.classical.recordings is still active. On a good day, you will find no finer guide to records.

17 special syntax for usenet group: limits posting to a certain group title: limits to titles of postings author: searches for author name or email address Mixing syntaxes works well

18 the open directory project "The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors. Claim that there is a historic precedence in the Oxford English Dictionary. Formerly known as ``GnuHoo'', then ``NewHoo'', then acquired by NetScape, and called ``dmoz''.

19 dmoz.org dmoz is maintained by volunteers ``net-citizen''. No special qualifications required, but claimed to be experts. There are about 30,000 volunteers (they claim). Powers the core directory services for the Web's largest and most popular search engines and portals –Netscape Search AOL Search –GoogleLycos –HotBot DirectHit Headquarters run by Netscape

20 http://openlib.org/home/krichel Thank you for your attention!


Download ppt "LIS618 lecture 10 Thomas Krichel 2003-04-23. Structure some repeats from last week other special syntaxes usenet news in google open directory project."

Similar presentations


Ads by Google