Web-based Information Architectures MSEC 20-760 Mini II Location:GSIA Simon Auditorium Time:1:30-3:20pm, Tues. & Thurs. Instructor:Prof. Jaime Carbonell.

Slides:



Advertisements
Similar presentations
Introduction to CS170. CS170 has multiple sections Each section has its own class websites URLs for different sections: Section 000:
Advertisements

Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
CS/CMPE 535 – Machine Learning Outline. CS Machine Learning (Wi ) - Asim LUMS2 Description A course on the fundamentals of machine.
SLIDE 1IS 202 – FALL 2004 Lecture 13: Midterm Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -
CS 331 / CMPE 334 – Intro to AI CS 531 / CMPE AI Course Outline.
WELCOME METALIB and SFX! TO THE 50 MINUTE IL SESSION “What Students need to know about SFX and Metalib in the context of the 50 minute information literacy.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
Chapter 5: Information Retrieval and Web Search
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Topic 1: Class Logistics. Outline Class Web site Class policies Overview References Software Background Reading.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Artificial Intelligence Information Retrieval (How to Power a Search Engine) Jaime Carbonell 20 September 2001 Topics Covered: “Bag of Words” Hypothesis.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
Course Introduction Software Engineering
SSE 120 Introduction to Satellites & Space Systems Morehead State University Space Science Center Fall 2011 Instructor: Prof. Bob Twiggs gmail.com.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Course Overview for Web Computing J. H. Wang Sep. 19, 2011.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
1 Database Management for Electronic Commerce and EBusiness Walt Scacchi, Ph.D. GSM 274/FEMBA 274 Spring 2002.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
The Internet 8th Edition Tutorial 4 Searching the Web.
Course Information Sarah Diesburg Operating Systems COP 4610.
Chapter 6: Information Retrieval and Web Search
Course Information Andy Wang Operating Systems COP 4610 / CGS 5765.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Text Based Information Retrieval Text Based Information Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven.
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering Foundations.
What else is there? CMPT 454: Database Systems II. – Transaction Management. – Query Planning. – Optional topics, e.g. data mining, information retrieval,
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
CSE 1105 Week 1 CSE 1105 Course Title: Introduction to Computer Science & Engineering Classroom Lecture Times: Section 001 W 4:00 – 4:50, 202 NH Section.
CSE 1105 Week 1 CSE 1105 Introduction to Computer Science & Engineering Time: Wed 4:00 – 4:50 Thurs 9:30 – 10:20 Thurs 4:00 – 4:50 Place: 100 Nedderman.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Course Overview for Compilers J. H. Wang Sep. 14, 2015.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Information Retrieval
Course Overview for Compilers J. H. Wang Sep. 20, 2011.
CS151 Introduction to Digital Design Noura Alhakbani Prince Sultan University, College for Women.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
What else is there? CMPT 454: Database Systems II. – Transaction Management. – Query Planning. – Optional topics, e.g. data mining, information retrieval,
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Course Overview: Linear Algebra
Introduction to CSCI 242 Compiled by S. Zhang 1. Syllabus Syllabus has the most updated information! –Use the information on the syllabus for the grading.
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
CS6501 Advanced Topics in Information Retrieval Course Policy
Course Introduction 공학대학원 데이타베이스
User Awareness Program ‘Accessing Emerald’ Universitas Lancang Kuning
Proposal for Term Project
Information Retrieval and Web Search
Course Information Mark Stanovich Principles of Operating Systems
Information Retrieval and Web Search
Federated & Meta Search
Information Retrieval and Web Search
Special Topics in Data Mining Applications Focus on: Text Mining
Andy Wang Operating Systems COP 4610 / CGS 5765
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
CS4501: Information Retrieval Course Policy
Linear Algebra Berlin Chen
Information Retrieval and Web Search
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Presentation transcript:

Web-based Information Architectures MSEC Mini II Location:GSIA Simon Auditorium Time:1:30-3:20pm, Tues. & Thurs. Instructor:Prof. Jaime Carbonell Office: NSH Tel: [Augmented with expert guest lectures] Teaching assistant:Jian Zhang Office: NSH Tel: Offices Hours: TBD Administrative assistant: TBD Office: NSH Tel:

Administrative Issues Prerequisites Basic programming skills (preferably JAVA) Familiarity with the web (HTML, browsing, etc.) Fundamentals of Web Programming (20-753). Grading 30% homeworks (2 programming assignments) 30% miniproject (student teams will propose) 15% midterm (5 pages notes, calculator OK, no laptops) 25% final (10 pages notes, calculator OK, no laptops) Bulletin Board Schedule/syllabus Lecture notes (in powerpoint) Homework Announcements & discussions

Textbook and Reference Materials (1) Required: Class notes (slides on web site) and handouts (to be provided) Required: "Understanding Search Engines: Mathematical Modeling and Text Retrieval" by Michael W. Berry, Murray Browne Available at (tel: ) Optional: Background reading material provided

Textbook and Reference Materials (2) Optional: "Advances in Information Retrieval" Edited by Croft, Kluwer Academic Pub., 2000 [more detailed state-of-the-art IR book] Optional: "Machine Learning" by Tom M. Mitchell, WCB McGraw-Hill [Tools for text categorization and data mining.]

Information Retrieval: The Challenge (1) Text DB includes: (1) Rainfall measurements in the Sahara continue to show a steady decline starting from the first measurements in In 1996 only 12mm of rain were recorded in upper Sudan, and 1mm in Southern Algiers... (2) Dan Marino states that professional football risks loosing the number one position in heart of fans across this land. Declines in TV audience ratings are cited... (3) Alarming reductions in precipitation in desert regions are blamed for desert encroachment of previously fertile farmland in Northern Africa. Scientists measured both yearly precipitation and groundwater levels...

Information Retrieval: The Challenge (2) User query states: "Decline in rainfall and impact on farms near Sahara" Challenges How to retrieve (1) and (3) and not (2)? How to rank (3) as best? How to cope with no shared words?

Information Retrieval in eCommerce (1) Bringing in Customers How do Web-search engines work? How to maximize hits on my eCommerce pages? How to maximize preselection of customers who will transact?

Information Retrieval in eCommerce (2) Analyzing the Competition How do we find the competition? How will customers find the competition? Can we do preemptive information strikes? Text Mining How to learn what customers want most? How to find out what they missed, but wanted? How to discover customer search/browsing patterns?

Information Retrieval Assumption (1) Basic IR task There exists a document collection {D j } Users enters at hoc query Q Q correctly states user’s interest User wants {D i } < {D j } most relevant to Q

"Shared Bag of Words" assumption Every query = {w i } Every document = {w k }...where w i & w k in same Σ All syntax is irrelevant (e.g. word order) All document structure is irrelevant All meta-information is irrelevant (e.g. author, source, genre) => Words suffice for relevance assessment Information Retrieval Assumption (2)

Information Retrieval Assumption (3) Retrieval by shared words If Q and D j share some w i, then Relevant(Q, D j ) If Q and D j share all w i, then Relevant(Q, D j ) If Q and D j share over K% of w i, then Relevant(Q, D j )

Boolean Queries (1) Industrial use of Silver Q: silver R: "The Count’s silver anniversary..." "Even the crash of ’87 had a silver lining..." "The Lone Ranger lived on in syndication..." "Sliver dropped to a new low in London..."... Q: silver AND photography R: "Posters of Tonto and the Lone Ranger..." "The Queen’s Silver Anniversary photos..."...

Boolean Queries (2) Q: (silver AND (NOT anniversary) AND (NOT lining) AND emulsion) OR (AgI AND crystal AND photography)) R: "Silver Iodide Crystals in Photography..." "The emulsion was worth its weight in silver..."...

Boolean Queries (3) Boolean queries are: a) easy to implement b) confusing to compose c) seldom used (except by librarians) d) prone to low recall e) all of the above

Beyond the Boolean Boondoggle (1) Desiderata (1) Query must be natural for all users Sentence, phrase, or word(s) No AND’s, OR’s, NOT’s,... No parentheses (no structure) System focus on important words Q: I want laser printers now

Beyond the Boolean Boondoggle (2) Desiderata (2) Find what I mean, not just what I say Q: cheap car insurance (pAND (pOR "cheap" [1.0] "inexpensive" [0.9] "discount" [0.5)] (pOR "car" [1.0] "auto" [0.8] "automobile" [0.9] "vehicle" [0.5]) (pOR "insurance" [1.0] "policy" [0.3]))

Beyond the Boolean Boondoggle (3) Desiderata (3) Speech-recognized queries Coming soon, to a system near you longer queries more fluff words to filter acoustic recognition errors

INFORMATION RETRIEVAL The Web Library, etc. Spider Inverted Index User Search Engine

INFORMATION RETRIEVAL: APPLICATIONS Searching Document Archives –Libraries (title, subject, full-text) –Data bases of patents and applications –DBs of legal cases (e.g. Lexis, Westlaw) Searching the Web –Pure search engines (Google, Inktomi, …) –Browsing + Search (Yahoo, Terra-Lycos, …) –Meta-search (Metacrawler, Vivisimo, …) Corporate or Government Intranets Non-traditional (e.g. Software DBs, News)

INFORMATION RETRIEVAL (IR) EVOLUTION IR in the 1980s: –Single collection with < 10 6 documents (archive) –Boolean queries with unordered-set answer IR circa 2000: –Single collection with > 10 9 documents (web) –Free-form queries with ranked-list answer IR circa 2010: –Multiple collections > docs (invisible web) –“Find what I mean” queries with clustering, summarization and customization.

Content for Rest of the Course (1) [See the course BB for the latest updates to the course schedule.] Under the Hood The vector space model for retrieval Building an inverted index Term weighting and selection Web spidering Automated text categorization

Content for Rest of the Course (2) IR Uses in eCommerce How to make search engine work for you How to build optimal search-attractive web sites The business(es) of web-based information Beyond Web Search Engines Speech processing primer Information extraction from web pages Data mining primer Multi-media applications Business models

Optional Quick Review of Linear Algebra If you know n-dimensional vectors, matrices, computing inner products, etc.., Then you do not need this review. You may take a break. If you learned this material, but do not remember it, please stay and listen to refresh your knowledge. If you never learned linear algebra, stay, listen and (optionally) read either: G. Hadley. Linear Algebra. Addison-Wesley, Ch 3. Or, Stephen W. Goode. An Introduction to Differential Equations and Linear Algebra. Prentice Hall, Ch.3).