CSCI 572: Information Retrieval and Search Engines: Summer 2011 Prof. Chris A. Mattmann.

Slides:



Advertisements
Similar presentations
CS533 Concepts of Operating Systems Class 1 Course Overview.
Advertisements

CS510 Concurrent Systems Course Overview. CS510 - Concurrent Systems 2 About the Instructor  Instructor – Jonathan Walpole o Professor at PSU o Research.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, January 13, 2009.
CS/CMPE 535 – Machine Learning Outline. CS Machine Learning (Wi ) - Asim LUMS2 Description A course on the fundamentals of machine.
Logistics: –My office hours: T, Th 4-5pm or by appointment –Class Web page:
IS112 - Computer Organization1 IS112 Computer Organization and Programming Professor Catherine Dwyer Fall 2006.
Chapter 0 Introductory Comments. Overview Syllabus Detailed power point slides My Web Page –Homework on web page –Readings –Other.
Fall 2004 WWW IS112 Prof. Dwyer Intro1: Overview and Syllabus Professor Catherine Dwyer.
Advanced Compilers CSE 231 Instructor: Sorin Lerner.
CS533 Concepts of Operating Systems Class 1 Course Overview and Entrance Exam.
UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2004 Project.
BA 378: Accounting Information Systems Instructor: Dr. James R. Coakley.
CSCE 211: Digital Logic Design
Object-Oriented Enterprise Application Development Course Introduction.
Project Management Take a Tour of the Online Course.
1 EEL 6935: Embedded Systems Seminar. 2 General Information Instructor: Ann Gordon-Ross Office: Benton Office Hours – By appointment.
Introduction to Apache Tika CSCI 572: Information Retrieval and Search Engines Summer 2010.
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, August 27, 2013.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
ISE420 Algorithmic Operations Research Asst.Prof.Dr. Arslan M. Örnek Industrial Systems Engineering.
1 EEL 6935: Embedded Systems Seminar. 2 General Information Instructor: Ann Gordon-Ross Office: Benton Office Hours – By appointment.
Course Introduction Software Engineering
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Characterizing the Web CSCI 572: Information Retrieval and Search Engines Summer 2011.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
CM240: UNIT 1 SEMINAR. Tonight’s Agenda Class Overview Class Overview Technical Communication Technical Communication Final Project Information Final.
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
CSSE 513 – COURSE INTRO With homework and project details Wk 1 – Part 2.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
B. Prabhakaran1 Multimedia Systems Textbook Any/Most Multimedia Related Books Reference Papers: Appropriate reference papers discussed in class from time.
1intro1 CIS740 - Software Engineering Dr David A. Gustafson
CEN 4010 First Lecture January 9, 2006 CEN 4010 Introduction to Software Engineering Spring 2006 Instructor: Masoud Sadjadi
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Jongwook Woo CIS 528 Introduction to Big Data Science (Syllabus) Jongwook Woo, PhD California State University, LA Computer and Information.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
CEN First Lecture CEN 4010 Introduction to Software Engineering Instructor: Masoud Sadjadi
1 Advanced Semantic Technologies Deborah McGuinness CSCI , 97543, CSCI , 97014, ITWS , 98113, ITWS , TA: Abigail.
ITCS 6265 Details on Project & Paper Presentation.
CSci6702 Parallel Computing Andrew Rau-Chaplin
Data Structures and Algorithms in Java AlaaEddin 2012.
B. Prabhakaran1 Multimedia Systems Reference Text “Multimedia Database Management Systems” by B. Prabhakaran, Kluwer Academic Publishers. – Kluwer bought.
CSCE 990 Advanced Distributed Systems Seminar Ying Lu 104 Schorr Center
CSCI 572’s Class Project Measuring the performance of parallel crawlers in different modes Huy Pham PhD – Computer Science Spring 2011.
PROBLEM SOLVING AND PROGRAMMING ISMAIL ABUMUHFOUZ | CS 170.
1 BIT 5495 Introduction Syllabus Instructor:Dr. Lance A. Matheson Office:Pamplin 1017 Office Hours:By appointment is best way to contact me Phone:
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
Xiaoying Sharon Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Specialties Description
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Welcome to ….. File Organization.
EEL 6686: Embedded Systems Seminar
It’s called “wifi”! Source: Somewhere on the Internet!
CSCE 211: Digital Logic Design
CSCI 578 Software Architectures
Cpt S 471/571: Computational Genomics
CS510 Concurrent Systems Jonathan Walpole.
Multimedia Systems Reference Text
Technologies of Google Seminar Week 1
CS533 Concepts of Operating Systems Class 1
CSCI 572: Information Retrieval and Search Engines: Summer 2010
CSE 444 Database Management Systems Autumn 1997 University of Washington Introduction and Welcome © 1997 UW CSE 12/12/2019.
Presentation transcript:

CSCI 572: Information Retrieval and Search Engines: Summer 2011 Prof. Chris A. Mattmann

May-18-11CS572-Summer2011CAM-2 The Class Will give you a complete treatment of the area of search engines and information retrieval –The fundamental building blocks of the web and search engines The Search Engine Architecture proposed by Brin/Page Understanding algorithms for ranking pages Understanding technologies for characterizing, downloading, parsing, indexing, searching and disseminating web content –Advanced topics in search engines such as BigData and distributed computation Will equip you with the necessary skills to design complex, real- world search engines

May-18-11CS572-Summer2011CAM-3 General class information Lecture, but… –You can participate –You should participate –You will participate, that is, if you want to do well :) Breakdown of points –20% participation –40% research paper presentation –40% course project

May-18-11CS572-Summer2011CAM-4 General class information Syllabus/website: –Visit it often, as the schedule may change! –This is where all of your course project info and presentation info will be posted –This site will point you to required reading (research papers), and to lectures that you can download before class

May-18-11CS572-Summer2011CAM-5 What we’ll cover Theory –Understanding of basic information retrieval –Search engine querying –Search engine ranking –Architecture of search engines and technologies –Design Patterns Practice –Modern search engine technologies from Apache

May-18-11CS572-Summer2011CAM-6 Course Presentation Each week, we’ll read a few research papers on search engines For the first part of the course (5 weeks), I’ll lecture on the general topics that the research papers cover –The search engine architecture: fetching, parsing, indexing, querying, distributed computation, etc. For the last part of the course (~5 weeks), each one of you will present on one of the research papers we covered in the first 5 weeks

May-18-11CS572-Summer2011CAM-7 Course Presentation What I’m looking for (~20 minutes of presentation, with ~5 mins questions at the end) –You understood the paper –Discussion of related work and background –Discussion of why should I care about the topic And more importantly why your fellow classmates should care –Relation of your paper to the lecture slides I gave on the topic –Simple summarization and description of the algorithm and/or technology introduced in the paper –What were the results/contributions/conclusions of the paper –Your evaluation of Pros of the paper –Your evaluation of Cons of the paper

May-18-11CS572-Summer2011CAM-8 Course Presentation What I’m NOT looking for –Plagiarism –Repetition –Cutting/Pasting out of the paper –Regurgitation –You to follow the EXACT set of bullets that I gave on the prior slide You should be looking to be innovative – show the class and me that you really understood what was in the paper –Treat it like a conference presentation

May-18-11CS572-Summer2011CAM-9 Course Project You will get to leverage one or a combination of several Apache software technologies –OODT, Nutch, Tika, Lucene, Solr, Hadoop, HBase, Hive, Cassandra, etc. You will make a significant contribution to one or more of the above communities Deliverables –A 2 page project proposal –A 2 page mid-term project report –Source code and final demonstration to me at end of class

May-18-11CS572-Summer2011CAM-10 Course Project Deliverables –Your project proposal should include: Demonstration that you’ve researched your particular idea with pointers to issue trackers and mailing lists Objectives section Approach section Identification of deliverables section Timeline/Schedule –Your mid term report should include: Current status Blockers to completion Planned mitigation to blockers

May-18-11CS572-Summer2011CAM-11 Me Graduated with my Ph.D. in Computer Science from USC in 2007 –Advisor: Dr. Nenad Medvidovic Was a student at USC from –B.S., Computer Science 2001 –M.S., Computer Science 2003 My research interests –The intersection of software architectures, and large-scale data dissemination –Software connector selection –Bayesian decision theory –Reinforcement learning –Search Engines

May-18-11CS572-Summer2011CAM-12 So…today Quick lecture on characterizing the web Read the papers linked from the syllabus Be ready for Tuesday +1 week (not next Tuesday) as this is a 10-week course and we are going to dive in