Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

How does a web search engine work?. search  google (started 1998 … now worth $365 billion)  bing  amazon  web, images, news, maps, books, shopping,
Information Retrieval in Practice
Search Engines and Information Retrieval
Project 1 Assignment Building a mini-database for CCI in UNCC which includes entity sets: departments (CS,SIS, bioinformatics), faculties, courses given.
Information Retrieval in Practice
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
 2008 Pearson Education, Inc. All rights reserved What Is Web 2.0?  Web 1.0 focused on a relatively small number of companies and advertisers.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Overview of Search Engines
Today’s Agenda Chapter 12 Admin Tasks Chapter 13 Automating Admin Tasks.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Chapter 1 Overview of Databases and Transaction Processing.
1 Dr. Fatemeh Ahmadi-Abkenari February Grade Detail Final Exam: 14 Research and Presentation: 6.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
IT 210 The Internet & World Wide Web introduction.
Overview of SQL Server Alka Arora.
Search Engines and Information Retrieval Chapter 1.
Module Title? DBMS Introduction to Database Management System.
CS345: Advanced Databases Chris Ré. What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge.
Jesse Wisnouse Session 3- 1:45 to 2:30 Room
David M. Kroenke’s Chapter One: Introduction Part Two Database Processing: Fundamentals, Design, and Implementation.
Information Retrieval CENG 555 Spring Course Web Page Authoritative source of administrivia In-class announcements generally reflected on Web.
So far, we have…
Web 2.0 Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs and practices.
Why I LIKE the Facebook Database… Sharon Viente May 2010.
Introduction to Database Systems Fundamental Concepts Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Module Info Web Application and Development Digital Media Department Unit Credit Value : 4 Essential Learning time : 120 hours
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Near East University Department of Computer Engineering E-COMMERCE FOR LAPTOPS SELLING COMPANY Abdul Halim Abu Kuwaik
Enhancing the Web With End-User Programming Tak Yeon Lee, Ben Bederson.
B. Prabhakaran1 Multimedia Systems Textbook Any/Most Multimedia Related Books Reference Papers: Appropriate reference papers discussed in class from time.
Database Design Presenters: Nicolas Lee Tam Nguyen.
Database Design Presenters: Nicolas Lee Tam Nguyen.
Fall CIS 764 Database Systems Engineering L1: Introduction to … CIS 764 Enterprise Database Systems Engineering: Software.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Database Applications Programming CS 362 Dr. Samir Tartir 2014/2015 Second Semester.
CS 541 Lecture Slides Sunil Prabhakar CS541 Database Systems.
Week 2- Overview of the internet The construction of a webpage Four Key Elements – how the internet works Elements and Design concepts Introduction to.
Topics. Introduce to students to kinds of topics: –Deeply research on an advanced topic that will be introduced in the next weeks –Explain how an existing.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
 Internet –INTERnational NETwork is the network of computer networks.  It is a Wide Area Network(WLAN).You can have unlimited access to internet. 
Basics Components of Web Design & Development Basics, Components, Design and Development.
Database Applications Programming CS 362 Dr. Samir Tartir 2014/2015 First Semester.
Information Retrieval in Practice
COP4710 Database Systems Project Overview.
Search Engine Architecture
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Proposal for Term Project
So, what was this course about?
PHP / MySQL Introduction
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Tools for Memory: Database Management Systems
WEB 237 Education for Service-- snaptutorial.com.
Database Driven Websites
Fred Dirkse CEO, OIC Group, Inc.
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
What Are Databases? Organized by Dr. Farrokh Alemi PhD
Database Applications Programming CS 362
CS122B: Projects in Databases and Web Applications Winter 2019
Web Mining Department of Computer Science and Engg.
Search Engine Architecture
CS4433 Database Systems Project.
Knowledge Sharing Mechanism in Social Networking for Learning
Database Applications Programming CS 362
Presentation transcript:

Mini-Project on Web Data Analysis DANIEL DEUTCH

Data Management “Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets” (DAMA Data Management Body of Knowledge) A major success: the relational model of databases

Relational Databases Developed by Codd (1970), who won the Turing award for the model Huge success and impact: ‒The vast majority of organizational data today is stored in relational databases ‒Implementations include MS SQL Server, MS excel, Oracle DB, mySQL,… ‒2 Turing award winners (Edgar F. Codd and Jim Gray) Basic idea: data is organized in tables (=relations) Relations can be derived from other relations using a set of operations called the relational algebra ‒On which SQL is largely based

Research in Data(base) Management : Relational Databases (tables). ‒Indexing, Tuning, Query Languages, Optimizations, Expressive Power,…. ~20 years ago: Emergence of the Web and research on Web data ‒XML, text database, web graph…. ‒Google is a product of this research (by Stanford’s PhD students Brin and Page) Recent years: hot topics include distributed databases, data privacy, data integration, social networks, web applications, crowdsourcing, trust,… ‒Foundations taken from “classical” database research Theoretical foundations with practical impact

Web 2.0 “Old” web (“Web 1.0”): static pages – News, encyclopedic knowledge... – No, or very little, interactive process between the web-page and the user. Web 2.0: A term very broadly used for web-sites that use new technologies (Ajax, JS..), allowing interaction with the user. – “Network as platform" computing – The “participatory Web”

Web 2.0 “Old” web (“Web 1.0”): static pages – News, encyclopedic knowledge... – No, or very little, interactive process between the web-page and the user. Web 2.0: A term very broadly used for web-sites that use new technologies (Ajax, JS..), allowing interaction with the user. – “Network as platform" computing – The “participatory Web”

Online shopping

Advertisements

Social Networks

Crowd Sourcing

Data is all around Web graph “Social graph” Pictures, Videos, notifications, messages.. Data that the application processes Advertisments Even the application structure itself

(A small portion of) the web graph

Need to Analyze Huge amount of data out there – Est billion web-pages and counting – Half a billion tweets per day and counting An average user “sees” about 600 tweets per day Most of it is irrelevant for you, some is incorrect

Filter, Rank, Explain Filter – Select the portion of data that is relevant – Group similar results Rank – Rank data by trustworthiness, relevance, recency... – Present highest-rank first Explain – An explanation of why is the data considered relevant/highly-ranked – An explanation of how has the data propagated “Why do I see this?”

Main topics Analysis of Tables and Links on the Web Trust Management Explanation (Provenance) Information Extraction Social Networks Crowd-sourcing Distributed Query Evaluation

Approach Leverage knowledge from “classic” database research Account for the new challenges Do so in a generic manner Leverage unique features such as collaborative contribution, distribution, etc.

17 Students Physical Storage Indexing Distribution... Data modelQuery language Select… From… Where… Students Takes sid=sid sname name=“Mary ” cid=cid Courses

Foundations Model Query Language Query evaluation algorithms Prototype implementation and optimizations Getting Data and Testing

Project Requirements Read a paper (or a bunch of papers) in the area Likely to require that you follow citations and read earlier papers! Think of an application based on the paper ideas Does not have to be exactly the application described in the paper! E.g. you do not have to use relational databases Think of how would you get/generate data Implement, test Submit an application+ report

Report An integral part of the project submission Should include: A detailed description of the model and algorithms that you have implemented A detailed description of the application Code design Use cases Difficulties that you have encountered and how you addressed them

Timeline By 20/3 (1 week from now): send me an ordered list of 3 preferred papers title includes the words “mini-project” Body includes the names and IDs of the pair A bit after passover (date TBA): Each pair presents a 7-10 minutes presentation on the expected project A slide on each of the issues mentioned in the requirement slide 1 week before the last week of the semester: short project presentations (including screenshots or live demo)