CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.

Slides:



Advertisements
Similar presentations
Tim Brody University of Southampton CiteBase Services 13/07/2001.
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
Introduction to Backend James Kahng. Install Node.js.
Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
What makes Facebook do what it does? By Gavin Mais.
Is Apache CouchDB for you?
WTT Workshop de Tendências Tecnológicas 2014
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
© Copyright 2013 STI INNSBRUCK
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT395, Week 8 Databases and Documents This presentation is licensed under Creative Commons Attribution License, v To view a copy of this license,
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT395, Week 4 Database Design and ER Modeling This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
CCT395, Week 12 Storage, Grid Computing, Review This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
INF1343, Winter 2011 Data Modeling and Database Design Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT395, Week 11 Review, Permissions Physical Storage and Performance This presentation is licensed under Creative Commons Attribution License, v. 3.0.
CCT395, Week 6 Implementing a Database with SQL and a Case Study Exercise This presentation is licensed under Creative Commons Attribution License, v.
CCT395, Week 2 Introduction to SQL Yuri Takhteyev September 15, 2010
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
Google App Engine. Contents Overview Getting Started Databases Inter-app Communications Modes.
CCT395, Week 7 SQL with PHP This presentation is licensed under Creative Commons Attribution License, v To view a copy of this license, visit
INF1343, Winter 2011 Data Modeling and Database Design Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
Web Development. Agenda Web History Network Architecture Types of Server The languages of the web Protocols API 2.
NoSQL: Graph Databases
Data Modeling and Database Design
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
DBSI Teaser Presentation
Review Databases and Objects
API (Application Program Interface)
Software Systems Development
Special Topics in CCIT: Databases
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NOSQL.
Database Design and Implementation
Data Modeling and Database Design
Building Search Systems for Digital Library Collections
Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev
Database Design and Implementation
NOSQL databases and Big Data Storage Systems
Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev
NoSQL Systems Overview (as of November 2011).
NOSQL and CAP Theorem.
Database Systems Summary and Overview
Presentation transcript:

CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution License, v To view a copy of this license, visit This presentation incorporates images from the Crystal Clear icon collection by Everaldo Coelho, available under LGPL from

Week 10 Relational Databases and Documents

pg

name: Barack Hussein Obama II date_of_birth: August 4, 1961 sex: male location: Honolulu mother: Stanley Ann Dunham father: Barack Hussein Obama hour_of_birth: 7:24 pm... Digital can be digitally signed

Full Text Search Matching sub-strings Can be done with LIKE Bag-of-words search Like doing LIKE for every term – very slow without a proper index! Natural language search Stop words (“the”, “of”, “to”, “end”) Stemming (“monkey” vs “monkeys”) Synonyms (“monkey” vs “simian” vs “Affe”) Ranking “tf-idf”, “Okapi BM25”, probabilistic models Can be useful. SQL support is coming.

Full Text in MySQL use common; select num from analects where match(contents) against("chariots business"); select num from analects where match(contents) against("+chariots -business" in boolean mode); This requires setting up a “fulltext” index first and use “MyISAM” engine – more on those in two weeks.

Esso.jpg

AliceBob invoice payment order

AliceBob payment invoice (HTML) order

AliceBob

AliceBob invoice payment order machine-readable documents

Bills Microdevices Spring St. 413 Elgin IL Joes Office Supply Lakeshore Dr 32 W. Chicago IL XML Example.xml

Pencils, box #2 red XML

Pencils, box #2 red XML Anatomy

Pencils, box #2 red XML Anatomy

Validation Syntactic: form 12.50</Amount Semantic: meaning Male

Bills Microdevices Spring St. 413 Elgin IL Namespaces

XML and RDBMS Externalization: export, import, messaging Internal use: XML storage and retrieval (less common for now)

AliceBob application software (e.g., using PHP) database HTML

AliceBob application software (e.g., using PHP) database HTTP request HTTP request HTML SQL

AliceBob application software (e.g., using PHP) database HTTP request HTTP request HTML SQL

AliceBob application software (e.g., using PHP) database HTTP request HTTP request HTML XML SQL

Bob application software (e.g., using PHP) database HTTP request HTTP request HTML XML SQL Alice Inc. “Web services”, “APIs”, “REST”, “Semantic web,” “Service-oriented architecture”

Examples Twitter: vs.

Examples Yahoo! Geocoding API: %2C+Mississauga%2C+ON vs. +N,+Mississauga,+ON

0 No error us_US Mississauga Rd Mississauga, ON L5L Canada Mississauga Rd L5L Mississauga Peel Ontari o Canada CA ON L5L 16F6288D6874C

0 No error us_US Mississauga Rd Mississauga, ON L5L Canada 3359 Mississauga Rd L5L

Examples Geonames Web Service north=45&south=42&east=-78&west=-82

en Toronto Toronto (, local pronunciation ) is the largest city in Canada[ Canada], Greenwich Mean Time. Retrieved on and is the provincial capital of Ontario. It is located on the northwestern shore of Lake Ontario.[ cities/Toronto.html Toronto, Ontario, Canada, North America] Retrieved on With over 2 (...) /thumb jpg

Alternatives JSON formatted=true&north=45&south=42 &east=-78&west=-82

JSON [{"place":null,"geo":null,"truncated":false,"in_reply _to_status_id_str":null,"in_reply_to_status_id":null, "favorited":false,"in_reply_to_user_id_str":null,"sou rce":"web","contributors":null,"in_reply_to_screen_na me":null,"created_at":"Tue Oct 26 16:01: ","retweet_count":null,"in_reply_to_user_id":null,"user":{"statuses_count":1491,"description":"grafia","show_all_inline_media":false,"profile_use_backgroun d_image":true,"favourites_count":57,"followers_count" :124,"contributors_enabled":false,"notifications":nul l,"profile_background_color":"0c3bf5","geo_enabled":f alse,"profile_background_image_url":" g.com\/profile_background_images\/ \/tumblr_l 8hz3wJ5lK1qb7u0po1_400.gif","profile_text_color":"eb1 725","url":" eras","verified":false,"profile_background_tile":true,"follow_request_sent":null,"lang":"en","created_at": "Fri Jun 26 03:54: ","profile_link_color":"0fc3ff","location":"Brasi l","protected":false,"profile_image_url":" _C_pia_normal.JPG",

JSON [ { "place":null, "geo":null, "truncated":false, "in_reply_to_status_id_str":null, "in_reply_to_status_id":null, "favorited":false, "in_reply_to_user_id_str":null, "source":"web", "contributors":null, "in_reply_to_screen_name":null, "created_at":"Tue Oct 26 16:01: ", "retweet_count":null, "in_reply_to_user_id":null, "user": { "statuses_count":1491, "description":"grafia", "show_all_inline_media":false,...

SQL Support in RDBMS Direct XML export Direct XML import Storing XML as a data

--xml mysql --xml < humans.sql mysqldump --xml -p starwars

LOAD XML mysqldump --xml -p starwars persona > personas.xml load xml local infile "personas.xml" into table personas2;

ExtractValue use common; select ExtractValue(xml, "/Locations/Location[4]/LocationName[1]") as name, ExtractValue(xml, "/Locations/Location[4]/Address[1]") as address from xmltest;

Scaling Facebook.5 billion users 30 billion pieces of content per month 20 billion events per day = 200,000 events per second select url, count(*) from status join liking on status.id = liking.status_id group by status.url; Instead: v= &oid= &comments

“NoSQL” Storage Basic ideas: Key-Value Storage & Retrieval Documents or sparse relations Schema-free Distributed storage “Wide Column Store”: BigTable, HBase, Cassandra “Document Store”: CouchDB, MongoDB

MapReduce Map split the task into many parts all parts should be equivalent e.g.: count “likes” by URL for batches of 1 mln hits Reduce put together the results e.g: merge the batch counts into a total count

Cf. with Rel. Model Pros: horizontal scaling Cons: more work “you are not Google”

Questions?