CC Procesamiento Masivo de Datos Otoño Lecture 12: Conclusion

Slides:



Advertisements
Similar presentations
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Advertisements

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Overview on ZHT 1.  General terms  Overview to NoSQL dabases and key-value stores  Introduction to ZHT  CS554 projects 2.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
1 CS1001 Lecture Overview Java Programming Java Programming Midterm Review Midterm Review.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Introduction Steve Ko Computer Sciences and Engineering University at Buffalo.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Wrap-Up.
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 8: Information Retrieval II Aidan Hogan
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
CS346: Advanced Databases Alexandra I. Cristea Term 1.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Big Table - Slides by Jatin. Goals wide applicability Scalability high performance and high availability.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 11: Conclusion Aidan Hogan
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IX: 2014/05/05.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture V: 2014/04/07.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
CPS 216: Advanced Database Systems Shivnath Babu.
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Big Data Yuan Xue CS 292 Special topics on.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
Big Data & Test Automation
Sample Test Questions These are designed to help answer the questions on Exam 2 PLEASE do not ask me to answer any of the questions.
Plan for Final Lecture What you may expect to be asked in the Exam?
CS 405G: Introduction to Database Systems
CC Procesamiento Masivo de Datos Otoño 2017 Lecture 9: NoSQL
Text Based Information Retrieval
NOSQL.
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Information Retrieval and Web Search
CC Procesamiento Masivo de Datos Otoño Lecture 4: MapReduce/Hadoop II
IST 516 Fall 2011 Dongwon Lee, Ph.D.
NOSQL databases and Big Data Storage Systems
Extraction, aggregation and classification at Web Scale
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Ministry of Higher Education
NoSQL Systems Overview (as of November 2011).
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Storage Systems for Managing Voluminous Data
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 3: DFS/HDFS + MapReduce/Hadoop Aidan Hogan
1 Demand of your DB is changing Presented By: Ashwani Kumar
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
CS110: Discussion about Spark
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 6 Information Retrieval: Crawling & Indexing Aidan Hogan
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
CC Procesamiento Masivo de Datos Otoño Lecture 8 NoSQL: Overview
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Information Retrieval and Web Design
Distributed Systems (15-440)
Presentation transcript:

Aidan Hogan aidhog@gmail.com CC5212-1 Procesamiento Masivo de Datos Otoño 2016 Lecture 12: Conclusion Aidan Hogan aidhog@gmail.com

FULL-CIRCLE

The value of data …

Twitter architecture

Google architecture

Generalise concepts to …

Working with large datasets

Value/danger of distribution

Frameworks For Distrib. Processing For Distrib. Storage

The Big Data Buzz-word

Distributed Systems

GFS (HDFS) / MapReduce (Hadoop)

Information Retrieval

NoSQL and Querying

EXAM …

Course Marking 45% for Weekly Labs (~3% a lab!) [Easy] 20% for Small Class Project [Medium] 35% for Final Exam [Hard]

Final Exam (35%) Goal: test your understanding of concepts Coding covered by labs/project No syntax writing questions! but there will be design and syntax reading questions Max. three hours Not marking you on English You can write in Spanish Four questions (marked on best three) … Old exams available on Material Docente (u-cursos) There will be overlap, but not 100% overlap

The following is not a legally abiding agreement The following is not a legally abiding agreement. It is just a helpful guide for what’s important. -- My laywer, 2016

Q1: Distributed Systems

Q1: Distributed Systems (Slides) Lecture 2: Distributed Systems I Lecture 3: Distributed Systems II Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

Q1: Distributed Systems (Topics) Possible Topics: Advantages/disadvantages of a distributed system Five distributed system design goals Distributed architectures (P2P vs. C–S, Fat/Thin, n-Tier, etc.) Java RMI (high-level) Eight fallacies of distributed computing Consensus basics (fail-stop vs. Byzantine, synchronous vs. asynchronous, goals) Consensus protocols (2PC, 3PC, Paxos) CAP theorem may appear in Q4, but will not appear in Q1

GFS (HDFS) / MapReduce (Hadoop)

Q2: GFS (HDFS) / MapReduce (Hadoop) Slides: Lecture 4: DFS & MapReduce I Lecture 6: MapReduce III: Pig (Lecture 5 was whiteboard only, practicing MapReduce design. This may also be useful when preparing!) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

Q2: GFS (HDFS) / MapReduce (Hadoop) Possible Topics: Google File System (reads, writes, fault-tolerance) MapReduce (incl. design question) HDFS/Hadoop (architecture) Pig (high-level, e.g., explain what a given script does)

Information Retrieval

Q3: Information Retrieval (Slides) Lecture 7: Information Retrieval I Lecture 8: Information Retrieval II (Lecture 9 was whiteboard only, practicing PageRank. This may also be useful when preparing!) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

Q3: Information Retrieval (Topics) Possible Topics: Crawling (high-level multi-threading, (D)DoS, robots.txt, sitemap, distribution, bow-tie) Inverted indexes (data structure, normalisation, Heap’s law, Zipf’s law, Elias encoding, etc.) Ranking (relevance vs. importance, TF-IDF, Vector Space Model, etc.) PageRank (concept, random surfer, calculation)

Bring a Calculator! (that’s not a phone!)

NoSQL and Querying

Q4: NoSQL and Querying (Slides) Lecture 10: NoSQL I Lecture 11: NoSQL II Lecture 3: Distributed Systems II (slides on CAP) Names per the homepage: http://aidanhogan.com/teaching/cc5212-1-2016/

Q4: NoSQL and Querying (Topics) Possible Topics: CAP theorem (<- note out of order) The Database Landscape Key–Value stores (data model, operations, distribution, consistent hashing, replication, Dynamo, Merkle trees) Document stores (high-level) Tabular/column-families (data model, Bigtable, sorting, tablets, column families, SSTables, writes, reads, compactions, hierarchy, bloom filters) Graph databases (high-level) Cassandra (high-level)

Final Exam (35%) Goal: test your understanding of concepts Coding covered by labs/project No syntax writing questions! but there will be design and syntax reading questions Max. three hours Not marking you on English You can write in Spanish Four questions (marked on best three) … Old exams available on Material Docente (u-cursos) There will be overlap, but not 100% overlap

SOME ADVERTISING

Ayudantes wanted This course next year (I take ayudantes in this course once) “Diplomado” version of this course End of November / Start of December Will prioritise based on course mark

Next semester … La Web de Datos Web of Data / Semantic Web / Linked Data