CPS 216: Advanced Database Systems Shivnath Babu.

Slides:



Advertisements
Similar presentations
Three Perspectives & Two Problems Shivnath Babu Duke University.
Advertisements

Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Introduction to Advanced Computing Platforms for Data Analysis Ruoming Jin.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
CS510 Concurrent Systems Course Overview. CS510 - Concurrent Systems 2 About the Instructor  Instructor – Jonathan Walpole o Professor at PSU o Research.
Data-Intensive Text Processing with MapReduce Jimmy Lin The iSchool University of Maryland Sunday, May 31, 2009 This work is licensed under a Creative.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
CS346: Advanced Databases Graham Cormode Term 2.
Introduction to MapReduce Programming & Local Hadoop Cluster Accesses Instructions Rozemary Scarlat August 31, 2011.
Technical Report Writing and Presentation Skills Course Outline 1.
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech 21 May 2015 presentation for.
CS 415: Programming Languages Course Introduction Aaron Bloomfield Fall 2005.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
CPS 216: Data-intensive Computing Systems Shivnath Babu.
HADOOP ADMIN: Session -2
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Wrap-Up.
Course Introduction Advanced Information Modeling and Database System แบบจำลองสารสนเทศและระบบฐานข้อมูลขั้นสูง Thursday –
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
COMP Introduction to Programming Yi Hong May 13, 2015.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Hadoop and HDFS
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
Contact Information Mrs. Marr – Extension 2974.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
CSED421 Database Systems Lab. Welcome Lab Class –Library 501, Fri 9:00 – 10:40 Teacher Assistants – 안석현, 이상훈 –{ashworld, –IDS.
An Introduction to HDInsight June 27 th,
Introduction to Databases Computer Science 557 September 2007 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2015 Lecture 11: Conclusion Aidan Hogan
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
CPS 216: Advanced Database Systems Class Project Shivnath Babu.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1/3/2016B.Ramamurthy1 Final Review CSE487/587 B.Ramamurthy.
Software Systems Engineering Rob Oshana Southern Methodist University EMIS 7312.
Nov 2006 Google released the paper on BigTable.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2008.
CPS 216: Advanced Database Systems Shivnath Babu.
CSCI 6442 Database Management II INTRODUCTION Copyright 2016 David C. Roberts, all rights reserved.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2008.
CSci6702 Parallel Computing Andrew Rau-Chaplin
CPS 216: Data-intensive Computing Systems Information about Project 1 Shivnath Babu.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2008.
Integrating Big Data into the Computing Curricula 02/2015 Achmad Benny Mutiara
Big Data Yuan Xue CS 292 Special topics on.
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Data Analytics (CS40003) Introduction to Data Lecture #1
CC Procesamiento Masivo de Datos Otoño Lecture 12: Conclusion
Course Introduction 공학대학원 데이타베이스
DATA SCIENCE Online Training at GoLogica
Ministry of Higher Education
NoSQL Systems Overview (as of November 2011).
CS234 – Advanced Network Tuesdays, Thursdays 3:30-4:50p.m. ICS 243
Introduction to Apache
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Charles Tappert Seidenberg School of CSIS, Pace University
Introduction Andrew Whitaker
Presentation transcript:

CPS 216: Advanced Database Systems Shivnath Babu

Minor Change to Course Logistics Grading: –Project 40%  35% –Homework Assignments 15% –Midterm 20%  25% –Final 25%

Presentation & Report on “Big Data” 6 topics, 2 students per topic. –Let us try to form groups in class. Otherwise, your ranked preferences. Shivnath will form the groups Shivnath will give some initial pointers. Get more information (use the Web, books, library, etc.) Do a 10-minute in-class presentation on Thu 9/24 Submit a detailed report that will be read by all students Presentation and report will be graded as part of the project

“Big Data” Topics 1.MapReduce Vs. Databases, Hive, Hybrid approaches 2.Parallel Databases: Old (Gamma) and New (Greenplum, Aster Data, HadoopDB) 3.HBase and databases over HDFS, Google File System, Google BigTable 4.Pig and other higher-level languages (Scope, Dryad) 5.Optimization of MapReduce programs: Hadoop Scheduling, Resource allocation 6.Key-Value stores (Amazon Dynamo, Cassandra)

The Duke CS Hadoop Cluster See the project web page for access instructions. I will try to give an introduction in class Programming component of Homework 1 will be done on the Hadoop cluster –Implement MapReduce program to compute average temperature per year over the NCDC data –Submit sources (Java files and Jar file) –Due on Tuesday 9/22