Learning MongoDB ZhangGang 2013.05.02.

Slides:



Advertisements
Similar presentations
Introduction to MongoDB
Advertisements

Tableau Software Australia
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
Installing and Setting up mongoDB replica set PREPARED BY SUDHEER KONDLA SOLUTIONS ARCHITECT.
MongoDB Sharding and its Threats
Jeff Lemmerman Matt Chimento Medtronic Confidential 1 9th Annual CodeFreeze Symposium Medtronic Energy and Component Center.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Software Engineer, #MongoDBDays.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
MONGODB NOSQL SERIES Karol Rástočný 1. Prominent Users 2  AppScale, bit.ly, Business Insider, CERN LHC, craigslist, diaspora, Disney Interactive Media.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
1 MONGODB: CH 9-15 REPLICATION AND SHARDING CSSE 533 Week 3, Spring, 2015 Side order of Hadoop.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring.
Goodbye rows and tables, hello documents and collections.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
MongoDB Replica,Shard Cluster 中央大學電算中心 楊素秋
Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Department of Computing, School of Electrical Engineering and Computer Sciences, NUST - Islamabad KTH Applied Information Security Lab Secure Sharding.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Senior Solutions Architect, MongoDB Inc. Massimo Brignoli #MongoDB Introduction to Sharding.
(Part 2) Josh Wells. Topics  Quick Review  Aggregation  Sharding  MongoDB Users.
An Introduction to Super-Scalability But first…
MySQL HA An overview Kris Buytaert. ● Senior Linux and Open Source ● „Infrastructure Architect“ ● I don't remember when I started.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Windows Azure SQL Database - Basic
Plan for Final Lecture What you may expect to be asked in the Exam?
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
and Big Data Storage Systems
CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016
Hadoop Aakash Kag What Why How 1.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Ops Manager API, Puppet and OpenStack – Fully automated orchestration from scratch! MongoDB World 2016.
MongoDB Distributed Write and Read
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
MyRocks at Facebook and Roadmaps
Senior Solutions Architect, MongoDB Inc.
MongoDB Connection in Husky
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
NOSQL databases and Big Data Storage Systems
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
CS6604 Digital Libraries IDEAL Webpages Presented by
Cse 344 May 4th – Map/Reduce.
Ch 4. The Evolution of Analytic Scalability
CS 345A Data Mining MapReduce This presentation has been altered.
Cloud Computing Architecture
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Presentation transcript:

Learning MongoDB ZhangGang 2013.05.02

Data size Type_data in a single node with no index. Datasize: about 14GB Compare mysql: 5.6GB

Index Index provide high performance read operations for frequently used queries _id index Unique index, created by default for all collections. In shard, create index for the shard key default. Command db.collection.ensureIndex({field:1}) A compound index like db.collection.ensureIndex({f1:1,f2:1…})

Index Indexing Strategies Create indexes to support specified queries. Use indexes to sort query results. Create queries that ensure selectivity. Ensure indexes fit RAM.

Index RAM capacity we need. Need not put all the data into RAM. The working set need stay in RAM. At least the index should stay in RAM.

Replica sets High availability Replication ensures redundancy, backup, and automatic failover. Replication occurs through replica sets. Master-slave replication is deprecated since V1.6.

Replica sets Cancept of replica sets Members in a set a cluster of mongod instances that replicate amongst one another and ensure automated failover. Members in a set Primary Secondary Arbiter Secondary-only, hidden,delayed and Non-Voting

Replica sets Drivers know the primary. Primary down, elect a new one from secondery. Data is replicated after writing. Typical three of a sets. Write only to primary. Read can read from secondery.

Replica sets Deploy a replica sets Three nodes : primary, secondery, arbiter. rs.initiate() rs.add(“localhost:30000”) rs.add(“localhost:30002”,{arbiter:true})

Replica sets A three members set. Test Shut down the primary, after about 10s, elect a new primary to response app.

Sharding High scalability Sharding is MongoDB’s approach to scaling out. Sharding automatically distributes collection data to the new server.

Sharding Components in a sharding Shards: Config servers Mongos usually each shard is a replica sets. Config servers Each config server is a mongod instance that holds metadata about the cluster. Mongos route the reads and writes from applications to the shards, applications don’t access the cluster directly.

Sharding

Sharding Sharding balancer When to use a sharding The shard key determines the distribution of the collection’s documents among the cluster’s shards. Data is organized as chunk in a shard in logical. Balance the number of chunks between shards. When to use a sharding  data approaches the storage capacity of one node. Working set approaches the max amount of RAM. Has a large amount of write activity.

Sharding Deploy a sharding Two shards: shard_1 at badger01, shard_2 at badger02. Each shard is a replica set with three mongod instance. Three config servers: two in badger02, one in badger01 A mongos instance

Sharding Start a cluster B Start shard_1 Start shard_2 Start config severs Start mongos B

Sharding Config the cluster Connect mongos Addshard enableshard

Sharding

Sharding

Aggregation Query with raw data aggregation framework provides a powerful and flexible tools to use for data aggregation task. Group() Aggregation Framework. Map/reduce.

Aggregation Aggregation Framework It is a pipeline, documents from a collection pass through an aggregation pipeline. A pipeline consists of several pipeline operators. $match $group $project $sort ..

Aggregation SQL to Aggregation Framework MappingChart

Aggregation Map/reduce Composed of many tasks can handle complex aggregation tasks.  using db.collection.mapReduce() wrapper method. Composed of many tasks reads from the input collection. executions of the map function. executions of the reduce function writes to the output collection(temporary collection).

Aggregation Map /reduce example

Aggregation Test Analysis the cpu efficiency distribution per user.

Aggregation Script $match: where user=1 and exectime>0 $project:output fields-CPUTime,cpu_efficiency $sort:sort the result.

Aggregation Total num: 103091 9166

thanks