Learning MongoDB ZhangGang 2013.05.02
Data size Type_data in a single node with no index. Datasize: about 14GB Compare mysql: 5.6GB
Index Index provide high performance read operations for frequently used queries _id index Unique index, created by default for all collections. In shard, create index for the shard key default. Command db.collection.ensureIndex({field:1}) A compound index like db.collection.ensureIndex({f1:1,f2:1…})
Index Indexing Strategies Create indexes to support specified queries. Use indexes to sort query results. Create queries that ensure selectivity. Ensure indexes fit RAM.
Index RAM capacity we need. Need not put all the data into RAM. The working set need stay in RAM. At least the index should stay in RAM.
Replica sets High availability Replication ensures redundancy, backup, and automatic failover. Replication occurs through replica sets. Master-slave replication is deprecated since V1.6.
Replica sets Cancept of replica sets Members in a set a cluster of mongod instances that replicate amongst one another and ensure automated failover. Members in a set Primary Secondary Arbiter Secondary-only, hidden,delayed and Non-Voting
Replica sets Drivers know the primary. Primary down, elect a new one from secondery. Data is replicated after writing. Typical three of a sets. Write only to primary. Read can read from secondery.
Replica sets Deploy a replica sets Three nodes : primary, secondery, arbiter. rs.initiate() rs.add(“localhost:30000”) rs.add(“localhost:30002”,{arbiter:true})
Replica sets A three members set. Test Shut down the primary, after about 10s, elect a new primary to response app.
Sharding High scalability Sharding is MongoDB’s approach to scaling out. Sharding automatically distributes collection data to the new server.
Sharding Components in a sharding Shards: Config servers Mongos usually each shard is a replica sets. Config servers Each config server is a mongod instance that holds metadata about the cluster. Mongos route the reads and writes from applications to the shards, applications don’t access the cluster directly.
Sharding
Sharding Sharding balancer When to use a sharding The shard key determines the distribution of the collection’s documents among the cluster’s shards. Data is organized as chunk in a shard in logical. Balance the number of chunks between shards. When to use a sharding data approaches the storage capacity of one node. Working set approaches the max amount of RAM. Has a large amount of write activity.
Sharding Deploy a sharding Two shards: shard_1 at badger01, shard_2 at badger02. Each shard is a replica set with three mongod instance. Three config servers: two in badger02, one in badger01 A mongos instance
Sharding Start a cluster B Start shard_1 Start shard_2 Start config severs Start mongos B
Sharding Config the cluster Connect mongos Addshard enableshard
Sharding
Sharding
Aggregation Query with raw data aggregation framework provides a powerful and flexible tools to use for data aggregation task. Group() Aggregation Framework. Map/reduce.
Aggregation Aggregation Framework It is a pipeline, documents from a collection pass through an aggregation pipeline. A pipeline consists of several pipeline operators. $match $group $project $sort ..
Aggregation SQL to Aggregation Framework MappingChart
Aggregation Map/reduce Composed of many tasks can handle complex aggregation tasks. using db.collection.mapReduce() wrapper method. Composed of many tasks reads from the input collection. executions of the map function. executions of the reduce function writes to the output collection(temporary collection).
Aggregation Map /reduce example
Aggregation Test Analysis the cpu efficiency distribution per user.
Aggregation Script $match: where user=1 and exectime>0 $project:output fields-CPUTime,cpu_efficiency $sort:sort the result.
Aggregation Total num: 103091 9166
thanks