Download presentation
Presentation is loading. Please wait.
Published byJeremy Gibbs Modified over 9 years ago
1
Summary of Alma-OSF’s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013 Heavily based on the presentation by Tzu-Chiang Shen, Leonel Peña ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013
2
ICT-CPM1 17-19 April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas: 150,000 monitor points (“MP”s) total. MPs get archived once per minute ~1 minute of MP data bucketed into a “clob” ~ 7000 clobs/s ~ 25 - 30 GB/day, ~10 TB/year 2500 clobs/s + dependent MP demultiplexing + fluctuations ~ equivalent to 310KByte/s or 2,485Mbit/s n Monitoring data characteristic Simple data structure: [ID, timestamp, value] But huge amount of data Read-only data
3
ICT-CPM1 17-19 April 2013 Prior DB Investigations n Oracle: See Alisdair’s slides. n MySQL Query problems, similar to Oracle DB n HBase (2011-08) Got stuck with Java client problems Poor support from the community n Cassandra (2011-10) Keyspace / replicator issue resolved Poor insert performance: Only 270 inserts / minute (unclear what size) Clients froze n These experiments were done “only” with some help from archive operators, not in the scope of a student’s thesis like it was later with MongoDB. n Also “administrational complexity” was mentioned, without details.
4
ICT-CPM1 17-19 April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n Documents within a collection can differ in structure. For monitor data we don’t really need this freedom. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index
5
ICT-CPM1 17-19 April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }
6
ICT-CPM1 17-19 April 2013 Schema Alternatives 1.) One MP value per doc n One MP value per doc: n One MongoDB collection total, or one per antenna.
7
ICT-CPM1 17-19 April 2013 n A clob (~1 minute of flattened MP data): n Collection per antenna / other device. Schema Alternatives 2.) MP clob per doc
8
ICT-CPM1 17-19 April 2013 n One monitor point data structure per day n Monthly database n Shard key = antenna + MP, keeps matching docs on the same node. n Updates of pre-allocated documents. Schema Alternatives 3.) Structured MP /day/doc
9
ICT-CPM1 17-19 April 2013 n Advantages of variant 3.): Fewer documents within a collection There will be ~150,000 documents per day The amount of indexes will be lower as well. No data fragmentation problem Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1) Smaller ratio of metadata / data Analysis
10
ICT-CPM1 17-19 April 2013 n Query to retrieve a value with seconds-level granularity: Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-09- 15T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29.18': 1 } ); How would a query look like?
11
ICT-CPM1 17-19 April 2013 n Query to retrieve a range of values Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at 2012-09-15T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query look like …
12
ICT-CPM1 17-19 April 2013 n A typical query is restricted by: Antenna name Component name Monitor point Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes
13
ICT-CPM1 17-19 April 2013 n A cluster of two nodes were created CPU: Intel Xeon Quad core X5410. RAM: 16 GByte SWAP: 16 GByte n OS: RHEL 6.0 2.6.32-279.14.1.el6.x86_64 n MongoDB V2.2.1 Testing Hardware / Software
14
ICT-CPM1 17-19 April 2013 n Real data from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented: Month: 1 (February) Number of days: 11 Number of antennas:70 Number of components by antenna: 41 Monitoring points by component: 35 Total daily documents: 100.450 Total of documents: 1.104.950 Average weight by document: 1,3MB Size of the collection: 1,375.23GB Total index size193MB Testing Data
15
ICT-CPM1 17-19 April 2013 Database Statistics
16
ICT-CPM1 17-19 April 2013 Data Sets
17
ICT-CPM1 17-19 April 2013 Data Sets …
18
ICT-CPM1 17-19 April 2013 Data Sets
19
ICT-CPM1 17-19 April 2013 Schema 1: One Sample of Monitoring Data per Document
20
ICT-CPM1 17-19 April 2013 Proposed Schema:
21
ICT-CPM1 17-19 April 2013 n For more tests, see https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB More tests
22
ICT-CPM1 17-19 April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like: i.e: Search monitoring points which values >= 10 n Test performance under a year worth of data n Stress tests with big amount of concurrent queries TODO
23
ICT-CPM1 17-19 April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data. nReported 25,000 clobs/s ingestion rate in the tests. n The schema + indexes are fundamental to achieve milliseconds level of responses Conclusion @ OSF
24
ICT-CPM1 17-19 April 2013 n What are the requirements going to be like? nOnly extraction by time interval and offline processing? nOr also “data mining” running on the DB? nAll queries ad-hoc and responsive, or also batch jobs? nRepair / flagging of bad data? Later reduction of redundancies? n Can we hide the MP-to-document mapping from upserts/queries? nCurrently queries have to patch together results at the 24 hour and monthly breaks. Comments
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.