Download presentation
Presentation is loading. Please wait.
Published byCynthia Sharp Modified over 9 years ago
1
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, 17-19 April 2013 Evaluation of mongoDB for Persistent Storage of Monitoring Data Tzu-Chiang Shen Leonel Peña
2
ICT-CPM1 17-19 April 2013 Monitoring Storage Requirement n Expected data rate with 66 antennas: ~ 6000 - 7000 clobs/s ~ 25 - 30 GB/day ~ equivalent to 310KByte/s or 2,485Mbit/s ~ 130,000 - 150,000 monitor points n Monitoring data characteristic Simple data structure: [timestamp, value] But huge amount of data Read-only data Data is sorted at the moment of insertion
3
ICT-CPM1 17-19 April 2013 n no-SQL and document oriented. n The storage format is BSON, a variation of JSON. n A document within a collection, doesn’t required to have the same fields. n Other features: Sharding, Replication, Aggregation (Map/Reduce) Very Brief Introduction of MongoDB SQLmongoDB Database TableCollection RowDocument Field Index
4
ICT-CPM1 17-19 April 2013 Very Brief Introduction of MongoDB … A document in mongoDB: { _id: ObjectID("509a8fb2f3f4948bd2f983a0"), user_id: "abc123", age: 55, status: 'A' }
5
ICT-CPM1 17-19 April 2013 Alternatives of Schema for Monitoring Data n One monitoring point per document
6
ICT-CPM1 17-19 April 2013 n A clob per document Alternatives of Schema …
7
ICT-CPM1 17-19 April 2013 n A monitor point per day per document Alternatives of Schema …
8
ICT-CPM1 17-19 April 2013 n Advantages: The amount of documents within a collection is bounded There will be ~150,000 documents per day The amount of indexes will be bounded as well. No data fragmentation problem Once a specific document is identified ( nlog(n) ), the access to a specific range or a single value can be done in O(1) Smaller ratio of metadata / data Analysis
9
ICT-CPM1 17-19 April 2013 n Query to retrieve a value with seconds-level granularity: Ej: To get the value of the FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-09- 15T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29.18': 1 } ); How would a query look like?
10
ICT-CPM1 17-19 April 2013 n Query to retrieve a range of value Ej: To get values of the FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at 2012-09-15T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } ); How would a query looks like …
11
ICT-CPM1 17-19 April 2013 n A typical query is restricted by: Antenna name Component name Monitor point Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } ); Indexes
12
ICT-CPM1 17-19 April 2013 n A cluster of two nodes were created CPU: Intel Xeon Quad core X5410. RAM: 16 GByte SWAP: 16 GByte n OS: RHEL 6.0 2.6.32-279.14.1.el6.x86_64 n MongoDB V2.2.1 Testing Hardware / Software
13
ICT-CPM1 17-19 April 2013 n Real data from from Sep-Nov of 2012 was used initially, but: n A tool to generate random data was implemented: Month: 1 (February) Number of days: 11 Number of antennas:70 Number of components by antenna: 41 Monitoring points by component: 35 Total daily documents: 100.450 Total of documents: 1.104.950 Average weight by document: 1,3MB Size of the collection: 1,375.23GB Total index size193MB Testing Data
14
ICT-CPM1 17-19 April 2013 Database Statistics
15
ICT-CPM1 17-19 April 2013 Data Sets
16
ICT-CPM1 17-19 April 2013 Data Sets …
17
ICT-CPM1 17-19 April 2013 Data Sets
18
ICT-CPM1 17-19 April 2013 Schema 1: One Sample of Monitoring Data per Document
19
ICT-CPM1 17-19 April 2013 Proposed Schema:
20
ICT-CPM1 17-19 April 2013 n For more tests, see https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB https://adcwiki.alma.cl/bin/view/Software/HighVolu meDataTestingUsingMongoDB More tests
21
ICT-CPM1 17-19 April 2013 n Test performance of aggregations/combined queries n Use Map/Reduce to create statistics (max, min, avg, etc) of range of data to improve performance of queries like: i.e: Search monitoring points which values >= 10 n Test performance under a years worth of data n Stress tests with big amount of concurrent queries Pending
22
ICT-CPM1 17-19 April 2013 n MongoDB is suitable as an alternative for permanent storage of monitoring data n The schema + indexes are fundamental to achieve milliseconds level of responses Conclusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.