Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHEN Zheng-xu ,Li shuang-shuang ,Sun Xiao-yan

Similar presentations


Presentation on theme: "CHEN Zheng-xu ,Li shuang-shuang ,Sun Xiao-yan"— Presentation transcript:

1 CHEN Zheng-xu ,Li shuang-shuang ,Sun Xiao-yan
Meteorological unstructured data storage service based on a distributed Nosql technology CHEN Zheng-xu ,Li shuang-shuang ,Sun Xiao-yan Meteorological Information Network Center of Zhejiang Province,Hangzhou Introduction Methods Results Conclusions As the meteorological data scale is increasing rapidly, the requirement of meteorological prediction and climate analysis for data service time effect、stability and convenience is enhancing. Through the traditional patterns such as file servers, network storage for meteorological data storage and sharing, there exists large file base, not high data access rate, I/O bottleneck problems while the number of user accesses is too large. Therefore, it gradually cannot adapt to and meet the urgent needs of meteorological vocation application. In abroad, by testing and comparing RDBMS and Nosql, Jeong etc verify data processing performance of Nosql database[1]. In HDFS, Shahabine put forwards a new binary coding approach, which reduces the complexity of data storage[2]. In domestic, many scholars make research in realated fields based on distributed database[3~7]. Li Yongsheng etc propose a storage model based on Hadoop numerical prediction products. However, they only use direct data storage method. Chen Donghui etc raise distributed storage and processing of meteorological station data, which dosenot refer to unstructured data products. Based on the above condition, in this paper, with the combination of data format of meteorological unstructured data products and based on column storage database model, the primary key column, attribute column and data column are designed and improved to enhance query ability of storage systems, which also provides a technique references for meteorology development. At last, taking grid data for example, such as numerical prediction, satellite and radar basic data, making use of Aliyun table storage, we verify its high availability. In this paper, we introduce a distributed Nosql technology applied in meteorological unstructured data products storage method, analyze meteorological unstructured data products and their application scenarios, build column storage database, store structured data converted from unstructured data in tables. Firstly, we design data storage models, work out primary keys column, attributes column and data column. Then, according to the features of data products, we design data compression strategies to improve the overall efficiency of data access. At last, taking numerical prediction products for example, we make use of Aliyun OTS to simulate tests. And the result shows that, column storage database based on Nosql technology can meet underlying data support of product service platform, which has a certain reference significance for extension and application of key technologies, such as distributed data environment, cloud computing and so on. Nosql storage model of grid data We split the above-mentioned grid data. And the type element, height and forecast time element information can only determine a latitude and longitude of a two-dimensional array. Each data in the array is the value of meteorological latitude and longitude elements. Those three dimension can be regarded as primary key information of data. Nosql storage of radar satellite data Satellite and radar data are binary data, when compared with NetCDF data table, there is no point that the primary key two is in this data. Therefore, it can be canceled. At the same time, this data is binary data, there is no need to acquire a certain segment in this data, and the whole file has a higher compression ratio, and when it is stored, each row has only one attribute column to store the compressed binary file data. Acquisition of data in storage In the acquisition of data, we package varieties of function for different scenarios, including acquiring the basic data of element field, information data of element field in continuous time and radar satellite data etc. Combining with sliced storage method in this paper, and supposing that we need to obtain a fixed latitude meteorological field data, and the latitude and longitude coordinates on a large grid is (50,100). Therefore, we need to know the size of two-dimension of rows and cols about original data to determine which grid the point is. According to table 2, it can be determined that (50,100) is in the column of Frid_81_92, and the value of it can be only obtained in Grid_81_92 and unzipped, and at the same time, the query result in database only return this column data, which greatly reduces the amount of data read. The test environment about the above data is based on Aliyun ECS and table storage service (Nosql server). The configuration of ECS is 2 cores 4GB and 50MB bandwidth, and the imported test data of table are 1GB, which totally are 2000 data documents. The example of product are numerical prediction and radar satellite of NetCDF format. Each grid dots is 1300*1100, and splitted into 10*10 two-dimensional arrays, so each splitted grid contains 130*110 grids and has 4 bytes storage, while before compression, the size of this data is about 57KB. Radar and satellite cloud picture is binary document, and before compressed, it is about 3MB. Acting numerical prediction and radar satellite cloud picture as test objects, we make one minutes test for the above data service interfaces, then take average, and also test the data service interfaces’ average service time in 1, 10, 50, 100 and 200 concurrent scenes respectively. The result of test is shown in figure2. The test result represents that in the data service process, the acquisition of meteorological data of continuous time delay is in 50 milliseconds steadily, and cannot significantly enhanced with the increasing of concurrent access, whose service effectiveness improves more significant than the original traditional file storage. Finally, the meteorological grid data storage model based on table store makes full use of its low latency, high concurrency and horizontal scaling, therefore, it is able to avoid I/O bottlenecks and meet with the actual needs of business. In this paper, for meteorological unstructured gird data, based on distributed Nosql technology, the data management and storage is designed. Also with the combination of application scenes and features of column database, we have designed an unstructured data storage model. Finally, the test is taken on Aliyun platform, and the result shows that, the methods of data storage in this paper are able to be applied in numerical prediction products, radar basic data and satellite data etc. What’s more, it makes fast distributed storage and processing come true, which meets the requirements of meteorological business systems better. Making use of Nosql technology as storage and service support of unstructured product data, is able to efficiently process unstructured meteorological data, such as numerical prediction data and improve the requirements of timeliness, stability, convenience of data services. Also, this technology is one of the development direction of unstructured meteorological data analysis and processing. With the application of distributed data environment and cloud computing, this method will play an important role in modernization and informatization construction of meteorological professional work. On the other hand, the current test is only in Aliyun table storage. How to introduce this technology into internal meteorological system in the next step and make integration still need to be gradually optimized in practical applications, which contains visual management development of data, extension of access data types and further simplification of data service interface[6,12]etc. In addition, the introduction of real-time computing concepts and techniques is as well as an important research direction in the future. Figure #2 Bibliography Figure #1 Jeong H, Choi J, Choi C, et al. A Design of Web Log Integration Framework Using NoSQL[M].Information and Communication Technology. Springer Berlin Heidelberg, 2014: Shahabinejad M, Khabbazian M, Ardakani M. An efficient binary locally repairable code for hadoop distributed file system[J]. Communications Letters, IEEE, 2014, 18(8): Yongsheng Li,Qin Zeng,Meihong Xu,et al.Design and Implementation of NWP Data Service Platform Based on Hadoop Framework[J].Journal of Applied Meteorological Science,2015(1):


Download ppt "CHEN Zheng-xu ,Li shuang-shuang ,Sun Xiao-yan"

Similar presentations


Ads by Google