Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Tabular Data Model to explore and analyze the

Similar presentations


Presentation on theme: "Application of Tabular Data Model to explore and analyze the"— Presentation transcript:

1 Application of Tabular Data Model to explore and analyze the
PROTEIN DATA BANK Alina Simón Cuevas1,2, Maité Torres Sánchez2, Yasser Novo-Fernández3, Lucina García Hernández2, Raudel Ravelo Suárez2 1 Datacimex, Grupo Empresarial Cimex, 1ra y 0, Edif. Sierra Maestra, La Habana, Cuba 2 Universidad de La Habana, Facultad de Matemática Computación, La Habana, Cuba 3 Instituto Superior de Tecnología y Ciencias Aplicadas, InSTEC The Protein Data Bank (PDB) at Brookhaven National Laboratory is a database containing experimentally determined three-dimensional structures of proteins, nucleic acids and other biological macromolecules, with approximately entries. There are numerous examples in molecular biology, medicine and drug discovery where the PDB is playing an increasingly important role. The approach on this work relies on the application of the Tabular Data Model for satisfies analytical processing needs over the data inside PDB. Tabular Model, introduced by Microsoft SQL Server 2012 Analysis Services (SSAS), combines high analytical functionality with reasonably large data capacity and reasonable productivity. Present development effort attempts to address the demands of the researcher’s community who take advantage of PDB. The application will support the reporting and analysis complex needs and help to organize and explore disparate data from different points of view and diverse visualization modes, increasing efficacy and efficiency in results. This solution offers quickly and easy access to the objects and data inside the proteins model from desktop applications, such as Microsoft Excel, and SharePoint web applications, such as Microsoft Power View. The system has been designed with the expectation of future application to other biological databases. INTRODUCTION Visual Studio 2012 has been the software development tool. SQL Server 2012 is the database management system, which offers a lot of great new features in the world of Business Intelligence (BI). Tabular Data Model, proposed in SQL Server 2012 Analysis Services, uses techniques of in-memory databases and column storage, with advanced compression algorithms, which make the model an efficient and attractive alternative for analytical processing in some contexts [8]. Power View, especially built for the end users, is a great new ad-hoc reporting tool which supports data visualization. PROGRAMING TOOLS RESULTS BACKGROUND The PDB has a 26-year history of service to a global community of researchers, educators and students in a wide variety of scientific disciplines. The archives contain atomic coordinates, citations, primary and secondary structure information, crystallographic structure experimental data, as well as hyperlinks to many other scientific databases [1]. The number of structures in the PDB has grown at an approximately exponential rate, exceeding the 100,000 structures milestone in 2014 [2]. SOLUTION ARCHITECTURE PDB data are available in XML format in order to provide flexibility, extensibility, and ease of data exchange in the biological research community. In this solution we propose a data warehouse design to store and efficiently consult the Protein Data Bank [3, 4]. CONCLUSIONS The Tabular Data Model enables management of the growing Protein Data Bank using recent compression techniques. The solution allows a more efficiently structure selection process based on a dimensional modeling. The Tabular Data Model is an alternative solution for bioinformatic analysis by chemists and biologists researchers without advanced computational skills. The data warehouse could be used for knowledge discovery in biological data. This computational solution provides both researchers and specialists a particular as well as broad insight of the state of the proteins, taking advantage of the recent facilities provided by the Microsoft Business Intelligence platform [5]. REFERENCES [1] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, et al., "The Protein Data Bank" Oxford Journals / Nucleic Acids Research, vol. 28, pp , 2000. [2] Anon, "Hard data: It has been no small feat for the Protein Data Bank to stay relevant for 100,000 structures" Nature, vol. 509, 2014. [3] J. Westbrook, N. Ito, H. Nakamura, K. Henrick, and H. M. Berman, "PDBML: the representation of archival macromolecular structure data in XML," Oxford Journals / Bioinformatics, vol. 21 No. 7, pp , 2004. [4] G. Anders and M. Nicola. (2012, 1/6/2015). Manipulación del Banco de Datos de Proteínas con DB2 pureXML. IBM developerWorks. Available: [5] MSDN, "What's New in Analysis Services and Business Intelligence," MSDN Library, Microsoft 2014. [6] C. Ballard, D. M. Farrell, A. Gupta, C. Mazuela, and S. Vohnik, "Dimensional Modeling: In a Business Intelligence Environmment" International Business Machines Corporation, United States of American SG , 2006. [7] T. K. M. Mertens, and H. J. and Appelrath, "Utilizing Structured Information from Multiple External Sources in the Context of the Multidimensional Data Model," in 16th International Conference of Business Information Systems, 2013, pp [8] P. Savjani, BI Solutions using SSAS Tabular Model Succinctly. Morrisville, USA: Synsfusion Inc., 2014. MULTIDIMENSIONAL SCHEMA TABULAR DATABASE Multidimensional modeling enables BI professionals to create sophisticated cubes using traditional online analytical processing (OLAP). We propose an approximation of a protein dimensional design [6, 7].


Download ppt "Application of Tabular Data Model to explore and analyze the"

Similar presentations


Ads by Google