SERIALIZED DATA STORAGE Within a Database James Devens (devensj)

Slides:



Advertisements
Similar presentations
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Advertisements

ORACLE Lecture 1: Oracle 11g Introduction & Installation.
PHP (2) – Functions, Arrays, Databases, and sessions.
Serial benchmarks in Mysql, Oracle and PostgreSQL -Test objectives -Test setup -Test result and highlights Carlos Jesus
It refers to the software used to manage the database.
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
 EcoMed Services is a fictional lighting company located in Kansas City, Missouri  It provides environmental friendly lighting to medical facilities.
Database Design for DNN Developers Sebastian Leupold.
CSCI 6962: Server-side Design and Programming JDBC Database Programming.
ASP.NET Programming with C# and SQL Server First Edition
Java Database Connectivity (JDBC) Introduction to JDBC JDBC is a simple API for connecting from Java applications to multiple databases. Lets you smoothly.
PHP Programming with MySQL Slide 8-1 CHAPTER 8 Working with Databases and MySQL.
Introduction to SQL Steve Perry
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Chapter 7 Working with Databases and MySQL PHP Programming with MySQL 2 nd Edition.
Stored Procedures, Triggers, Program Access Dr Lisa Ball 2008.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
More about Databases. Data Entry through Forms Table View (Data sheet view) is useful for data entry of new records But sometimes customization would.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Chapter 14 - Designing Data Access Classes1 Chapter 14 Designing Data Access Classes.
Most information comes from Chapter 3, MySQL Tutorial: 1 MySQL: Part.
Improving Database Performance Derrick Rapley
LAT HSK Data Handling from B33 Cleanroom. ISOC Software Architecture.
Java Database Connectivity (JDBC). Topics 1. The Vendor Variation Problem 2. SQL and Versions of JDBC 3. Creating an ODBC Data Source 4. Simple Database.
Database structure and space Management. Segments The level of logical database storage above an extent is called a segment. A segment is a set of extents.
Course FAQ’s I do not have any knowledge on SQL concepts or Database Testing. Will this course helps me to get through all the concepts? What kind of.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
1 CS 430 Database Theory Winter 2005 Lecture 14: Additional SQL Topics.
Li Tak Sing COMPS311F. Database programming JDBC (Java Database Connectivity) Java version of ODBC (Open Database Connectivity) ODBC provides a standard.
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
Database Access Using JDBC BCIS 3680 Enterprise Programming.
Task #1 Create a relational database on computers in computer classroom 308, using MySQL server and any client. Create the same database, using MS Access.
Session 1 Module 1: Introduction to Data Integrity
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
Performance. Performance Performance is a critical issue especially in a multi-user environment. Benchmarking is one way of testing this.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. DATABASE.
Dr. Abdullah Almutairi Spring PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used,
SQL pepper. Why SQL File I/O is a great deal of code Optimal file organization and indexing is critical and a great deal of code and theory implementation.
M2OProxy Details Andy Salnikov Monitoring in M2OProxy2 Monitoring What is monitored Requests Requests client ID (host IP address/port)
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Introduction to Database Programming with Python Gary Stewart
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Database: JDBC Overview
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Web Systems & Technologies
Module 11: File Structure
Introduction to MySQL.
Database JDBC Overview CS Programming Languages for Web Applications
Introduction What is a Database?.
Database Management  .
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
Web Systems Development (CSC-215)
Disk Storage, Basic File Structures, and Buffer Management
Chapter 8 Working with Databases and MySQL
8 6 MySQL Special Topics A Guide to MySQL.
File-System Structure
CS3220 Web and Internet Programming SQL and MySQL
Chapter 11 Managing Databases with SQL Server 2000
CS3220 Web and Internet Programming SQL and MySQL
Storing and Processing Sensor Networks Data in Public Clouds
Updating Databases With Open SQL
Information Retrieval and Web Design
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Updating Databases With Open SQL
Presentation transcript:

SERIALIZED DATA STORAGE Within a Database James Devens (devensj)

THE IDEA  Serialized data can be used to store the current state of objects in a database.  Good alternative to deprecated object based databases.  Storing separate data values into a single byte array.

TOOLS USED  MySQL Workbench  DigitalOcean Server Hosting  PuTTY  WinSCP  Microsoft Excel & PowerPoint  Vim (Java Source)  Protocol Buffers (Google)  JDBC (Java Database Connectivity)  United States 2000 Census

PREDICTIONS  Data will usually take less storage as byte arrays.  Data will take less time to do basic queries (non-indexed database).  Serialized data will be harder to access in a relational database.  It can defeat the purpose of relational databases

DATABASE STRUCTURE  Census Table  Census_ pb Table

INSERTING DATA  Data inserted into both tables using JDBC Prepared Statements  Prevents SQL injections  Allows similar queries to execute FASTER  Serialized data through the use of Protocol Buffers  Developed by Google  More secure and portable than Java serialization

INSERTING DATA (NON-SERIALIZED)

INSERTING DATA (SERIALIZED)

QUERYING DATA  Use an array of names  Each of these names will be queried  This process repeats however many times specified (default 1000)  Number of Queries = NumLoops * Names.length * 2

QUERYING DATA

DATA COLLECTION  Modified the simple query class to record data  Exported to.csv for Microsoft Excel  Each data sample consisted of 5 names being queried times  5000 data samples were taken  Number of Queries = * 5000 * 2 = 500,000,000 queries

DATA COLLECTION

RESULTS (INSERTS)  Results:  Non-Serialized  INSERT Dump Success!  Took: ms to complete.  Serialized  INSERT Dump Success!  Took: ms to complete.

RESULTS (DATA COLLECTION)  Results:  Took ms to complete (7.67 hours).  5000 loops, and queries executed.

RESULTS (DATA COLLECTION) Every 50,000 Queries

RESULTS (STORAGE)  Non-Serialized Data Space  Serialized Data Space Byte (4.19 MB) Difference

CONCLUSION  Data storage is reduced quite a bit, making it efficient to store serialized data  The query speeds were roughly the same  Serialization is good way to store object states  Serialization is NOT a good way to store frequently changing objects  If an object class is modified it would ruin all of your current data  It is NOT relational friendly (for the most part)  You cannot access the original data values inside the byte array without another program’s help

FUTURE WORK  Write a program to return the byte array back to the original object (easy)  Use a different.proto file with tons of data values (e.g doubles)  Find more test statistics and collect more data  Index the data to see how it affects query speeds of both methods