Download presentation
Presentation is loading. Please wait.
Published byAmi Stanley Modified over 8 years ago
1
Advanced Computing Data Bases: Basic concepts Based in part on open access material from: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke http://pages.cs.wisc.edu/~dbbook/index.html http://pages.cs.wisc.edu/~dbbook/index.html Databases by Timothy Griffin www.cl.cam.ac.uk/Teaching/current/Databases & www.cl.cam.ac.uk/Teaching/current/Databases Scientific Data Management by Laura Bright & Bill Howe http://web.cecs.pdx.edu/~howe/cs410/ http://web.cecs.pdx.edu/~howe/cs410/
2
What is a database? "A database is a collection of information stored (usually in a computing system) in a systematic way, such that users (usually a computer program) can consult it to answer questions.“ (see WikiPedia - The Free Encyclopedia.) Databases are in almost any medium to large system in the world today. This is useful to handle scalability. As the expectations of users and the complexity of applications continue to increase, most application systems would need a database to store and manage information. Databases have grown larger and more powerful as technology advances. It is now common to have databases which are terabytes in size and hosted on multiple servers Data is stored in database tables and SQL is used to access or modify this data.database tablesSQL
3
Database management systems A database is a collection of (usually persistent) data that model the entities of interest, by means of properties. To handle large amounts of data computers use what we call a Database Management System (DBMS) – These are software systems that support the creation, population, and querying of a database Aims in the access to data: Efficiency, Concurrency, Security
4
Database Tables Tables are the basic structures within databases that are used to store data. Each database table consists of rows and columns. Rows are called Records (the entities to be modelled), and columns are called Fields (their properties of interest). As the number of records grows, more rows are added into the table A record is expressed as a collection of fields, which describe the relevant aspects of the single unit of data stored in the table. Each column in the database table shows the fields of the different records and has a data-type associated.
5
Data types Each field in a table has a data-type associated. Types fix the way fields are represented and stored in the binary implementation inside the computing system. This allows to have efficient mechanism to reach a record in the serialized version of the table.
6
Relationships As data grow in a database, it may be more efficient to have several tables to store it. When data types become more complex, and/or the properties to attach to a field would be better expressed as records in another table, records of this tables are referenced from fields in records of the other tables. The referencing mechanism is the definition of one of the fields of the record as a key Keys are used in the referencing table as data stored in a field
7
Relational Database When data need to be expressed in several related tables, we call it a relational database The software used to manage them is a Relational Database Management System (RDBMS). It consists of a number of tables and single schema (definition of tables and attributes) Example:Students (sid, name, login, age, gpa) Students identifies the table sid, name, login, age, gpa identify attributes sid is primary key
8
Programming over Database Management Systems (DBMSs) Raw Resources (and its binary code) DBMS Applications Go Here Database abstractions allow this interface to be cleanly defined and this allows applications and data management systems to be implemented separately.
9
9 Three-level architecture Conceptual Schema … Physical level Conceptual level Internal Schema External Schema 1 External Schema 2 External Schema n External level
10
An Example Table Students (sid: string, name: string, login: string, age: integer, gpa: real) sidnameloginagegpa 50000Davedave@cs193.3 53666Jonesjones@cs183.4 53688Smithsmith@ee183.2 53650Smithsmith@math193.8 53831Madayanmadayan@music111.8 53832Gulduguldu@music122.0
11
Another example: Courses Courses (cid, instructor, quarter, dept) cidinstructorquarterdept Carnatic101JaneFall 06Music Reggae203BobSummer 06Music Topology101MarySpring 06Math History105AliceFall 06History
12
Keys Primary key – minimal subset of fields that is unique identifier for a tuple sid is primary key for Students cid is primary key for Courses Foreign key –connections between tables Courses (cid, instructor, quarter, dept) Students (sid, name, login, age, gpa) How do we express which students take each course?
13
Many to many relationships In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table cidgradestudid Carnatic101C53831 Reggae203B53832 Topology112A53650 History 105B53666 sidnamelogin 50000Davedave@cs 53666Jonesjones@cs 53688Smithsmith@ee 53650Smithsmith@math 53831Madayanmadayan@music 53832Gulduguldu@music Enrolled Student Foreign key
14
Indexes When a table will be searched using different fields, it is convenient to have them pre-ordered according to those fields. To have quick access to it, it is convenient to have an additional table that links ordering values(fields) to keys of the records in the related table. Idea: speed up access to desired data “Find all students with gpa > 3.3” May need to scan entire table Index consists of a set of entries pointing to locations of each search key
15
Types of Indexes Clustered vs. Unclustered Clustered- ordering of data records same as ordering of data entries in the index Unclustered- data records in different order from index Primary vs. Secondary Primary – index on fields that include primary key Secondary – other indexes
16
Example: Clustered Index Sorted by sid sidnamegpa 50000Dave3.3 53650Smith3.8 53666Jones3.4 53688Smith3.2 53831Madayan1.8 53832Guldu2.0 50000 53600 53800
17
Example: Unclustered Index Sorted by sid Index on gpa sidnamegpa 50000Dave3.3 53650Smith3.8 53666Jones3.4 53688Smith3.2 53831Madayan1.8 53832Guldu2.0 1.8 2.0 3.2 3.3 3.4 3.8
18
Comments on Indexes Indexes can significantly speed up query execution But inserts more cost May have high storage overhead Need to choose attributes to index wisely! What queries are run most frequently? What queries could benefit most from an index?
19
Summary: Why are RDBMS useful? Data independence – provides abstract view of the data, without details of storage Efficient data access – uses techniques to store and retrieve data efficiently Reduced application development time – many important functions already supported Centralized data administration Data Integrity and Security Concurrency control and recovery
20
So, why don’t scientists use them? “I tried to use databases in my project, but they were just too [slow | hard-to-use | expensive | complex]. So I use files”. Gray and Szalay, Where Rubber Meets the Sky: Bridging the Gap Between Databases and Science Some other limitations of RDBMS Arrays Hierarchical data
21
Example: Taxonomy of Organisms Hierarchy of categories: Kingdom - phylum – class – order – family – genus - species How would you design a relational schema for this? Animals Chordates Vertebrates Arthropods birds insectsspiderscrustaceans reptilesmammals
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.