CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU
CENG 3512 CENG 351-Section 2 Instructor: Nihan Kesim Çiçekli Office: A308 Lecture Hours: Tue. 9:40; Thu. 13:40,14:40 (BMB3) Course Web page: Teaching Assistants: Ömer Nebil Yaveroğlu Nilgün Dağ
CENG 3513 References 1.Betty Salzberg, File Structures: An Analytic Approach, Prentice Hall, Raghu Ramakrishnan, Database Management Systems (3rd. ed.), McGraw Hill, Michael J. Folk, Bill Zoellick and Greg Riccardi, File Structures, An object oriented approach with C++, Addison-Wesley, R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 4 th edition, Addison-Wesley, 2004.
CENG 3514 Course Outline 1.Introduction: Secondary storage devices 2.Fundamental File Structure Concepts: Sequential Files 3.External Sorting 4.Indexed Sequential Files (B-trees) 5.Direct access (Hashing) 6.Introduction to Database Systems: E/R modeling, relational model, 7.Query languages: Relational algebra, relational calculus, SQL 8.Query Evaluation
CENG 3515 Grading 3 written HW, 3 programming assignments30% Midterm Exam 120% Midterm Exam 220% Final Exam30% Tentative Exam Dates: Midterm Exam 1: Nov. 10, 2009 Midterm Exam 2: Dec. 22, 2009
CENG 3516 Grading Policies Policy on missed midterm: –no make-up exam Lateness policy: –Late assignments are penalized up to 10% per day. All assignments and programs are to be your own work. No group projects or assignments are allowed.
CENG 3517 Introduction to File management
CENG 3518 Motivation Most computers are used for data processing (over $100 billion/year). A big growth area in the “information age” This course covers data processing from a computer science perspective: –Storage of data –Organization of data –Access to data –Processing of data
CENG 3519 Data Structures vs File Structures Both involve: –Representation of Data + –Operations for accessing data Difference: –Data structures: deal with data in main memory –File structures: deal with data in secondary storage
CENG Hardware Operating System DBMS File system Application Where do File Structures fit in Computer Science?
CENG Computer Architecture Main Memory (RAM) Secondary Storage data transfer data is manipulated here data is stored here - Semiconductors - Fast, expensive, volatile, small - disks, tape - Slow,cheap, stable, large
CENG Advantages Main memory is fast Secondary storage is big (because it is cheap) Secondary storage is stable (non-volatile) i.e. data is not lost during power failures Disadvantages Main memory is small. Many databases are too large to fit in main memory (MM). Main memory is volatile, i.e. data is lost during power failures. Secondary storage is slow (10,000 times slower than MM)
CENG How fast is main memory? Typical time for getting info from: Main memory: ~12 nanosec = 120 x sec Magnetic disks: ~30 milisec = 30 x sec An analogy keeping same time proportion as above: Looking at the index of a book : 20 sec versus Going to the library: 58 days
CENG Normal Arrangement Secondary storage (SS) provides reliable, long- term storage for large volumes of data At any given time, we are usually interested in only a small portion of the data This data is loaded temporarily into main memory, where it can be rapidly manipulated and processed. As our interests shift, data is transferred automatically between MM and SS, so the data we are focused on is always in MM.
CENG Goal of the file structures Minimize the number of trips to the disk in order to get desired information Grouping related information so that we are likely to get everything we need with only one trip to the disk.
CENG Physical Files and Logical Files physical file: a collection of bytes stored on a disk or tape logical file: a "channel" (like a telephone line) that connects the program to a physical file The program (application) sends (or receives) bytes to (from) a file through the logical file. The program knows nothing about where the bytes go (came from). The operating system is responsible for associating a logical file in a program to a physical file in disk or tape. Writing to or reading from a file in a program is done through the operating system.
CENG Files The physical file has a name, for instance myfile.txt The logical file has a logical name (a varibale) inside the program. –In C : FILE * outfile ; –In C++: fstream outfile;
CENG Basic File Processing Operations Opening Closing Reading Writing Seeking
CENG File Systems Data is not scattered hither and thither on disk. Instead, it is organized into files. Files are organized into records. Records are organized into fields.
CENG Example A student file may be a collection of student records, one record for each student Each student record may have several fields, such as –Name –Address –Student number –Gender –Age –GPA Typically, each record in a file has the same fields.
CENG Properties of Files 1)Persistance: Data written into a file persists after the program stops, so the data can be used later. 2)Sharability: Data stored in files can be shared by many programs and users simultaneously. 3)Size: Data files can be very large. Typically, they cannot fit into main memory.