Real-World File Structures

Slides:



Advertisements
Similar presentations
CH10 Instruction Sets: Characteristics and Functions
Advertisements

Data Representation. Units & Prefixes Review kilo, mega, and giga are different in binary! bit (b) – binary digit Byte (B) – 8 binary digits KiloByte.
Goal: Write Programs in Assembly
Information Representation
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
INSTRUCTION SET ARCHITECTURES
1 Chapter 2 The Digital World. 2 Digital Data Representation.
The Binary Numbering Systems
Connecting with Computer Science, 2e
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Portability CPSC 315 – Programming Studio Spring 2008 Material from The Practice of Programming, by Pike and Kernighan.
Data starts with width and height of image Then an array of pixel values (colors) The number of elements in this array is width times height Colors can.
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
Bits & Bytes: How Computers Represent Data
© GCSE Computing Candidates should be able to:  explain the representation of an image as a series of pixels represented in binary  explain the need.
1 Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas 78704
Binary Arithmetic & Data representation
Lec 3: Data Representation Computer Organization & Assembly Language Programming.
Computer Systems Organization CS 1428 Foundations of Computer Science.
Cis303a_chapt03-2a.ppt Range Overflow Fixed length of bits to hold numeric data Can hold a maximum positive number (unsigned) X X X X X X X X X X X X X.
Text and Graphics September 26, Unit 3.
File Structures Foundations of Computer Science  Cengage Learning.
Robert Crawford, MBA West Middle School.  Explain how the binary system is used by computers.  Describe how software is written and translated  Summarize.
1 Data Representation Characters, Integers and Real Numbers Binary Number System Octal Number System Hexadecimal Number System Powered by DeSiaMore.
Graphics and Images Graphics and images are both non-textual information, that can be displayed and printed. These images may appear on screen as well.
Computer Science I Storing data. Binary numbers. Classwork/homework: Catch up. Do analysis of image types.
Lecture # 36 Computer Architecture and Organization.
Text and Images Key Revision Points.
Chapter 17 Looking “Under the Hood”
Data Representation COE 308 Computer Architecture
Lecture 1b- Introduction
Chapter 8 & 11: Representing Information Digitally
Data Representation.
NUMBER SYSTEMS.
Data Representation ICS 233
Lec 3: Data Representation
Computer Architecture & Operations I
Data representation How do we represent data in a digital system?
Data Representation.
Reading Netpbm Images.
Intermediate 2 Computing
CPSC 315 – Programming Studio Spring 2012
Other Kinds of Arrays Chapter 11
Net 323 D: Networks Protocols
An Introduction to Scanning and Storing Photographs and Graphics
Data Representation COE 301 Computer Organization
Compiler Construction
McGraw-Hill Technology Education
The University of Adelaide, School of Computer Science
Portability CPSC 315 – Programming Studio
Ch2: Data Representation
1. Explain how ASCII is used to represent text in a computer system
Net 323 D: Networks Protocols
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Data representation How do we represent data in a digital system?
Homework Reading Programming Assignments Finish K&R Chapter 1
Data Representation ICS 233
Tonga Institute of Higher Education IT 141: Information Systems
Abstraction – Number Systems and Data Representation
Tonga Institute of Higher Education IT 141: Information Systems
McGraw-Hill Technology Education
Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.
CPU Structure CPU must:
McGraw-Hill Technology Education
Data representation How do we represent data in a digital system?
Beyond Record Structures
Data Representation COE 308 Computer Architecture
Presentation transcript:

Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas 78704 http://www.stedwards.edu/

Metadata Data About Data Usually in the form of a file header Example in text Astronomy image storage format HTML format (name = value) But look on page 177: coding style makes a BIG difference Parsing this kind of data Read field name; read field value Convert ASCII value to type required for storage & use Store converted value into right variable Why use this type of header?

More Metadata Graphics Storage Formats Data Metadata Color values for each pixel in image Data compression often used (GIF, JPG) Different color “depth” possibilities Metadata Height & width of image Number of bits per pixel (color depth) If not true color (24 bits / pixel) Color look-up table Normally 256 entries Indexed by values stored for each pixel (normally 1 byte) Contains R/G/B values for color combination Often formatted to be loaded directly into graphics RAM

Mixing Kinds of Data in a File Objective Store different types of data in the same file Textbook example – mix of astronomy data Main file header (HTML-style) Sub-files of notes – lines of ASCII text Sub-files of image data – in whatever format is needed So our main file becomes a file of sub-files Each sub-file (header, notes, or image) is really a “record” in the main file These “records” are of varying length & format How do we store the actual records in the sub-file “records”? Could use another level of specified-length record software Better – do what makes sense in the situation(s)

Our Main File Organization Main File Header Notes Sub-file Image Sub-file Notes Sub-file Image Sub-file … Notes Header Text line … Terminator Image Header Image Data

More on Our Mixed-Data File Access Can we just read it sequentially? Why or why not? What if we wanted to skip a notes sub-file? What if some image didn’t even have a notes sub-file? Can we access it directly? What would the header have to include to allow that? An index of the “records” in the file We call the entries in that index “tags” Each tag in the tag list has: Type of sub-file referred to Special-case type: end of file RBA of sub-file in main file Length of sub-file (not necessary, but helpful) Key information, if any, for the sub-file

Even More on Our Mixed-Data File Access, continued So how can we access the mega-file now? Read and process the header Get information about the whole main file Build in-memory table of tags (keys + locations) for sub-files Sequential access Same as before May be able to program in some speed-ups from tag table Direct access Locate sub-file in tag table Go right to it

Extensibility Look at Our Main File Format Again Main header tells us things about the sub-files: What kinds of files they are Where to find them Sub-files themselves To the main-file processor, they are just random bytes To each sub-file processor, they are meaningful information What If We Need a New Type of Sub-File? Define a new type of main header entry Extend main header processor to understand that entry Write (or borrow or buy) code to handle new sub-file Cardinal Rule: Everything changes – file types, data types, ...

File Portability

Factors Affecting Portability - 1 Operating System Differences Example – text lines End with line-feed character End with carriage-return and line-feed Prefixed by a count of characters in the line Natural Language Differences Example – character coding Single-byte coding – ASCII, EBCDIC Double-byte coding – Unicode Programming Language Differences Pascal can’t directly process varying-length records Different C++ compilers use different byte lengths for the standard data types

Factors Affecting Portability - 2 Computer Architecture Differences Byte order in 16-bit and 32-bit integer values “Big-endian” – leftmost byte is most significant “Little-endian” – rightmost byte is most significant Storage of data in memory Most architectures require values that are N bytes long to start at a byte whose address is divisible by N 0x15 0x32 Big-endian Little-endian interpretation: interpretation: 0x1532 0x3215 Don’t ask.

How to Port Files Define Your Format C*A*R*E*F*U*L*L*Y Once a file format is defined, never change it If you need a new file format, add it so as not to invalidate the existing formats If you need to change a format, add a new one instead, and let programs that need the new version use it Decide on a standard format for data elements Text lines ASCII , EBCDIC, or Unicode? Which character(s) to end lines? Binary Tightly packed or multiple-of-N addressing? Which “endian”? You can always write code to convert to & from the standard format on a new language, computer, etc.

The Conversion Problem Only a Few Environments – do it directly: Many Env’ts. – need an intermediate form: IBM Sun Sun IBM IBM IBM Sun Sun . . . XML IA-32 IA-32 IA-64 IA-64 (or some other standard format)