Adv. UNIX:fp/101 Advanced UNIX v Objectives of these slides: –a more detailed look at file processing in C Special Topics in Comp. Eng. 1 Semester 2, File Processing
Adv. UNIX:fp/102 Overview 1.Background 2.Text Files 3.Error Handling 4.Binary Files 5.Direct Access continued
Adv. UNIX:fp/103 6.Temporary Files 7.Renaming & Removing 8.Character Pushback 9.Buffering 10.Redirecting I/O
Adv. UNIX:fp/ Background v Two types of file: text, binary v Two access methods: sequential, direct (also called random access) v UNIX I/O is line buffered –input is processed a line at a time –output may not be written to a file immediately until a newline is output
Adv. UNIX:fp/ Text Files Standard I/OFile I/O printf()fprintf() scanf()fscanf() gets()fgets() puts()fputs() getchar()getc() putcharputc() most just add a 'f'
Adv. UNIX:fp/106 Function Prototypes v int fscanf(FILE *fp, char *format,...); v int fprintf(FILE *fp, char *format,...); v int fgets(char *str, int max, FILE *fp); v int fputs(char *str, FILE *fp); v int getc(FILE *fp); v int putc(int ch, FILE *fp); the new argument is the file pointer fp
Adv. UNIX:fp/ Standard FILE* Constants NameMeaning stdin standard input stdout standard output stderr standard error e.g. if (len >= MAX_LEN) fprintf(stderr, “String is too long\n”);
Adv. UNIX:fp/ Opening / Closing v FILE *fopen(char *filename, char *mode); int fclose(FILE *fp); fopen() modes: ModeMeaning “r” read mode “w” write mode “a” append mode
Adv. UNIX:fp/109 Careful Opening v FILE *fp;/* file pointer */ char *fname = “myfile.dat”; if ((fp = fopen(fname, “r”)) == NULL) { fprintf(stderr, “Error opening %s\n”, fname); exit(1); }.../* file opened okay */
Adv. UNIX:fp/ Text I/O v As with standard I/O: –formatted I/O( fprintf, fscanf ) –line I/O( fgets, fputs ) –character I/O( getc, putc )
Adv. UNIX:fp/ Formatted I/O v int fscanf(FILE *fp, char *format,...); int fprintf(FILE *fp, char *format,...); Both return EOF if an error or end-of-file occurs. If okay, fscanf() returns the number of bound variables, fprintf() returns the number of output characters.
Adv. UNIX:fp/ Line I/O v char *fgets(char *str, int max, FILE *fp); int fputs(char *str, FILE *fp); If an error or EOF occurs, fgets() returns NULL, fputs() returns EOF. If okay, fgets() returns pointer to string, fputs() returns non-negative integer.
Adv. UNIX:fp/1013 Differences between fgets() and gets() Use of max argument: fgets() reads in at most max-1 chars (so there is room for ‘\0’ ). fgets() retains the input ‘\n’ Deleting the ‘\n’ : len1 = strlen(line)-1; if (line[len1] == ‘\n’) /* to be safe */ line[len1] = ‘\0’;
Adv. UNIX:fp/1014 Difference between fputs() and puts() fputs() does not add a ‘\n’ to the output.
Adv. UNIX:fp/1015 Line-by-line Echo #define MAX 100/* max line length */ : void output_file(char *fname) { FILE *fp; char line[MAX]; if ((fp = fopen(fname, “r”)) == NULL) { fprintf(stderr, “Error opening %s\n”, fname); exit(1); } while (fgets(line, MAX, fp) != NULL) fputs(line, stdout); fclose(fp); }
Adv. UNIX:fp/ Character I/O v int getc(FILE *fp); int putc(int ch, FILE *fp); Both return EOF if an error or end-of-file occurs. Can also use fgetc() and fputc().
Adv. UNIX:fp/1017 Char-by-char Echo #define MAX 100/* max line length */ : void output_file(char *fname) { FILE *fp; int ch; if ((fp = fopen(fname, “r”)) == NULL) { fprintf(stderr, “Error opening %s\n”, fname); exit(1); } while ((ch = getc(fp)) != EOF) putc(ch, stdout); fclose(fp); }
Adv. UNIX:fp/1018 Using feof() v Rewrite the previous while-loop as: while (!feof(fp)) { ch = getc(fp); putc(ch, stdout); } –not a common coding style.
Adv. UNIX:fp/ Error Handling int ferror(FILE *fp); –check error status of file stream –it returns non-zero if there is an error void clearerr(FILE *fp); –reset error status continued
Adv. UNIX:fp/1020 void perror(char *str); –print str (usually a filename) followed by colon and a system-defined error message v... fp = fopen(fname, “r”); if (fp == NULL) { perror(fname); exit(1); } common in advanced coding
Adv. UNIX:fp/1021 errno The system error message is based on a system error number ( errno ) which is set when a library function returns an error. v #include... fp = fopen(fname, “r”); if (errno ==...)... continued
Adv. UNIX:fp/1022 Many errno integer constants are defined in errno.h –it is better style to use the constant name instead of the number –linux distributions usually put most errno constants in asm/errno.h Example errno constants: EPERM permission denied ENOENT no such file / directory
Adv. UNIX:fp/ Binary Files v For storing non-character data –arrays, structs, integers (as bytes), GIFs, compressed data v Not portable across different systems –unless you have cross-platform reading/writing utilities, such as gzip v For portability, use text files
Adv. UNIX:fp/1024 fopen() modes for Binary Files ModeMeaning “rb” read binary file “wb” write binary file “ab” append to binary file add a "b" to the text file modes
Adv. UNIX:fp/1025 Reading / Writing int fread(void *buffer, int size, int num, FILE *fp); int fwrite(void *buffer, int size, int num, FILE *fp); Returns number of things read/written (or EOF ).
Adv. UNIX:fp/1026 Example v The code will write to a binary file containing employee records with the following type structure: #define MAX_NAME_LEN 50 struct employee { int salary; char name[MAX_NAME_LEN + 1]; }; continued
Adv. UNIX:fp/1027 struct employee e1, emps[MAX]; : : /* write the struct to fp */ fwrite(&e1, sizeof(struct employee), 1, fp); /* write all of the array with 1 op */ fwrite(emps, sizeof(struct employee), MAX, fp);
Adv. UNIX:fp/ Direct Access v Direct access: move to any record in the binary file and then read (you do not have to read the others before it). v e.g. a move to the 5th employee record would mean a move of size: 4 * sizeof(struct employee) 5th
Adv. UNIX:fp/1029 fopen() Modes for Direct Access (+) ModeMeaning “rb+” open binary file for read/write “wb+” create/clear binary file for read/write “ab+” open/create binary file for read/write at the end
Adv. UNIX:fp/1030 Employees Example #include #include #include #define DF “employees.dat” #define MAX_NAME_LEN 50 struct employee { int salary; char name[MAX_NAME_LEN + 1]; }; int num_emps = 0; /* num of employees in DF */ FILE *fp; : Poor style: global variables
Adv. UNIX:fp/1031 Data Format e1e2e3e4 number.... employees.dat v The basic coding technique is to store the number of employee currently in the file (e.g. 4) –some functions will need this number in order to know where the end of the data occurs empty space of the right size
Adv. UNIX:fp/1032 Open the Data File void open_file(void) { if ((fp = fopen(DF, “rb+”)) == NULL) { fp = fopen(DF, “wb+”); /* create file */ num_emps = 0; /* initial num. */ } else /* opened file, read in num. */ fread(&num_emps, sizeof(num_emps), 1, fp); }
Adv. UNIX:fp/1033 Move with fseek() int fseek(FILE *fp, long offset, int origin); v Movement is specified with a starting position and offset from there. The current position in the file is indicated with the file position pointer (not the same as fp ).
Adv. UNIX:fp/1034 Origin and Offset fseek() origin values: NameValueMeaning SEEK_SET 0beginning of file SEEK_CUR 1current position SEEK_END 2end of file v Offset is a large integer –can be negative (i.e. move backwards) –equals the number of bytes to move
Adv. UNIX:fp/1035 Employees Continued void put_rec(int posn, struct employee *ep) /* write an employee at position posn */ { long loc; loc = sizeof(num_emps) + ((posn-1)*sizeof(struct employee)); fseek(fp, loc, SEEK_SET); fwrite(ep, sizeof(struct employee), 1,fp); } Can write anywhere No checking to avoid over-writing.
Adv. UNIX:fp/1036 Read in an Employee void get_rec(int posn, struct employee *ep) /* read in employee at position posn */ { long loc; loc = sizeof(num_emps) + ((posn-1)*sizeof(struct employee)); fseek(fp, loc, SEEK_SET); fread(ep, sizeof(struct employee), 1,fp); } should really check if ep contains something
Adv. UNIX:fp/1037 Close Employees File void close_file(void) { rewind(fp); /* same as fseek(fp, 0, 0); */ /* update num. of employees */ fwrite(&num_emps, sizeof(num_emps), 1, fp); fclose(fp); }
Adv. UNIX:fp/1038 ftell() v Return current position of the file position pointer (i.e. its offset in bytes from the start of the file): long ftell(FILE *fp);
Adv. UNIX:fp/ Temporary Files FILE *tmpfile(void); /* create a temp file */ char *tmpnam(char *name); /* create a unique name */ tmpfile() opens file with “wb+” mode; removed when program exits
Adv. UNIX:fp/ Renaming & Removing int rename(char *old_name, char *new_name); –like mv in UNIX int remove(char *filename); –like rm in UNIX
Adv. UNIX:fp/ Character Pushback int ungetc(int ch, FILE *fp); v Overcomes some problems with reading too much –1 character lookahead can be coded ungetc() only works once between getc() calls Cannot pushback EOF
Adv. UNIX:fp/ Buffering int fflush(FILE *fp); –e.g. fflush(stdout); v Flush partial lines –overcomes output line buffering stderr is not buffered.
Adv. UNIX:fp/1043 setbuf() void setbuf(FILE *fp, char *buffer); v Most common use is to switch off buffering: setbuf(stdout, NULL); –equivalent to fflush( ) after every output function call
Adv. UNIX:fp/ Redirecting I/O v FILE *freopen(char *filename, char *mode, FILE *fp); –opens the file with the mode and associates the stream with it Most common use is to redirect stdin, stdout, stderr to mean the file v It is better style (usually) to use I/O redirection at the UNIX level. continued
Adv. UNIX:fp/1045 v FILE *in; int n; in = freopen("infile", "r", stdin); if (in == NULL) { perror("infile"); exit(1); } scanf("%d", &n); /* read from infile */ : fclose(in);