> word){ // read succeeded numWords++; } cout << "number of words read = " << numWords << endl;"> > word){ // read succeeded numWords++; } cout << "number of words read = " << numWords << endl;">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Computer Science Tapestry 7.1 Why do we read from files (chapter 6)? l Lots of data stored in files  What's a terabyte? Petabyte? Where does this much.

Similar presentations


Presentation on theme: "A Computer Science Tapestry 7.1 Why do we read from files (chapter 6)? l Lots of data stored in files  What's a terabyte? Petabyte? Where does this much."— Presentation transcript:

1 A Computer Science Tapestry 7.1 Why do we read from files (chapter 6)? l Lots of data stored in files  What's a terabyte? Petabyte? Where does this much data come from?  What's a URL? Difference between http://… and file://…  Can we read strings at a time, ints at a time, lines at a time, chars at a time?  Can we read objects at a time? Why? l What's a database?  Who knows about your Duke data? Other personal data?  Why are "database-backed websites" better?

2 A Computer Science Tapestry 7.2 Streams for reading files We’ve seen the standard input stream cin, and the standard output streams cout (and cerr )  Accessible from, used for reading from the keyboard, writing to the screen l Other streams let us read from files and write to files  Syntax of reading is the same: a stream is a stream  Syntax for writing is the same: a stream is a stream l To use a file stream the stream must be opened  Opening binds stream to a file, then I/O to/from is ok  Should close file streams (with only one or two files, doesn’t matter since closed on program exit)

3 A Computer Science Tapestry 7.3 Input file stream: Note similarity to cin string word; int numWords = 0; // # words read so far while (cin >> word) { // read succeeded numWords++; } cout << "number of words read = " << numWords << endl; ifstream input; // need for this string filename = "c:/data/poe.txt"; input.open(filename.c_str()); while (input >> word){ // read succeeded numWords++; } cout << "number of words read = " << numWords << endl;

4 A Computer Science Tapestry 7.4 Find longest word in file (most letters) l Idea for algorithm/program  Read every word, remember the longest word read so far  Each time a word is read, compare to longest-so-far, if longer then there’s a new longest-so-far l What should longest-so-far be initialized to?  Short word? Long word? First word? l In general, when solving extreme-value problems like this use the first value as the initial value  Always works, but leads to some duplicate code  For some values a default initialization is possible

5 A Computer Science Tapestry 7.5 Maximal and Minimal Values Largest int and double values are found in and, respectively ( and )  INT_MAX and INT_MIN are max and min ints  DBL_MAX and DBL_MIN are max and min doubles l What about maximal string value alphabetically? Minimal?  l What about longest string in length? Shortest? 

6 A Computer Science Tapestry 7.6 Read values, find largest (smallest) l How to write code and initialize values int current; int maxValue = ; // INT_MIN or INT_MAX? int numNums = 0; // # numbers so far while (cin >> current) { // read succeeded numNums++; if (current > maxValue) { maxValue = current; } cout << “max value = “ << maxValue << endl;

7 A Computer Science Tapestry 7.7 What about if ints are strings? l How to write code and initialize values string current; string maxValue = ; // what here? while (cin >> current) { // read succeeded if (current > maxValue) { maxValue = current; } cout << “max value = “ << maxValue << endl;

8 A Computer Science Tapestry 7.8 Using classes to solve problems l Find the word in a file that occurs most frequently  What word occurs most often in Romeo and Juliet  How do we solve this problem? l Suppose a function exists with the header below: int CountOccurrences(const string& filename, const string& s) // post: return # occurrences of s in file w/filename l How can this function be called to find the maximally occurring word in Romeo and Juliet ?  Read words, update counter?  Potential problems?

9 A Computer Science Tapestry 7.9 Complete the code below int CountOccurrences(const string& filename, const string& s) // post: return # occurrences of s in file w/filename int main() { string filename = PromptString("enter file: "); ifstream input(filename.c_str()); string word; while (input >> word) { int occs = CountOccurrences(filename, word); } cout << "maximally occurring word is " << maxWord << endl; cout << "which occurs " << max << " times" << endl; }

10 A Computer Science Tapestry 7.10 Two problems l Words appear as The and the, how can we count these as the same? Other issues?  Useful utilities in “strutils.h” ToLowerreturn a lowercase version of a string ToUpperreturn an uppercase version of a string StripPuncremove leading/trailing punctuation tostring(int)return “123” for 123, int-to-string conversion atoi(string)return 123 for “123”, string-to-int conversion l We count occurrences of “the” as many times as it occurs  Lots of effort duplicated, use the class StringSet  A set doesn’t store duplicates, read file, store in set, then loop over the set counting occurrences

11 A Computer Science Tapestry 7.11 StringSet and WordIterator l Both classes support access via iteration  Iterating over a file using a WordIterator returns one word at-a-time from the file  Iterating over a set using a StringSetIterator returns one word at-a-time from the set l Iteration is a common pattern: A pattern is a solution to a problem in a context  We’ll study more patterns later  The pattern has a name, and it’s an idea embodied in code rather than the code itself l We can write code without knowing what we’re iterating over if the supports generalization in some way

12 A Computer Science Tapestry 7.12 See setdemo.cpp #include using namespace std; #include "stringset.h" int main() { StringSet sset; sset.insert("watermelon"); sset.insert("apple"); sset.insert("banana"); sset.insert("orange"); sset.insert("banana"); sset.insert("cherry"); sset.insert("guava"); sset.insert("banana"); sset.insert("cherry"); cout << "set size = " << sset.size() << endl; StringSetIterator it(sset); for(it.Init(); it.HasMore(); it.Next()) { cout << it.Current() << endl; } if (sset.contains("banana")){ cout << "has we have a banana" << endl; } return 0; }

13 A Computer Science Tapestry 7.13 Designing and Using Classes l Class implementation, summary of what we’ve seen  Data is private and is accessible in each member function  Each object has it’s own data, so that each of five Dice objects has its own mySides and myRollCount  Member function implementations are in a.cpp file, interface is in a.h file l Compiling and linking, interface and implementation  Client programs #include a.h file, this is the interface  Client programs link the implementation, which is a compiled version of the.cpp file (.o or.obj suffix), implementations are often combined in a library, e.g., libtapestry, and the library is linked

14 A Computer Science Tapestry 7.14 Implementing Classes l Determining what classes are needed, and how they should be implemented is difficult; designing functions is difficult  Experience is a good teacher, failure is a good teacher Good design comes from experience, experience comes from bad design attributed to Fred Brooks, Henry Petroski  Design and implementation combine into a cyclical process: design, implement, re-visit design, implement, test, redesign, … Grow a working program, don’t do it all at the same time l One design methodology says “look for nouns, those are classes”, and “look for verbs or scenarios, those are member functions”  Not every noun is a class, not every verb is a method

15 A Computer Science Tapestry 7.15 Programming Tips, Heuristics, Help l Develop a core working program, add to it slowly  Iterative enhancement, test as you go, debug as you go l Do the hard part first, or do the easy part first  Which is best? It depends. l Concentrate on behavior first when designing classes, then on state  State is useful for communicating between method calls l If you’re using several classes, you’ll need to modify the Makefile or your project in an IDE: Visual C++/Eclipse

16 A Computer Science Tapestry 7.16 Common interfaces are a good thing l The class WordStreamIterator iterates over a file returning one word/string at a time string filename = PromptString("enter file name: "); WordStreamIterator ws; ws.Open(filename); for(ws.Init(); ws.HasMore(); ws.Next()) { cout << ws.Current() << endl; } l The class StringSet and StringSetIterator allow sets of strings to be iterated over one string at a time StringSet sset; sset.insert("banana"); sset.insert("cherry"); StringSetIterator it(sset); for(it.Init(); it.HasMore(); it.Next()) { cout << it.Current() << endl; }

17 A Computer Science Tapestry 7.17 Reuse concepts as well as code l Using the same syntax for iterating saves time in learning about new classes, will save coding when we learn how to exploit the commonality l We can develop different Question classes and “plug” them into a quiz program if the member functions have the same name  See quiz.cpp, mathquest.cpp, and capquest.cpp  Programs must #include different headers, and link in different implementations, but quiz.cpp doesn’t change l Random walk classes: one- and two-dimensional, can use the same driver program if the classes use the same method names

18 A Computer Science Tapestry 7.18 Random walks l Throwing darts (randomness in programs) is a technique for simulating events/phenomena that would be otherwise difficult  Molecular motion is too time-consuming to model exactly, use randomness to approximate behavior Consider the number of molecules in 10 -10 liters of a gas, each affects the other if we’re simulating motion 6.023x10 23 molecules/22.4 liters is (approx) 2.7e+12molecules  If we can do 100 megaflops, what does this mean? l Simulations are important in modeling nature, airplanes, airports, …, require pseudo-random numbers and some mathematics as well as programming

19 A Computer Science Tapestry 7.19 Simulations and solving problems We can find the area under a curve in two dimensions by integrating, see integrate.cpp  Sometimes can’t get exact solution, need approximations  Can use rectangles, trapezoids, parabolas, etc.  Can also use randomization, especially useful in multidimensional integrals where other techniques fail 3 10 y = x 2 3 10

20 A Computer Science Tapestry 7.20 Random walks l Interesting from a mathematical viewpoint, leads to understanding other random simulations l Two-dimensional random molecular simulations are fundamental to understanding physics (nuclear reactions) l Basic idea in one dimension: frog, object, molecule “decides” to step left or right at random  Where does object end up after N steps?  How many times does from visit each lily pad Assuming pads located at integer values, not real values  See frogwalk.cpp for details

21 A Computer Science Tapestry 7.21 Walking behavior (see frogwalk2.cpp) int main() { int numSteps = PromptRange("enter # steps",0,30000); RandomWalk frog(numSteps); // define two random walkers RandomWalk toad(numSteps); int samePadCount = 0; // # times at same location frog.Init(); // initialize both walks toad.Init(); while (frog.HasMore() && toad.HasMore()) { if (frog.Current() == toad.Current()) { samePadCount++; } frog.Next(); toad.Next(); } cout << "frog position = " << frog.Position() << endl; cout << "toad position = " << toad.Position() << endl; cout << "# times at same location = " << samePadCount << endl; return 0; }

22 A Computer Science Tapestry 7.22 Two-dimensional walker l One-d walker Current() returns an int as position l Two-d walker Current() returns a Point as position  Both int and Point can be compared using ==  Both int and Point can be printed using << l Same program works for two-d walker, even though underneath the implementation is very different  Since the interfaces are the same/similar, client programs are easier to write once, use many times  Client code still needs to #include a different header and must link in a different (two-d) walker implementation

23 A Computer Science Tapestry 7.23 What’s the Point? The two-dimensional walker uses #include " point.h "  This provides access to class Point declaration/interface  The class Point is actually defined using struct Point  In C++, a struct is a class in which everything is public by default In a class, everything is private by default A struct is really a hold-over from C, used in C++ for plain old data  Some programmers/designers don’t like to use structs in C++, but use classes only l We’ll use struct when data is public, when the state is really more important than the behavior  Guideline, data is private accept in a struct, other options?

24 A Computer Science Tapestry 7.24 point.h struct Point { Point(); Point(double px, double py); string tostring() const; double distanceFrom(const Point& p) const; double x; double y; }; l Two constructors, data is public, how is the (0,0) defined?  How is distance from (3,5) to (11,20) calculated?  How is a Point p printed?

25 A Computer Science Tapestry 7.25 Other details from point.h Points can be compared with each other using ==, =, etc. Point p can be printed using cout << p << endl;  Later we’ll learn how to overload operators like this  For now we’ll be clients, using Points like ints, BigInts, etc. The struct Point has constructors and other behavior  distanceFrom and tostring constitute the behavior  Some programmers think structs shouldn’t have any functions, holdover from C rather than C++ What is implemention of Point::distanceFrom like?

26 A Computer Science Tapestry 7.26 Other uses of structs l In a program using free (non-class) functions, lots of data is often passed from one function to another  In class-based programs data is often, though not always, part of a class and a class object is passed l Using structs to collect related data makes programs easier to read, modify, and maintain  Suppose you want to find mean, mode, and median of the lengths of words in a file, two alternatives: void doFileStats(const string& filename, double & mean, int & mode, int & median); void doFileStats(const string& filename, FileData& data);

27 A Computer Science Tapestry 7.27 More struct conventions It’s almost always worth including a constructor in a struct struct FileData { FileData() { myMean = 0.0; myMode = 0; myMedian = 0; } double myMean; int myMode; int myMedian; }; What other data might be included in FileData, what about other constructors?

28 A Computer Science Tapestry 7.28 Class (and struct) conventions For debugging and printing it’s useful for classes to implement a function tostring(), that " stringizes " an object  Also useful in overloading operator << for an object Point p; string s = p.tostring(); cout << s << " " << p << endl; l When initializing data in a constructor, it’s better to use an initializer list than to set values in the constructor body  Sometimes initializer lists are required (see next example), so using them at all times leads to more uniform coding that works in more situations

29 A Computer Science Tapestry 7.29 Initializer lists are sometimes required Consider a class that has a private Dice data member class Game { public: Game(); // more functions private: Dice myDie; // more data }; The instance variable myDie must be given a # sides, this cannot be given in the.h file/declaration, must be provided in the.cpp file/class implementation  It’s an error if an initializer list isn’t use

30 A Computer Science Tapestry 7.30 Initializer lists Here are two versions of an initializer list for Game::Game() Game::Game() : myDie(6) { } // if there’s more data, use initializer list Game::Game() : myDie(6), myName(“roulette”) { } l There can be code in constructor body to do more, e.g., read from a file  Sometimes it’s useful to call private, helper functions

31 A Computer Science Tapestry 7.31 Mary Shaw l Software engineering and software architecture  Tools for constructing large software systems  Development is a small piece of total cost, maintenance is larger, depends on well-designed and developed techniques l Interested in computer science, programming, curricula, and canoeing

32 A Computer Science Tapestry 7.32 Three phases of creating a program l The preprocessor is a program that processes a file, processing all #include directives (and other preprocessor commands)  Takes a file, and creates a translation unit  Replaces #include “foo.h” with contents of file foo.h, and does this recursively, for all #includes that foo includes and so on  Produces input to the next phase of program creation l The compiler has a translation unit as input and produces compiled object code as output  The object code is platform/architecture specific, the source code is (in theory at least) the same on all platforms  Some compilers require special treatment, not up to standard C++

33 A Computer Science Tapestry 7.33 From compiling to linking l The compilation phase creates an object file, but libraries and other files still need to be linked to create an executable  Header files like “dice.h” provide only the interface, enough for the compiler to know that a function call has the right parameters and is used correctly  The implemention file, “dice.cpp”, must be compiled and included in the final executable, or the program won’t work (call a dice function, but no one is home?) l Linking combines object files, some of which may be collected in a library of related files, to create an executable  Link the standard library (iostream, for example)  Link other libraries depending on program, graphics, tapestry, other application-specific libraries

34 A Computer Science Tapestry 7.34 Issues in creating a program l Programming environments create optimized or debug code  Use debug version to facilitate development  If you need optimization, use it only after a program works l Some errors are compilation errors, typically language syntax or failure to find a #include’d header file  The preprocessor looks in standard places for header files, sometimes this list needs to be changed l Other errors are linker errors, libraries or object files that are needed aren’t included  Change programming environment parameters to find the libraries


Download ppt "A Computer Science Tapestry 7.1 Why do we read from files (chapter 6)? l Lots of data stored in files  What's a terabyte? Petabyte? Where does this much."

Similar presentations


Ads by Google