Arrays
The need for arrays Up until now we have only had to deal with programs that used a small number of variables. If we wanted an integer we declared one. If we wanted a real number we declared one. Often there is a need for programs that are able to handle many variables. Our approach has serious limitations in this regard.
Example Imagine that your friend is taking Chemistry and one day approaches you with the following : “Hey, I hear that you are learning how to program computers. Would you be willing to help me out with a problem in my Chem class?”
Yes, go on. “Here’s the deal. We are running an experiment that generates 5 different values. All I need is a simple program to read 5 numbers and figure out the average. Can you do it? Oh, and the program needs to print out the data at the end so we can check it.” This is easy to do, so you agree and come up with the following program...
Chemistry program, version 1 PROGRAM Chem REAL average, sum, num1, num2, num3, num4, num5 PRINT*, “Please enter 5 numbers” READ*, num1, num2, num3, num4, num5 sum = num1 + num2 + num3 + num4 + num5 average = sum / 5 PRINT*, “The average is: “, average PRINT*, “The numbers were: “, num1, num2, num3, num4, num5 END PROGRAM Chem
Then... So you compile the program and give your friend a copy of the compiled version. Then after a week your friend is back again... “Hey thanks for the program! It works great! Can I ask you to make a small change in it for me though? This week we are supposed to run the experiment 3 times and average all 15 numbers!”
Chemistry program, version 2 PROGRAM Chem REAL average, sum, num1, num2, num3, num4, num5 REAL num6, num7, num8, num9, num10, num11, num12 REAL num13, num14, num15 PRINT*, “Please enter 15 numbers” READ*, num1, num2, num3, num4, num5, num6, num7, & num8, num9, num10, num11, num12, num13, & num14, num15 sum = num1 + num2 + num3 + num4 + num5 + num6 + & num7 + num8 + num9 + num10 + num11 + num12 + & num13 + num14 + num15 average = sum / 15 PRINT*, “The average is: “, average PRINT*, “The numbers were: “, num1, num2, num3, num4, num5, & num6, num7, num8, num9, num10, num11, num12, & num13, num14, num15 END PROGRAM Chem
What next? Clearly it was a major pain to have to rewrite the program to handle 15 numbers, then recompile it, etc. You hope you don’t have to do that again. But the next week your friend comes back once more… “Great job! Everyone loves it. Now we want to average all the numbers that the whole class has gotten.
What next? There are 8 lab groups and each one has 15 values. Can you fix the program to handle that?” OK. Now we have a problem. It is going to be VERY laborious to make this change. Plus who knows what will come next. Maybe they will want to average all values for all classes, or for several semesters. Arrrrrg!
A solution What is happening here is that we designed the initial program to suit one situation, without any thought for the future. We should have written it so that it gave us a more general solution to the problem. But what might that be? After a few moments of thought and a quick writeup of an algorithm, we come up with the following
Chemistry program, version 3 PROGRAM Chem REAL average, sum, num, n, i sum = 0 PRINT*, “How many numbers will you enter?” READ*, n DO i = 1, n READ*, num sum = sum + num ENDDO average = sum / n PRINT*, “The average is: “, average END PROGRAM Chem
Done? This solution could stand some error checking, but it is otherwise alright. If we compile it and run it we can then give it to our friend and never worry about them coming back and bothering us for changes in the number of values being entered. We should have done this the first time!
Done? Maybe not. The design still has several flaws however. One of them is that the user must know exactly how many values to enter. This is very user-unfriendly. Let’s rewrite the program so that it can accommodate any number of values without being forced to have to count them before the program can be used.
Chemistry program, version 4 PROGRAM Chem REAL average, sum, num, n sum = 0 n = 0 DO READ (*, *, END = 20) num sum = sum + num n = n + 1 ENDDO 20 average = sum / n PRINT*, “The average is: “, average END PROGRAM Chem
Done? Still maybe not. This version is much better because it does not ask the user to tell you how many values must be entered before it is used. However, both of the last two versions are unable to do something that the first two versions (the dumb ones) were easily able to do? What is it?
old vs. new version Old version What does the old version do REAL average, sum, num1, num2, num3, num4, num5 PRINT*, “Please enter 5 numbers” READ*, num1, num2, num3, num4, num5 sum = num1 + num2 + num3 + num4 + num5 average = sum / 5.0 PRINT*, “The average is: “, average PRINT*, “The numbers were: “, num1, num2, num3, num4, num5 Old version REAL average, sum, num, n sum = 0 n = 0 DO READ (*, *, END = 20) num sum = sum + num n = n + 1 ENDDO 20 average = sum / n PRINT*, “The average is: “, average What does the old version do that the new version can’t? New version
old vs. new version Old version Answer: It prints the New version REAL average, sum, num1, num2, num3, num4, num5 PRINT*, “Please enter 5 numbers” READ*, num1, num2, num3, num4, num5 sum = num1 + num2 + num3 + num4 + num5 average = sum / 5.0 PRINT*, “The average is: “, average PRINT*, “The numbers were: “, num1, num2, num3, num4, num5 Old version REAL average, sum, num, n sum = 0 n = 0 DO READ (*, *, END = 20) num sum = sum + num n = n + 1 ENDDO 20 average = sum / n PRINT*, “The average is: “, average Answer: It prints the numbers out after the average so they can be verified. New version
Now what? In order to verify that the program works correctly, it is necessary for the data to be printed out after the average has been calculated. The first program could do this. Neither of the loop versions can. This is because they destroyed the data as they read it in. By using only one variable we lose data!
The need for arrays The best solution to our dilemma involves reading in data in a general way, so that the user can enter as much as they want, while storing the data in many different memory cells so that we have it available later. The data structure for doing this is called an array.
The need for arrays The best solution to our dilemma involves reading in data in a general way, so that the user can enter as much as they want, while storing the data in many different memory cells so that we have it available later. The data structure for doing this is called an array.
Arrays An array is a contiguous block of memory cells that store data all of one type. Each individual item in the array is called an ‘element’ There is one name for the array, and we use an integer (called an index) to specify which element we want.
Array basics (1) (2) (3) (4) (5) num Array name 2 Index values 7 Elements 3 8 6
Array basics (1) (2) (3) (4) (5) To refer to any element in the array you use its name and its index value. For example: PRINT*, num(2) prints out the number 7 num (1) (2) (3) (4) (5) 2 7 3 8 6
Array basics (1) (2) (3) (4) (5) To print out all of the values in an array we use the loop control variable as the array index. For example: DO i = 1,5 PRINT*, num(i) ENDDO num (1) (2) (3) (4) (5) 2 7 3 8 6
Array loop basics (1) (2) (3) (4) (5) DO i = 1,5 PRINT*, num(i) ENDDO Output 2 (1) (2) (3) (4) (5) 2 7 3 8 6
Array loop basics (1) (2) (3) (4) (5) DO i = 1,5 PRINT*, num(i) ENDDO Output 2 7 (1) (2) (3) (4) (5) 2 7 3 8 6
Array loop basics (1) (2) (3) (4) (5) DO i = 1,5 PRINT*, num(i) ENDDO Output 2 7 3 (1) (2) (3) (4) (5) 2 7 3 8 6
Array loop basics (1) (2) (3) (4) (5) DO i = 1,5 PRINT*, num(i) ENDDO Output 2 7 3 8 (1) (2) (3) (4) (5) 2 7 3 8 6
Array loop basics (1) (2) (3) (4) (5) DO i = 1,5 PRINT*, num(i) ENDDO Output 2 7 3 8 6 (1) (2) (3) (4) (5) 2 7 3 8 6
Array declarations Arrays are declared by giving their type and size (just as with other variables) Example: INTEGER, PARAMETER :: n = 5 REAL, DIMENSION(n) :: num Once declared they can be used anywhere any other variable would, as long as you specify both the array name and the index value of an element.
General solution at hand So, now that we have the array tool at our disposal we have can come up with a solution to the problem that we were not able to before.
Chemistry program, version 5 INTEGER, PARAMETER :: n = 5 REAL, DIMENSION(n) :: num REAL average, sum sum = 0 DO i = 1, n READ*, num(i) sum = sum + num(i) ENDDO average = sum / n PRINT*, “The average is: “, average PRINT*, “The numbers were: “ PRINT*, num(i) We read in the values using the loop control variable to specify the array element index location of where the data should go. Since all the data is in the array we can print it by traversing the array.
Only one problem left We can now read in data, store it all without having to create individual variables one by one (like versions 1 and 2). We have all the data around for easy access later in the program. In addition, we can use a loop to generalize how the array of data items is read, printed, etc. Everything is perfect, except for one thing...
A step backward In order to declare the array, we must know how many elements it will have. Now we are back to a situation similar to what we started with. Every time the user has a new data set we must physically edit the settings in the program, recompile it and then re-release it. How can we get around this?
Chemistry program, version 5 INTEGER, PARAMETER :: n = 5 REAL, DIMENSION(n) :: num REAL average, sum sum = 0 DO i = 1, n READ*, num(i) sum = sum + num(i) ENDDO average = sum / n PRINT*, “The average is: “, average PRINT*, “The numbers were: “ PRINT*, num(i) Here is the problem. What if we don’t know how many numbers will be entered? OR What if we do know but the number is not 5?
A workable solution We cannot get around having to set n to some value. So what we do is as follows. We anticipate the maximum number of values this program will ever use and double it. That number is used to dimension the array.
Chemistry program, version 6 INTEGER, PARAMETER :: size = 2000 REAL, DIMENSION(size) :: num REAL average, sum, n sum = 0 DO i = 1, size READ (*,*,END=20) num(i) sum = sum + num(i) ENDDO 20 n = i - 1 average = sum / n PRINT*, “The average is: “, average PRINT*, “The numbers were: “ DO i = 1, n PRINT*, num(i) size is used to set the size of the array. n will be set to the number of values actually stored in it. Although there are size elements we only use the first n number of them.
Waste of space We now have the best of all worlds in terms of the algorithm. However, we have created a new problem for ourselves in terms of storage. The array is dimensioned to 2,000 but only n elements will actually be used. This means that 2,000 - n spaces are wasted. This is what happens when array sizes are dimensioned at compile time.
A possible solution We could get around the wasted space by waiting until we know what n was before we allocated storage for the array. Unfortunately, this is not allowed. All variable declarations must happen first, before any executable statements.
This is not valid INTEGER :: n executable statements INTEGER :: n PRINT*, “How many numbers will you enter?” READ*, n REAL, DIMENSION(n) :: num An invalid declaration
Runtime allocation FORTRAN 90 does allow you to create arrays ‘on-the-fly’ as a program executes using the keyword ALLOCATABLE This means that an array is able to be allocated once in the executable section of the program. This allows our last program segment to actually work.
This is valid INTEGER :: n REAL, DIMENSION(n), ALLOCATABLE :: num Declare an allocatable array INTEGER :: n REAL, DIMENSION(n), ALLOCATABLE :: num PRINT*, “How many numbers will you enter?” READ*, n ALLOCATE(num(n)) Allocates memory for the array num of size n
Appropriateness issues Although run-time allocation of an array would make our use of storage more efficient, it could only be used in a situation where determining what n was did not involve counting the number of items as they were being entered. Like our WHILE not end-of-file solution This is because we need to store the data in the array as we read it.
The balancing act Run-time allocation is therefore appropriate whenever the question “How many data values will you enter?” is appropriate. Usually this would only be for small values of n. In all other instances, we should declare the larger compile-time array and use only that portion that we need. Note: later in the semester we will learn of other ways to store data besides arrays that can also handle run-time allocation.