Modified from Sharon Guffy Data Types June 23, 2016 Amy Allen Modified from Sharon Guffy
Data Types Overview Primitive data types Data structures Integers Floating point numbers Boolean Characters Data structures Primitive types = data types (aka numeric types and characters) Aggregate types: ordered or unordered Brief discussion of how to make your own data types in some languages
Integers Number with no fractional part Stores the exact value, so == comparisons are safe Storage: one bit to indicate if the number is positive or negative a binary representation of the number Absolute size limit (system-dependent) E.g. a 32-bit system could store integers within ± (231 - 1) Converting floating point numbers to integers truncates the decimal part (2.9 2, -2.1 -2) Exact storage of the data, but this comes with a size limit Relatively large (aka, 32 bit system has 31 bits to store the integer, 1 bit to store positive vs negative) If you convert a number from float to integer, it will NOT ROUND, just truncates
Special types of integers Long integers Take up more space in memory than integers, but can store larger values Short integers Take up less space, but have smaller range Unsigned integers Same amount of space as integers Only store integers >= 0 Useful for values that can never be negative (count, class size, etc.) Can adjust the size limits of integers Benefits/pitfalls to each size Don’t want to store a small integer with a ton of space Need to remember that a negative number should only be stored as a negative (otherwise problems can arise)
Integer Overflow Overflow: Integer exceeds its maximum/minimum size Different systems deal with overflow differently Some systems (e.g. R) may give you an overflow error In other cases, values may wrap: <largest possible positive integer> + 1 = <largest possible negative integer> Others (e.g. Python) automatically convert to a long integer How do these details affect you?? If size of integer is larger than limit: best thing to happen is an error. Worst thing to happen is a wrap-around The computer doesn’t know what to do. Basically goes to the top limit and then retarts at the bottom (negative) limit Language specificity with how the computer deals with problems Example: Storing a positive (12) or negative (-12) as an UNSIGNED integer Watch out for the giant number in place of the size that you expected
Floating point numbers Numbers stored in scientific notation: Sign Power of ten Binary representation of fractional part i.e. 10.2 = +1.02*101 Double precision: Uses additional space, so it can store larger numbers and more decimal places These are stored in your computer similar to how it is in scientific notation Some number to store the exponent, some number to store the value, some number to store the sign Typically size of storage is set to be muh larger than that of an integer Remember, all restricted by storage size Major problem will be accuracy of number storage. Because computer store everything as binary fraction NOT ALL NUMBERS THAT CAN BE RECORDED IN BASE TEN CAN BE STORED IN BINARY Type Smallest Positive Value Largest Value Precision float 1.17549⨉10-38 3.40282⨉1038 6 digits double 2.22507⨉10-308 1.79769⨉10308 15 digits
Special floating point numbers Special values Inf (infinity) and –Inf (negative infinity) NaN (not a number)—produced from operations without a definite answer (Inf/Inf, etc.)
Problems with floating point numbers Since floating point numbers are stored as fractions in binary, not all numbers can be stored precisely, so math may not give exactly the answer you expect: >>> x = 3.2 >>> y = 1.1 >>> x+y == 4.3 False Hmmmm, that’s not right . . . >>> x+y 4.300000000000001 Therefore, it is best not to check equality of floating point numbers. Think about storing 1/3 as a decimal. .333333x3 = .999999 NOT 1.0 as it should be Remember the restrictions of the tools and rules of your language and number system
Boolean Used to store true/false values Can be converted from any other type. False values (may vary by language): Numeric types: 0 Empty aggregate types ([], “”, {}, etc.) All other values are true Commonly used in loops and control structure This is what your computer loves to store: 1 or 0, yes or no, on or off Almost anything can be sored as a Boolean value How the computer sees it False: nothing, none, 0 True: something
Characters and Strings Think of characters and strings and none numeric. A character is a single symbol and a string multiple symbols strung together Differs significantly in Python and R In Python characters are a string of length one In R strings are still considered under character class
Data Structures Overview A particular way of organizing data in a computer so that it can be used efficiently Range from very basic to very complicated Can be as simple as an ordered list of numbers Can be more complicated like a graph
Why is all of this important? You want to make sure you are storing and accessing your data as efficiently as possible. You need to understand the pit falls of the data types and data structures you’re using so you can trouble shoot errors.