BIG DATA A Revolution That Will Transform How We Live, Work, and Think Niko Sipola CT60A Critical Thinking and Argumentation in Software Engineering Spring 2015 CHAPTER 5: Datafication
TRADITIONAL PERCEPTION OF DATA BIG DATA So, datafication in big data picture Datafication Shift in thinking Shift in data collection “About taking information about all things under the sun” IMPROVED, EFFICIENT AND CHEAP TECHNOLOGY, TOOLS, ALGORITHMS Collecting limited portions of data Measurement errors allowed when all data is available Correlation over causation Exactness Sampling small data sets Causation
Book’s introduction to datafication Many people in tech industry consider the technological advancements the main reason why the big data transformation has emerged. However, it’s more because more data is available and because ”we are rendering more aspects of reality in a data format”. For understading what this data rendering means, the author introduces a US Navy officer Matthew Maury who lived in the 1800s. Sailing the seas in the early 1800s was more a matter of captains’ experiences and intuition than a matter of hard facts and maps. The ships would often zigzag at sea making journeys unnecessarily long- lasting. Maury gathered huge amount of captains’ logbooks, executed interviews and investigated related books and charts. After analyzing the data, Maury published a tome, The Physical Geography of the Sea, comprising of 1.2 million data points. The charts in the book described favorable currents, tides and winds usually cutting long voyages about a third. Maury’s work can be considered a successful pioneer of datafication – taking as many aspects as possible of reality and rendering it to data for analyzing.
What is datafication? There is actually no good term existing for this transformation Maudy created. The concept of datafication as such is defined in the book. ( *however there exists already similar definition through searching the wikipedia ) DATAFICATION = To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed. Datafication is not digitization – converting the analog information into the format of binary code. Datafication has existed long – since early civilizations (examples of Mesopotamia, invention of numbers for recording data). More precisely, measurements, data and analysis have provided many success stories in the history – like the Venetian traders in the 1500s that implemented a new data recording method: double entry system. For datafication needs answered: How to measure? How to record what is measured? (using metadata) Tools?
Modern examples of datafication Google Books: Google has scanned and datafied at least 20 million paper books (estimated 130 unique paper books has been published since the beginning of history). Different analysis can be done to the text within the books like searching the use of words or phrases over time. =>Culturomics (new discipline of science) Tracking location with GPS. GPS doesn’t work well indoors. Google Street View cars have collected wifi router information to get more precise location among with picture information. Also objects can be tracked with wireless modules. Cars are one example. Based on this information the car insurance can be priced and even driving routes can be optimized (UPS). By tracking cell phone locations, AirSage creates real-time traffic records. Datafication of human behaviour: Datafication of relationships by Facebook (1 billion users with 100 billion friendships in 2012) Datafication of thoughts and moods by Twitter (markets can use this info to estimate success of movie) Datafication of professional experiences by LinkedIn Quantified self: Measure every element of bodies and live according to this data for better life(e.g. sleep monitor Zeo, sportstrackers, health bracelets).
“We are in the midst of a great infrastructure project that in some ways rivals those of the past, from Roman aqueducts to the Enlightenment’s Encyclopédie.” “We fail to appreciate this because today’s project is so new, because we are in the middle of it, and because unlike the water that flows on the aqueducts the product of our labors is intangible. The project is datafication.”