presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology for cultural analysis) Dr. Lev Manovich Director, Software Studies Initiative, Calit2 + UCSD Professor, Visual Arts Department Follow our research: softwarestudies.com Learning from software
we will compare common methods of cultural analysis in humanities and principles behind Google search, Google Earth, Google Analytics, and Google Trends
1| data size cultural analysis: very small samples of cultural production search engine: every accessible web document (and now also twitter and facebook)
2| coverage cultural analysis: highly uneven coverage (some areas are covered in much higher resolution than others Google search technology/ Google Earth: aiming for even coverage of all territory at the same level of detail (space in Google Earth, web in Google search)
3| zoomability cultural analysis: document - creator - group - period - paradigm Google search technology/Google Trends: a single interaction/page - search patterns of billions of people over a number of years Google Earth: street view - Earth view
4|categorization cultural analysis: cultural objects are placed into small number of genres/categories search engine: analysis of each web document to generate its unique description (using 200+ signals) (while significant research in automatic classification of web pages into genres exists, Google does not use it)
5| links cultural criticism: analysis of small number of selective links (influences) between a given object/person and others search engine: systematic consideration of all (explicitly defined) links between a given web page and other pages
6|features (characteristics, attributes, dimensions) cultural analysis: small number of subjectively selected features diff. from text to text search engine: lots of features (always the same) Examples: Google: PageRank [considers] more than 500 million variables and 2 billion terms. Our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Sense Networks attributes 487,500 dimensions to every place in a city.
7|interaction cultural analysis: theoretical work on reception -but in practice analysis of documents as experienced by a critic. search engine: analysis of documents, links, search engine use web analytics: analysis of user interactions with a web site
summary: software developed in digital culture industry and academy (as exemplified by applications/services used above) often contains innovative theoretical ideas about culture embedded in its design (i.e. the steps taken by software to calculate the results). However, the applications of such software are often less innovative than the steps themselves.
example of standard applications: - search: looking for a particular members of a set. - classification of cultural content into small number of genres.
example of a new application which uses some of the same steps : 1) extract features from each document in a set; 2) instead of using the features to classify documents into a few classes, visualize the patterns and variability across the set
Map of Science visualization of scientific paradigms - can we create similar maps of of cultural fields which show hundreds or thousands of clusters - instead of dividing everything into a few genres?
lets take selected principles from search engines (and data analysis in general) + web analytics and Google Trends (interactive visualization of patterns) + Google Earth (continuous zoom and navigation) + manyeyes (visualization, sharing of data and analysis) and imbed them in new software tools for researching, teaching and exhibition of culture we can call research which uses these principles and software tools cultural analytics
goals of cultural analytics: - being able to better represent the complexity, diversity, variability, and uniqueness of cultural processes and artifacts - develop techniques to describe the dimensions of cultural artifacts and cultural processes which until now received little or no attention (such as gradual temporal change) - create much more inclusive cultural histories and analysis - ideally taking into account all available cultural objects created in particular cultural area and time period (art history without names) - democratize cultural research by creating open-source tools for cultural analysis and visualization - create interfaces for exploration of cultural data which operate across multiple scales - from details of structure of a particular individual cultural artifact/processes to massive cultural data sets/flows
cultural analytics - typical steps: -1) description (i.e, culture into data): a) manual: annotation, tagging b) automatic: software analysis of media; capturing user activity our focus: easy-to-use techniques for automatic description of visual and interactive media 2) optional: statistical data analysis 3) data visualization (reduction, summarization) and data mapping (expansion, outlining, layering) our focus: new visualization + mapping techniques appropriate for interactive exploration of large sets of visual objects 4) interpretation (humanities), or explanation (science), or correlation (social science)
visualization of cultural data - visualization types: 1) visualization without doing additional annotation / automatic analysis a) display all objects in a set together organized by exiting metadata (for instance, dates, artist names, etc.) b) sample and re-order (for instance: montage, slice) 2) visualization after doing additional data analysis/annotation c) visualization of newly generated metadata (graph) d) display objects organized by metadata (image graph) d1) 2D sorted view d2) 2D image graph - using single feature for each dimension d3) 2D image graph - using combination of features (PCA, etc.)
theoretical issues: Can analysis of a cultural artifact/user experience in terms of separate features still capture overall gestalt? Culture does not equate cultural artifacts. How can we automatically analyze context in a meaningful way? If cultural process/activity is more important than the outputs being produced, how to conceptualize and visualize this? Statistical paradigm (using a sample) vs. data mining paradigm (analyzing the complete population). Modernity/normal distribution vs. Software Society/power law. Pattern as a new epistemological object. From meaning to pattern: humanities have been focused on interpreting the meanings of a cultural artifact/process. Today we can easily uncover the meanings of each cultural artifact - but we dont know the larger patterns they form. The new scale of culture points toward a pattern as a new unit of analysis (because we can not afford to consider meanings of every single artifact.) From small number of genres to multi-dimensional space of many features where we can look for clusters and patterns. This maybe the only way to contemporary analyze design and media created with software. (See next slide).