Text visualisation
Aims Examine some visualisations Understand the uses of visualisation and consider the advantages disadvantages Critically assess particular visualisations
Some considerations Corpus analysis is becoming complex and multi-dimensional Corpora are heavily marked-up many dimensions of information difficult to work with directly difficult to see all the possible relations in a multidimensional space (text is linear; info is multidimensional)
(big) Data visualisation Some corpora are billions of words In the digital world, big data is common Some visualisation is necessary Use visualisation for: presentation query/analysis of data exploration (and analysis)
How to represent data All visual elements should be meaningful E.g.. no 3-D graphs to represent two dimensions of information Visualisation helps to reveal complex processes
Networks © Stephen Eick, Bell Labs
Edward Tufte guru of representation of data
Simple visualisation – presentation of data
Graph Simple Easy to understand (axes are labelled) No distortion of the data Colour is meaningless – contrasting colours used to distinguish the lines on the graph
Birthday visualisation Works well We are familiar with the frame (months/days) (What is the frame for a text?) Frequency represented by intensity of colour
https://www.nytimes.com/interactive/2016/07/29/us/elections/trump-clinton-pence-kaine-speeches.html
Word cloud
Visualisation – (good and bad)
Zooming in
Highlighting hapax legomena
COCA
Part-of-speech only
Networks Email at Enron -- Jeffrey Heer
TextArc
TextArc
Textarc http://www.textarc.org/
Voyant tools
Representing text Series of pages Linear block Circle - Spiral Network Multiple-layers (and links) Word plus connections (stats)
Corpora and text visualisation How can we deal with complexity of corpora? Uses XML structures to hide/reveal dimensions of the corpus Browse looking for patterns ?? Zoom in on areas of interest Switch to non-visual mode for analysis
Analysis metaphors Object-oriented -- transform yourself to show X Zoom and pan -- through text
Issues in Visualization Visualization is a transformation of data How to transform in a revealing way How to transform without giving a false picture Basic problem -- how to represent a text
Text Analytics New commercial application of text analysis Follows on from Google analytics etc. Extracting info from unstructured documents (such as customer emails, customer complaints)
Sentiment analysis
Tolkein books http://lotrproject.com/statistics/books/sentimentanalysis
GraphColl