Using Voyant to Explore Text Data

Slides:



Advertisements
Similar presentations
Introduction to OBIEE:
Advertisements

The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Business Objects For End Users BI_BOBJ_200 1BI_BOBJ_200 Business Objects for End Users.
From Words to Meaning to Insight Julia Cretchley & Mike Neal.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
PubMed/How to Search, Display, Download & (module 4.1)
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
Inking. 2 Pen Basics You MUST keep your pen tethered at all times. If you lose the stylus, the replacement cost is $30. Buttons should face YOU in garage.
InDesign CS3 Lesson 4 ( Only pages ) Importing and Editing Text.
Introduction to Canvas K-5
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
USING WORDPRESS TO CREATE A WEBSITE (RATHER THAN A BLOG) STEP-BY-STEP INSTRUCTIONS.
0 eCPIC Admin Training: OMB Submission Packages and Annual Submissions These training materials are owned by the Federal Government. They can be used or.
A STUDENT’S GUIDE TO ADDING IMAGES TO NEW OR EXISTING BLOG POSTS Adding Images to Your Blog Post Presented by Michelle Krummel.
Research Using Ebooks via the Media Center. Research usingEbooks.
USING WORDPRESS TO CREATE A WEBSITE (RATHER THAN A BLOG) STEP-BY-STEP INSTRUCTIONS.
Creating Google Sites Laura Assem, Director of Technology.
Photoshop Image Slicing. Reasons to Image Slide To create links out of sliced images To optimise different areas. (flat areas of colour, such as logos,
Easy WP Guide V2.6 for WordPress 3.8. easywpguide.com Adding Tags within your Post Adding Tags whilst editing your Post, will automatically assign those.
Inserting an Image Using the WordPress media manager, it’s extremely simple to insert, align and link your individual images and image galleries. To insert.
E-PORTFOLIOS E-PORTFOLIOS Building For Your Future.
Text Mining for Music Research: Using word frequency to analyze content Janelle Varin The New School Music Library Association Conference Cincinnati, OH.
Reporter Training for High School RIO TM
TechKnowlogy Conference August 2, 2011 Using GoogleDocs for Collaboration.
ScotPHO profiles storyboard This presentation provides guidance and support to those who wish to use the ScotPHO profiles. The following topics are covered:
PowerPoint Adding Hyperlinks and Hiding Slides Learn to Link to websites and other slides in the presentation! Adding Hyperlinks and Hiding Slides Learn.
Overview Review Elements
AdisInsight User Guide July 2015
General System Navigation
Adobe Premiere interface overview
Creating a Document MOAC Lesson 1.
Getting an account with WordPress.com
Formatting a Research Paper
Creating & Customizing Business for Sale Websites
Microsoft PowerPoint Illustrated
Introduction to OBIEE:
Creating, formatting, and editing graphs using Google Sheets
The Smarter Balanced Assessment Consortium
Data Visualizer.
Understanding Search Engines
Adding a File to a Course
Managing and Printing Documents
The Smarter Balanced Assessment Consortium
Updating Your Section’s Website
Objectives Create a media query Work with the browser viewport
Collaboration with Google Docs
The How-to-Guide for Using Word
How to customize your Microsoft SharePoint Online website
Midwest NASCOE Tutorial
Managing Your Literature Search Using Zotero
Overview Review Elements
Extranet User Instructions
Windows Internet Explorer 7-Illustrated Essentials
Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy
The Smarter Balanced Assessment Consortium
How to undertake a Specialty review using Discovery
Literary reference center
The Smarter Balanced Assessment Consortium
NORMA Lab. 5 Duplicating Object Type and Predicate Shapes
How to customize your Microsoft SharePoint Online website
An overview of the online edition
Reporting Site Manager User Guide February 2019.
Tutorial 7 – Integrating Access With the Web and With Other Programs
Creating Online Presentations
The Smarter Balanced Assessment Consortium
PubMed/How to Search, Display, Download & (module 4.1)
The Smarter Balanced Assessment Consortium
Reporter Training for High School RIOTM
Presentation transcript:

Using Voyant to Explore Text Data Connected Health Conference December 14, 2016

Contents Overview Data Default Dashboard Review of Voyant “Tools” Word Cloud Reader Bubble Lines Links Break out into groups and explore tools Overview Discuss increasing use of unstructured data (words on websites, social media, text applications) Need to quickly understand that data

Overview Rise in unstructured text data (e.g. websites, blogs/forums, social media, SMS applications) Challenge of understanding what is in all that data without some visualization tools Voyant is a browser-based tool that helps you explore different themes and patterns in your data, pinpointing areas for further exploration and analysis Need input from Jill and Nancy for good global health practitioner applications

Data A sample of ~4,700 tweets that mention the word “zika” from Nov 2015 to Nov 2016 Data structured so that the beginning of the file corresponds with the earlier dates Can use other data formats in Voyant such as: txt, htm, html, xml, doc, docx, rtf, pdf What questions might you have about what people are saying about Zika? Brainstorm questions: What measures to prevent Zika are people more familiar with? Mosquito avoidance? Sexual transmission prevention? Did people’s understanding of preventions measures change over time? (especially as governmental agencies implemented awareness campaigns)

Retrieve Dataset and Load into Tool 4,700+ tweets that mention “zika” (11/2015 – 11/2016) Copy/paste text or URL in box (text will be retrieved from specified URLs), or click upload and find your file

Default Dashboard 5 Panels: Word Cloud, Text, Trends, Summary, Context Word Clouds are a visual representation of word counts for each word in your data. The larger the displayed word, the more frequently is occurs in your data. The Reader shows the actual text of the document you uploaded Trends show the distribution of the top 5 words throughout the course of the document – the horizontal axis shows a default of 10 equal sized segments in your data (here, the first segment corresponds to your earlier data and the 10 segment should contain your most recent Nov 2016 data) Summary provides the frequency of the top 5 most common words in the data as well as figures for the number of documents you uploaded, the total number of words, and the number of unique words Contexts shows each occurrence of a keyword with a bit of surrounding text (the context). It can be useful for studying more closely how terms are used in different contexts. 5 Panels: Word Cloud, Text, Trends, Summary, Context

Word Cloud – Terms to Display Click and slide the icon in the lower left to choose how many terms to display Hover to left of the “?” in the upper right corner and click options Next to “Stopwords”, click “Edit List” Play with this wordcloud Adjust the number of terms you want to display Click on a word and see how the panels to the right of it (text panel and trend panel) update

Wordcloud – Edit Stopwords Stopwords are words that will be filtered out from analysis. Typically, these words do not provide additional insights when exploring our data Stopwords to add : t.co https zika rt virus Sometimes, you can start by seeing what words are most frequently used in your data – in this case, we see our top 5 most used words in the lower left summary box on the default dashboard: t.co, https, zika, rt, virus. Because none of those terms provides insights for us, we can remove them (along with a few others). Remember to type only one word per line (hit enter after each word) When you update the list of stopwords, it will update this throughout your dashboard (in all panels) Term Limits: 105

Reader The Reader tool provides a view of the text from the data Hover over a word to see frequency of word Click on word to reveal distribution of the word throughout your data Search box to find specific words See the cursor hover over three words in the Reader panel: Babies (83) Symptoms (22) Pregnant (251) Now click on symptoms and look how a line graph populates in the grey ribbon below the panel. The line represents the frequency of the word “symptom” from start to end of dataset (again, where the leftmost part corresponds to the earlier dates and the rightmost part to the later dates) You can use this feature when looking for specific themes in your data

Bubble Lines How can bubble lines help us? They help us not only visualize how common a word is, but also where it is located in the document and compare it to other words Here we look at mosquit* repel* pregnan* sex* The * denote wildcard characters that occur after the stem so I can capture plurals and other forms of the same word After entering the words, click the box next to Separate Lines for Terms What do you observe? Bubble Lines displays the frequency and repetition of a word’s use in a dataset. Each line is broken up into equal parts representing the beginning, middle, and end of a document or dataset. Larger bubbles = higher frequency The left side of the line corresponds to earliest Tweets starting in Nov 2015 and the rightmost end represents recent data (Nov 2016)

Links Links shows a network graph of higher frequency terms that appear in proximity. Keywords are shown in green and collocates (words in proximity) are showing in orange. Features include: - hovering over keywords shows their frequency in the corpus - hovering over collocates shows their frequency in proximity (not their total frequency) - double-clicking on any word fetches more results a search box for queries (hover over the magnifying icon for help with the syntax) Let’s hover over some of the frequently occurring words such as cases (329) and health (???) – you’ll notice that words that occur close to cases and health are shown with the red web Now let’s increase the context so we include more words that occur close to words we’re interested in

Resources Voyant Documentation http://docs.voyant-tools.org/ Description of Tools http://docs.voyant-tools.org/tools/