Download presentation
Presentation is loading. Please wait.
Published byJohnathan Pitts Modified over 8 years ago
1
Text Mining for Music Research: Using word frequency to analyze content Janelle Varin The New School Music Library Association Conference Cincinnati, OH March 5, 2016
2
To start – Internet Archive and Voyant Locate a work on Internet Archive, or another digital source – for my research comparing orchestration treatises by different authors and from different years, I used “A Treatise on Modern Instrumentation and Orchestration” by Hector Berlioz, “Principles of Orchestration” by Nikolay Rimsky-Korsakov, and “Orchestration” by Cecil Forsyth. Download as a PDF Go to Voyant-tools.org and Copy/Paste your text into the window, or upload using the link on the lower left of the center window
4
Description of Method Eliminate stop words by going to the Options button (first one in upper right of Cirrus window), choose English, and select the box to Apply Stop Words Globally. Open lower 2 windows by clicking on titles. To see windows on right of screen, choose a word from the Words in the Entire Corpus list. Those tools show the distribution of a word throughout the entire work.
6
Description of Method, cont’d Export the top 50 words from your text by going to the second button (looks like a floppy disk) in the Words in the Entire Corpus window, select tabular data as plain text, and copy and paste that into a spreadsheet. Remove all but the first two columns, and sort the first alphabetically to combine plurals and other words with the same roots (be sure to combine word counts as well). This is sometimes referred to as stemming. Go back to Voyant to get more words and repeat above steps until you have 50 unique words.
7
Description of Method, cont’d To compare multiple works, alphabetize lists and place them side by side to look for words only used in one or two texts. Create a chart to aid in comparison – Since I was comparing orchestration treatises, mine included the top 10 words, unique words, and instruments mentioned in the top 50 words of each orchestration treatise. Data may require further cleaning up than what I described here.
8
Word frequency list, after stemming Alphabetical comparison list
9
Summary chart
10
Take Away Additional tools on Voyant for further research include Word Trends, Keywords in Context, Words in Documents, and the Corpus Reader (darker blue = mentioned more) Related research: Topic modeling identifies patterns and groups words into topics - MALLET, VUE, and the Stanford Topic Modeling Toolbox or TMT Sentiment analysis determines the mood of a text based on words used (Twitter / other social media) - Senticnet, the Python Natural Language Toolkit or NLTK, and GATE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.