Download presentation
Presentation is loading. Please wait.
1
November 8th, 2017 Matthew Davis and John Fink
Do More with Digital Scholarship: Building Data Sets Using Social Media November 8th, 2017 Matthew Davis and John Fink
2
What is a social media data set?
What is a data set? What is a social media data set? A data set is a collection of similar information, sharing a structure, that covers a fixed period of time Incoming student transcripts for the years Medical records for a research trial Records of bequests kept in a parish register or courthouse A social media data set is records of entries on social media sites like Twitter or Facebook over a period of time. The structure comes from the internal structures the particular social media site uses to organize its data, generally exposed to the world through an API
3
What is an API? Stands for Application Programming Interface
The set of rules that software programs follow in order to communicate with each other There’s a lot of API’s out there, and if you’re going to work with computer programs and data you will be using more than one – and probably already are without realizing it For our purposes, we’re going to work primarily with Web API’s
4
What’s a Web API? A Web API basically a set of commands that you use, via regular http methods to interact with a web client. Disqus
5
Some Caveats Because you’re using a structure that the social media company has provided, you are limited to what information they’re willing to give you access to. Social media companies make their money off of data. These API’s are designed for business purposes, not for academic use, and are structured accordingly. Some API’s may not be publically accessible or may be experimental. If you try to grab too much information from these the company can and will shut you down. Not all the data you retrieve will be valid! Social media companies have no incentive to filter fake accounts out from their results. The responsibility to verify the utility of the data you retrieve rests on you as the researcher. Be ready to do a lot of sifting.
6
Facebook (taken from https://towardsdatascience
To access the Facebook API outside of Facebook’s web interface, you have to have a developer’s token. This is because Facebook assumes you’re building an app to work with their service. Go to developers.facebook.com and create an account there. You may find it will take you directly to the apps page. If this is the case, skip the next bulletpoint. Go to “My apps” drop down in the top right corner and select “add a new app”. Choose a display name and a category and then “Create App ID”. Once you’ve created the account, go to developers.facebook.com/tools/explorer. You will see “Graph API Explorer” below “My Apps” in the top right corner. From “Graph API Explorer” drop down, select your app. Then, select “Tools and Support.” Click on “Access Token Tool” or navigate to developers.facebook.com/tools/accesstoken. Select “Debug” corresponding to “User Token.” Go to “Extend Token Access.” This will ensure that your token does not expire every two hours. Note that you may need to grant permissions on your application in order to get an access token. Once you have an active token, navigate back to developers.facebook.com/tools/explorer
7
LiveSlide Site
8
Twitter Twitter is an entirely different animal than Facebook in terms of what information is available and how it’s organized, but the way you gain access to collect it is much the same. Navigate to and select “Create New App.” Note that Twitter requires you to have your phone registered with them to create developer tokens. Fill out and submit the form. Click on “Keys and Access Tokens on the resulting page. These are what you need to access Twitter’s information via tool.
9
Twarc Twitter has its own internal tool, called Twurl ( and written in Ruby, but the Twarc Python tool ( is more powerful and has been around longer. Follow the instructions at the linked page (basically, twarc configure after you’ve installed it) and put your consumer and access keys where indicated. The github page provides information on how to perform searches and export information to .json files, .html, and more.
10
Extracting your data Once you’ve generated your data in Facebook, you can cut and paste it into a plain text file. Save this file with the extension .json If you are using twarc, you just need to add >FILENAME.json to your command and it will save the results in a .json file
11
LiveSlide Site
12
Matthew Davis davism17@mcmaster.ca John Fink jfink@mcmaster.ca
Thank you! Matthew Davis John Fink
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.