Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scraping Facebook via API in R

Similar presentations


Presentation on theme: "Scraping Facebook via API in R"— Presentation transcript:

1 Scraping Facebook via API in R
. Shashank Hebbar, Ph.D. Student, Analytics and Data Science Kennesaw State University 1

2 What is an API? API is short for Application Programming Interface. Basically, it means a way of accessing the functionality of a program from inside another program. So instead of performing an action using an interface that was made for humans, a point and click GUI for instance, an API allows a program to perform that action automatically.  Todays API , usually refer to the API that are based on World Wide Web’s HTTP Protocol, that is also used by web servers and browsers to exchange data. 2

3 API Identification /authorization
API key (aka token). A key is used to identify the user along with track and control how the API is being used (guard against malicious use). A key is often obtained by supplying basic information (i.e. name, ) to the organization and in return they give you a multi-digit key. OAuth is an authorization framework that provides credentials as proof for access to certain information. Many APIs are open to the public and only require an API key; however, some APIs require authorization to account data (think personal Facebook & Twitter accounts) R has an extensive list of packages in which API data feeds have been hooked into R. You can find a slew of them scattered throughout the 3

4 Facebook API Register a new application
From Facebook Developer click on Apps at the top of the page to go to the application dashboard. Click the fb-create-new-app-button button near the top. Once you are done with the verification process, your application is created. Note down the App Id & App Secret 4

5 Create OAuth token to Facebook R session.
fbOAuth creates a long-lived OAuth access token that enables R to make authenticated calls to the Facebook API. 5

6 Functions from Rfacebook Package
function getLikes getLikes(user, n = n , token): Extract list of liked pages of a Facebook user with page id. Arguments: user: user name/ID , n: Number of liked pages to return for user. searchPages(, token, n = n): It Search pages that having a string/keyword. Arguments: string: any string , n: Number of pages to return function getPage  getPage(page , token, n = n): Extract list of posts from a public Facebook page. Missing Values have not been reported. For each Category: Outside = 344 (5.9% of total) Inside = 665 (11.47%) Out of bed = 659 (11.36%) Eating = 402 (6.93%) Bathing = 321 (5.54%) Toileting = 649 (11.19%) Dressing = 522 (9.00%) 6

7 Analyzing data from a Facebook page
 For example, assume that we're interested in learning about how the Facebook page Humans of New York has become popular, and what type of audience it has. The first step would be to retrieve a data frame with information about all its posts Using this data frame, it is relatively straightforward to visualize how the popularity of Humans of New York has grown exponentially over time. Missing Values have not been reported. For each Category: Outside = 344 (5.9% of total) Inside = 665 (11.47%) Out of bed = 659 (11.36%) Eating = 402 (6.93%) Bathing = 321 (5.54%) Toileting = 649 (11.19%) Dressing = 522 (9.00%) 7

8 Other API Packages in R Some of the popular packages are
 Some of the popular packages are blsAPI for pulling U.S. Bureau of Labor Statistics data rnoaa for pulling NOAA climate data rtimes for pulling data from multiple APIs offered by the New York Times The rnoaa package allows users to request climate data from multiple data sets through the National Climatic Data Center API. Unlike blsAPI, the rnoaa app requires you to have an API key. To request a key go to and provide your ; a key will immediately be ed to you.

9 What if there is no package for that API?!
Although numerous R API packages are available, and cover a wide range of data, you may eventually run into a situation where you want to leverage an organization’s API but an R package does not exist. This is where httr comes in. httr was developed by Hadley Wickham to easily work with web APIs. One of the popular function here is Get(). We use the Get() function to access an API, provide it some request parameters, and receive an output. httr is designed to map closely to the underlying http protocol. There are two important parts to http: the request, the data sent to the server, and the response, the data sent back from the server


Download ppt "Scraping Facebook via API in R"

Similar presentations


Ads by Google