Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining twitter 1.9, 1.10 1131036001 김종명. 1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts.

Similar presentations


Presentation on theme: "Mining twitter 1.9, 1.10 1131036001 김종명. 1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts."— Presentation transcript:

1 Mining twitter 1.9, 1.10 1131036001 김종명

2 1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts of data, such as the friend and follower ids for a very popular Twitterer; however, the Twitter API is inherently unreliable and imposes rate limits that require you to always expect the unexpected. Solution –Write an abstraction for making twitter requests that accounts for rate limiting and other types of HTTP errors so that you can focus on the problem at hand and not worry about HTTP errors or rate limits, which are just a very specific kind of HTTP error.

3 Error Codes & Responses CodeTextDescription 200OK Success! 304Not Modified There was no new data to return. 400Bad Request The request was invalid. An accompanying error message will explain why. This is the status cod e will be returned during version 1.0 rate limiting. In API v1.1, a request without authentication is considered invalid and you will get this response.rate limiting 401Unauthorized Authentication credentialsAuthentication credentials were missing or incorrect. 403Forbidden The request is understood, but it has been refused or access is not allowed. An accompanying er ror message will explain why. This code is used when requests are being denied due to update li mits.update li mits 404Not Found The URI requested is invalid or the resource requested, such as a user, does not exists. Also ret urned when the requested format is not supported by the requested method. 406Not Acceptable Returned by the Search API when an invalid format is specified in the request. 410Gone This resource is gone. Used to indicate that an API endpoint has been turned off. For example: " The Twitter REST API v1 will soon stop functioning. Please migrate to API v1.1." 420Enhance Your Calm Returned by the version 1 Search and Trends APIs when you are being rate limited.you are being rate limited 422Unprocessable Entity Returned when an image uploaded to POST account/update_profile_banner is unable to be proc essed.POST account/update_profile_banner 429Too Many Requests Returned in API v1.1 when a request cannot be served due to the application's rate limit having b een exhausted for the resource. See Rate Limiting in API v1.1.Rate Limiting in API v1.1 500Internal Server Error Something is broken. Please post to the group so the Twitter team can investigate.to the group 502Bad Gateway Twitter is down or being upgraded. 503Service Unavailable The Twitter servers are up, but overloaded with requests. Try again later. 504Gateway timeout The Twitter servers are up, but the request couldn't be serviced due to some failure within our sta ck. Try again later.

4 Error Messages {"errors":[{"message":"Sorry, that page does not exist","code":34}]} Sorry, that page does not exist

5 Error Codes CodeTextDescription 32Could not authenticate you Your call could not be completed as dialed. 34Sorry, that page does not exist Corresponds with an HTTP 404 - the specified resource was not found. 88Rate limit exceeded The request limit for this resource has been re ached for the current rate limit window. 89Invalid or expired token The access token used in the request is incorr ect or has expired. Used in API v1.1 130Over capacity Corresponds with an HTTP 503 - Twitter is tem porarily over capacity. 131Internal error Corresponds with an HTTP 500 - An unknown i nternal error occurred. 135Could not authenticate you Corresponds with a HTTP 401 - it means that y our oauth_timestamp is either ahead or behind our acceptable range 215Bad authentication data Typically sent with 1.1 responses with HTTP co de 400. The method requires authentication bu t it was not presented or was wholly invalid.

6 정상 수행

7 존재하지 않는 페이지 404 34

8 Rate limit reached 429 88

9 URL Error DNS 교체

10 1.10 Problem –You want to harvest and store tweets from a collection of id values, or harvest entire timelines of tweets Solution –Use the /statuses/show resource to fetch a single tweet by its id value; the various /statuses/*_timeline methods can be used to fetch timeline data. CouchDB is a great option for persistent storage, and also provides a map/reduce processing paradigm and built-in ways to share your analysis with others.

11 문서 기반 분산 데이터베이스 –Cluster Of Unreliable Commodity Hardware

12 Document-oriented

13

14 Document-oriented DB MongoDB(C++) RavenDB(C#) CouchDB(Erlang)

15 Document

16 { "_id": "tansac", “_rev”: “1” "profile": { "nickname": "tansanc", "name":{ "firstname": " 종명 ", "lastname": " 김 " }, "birthdate": "1987-05-31“ }

17 Schema Free { "_id": "tansac", “_rev”: “2” "profile": { "nickname": "tansanc", "name":{ "firstname": " 종명 ", "lastname": " 김 " }, "birthdate": "1987-05-31” “hasBrother”: true }

18 Typical 3-Tier Architecture

19 2-Tier Architecture with CouchDB

20 No Locking Multi-Version Concurrency Control (MVCC)

21 /statuses/show public_timeline() user_timline() home_timeline()

22 tweepy get timeline API.public_timeline() –Returns the 20 most recent statuses from non-protected users who have set a custom user icon. The public timeline is cached for 60 seconds so requesting it more often than that is a waste of resources. –Parameters: None –Returns: list of class:Status objects API.home_timeline() –Returns the 20 most recent statuses, including retweets, posted by the authenticating user and that user’s friends. This is the equivalent of /timeline/home on the Web. –Parameters: since_id, max_id, count, page –Returns: list of class:Status objects API.friends_timeline() –Returns the 20 most recent statuses posted by the authenticating user and that user’s friends. –Parameters: since_id, max_id, count, page –Returns: list of class:Status objects API.user_timeline() –Returns the 20 most recent statuses posted from the authenticating user. It’s also possible to request another user’s timeline via the id parameter. –Parameters: (id or user_id or screen_name), since_id, max_id, count, page –Returns: list of class:Status objects http://pythonhosted.org/tweepy/html/api.html#timeline-methods

23 home_timeline()

24 API.friends_timeline() API.public_timeline()

25 API.user_timeline API.mention_timeline


Download ppt "Mining twitter 1.9, 1.10 1131036001 김종명. 1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts."

Similar presentations


Ads by Google