Mining twitter 1.9, 1.10 1131036001 김종명. 1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts.

Slides:



Advertisements
Similar presentations
MFA for Business Banking – Security Questions with 2nd Request Multifactor Authentication: Quick Tip Sheets Note to Financial Institutions: We are providing.
Advertisements

MFA for Business Banking – Security Code Multifactor Authentication: Quick Tip Sheets Note to Financial Institutions: We are providing these QT sheets.
MFA for Business Banking – Security Questions with Reset Multifactor Authentication: Quick Tip Sheets Note to Financial Institutions: We are providing.
Reinventing using REST. Anything addressable by a URI is called a resource GET, PUT, POST, DELETE WebDAV (MOVE, LOCK)
Overview of Twitter API Nathan Liu. Twitter API Essentials Twitter API is a Representational State Transfer(REST) style web services exposed over HTTP(S).
PHP syntax basics. Personal Home Page This is a Hypertext processor It works on the server side It demands a Web-server to be installed.
Twitter – what is it? The School District of Haverford Township |
HTTP – HyperText Transfer Protocol
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
Hypertext Transfer Protocol Information Systems 337 Prof. Harry Plantinga.
2/9/2004 Web and HTTP February 9, /9/2004 Assignments Due – Reading and Warmup Work on Message of the Day.
Hypertext Transport Protocol CS Dick Steflik.
 What is it ? What is it ?  URI,URN,URL URI,URN,URL  HTTP – methods HTTP – methods  HTTP Request Packets HTTP Request Packets  HTTP Request Headers.
PL-IV- Group A HTTP Request & Response Header
8/6/2015Auto Attendants 1 Smarter Communications.
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
CSCI 323 – Web Development Chapter 1 - Setting the Scene We’re going to move through the first few chapters pretty quick since they are a review for most.
PHP Tutorials 02 Olarik Surinta Management Information System Faculty of Informatics.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Bookstore Web Application Introducing Visual Web Developer 2008 Express and the.
CHAPTER 12 COOKIES AND SESSIONS. INTRO HTTP is a stateless technology Each page rendered by a browser is unrelated to other pages – even if they are from.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Web application architecture
REST.  REST is an acronym standing for Representational State Transfer  A software architecture style for building scalable web services  Typically,
HTRC API Overview Yiming Sun. HTRC Architecture Data API Portal access Direct programmatic access (by programs running on HTRC machines) Security (OAuth2)
CollectionSpace Service REST-based APIs June 2009 Face-to-face Aron Roberts U.C. Berkeley IST/Data Services.
PHP meets MySQL.
MyFloridaMarketPlace MyFloridaMarketPlace Change Request Board August 30, 2007.
© 2006 Cisco Systems, Inc. All rights reserved.1 Connection 7.0 Serviceability Reports Todd Blaisdell.
API Crash Course CWU Startup Club. OUTLINE What is an API? Why are API’s useful? What is HTTP? JSON? XML? What is a RESTful API? How do we consume an.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
2: Application Layer 1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP.
CITA 310 Section 2 HTTP (Selected Topics from Textbook Chapter 6)
1 © Donald F. Ferguson, All rights reserved.Modern Internet Service Oriented Application Development – Lecture 2: REST Details and Patterns Some.
HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
RESTful Web Services What is RESTful?
Updates made to latest draft since Herndon Sony Corporation Toshiaki Kojima.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
COMP2322 Lab 2 HTTP Steven Lee Jan. 29, HTTP Hypertext Transfer Protocol Web’s application layer protocol Client/server model – Client (browser):
Linked Data & Semantic Web Technology Development of Twitter Applications Part 1. Overview Dr. Myungjin Lee.
#SummitNow Consuming OAuth Services in Alfresco Share Alfresco Summit 2013 Will Abson
PCard Training Logging into the new PCard (PaymentNet) System: * Introduction * May use IE 8.0 or greater or Firefox * Do not.
- Sai Divya Panditi - Priyanka Yechuri
Hypertext Transfer Protocol
Tiny http client and server
How HTTP Works Made by Manish Kushwaha.
Hypertext Transfer Protocol
Better RESTFul API – Best Practices
z/Ware 2.0 Technical Overview
Data Virtualization Tutorial… CORS and CIS
Data Virtualization Tutorial… OAuth Example using Google Sheets
COMP2322 Lab 2 HTTP Steven Lee Feb. 8, 2017.
Hypertext Transfer Protocol
Hypertext Transport Protocol
What is REST API ? A REST (Representational State Transfer) Server simply provides access to resources and the REST client accesses and presents the.
All about social networking
Web Caching? Web Caching:.
Beautiful REST + JSON APIs
WEB API.
intro to notifications in iOS 10
HyperText Transfer Protocol
Hypertext Transfer Protocol (HTTP)
ETS – Air Data submission Training
While the audience is gathering. During breaks etc
Technical Integration Guide
PHP Forms and Databases.
Troubleshooting.
While the audience is gathering. During breaks etc
Restful APIs 101 Laura
Presentation transcript:

Mining twitter 1.9, 김종명

1.9 Making Robust Twitter Requests Problem –You want to write a long-running script that harvests large amounts of data, such as the friend and follower ids for a very popular Twitterer; however, the Twitter API is inherently unreliable and imposes rate limits that require you to always expect the unexpected. Solution –Write an abstraction for making twitter requests that accounts for rate limiting and other types of HTTP errors so that you can focus on the problem at hand and not worry about HTTP errors or rate limits, which are just a very specific kind of HTTP error.

Error Codes & Responses CodeTextDescription 200OK Success! 304Not Modified There was no new data to return. 400Bad Request The request was invalid. An accompanying error message will explain why. This is the status cod e will be returned during version 1.0 rate limiting. In API v1.1, a request without authentication is considered invalid and you will get this response.rate limiting 401Unauthorized Authentication credentialsAuthentication credentials were missing or incorrect. 403Forbidden The request is understood, but it has been refused or access is not allowed. An accompanying er ror message will explain why. This code is used when requests are being denied due to update li mits.update li mits 404Not Found The URI requested is invalid or the resource requested, such as a user, does not exists. Also ret urned when the requested format is not supported by the requested method. 406Not Acceptable Returned by the Search API when an invalid format is specified in the request. 410Gone This resource is gone. Used to indicate that an API endpoint has been turned off. For example: " The Twitter REST API v1 will soon stop functioning. Please migrate to API v1.1." 420Enhance Your Calm Returned by the version 1 Search and Trends APIs when you are being rate limited.you are being rate limited 422Unprocessable Entity Returned when an image uploaded to POST account/update_profile_banner is unable to be proc essed.POST account/update_profile_banner 429Too Many Requests Returned in API v1.1 when a request cannot be served due to the application's rate limit having b een exhausted for the resource. See Rate Limiting in API v1.1.Rate Limiting in API v Internal Server Error Something is broken. Please post to the group so the Twitter team can investigate.to the group 502Bad Gateway Twitter is down or being upgraded. 503Service Unavailable The Twitter servers are up, but overloaded with requests. Try again later. 504Gateway timeout The Twitter servers are up, but the request couldn't be serviced due to some failure within our sta ck. Try again later.

Error Messages {"errors":[{"message":"Sorry, that page does not exist","code":34}]} Sorry, that page does not exist

Error Codes CodeTextDescription 32Could not authenticate you Your call could not be completed as dialed. 34Sorry, that page does not exist Corresponds with an HTTP the specified resource was not found. 88Rate limit exceeded The request limit for this resource has been re ached for the current rate limit window. 89Invalid or expired token The access token used in the request is incorr ect or has expired. Used in API v Over capacity Corresponds with an HTTP Twitter is tem porarily over capacity. 131Internal error Corresponds with an HTTP An unknown i nternal error occurred. 135Could not authenticate you Corresponds with a HTTP it means that y our oauth_timestamp is either ahead or behind our acceptable range 215Bad authentication data Typically sent with 1.1 responses with HTTP co de 400. The method requires authentication bu t it was not presented or was wholly invalid.

정상 수행

존재하지 않는 페이지

Rate limit reached

URL Error DNS 교체

1.10 Problem –You want to harvest and store tweets from a collection of id values, or harvest entire timelines of tweets Solution –Use the /statuses/show resource to fetch a single tweet by its id value; the various /statuses/*_timeline methods can be used to fetch timeline data. CouchDB is a great option for persistent storage, and also provides a map/reduce processing paradigm and built-in ways to share your analysis with others.

문서 기반 분산 데이터베이스 –Cluster Of Unreliable Commodity Hardware

Document-oriented

Document-oriented DB MongoDB(C++) RavenDB(C#) CouchDB(Erlang)

Document

{ "_id": "tansac", “_rev”: “1” "profile": { "nickname": "tansanc", "name":{ "firstname": " 종명 ", "lastname": " 김 " }, "birthdate": " “ }

Schema Free { "_id": "tansac", “_rev”: “2” "profile": { "nickname": "tansanc", "name":{ "firstname": " 종명 ", "lastname": " 김 " }, "birthdate": " ” “hasBrother”: true }

Typical 3-Tier Architecture

2-Tier Architecture with CouchDB

No Locking Multi-Version Concurrency Control (MVCC)

/statuses/show public_timeline() user_timline() home_timeline()

tweepy get timeline API.public_timeline() –Returns the 20 most recent statuses from non-protected users who have set a custom user icon. The public timeline is cached for 60 seconds so requesting it more often than that is a waste of resources. –Parameters: None –Returns: list of class:Status objects API.home_timeline() –Returns the 20 most recent statuses, including retweets, posted by the authenticating user and that user’s friends. This is the equivalent of /timeline/home on the Web. –Parameters: since_id, max_id, count, page –Returns: list of class:Status objects API.friends_timeline() –Returns the 20 most recent statuses posted by the authenticating user and that user’s friends. –Parameters: since_id, max_id, count, page –Returns: list of class:Status objects API.user_timeline() –Returns the 20 most recent statuses posted from the authenticating user. It’s also possible to request another user’s timeline via the id parameter. –Parameters: (id or user_id or screen_name), since_id, max_id, count, page –Returns: list of class:Status objects

home_timeline()

API.friends_timeline() API.public_timeline()

API.user_timeline API.mention_timeline