Download presentation
Presentation is loading. Please wait.
1
SOCCER DATA WEB CRAWLER
(By Team 02) B.I 10/12/14
2
Team members SporTech B.I (Client) Trupti Sardesai Zhitao Zao Subessware Karunamoorthy Wenchen Tu Qing Hu Yan Zhang Pranshu Kumar Amir Ali Tahmasebi B.I 10/12/14
3
outline Team Strength and Weakness Technical Concerns and Solutions
Operational Concerns and Solutions Win-win Shaping Status Project Evaluation B.I 10/12/14
4
Team strength B.I 10/12/14
5
Team weakness B.I 10/12/14
6
Technical concerns and solutions
B.I 10/12/14
7
Technical concerns and solutions
B.I 10/12/14
8
Operational concerns and solutions
B.I 10/12/14
9
Win condition status 6 agreed on win conditions.
No win-conditions have open issues. B.I 10/12/14
10
Project evaluation Success critical stakeholder:
B.I 10/12/14
11
Operational concept design
B.I 10/12/14
12
outline System purpose Shared vision Benefit-chain diagram
System boundary Proposed new system Desired capabilities and goals B.I 10/12/14
13
purpose B.I 10/12/14
14
Shared vision - Program model
B.I 10/12/14
15
Benefits chain diagram
B.I 10/12/14
16
System boundary diagram
B.I 10/12/14
17
Business workflow diagram
B.I 10/12/14
18
copyrights@SporTech B.I
10/12/14
19
Desired capabilities & goals
Capability Goals Priority Level OC-1 crawls data to gather real time data of soccer players/team Must have OC-2 ingest gathered data in PostgreSQL OC-3 handle discrepancies of data between websites crawled OC-4 collect Facebook number of likes, fans, the comments of soccer players OC-5 Collect Twitter number of followers, retweets, the comments of soccer players OC-6 Collect Instagram pictures, number of likes and the comments of soccer players Would Like OC-7 Collect videos links from websites crawled OC-8 Collect youtube videos of players B.I 10/12/14
20
Referred Win Win Agreements
Level of service Level of Service Goals Priority Level Referred Win Win Agreements Compatibility: The system will Support the following browsers: IE 9+, Chrome 25+, Mozilla Firefox 23+ and Safari 6+ Should Have WC_3413 Scheduling: The system shall be able to crawl whenever specified. WC_3410 Scalability: The system can crawl any number of websites. Should have WC_3414 B.I 10/12/14
21
Organizational goals OG-1: To enable the end users to make a well-informed knowledge about the players/team. OG-2: To increase time-saving to increase operational efficiency. OG-3: To increase accessibility of real-time data/information. B.I 10/12/14
22
prototype B.I 10/12/14
23
outline Basic web crawler prototype
Social media feeds collector prototype Developer user interface prototype B.I 10/12/14
24
Basic web crawler Win conditions: WC_ The web crawler shall gather team information from the websites in the website list. WC_ The web crawler shall gather player information from the websites in the website list. WC_ The web crawler shall gather videos from the pages being crawled and ingest into STBI as is so that the coach and fans is able to watch the relevant videos PA. B.I 10/12/14
25
architecture B.I 10/12/14
26
Code & OUTPUT B.I 10/12/14
27
Code & output B.I 10/12/14
28
Social media feeds collector
Win conditions: WC_ The web crawler shall get comments, name and number of members, likes from specified Facebook pages B.I 10/12/14
29
workflow B.I 10/12/14
30
Developer user interfaces
Win conditions: WC_ As a developer, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. B.I 10/12/14
31
User interfaces Demo B.I 10/12/14
32
requirements B.I 10/12/14
33
Capability requirements
ID Requirement Win condition(s) Priority* CR-1 The web crawler shall gather team information from the websites in the website list. WC_3473 4.83 CR-2 The web crawler shall gather player information from the websites in the website list. WC_3472 CR-3 The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account. WC_3417 4.77 CR-4 The web crawler shall get comments, name and number of members, likes from specified Facebook pages. WC_3416 4.71 B.I 10/12/14
34
Capability requirements
CR-5 As a developer, I can update/revise the player data as the season progresses. WC_3409 3.06 CR-6 As a developer, I can add,delete,update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. WC_3398 3.15 B.I 10/12/14
35
Project requirements ID Requirement Win condition(s) PR-1
The developer UI of the system will be developed using HTML, CSS and JavaScript WC_3398 PR-2 The server side of the system will be developed using Python. WC_3473 WC_3472 PR-3 System will use PostgreSQL for database. PR-4 System shall use Facebook API to gather required facebook data WC_3416 PR-5 System shall use Twitter API to fetch data from twitter WC_3417 B.I 10/12/14
36
Level of service requirements
ID Requirement status Win condition(s) Priority LOS-1 The system will Support the following browsers: IE 9+, Chrome 25+, Mozilla Firefox 23+ and Safari 6+ Agreed upon & verified WC_3413 Should have LOS-2 The system shall be able to crawl whenever specified. Agreed upon and verified WC_3410 LOS-3 The system can crawl any number of websites. Agreed upon and verified WC_3414 B.I 10/12/14
37
Architecture B.I 10/12/14
38
architecture B.I 10/12/14
39
SYSTEM CONTEXT diagram
B.I 10/12/14
40
Use case diagram B.I 10/12/14
41
Life cycle plan B.I 10/12/14
42
outline Life Cycle Strategy Project Plan
Key Stakeholders’ Responsibilities Resource Estimation B.I 10/12/14
43
Foundations phase Duration: 10/15/14- 11/19/14
Concept: In this phase the team continues with planning and managing project, develops the prototype, and also begin to develop this project. Deliverables: Foundation Commitment Package, Bi-weekly Project Report and Plan. Milestone: Development Commitment Review Strategy: One Incremental Commitment Cycle B.I 10/12/14
44
Project plan* * - date format: YY/MM/DD copyrights@SporTech B.I
10/12/14
45
Key stakeholder responsibilities
Team member/Role Responsibilities Trupti Sardesai: Project Manager -Schedule meetings -Assign task, Analyze proposed system -Conceptualize system. Wencheng Tu: Prototyper -Analyze and prioritize capabilities to prototype; -Update website Subessware Selvameena Karunamoorthy : System/software Architect -Provide system architecture -provide feasibility evidence; Yan Zhang: Operational Concept Engineer -Analyze Proposed System -Assess operational concept; -Identify Shared Vision; B.I 10/12/14
46
Key stakeholders responsibilities
Team Member/Role Responsibilities Zhitao Zhou : Feasibility analyst -Analyze estimation cost. -Assess and plan to mitigate risks. -Assess feasibility evidence. Qing Hu: Life Cycle Planner -Assess development timeline. -Estimate effort, cost and schedule using COINCOMO 2.0 Pranshu Kumar: Requirements Engineer - Analyze current and proposed system. -Gather requirements. Amir ali Tahmasebi: IV &V; - Checking and verifying documents that are been generated. -Verify the bugs that are filed on bugzilla. Justin Norman/ Laura Penna: Client Provide requirements Meet with development team on a regular basis Make changes to the requirements to fit the business. B.I 10/12/14
47
Resource estimation Estimated CSCI577a Effort: 8 team members at 8 hrs/week for 12weeks. Total estimated effort: 504 hours Budget information-$10000 Project duration-12 weeks(Fall 2014) Component modules: Spider Management Data Ingestion Management Social Media Management Language used: Python, PostgreSQL B.I 10/12/14
48
Cocomo scale drivers Scale Driver Value Rationale PREC LOW
Although there are some examples about crawler, the development team have no previous experience in developing a similar system. FLEX HIGH Schedule, requirement and interface flexibility. RESL The architecture is not yet defined for future requirements. TEAM The team cohesion is good. Clients and team members have a good communication. PMAT NOMINAL The development team follows ICSM guidelines, which the processes are defined. B.I 10/12/14
49
Cost driver: spider crawler management
Value Rationale RELY VERY HIGH Failure will lead to high financial loss for clients. DATA We need to crawl information of player from specific websites. DOCU NOMINAL Because the development process follows ICSM, the document for life-cycle needs is normal. CPLX This system has complex structuring for crawling the website. RUSE This project built intends to be reused in the future. TIME The percentage of available execution time expected to be used by the system and subsystem consuming the execution time resource is less than 50% STOR It crawl the website and transform the data into structured format B.I 10/12/14
50
Cost driver: spider crawler management
Value Rationale PVOL LOW There are not major changes in platform. ACAP HIGH The analysts have the ability to analyze, design, communicate, and cooperate very well. PCAP NOMINAL The capabilities and efficiencies of Programmers are in general. But we are able to communicate and cooperate very well. PCON VERY HIGH We have 8 members in our 577A, and we completed this project only in this semester. So we all go through the whole project. APEX Most people have low programming application experience. LTEX Intermediate programming language and tool experience. PLEX Little platform experience. TOOL There is no support for life-cycle tools. B.I 10/12/14
51
Cost driver: DATA INGESTION management
Value Rationale RELY VERY HIGH The effect of failure will incur financial losses DATA we need to store the related information from Facebook, Twitter and Youtube into a structured format. DOCU NOMINAL Because the development process follows ICSM, the document for life-cycle needs is normal. CPLX Complex data structuring required. RUSE This project built intends to be reused in the future. TIME The percentage of available execution time expected to be used by the system and subsystem consuming the execution time resource is less than 50% STOR We need to store increasing data from different social medias PVOL LOW There are not major changes in platform. B.I 10/12/14
52
Cost driver: data ingestion management
Cost Drivers Value Rationale ACAP HIGH The analysts have the ability to analyze, design, communicate, and cooperate very well. PCAP NOMINAL The capabilities and efficiencies of Programmers are in general. But we are able to communicate and cooperate very well. PCON VERY HIGH We have 8 members in our 577A, and we completed this project only in this semester. So we all go through the whole project. APEX LOW Most people have low programming application experience. LTEX Intermediate programming language and tool experience. PLEX Little platform experience. TOOL There is no support for life-cycle tools. SITE The client and team members are all in the same city. B.I 10/12/14
53
Cost driver: social media management
Value Rationale RELY VERY HIGH The effect of failure will incur financial losses DATA we need to store the related information from Facebook, Twitter and You Tube into a structured format. DOCU NOMINAL Because the development process follows ICSM, the document for life-cycle needs is normal. CPLX Complex data structuring required. RUSE This project built intends to be reused in the future. TIME The percentage of available execution time expected to be used by the system and subsystem consuming the execution time resource is less than 50% STOR We need to store increasing data from different social medias PVOL LOW There are not major changes in platform. B.I 10/12/14
54
Cost driver: social media management
Value Rationale ACAP HIGH The analysts have the ability to analyze, design, communicate, and cooperate very well. PCAP NOMINAL The capabilities and efficiencies of Programmers are in general. But we are able to communicate and cooperate very well. PCON VERY HIGH We have 8 members in our 577A, and we completed this project only in this semester. So we all go through the whole project. APEX LOW Most people have low programming application experience. LTEX Intermediate programming language and tool experience. PLEX Little platform experience. TOOL There is no support for life-cycle tools. SITE The client and team members are all in the same city. B.I 10/12/14
55
Screenshot of cocomo model
B.I 10/12/14
56
Feasibility analysis B.I 10/12/14
57
OUTLINE Personal Cost Benefit and ROI analysis
LOS/Capability Feasibility Process Decision Major Risk & Mitigation Plans Use case B.I 10/12/14
58
PERSONAL COST Initial Client Meeting Win-Win Negotiation
B.I 10/12/14
59
Personal cost (2 hr/week) ARB B.I 10/12/14
60
Benefit analysis 3 million new inhabitant /year 25/1000 1 min
Update DB B.I 10/12/14
61
Roi analysis First year $2,709 Maintenance $10,320
Annual Benefit $1,925,200 Second year ROI = ( )/( ) = 146 B.I 10/12/14
62
Capability and feasibility analysis
B.I 10/12/14
63
Los and feasibility plan
B.I 10/12/14
64
Process decision B.I 10/12/14
65
Major risk and mitigation plan
Compatibility of PostgreSQL with Python Response time too long Failure to find data for some soccer player Facebook/Twitter API upgrade cause Facebook/Twitter component down Failure to gather information on some websites B.I 10/12/14
66
Use case – get Facebook data
Main success Scenario: System gets soccer player name from DB. System get authorization from facebook. System invokes API search. System get list of possible user ID and Category. System uses Facebook page and send get request. System store number of likes and talking_about. System uses edge property of page_id and sent get request. System store number of posts and comments. System ingest the data into DB. B.I 10/12/14
67
Use case – get Facebook data
Alternative courses of action: 4a. System gets response from Facebook, but the response is o_auth error. 4a.1. System get authorization for search from Facebook again. 6a. System gets response from Facebook, but the response is o_auth error. 6a.1. System get authorization for page request from Facebook again. B.I 10/12/14
68
Use case - crawling Main Success Scenario:
The crawler gets the response object from the URL Request. The crawler gets the selector based on the xpath rule. The crawler gets the data and links. The crawler identifies the required attribute. The crawler structures the data according to the schema. The structured data is ingested/updated into the DB. B.I 10/12/14
69
Use case - crawling Alternative courses of actions:
1a The server specified by the URL is down. 1a.1 Display the information indicating the server is down. 2a The structure of the response is changed. 2a.1 Print error information indicating the xpath rule is not appropriate. B.I 10/12/14
70
THANK YOU B.I 10/12/14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.