Presentation is loading. Please wait.

Presentation is loading. Please wait.

SOCCER DATA WEB CRAWLER

Similar presentations


Presentation on theme: "SOCCER DATA WEB CRAWLER"— Presentation transcript:

1 SOCCER DATA WEB CRAWLER
(By Team 02) B.I. 11/17/2018

2 Team members Trupti Sardesai - Program Manager Zhitao Zhou - Feasibility Analyst Subessware Karunamoorthy - System Architect Wenchen Tu - Prototyper Qing Hu - Life Cycle Planner Yan Zhang - Operational Concept Engineer Pranshu Kumar - Requirements Engineer Amir Ali Tahmasebi - Shaper B.I. 11/17/2018

3 outline Team Strength and Weakness Overall evaluation
Operational Concept Design Requirements SSAD Life Cycle Plan Feasibility Evidence Quality Focal Point Test Cases Final Product Demonstration B.I. 11/17/2018

4 Less knowledge about associated technologies Goal-driven
Team Weakness Team Strength Schedule conflicts Communication Work Overlap Collaboration Less knowledge about associated technologies Goal-driven B.I. 11/17/2018

5 Overall Project Evaluation
Identified new requirements Identified new risks with evolution of Project Developed all the agreed to win condition Developed final Product B.I. 11/17/2018

6 Operational concept design
B.I. 11/17/2018

7 Current Business workflow
B.I. 11/17/2018

8 System purpose: Organizational goals
OG-1: To enable the end users to make a well-informed knowledge about the players/team. OG-2: To increase time-saving to increase operational efficiency. OG-3: To increase accessibility of real-time data/information. B.I. 11/17/2018

9 Current Business workflow
B.I. 11/17/2018

10 Proposed Business Workflow
B.I. 11/17/2018

11 CAPABILITY GOALS OC-1 Crawl predefined websites: The web crawler shall gather team information from the websites in the website list. OC-2 Crawl predefined websites: The web crawler shall gather player information from the websites in the website list. OC-3 Crawl Social Media: The web crawler shall get comments, name and number of members, likes from specified Facebook pages. OC-4 Crawl Social Media: The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account. B.I. 11/17/2018

12 CAPABILITY GOALS OC-5 Ingest Data: The crawler shall ingest crawled data into PostgreSQL database. OC-6 STBI Contractor UI: As a STBI contractor, I can update/revise the player data as the season progresses. OC-7 STBI Contractor UI: As a STBI contractor, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. B.I. 11/17/2018

13 Level of service LOS 1 Flexibility: The system can crawl and scrape any given URL into database. LOS 2 Efficiency: The system can crawl and scrape Facebook and Twitter data for a player in a time proportional to the amount of comments and post the player’s account has. The system can crawl and scrape specific website in an hour averagely. B.I. 11/17/2018

14 Requirements B.I. 11/17/2018

15 WIN CONDITION SUCCESS # Capability Goals Priority Level Success/Fail
OC1 Crawl predefined websites: The web crawler shall gather team information from the websites in the website list. Must have (Agreed to) SUCCESS OC2 Crawl predefined websites: The web crawler shall gather player information from the websites in the website list. OC3 Crawl Social Media: The web crawler shall get comments, name and number of members, likes from specified Facebook pages. OC4 Crawl Social Media: The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account. B.I. 11/17/2018

16 # Capability Goals Priority Level Success/Fail
OC5 Ingest Data: The crawler shall ingest crawled data into PostgreSQL database. Must have (Agreed to) SUCCESS OC6 STBI Contractor UI: As a STBI contractor, I can update/revise the player data as the season progresses. OC7 STBI Contractor UI: As a STBI contractor, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. B.I. 11/17/2018

17 # Capability Goals Priority Level Success/Fail
OC8 Crawl Social Media: The web crawler shall gather Instagram pictures, number of likes and the comments from particular Instagram account. Would Like (Potentially Agree) FAIL OC9 Crawl predefined websites: The web crawler shall gather videos from the pages being crawled and ingest into STBI as is so that the coach and fans is able to watch the relevant videos. Would like OC10 Crawl Social Media: The web crawler shall crawl YouTube to gather videos of specific players. Would like B.I. 11/17/2018

18 SSAD B.I. 11/17/2018

19 USE CASE DIAGRAM B.I. 11/17/2018

20 Design Class Diagram B.I. 11/17/2018

21 SEQUENCE DIAGRAM- Website
B.I. 11/17/2018

22 SEQUENCE DIAGRAM- FACEBOOK
B.I. 11/17/2018

23 LIFE CYCLE PLAN B.I. 11/17/2018

24 ITERATION PLAN # Capability Priority Iteration OC-1, OC-2
Retrieve team and player data from specific website High 1 OC-3, OC-4 Retrieve data from Facebook and Twitter OC-5 Storing data into Postgres database 2 OC-6, OC-7 Develop user- interface for the developer TC-01-01, TC-02-01 Test if the web crawler is able to gather team and player information. TC-03-01 Integration Test 3 B.I. 11/17/2018

25 copyrights@SporTech B.I.
11/17/2018

26 EFFORT ESTIMATION B.I. 11/17/2018

27 FEASIBILITY EVIDENCE B.I. 11/17/2018

28 Activities Time Spent (Hours)
Nonrecurring Cost Initial Client Meeting(1 meeting * 2 people * 1.5 hours + 1 meeting * 1 person * 1hours) 4 Win-Win Negotiation Meetings(2 meeting * 2 people * 2 hours + 1 meeting * 1 person * 2 hours) 10 Communication with development team (2 people * 2 hours/week * 12 weeks) 48 Architecture Review Boarding Meeting(1 meeting * 1 person* 1 hours) 1 Weekly status update(2 people * 0.5 hours * 4 weeks) Training STBI contractors 2 Total Time: 69 Cost:(Estimation of $431/hour) $2,968 B.I. 11/17/2018

29 Current activities & resources used Money Saved (Dollars/Year)
% Reduce Money Saved (Dollars/Year) Nonrecurring Benefit Manual Data Entry & Data Ingestion 80 19,5002 Recurring Benefit Manual Data Entry & Data Ingestion & Update database Total 19,5003 B.I. 11/17/2018

30 Benefit (Effort Saved)
Year Cost Benefit (Effort Saved) Cumulative Cost Cumulative Benefit ROI 2014 2968 2,968 -1 2015 6,5001 19,500 9,468 1.06 2016 7,1501 21,450 16,618 40,950 1.46 2017 7,8651 23,595 24,483 64,545 1.67 2018 8,6511 25,954 33,134 90,499 1.73 B.I. 11/17/2018

31 Risks Risk Exposure Risk Mitigations Potential Magnitude Probability Loss One player may have information on different website and two players may have same name on different websites, causing data duplication or data inaccuracy in the data base. 5 3 15 Mark the source of data when ingested into database, use an attribute duplicate to indicate whether there exists a duplicate for this player and STBI contractor will figure this duplicate by human intervention. Because the posts and comments for a posts may be a very long list, the efficiency of fetching a player’s data from Facebook is low. 7 21 Set a timestamp of late 6 months for fetching posts and comments for posts for a player. B.I. 11/17/2018

32 QUALITY FOCAL POINT B.I. 11/17/2018

33 METRIC- BURN DOWN CHART
B.I. 11/17/2018

34 METRIC- TEST PASS COVERAGE
B.I. 11/17/2018

35 TECHNICAL DEBT Causes Solutions
Lack of Domain Experience (Python and PostgreSQL) Vague Requirements Technology Volatility Solutions Learning and Training Win book and Negotiation Choose the most stable API and prototype B.I. 11/17/2018

36 TRACEABILITY MATRIX OCD Requirement Win Condition SSAD/ Use case
Test Cases OC-1 Crawl predefined websites to gather team information WC_3473 UC02 TC-01-01 OC-2 Crawl predefined websites to gather player information WC_3472 UC05 TC-02-01 OC-3 Crawl Social Media- Facebook WC_3416 UC06 OC-4 Crawl Social Media- Twitter WC_3417 OC-5 Ingest Data into PostgreSQL Database WC_3495 TC-03-01 OC-6 STBI Contractor UI to update the player data WC_3398 UC06, UC04, UC08 OC-7 STBI Contractor UI to add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. UC01, UC02, UC03, UC07  TC-01-01 B.I. 11/17/2018

37 TEST IDENTIFICATION Test Identifier- TC-01 Gather team information
Test Level -Software item level Test Class-Capability Test Test Completion Criteria-Team Information should be gathered from a webpage correctly and match the expected information that we have gathered by hand. B.I. 11/17/2018

38 TEST CASE B.I. 11/17/2018

39 TEST IDENTIFICATION Test Identifier - TC-02 Gather player information
Test Level-Software item level Test Class-Capability Test Test Completion Criteria- Player Information should be gathered from a webpage correctly and match the expected information that we have gathered by hand. B.I. 11/17/2018

40 TEST CASE B.I. 11/17/2018

41 TEST IDENTIFICATION Test Identifier - TC-03 Update player information
Test Level- Software item level Test Class-Capability Test Test Completion Criteria-When player information is updated, the data in the DB should match the updated data. B.I. 11/17/2018

42 TEST CASE B.I. 11/17/2018

43 FINAL PRODUCT DEMO B.I. 11/17/2018

44 THANK YOU!!! B.I. 11/17/2018


Download ppt "SOCCER DATA WEB CRAWLER"

Similar presentations


Ads by Google