SOCCER DATA WEB CRAWLER (By Team 02) copyrights@SporTech B.I. 11/17/2018
Team members Trupti Sardesai - Program Manager Zhitao Zhou - Feasibility Analyst Subessware Karunamoorthy - System Architect Wenchen Tu - Prototyper Qing Hu - Life Cycle Planner Yan Zhang - Operational Concept Engineer Pranshu Kumar - Requirements Engineer Amir Ali Tahmasebi - Shaper copyrights@SporTech B.I. 11/17/2018
outline Team Strength and Weakness Overall evaluation Operational Concept Design Requirements SSAD Life Cycle Plan Feasibility Evidence Quality Focal Point Test Cases Final Product Demonstration copyrights@SporTech B.I. 11/17/2018
Less knowledge about associated technologies Goal-driven Team Weakness Team Strength Schedule conflicts Communication Work Overlap Collaboration Less knowledge about associated technologies Goal-driven copyrights@SporTech B.I. 11/17/2018
Overall Project Evaluation Identified new requirements Identified new risks with evolution of Project Developed all the agreed to win condition Developed final Product copyrights@SporTech B.I. 11/17/2018
Operational concept design copyrights@SporTech B.I. 11/17/2018
Current Business workflow copyrights@SporTech B.I. 11/17/2018
System purpose: Organizational goals OG-1: To enable the end users to make a well-informed knowledge about the players/team. OG-2: To increase time-saving to increase operational efficiency. OG-3: To increase accessibility of real-time data/information. copyrights@SporTech B.I. 11/17/2018
Current Business workflow copyrights@SporTech B.I. 11/17/2018
Proposed Business Workflow copyrights@SporTech B.I. 11/17/2018
CAPABILITY GOALS OC-1 Crawl predefined websites: The web crawler shall gather team information from the websites in the website list. OC-2 Crawl predefined websites: The web crawler shall gather player information from the websites in the website list. OC-3 Crawl Social Media: The web crawler shall get comments, name and number of members, likes from specified Facebook pages. OC-4 Crawl Social Media: The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account. copyrights@SporTech B.I. 11/17/2018
CAPABILITY GOALS OC-5 Ingest Data: The crawler shall ingest crawled data into PostgreSQL database. OC-6 STBI Contractor UI: As a STBI contractor, I can update/revise the player data as the season progresses. OC-7 STBI Contractor UI: As a STBI contractor, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. copyrights@SporTech B.I. 11/17/2018
Level of service LOS 1 Flexibility: The system can crawl and scrape any given URL into database. LOS 2 Efficiency: The system can crawl and scrape Facebook and Twitter data for a player in a time proportional to the amount of comments and post the player’s account has. The system can crawl and scrape specific website in an hour averagely. copyrights@SporTech B.I. 11/17/2018
Requirements copyright@SporTech B.I. 11/17/2018
WIN CONDITION SUCCESS # Capability Goals Priority Level Success/Fail OC1 Crawl predefined websites: The web crawler shall gather team information from the websites in the website list. Must have (Agreed to) SUCCESS OC2 Crawl predefined websites: The web crawler shall gather player information from the websites in the website list. OC3 Crawl Social Media: The web crawler shall get comments, name and number of members, likes from specified Facebook pages. OC4 Crawl Social Media: The web crawler shall get number of followers, the comments and the number of retweets for a specified twitter account. copyrights@SporTech B.I. 11/17/2018
# Capability Goals Priority Level Success/Fail OC5 Ingest Data: The crawler shall ingest crawled data into PostgreSQL database. Must have (Agreed to) SUCCESS OC6 STBI Contractor UI: As a STBI contractor, I can update/revise the player data as the season progresses. OC7 STBI Contractor UI: As a STBI contractor, I can add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. copyrights@SporTech B.I. 11/17/2018
# Capability Goals Priority Level Success/Fail OC8 Crawl Social Media: The web crawler shall gather Instagram pictures, number of likes and the comments from particular Instagram account. Would Like (Potentially Agree) FAIL OC9 Crawl predefined websites: The web crawler shall gather videos from the pages being crawled and ingest into STBI as is so that the coach and fans is able to watch the relevant videos. Would like OC10 Crawl Social Media: The web crawler shall crawl YouTube to gather videos of specific players. Would like copyrights@SporTech B.I. 11/17/2018
SSAD copyrights@SporTech B.I. 11/17/2018
USE CASE DIAGRAM copyrights@SporTech B.I. 11/17/2018
Design Class Diagram copyrights@SporTech B.I. 11/17/2018
SEQUENCE DIAGRAM- Website copyrights@SporTech B.I. 11/17/2018
SEQUENCE DIAGRAM- FACEBOOK copyrights@SporTech B.I. 11/17/2018
LIFE CYCLE PLAN copyrights@SporTech B.I. 11/17/2018
ITERATION PLAN # Capability Priority Iteration OC-1, OC-2 Retrieve team and player data from specific website High 1 OC-3, OC-4 Retrieve data from Facebook and Twitter OC-5 Storing data into Postgres database 2 OC-6, OC-7 Develop user- interface for the developer TC-01-01, TC-02-01 Test if the web crawler is able to gather team and player information. TC-03-01 Integration Test 3 copyrights@SporTech B.I. 11/17/2018
copyrights@SporTech B.I. 11/17/2018
EFFORT ESTIMATION copyrights@SporTech B.I. 11/17/2018
FEASIBILITY EVIDENCE copyrights@SporTech B.I. 11/17/2018
Activities Time Spent (Hours) Nonrecurring Cost Initial Client Meeting(1 meeting * 2 people * 1.5 hours + 1 meeting * 1 person * 1hours) 4 Win-Win Negotiation Meetings(2 meeting * 2 people * 2 hours + 1 meeting * 1 person * 2 hours) 10 Communication with development team (2 people * 2 hours/week * 12 weeks) 48 Architecture Review Boarding Meeting(1 meeting * 1 person* 1 hours) 1 Weekly status update(2 people * 0.5 hours * 4 weeks) Training STBI contractors 2 Total Time: 69 Cost:(Estimation of $431/hour) $2,968 copyrights@SporTech B.I. 11/17/2018
Current activities & resources used Money Saved (Dollars/Year) % Reduce Money Saved (Dollars/Year) Nonrecurring Benefit Manual Data Entry & Data Ingestion 80 19,5002 Recurring Benefit Manual Data Entry & Data Ingestion & Update database Total 19,5003 copyrights@SporTech B.I. 11/17/2018
Benefit (Effort Saved) Year Cost Benefit (Effort Saved) Cumulative Cost Cumulative Benefit ROI 2014 2968 2,968 -1 2015 6,5001 19,500 9,468 1.06 2016 7,1501 21,450 16,618 40,950 1.46 2017 7,8651 23,595 24,483 64,545 1.67 2018 8,6511 25,954 33,134 90,499 1.73 copyrights@SporTech B.I. 11/17/2018
Risks Risk Exposure Risk Mitigations Potential Magnitude Probability Loss One player may have information on different website and two players may have same name on different websites, causing data duplication or data inaccuracy in the data base. 5 3 15 Mark the source of data when ingested into database, use an attribute duplicate to indicate whether there exists a duplicate for this player and STBI contractor will figure this duplicate by human intervention. Because the posts and comments for a posts may be a very long list, the efficiency of fetching a player’s data from Facebook is low. 7 21 Set a timestamp of late 6 months for fetching posts and comments for posts for a player. copyrights@SporTech B.I. 11/17/2018
QUALITY FOCAL POINT copyrights@SporTech B.I. 11/17/2018
METRIC- BURN DOWN CHART copyrights@SporTech B.I. 11/17/2018
METRIC- TEST PASS COVERAGE copyrights@SporTech B.I. 11/17/2018
TECHNICAL DEBT Causes Solutions Lack of Domain Experience (Python and PostgreSQL) Vague Requirements Technology Volatility Solutions Learning and Training Win book and Negotiation Choose the most stable API and prototype copyrights@SporTech B.I. 11/17/2018
TRACEABILITY MATRIX OCD Requirement Win Condition SSAD/ Use case Test Cases OC-1 Crawl predefined websites to gather team information WC_3473 UC02 TC-01-01 OC-2 Crawl predefined websites to gather player information WC_3472 UC05 TC-02-01 OC-3 Crawl Social Media- Facebook WC_3416 UC06 OC-4 Crawl Social Media- Twitter WC_3417 OC-5 Ingest Data into PostgreSQL Database WC_3495 TC-03-01 OC-6 STBI Contractor UI to update the player data WC_3398 UC06, UC04, UC08 OC-7 STBI Contractor UI to add, delete, update the specific websites visited, fields to capture from the website and frequency of crawler refreshes for each specified website. UC01, UC02, UC03, UC07 TC-01-01 copyrights@SporTech B.I. 11/17/2018
TEST IDENTIFICATION Test Identifier- TC-01 Gather team information Test Level -Software item level Test Class-Capability Test Test Completion Criteria-Team Information should be gathered from a webpage correctly and match the expected information that we have gathered by hand. copyrights@SporTech B.I. 11/17/2018
TEST CASE copyrights@SporTech B.I. 11/17/2018
TEST IDENTIFICATION Test Identifier - TC-02 Gather player information Test Level-Software item level Test Class-Capability Test Test Completion Criteria- Player Information should be gathered from a webpage correctly and match the expected information that we have gathered by hand. copyrights@SporTech B.I. 11/17/2018
TEST CASE copyrights@SporTech B.I. 11/17/2018
TEST IDENTIFICATION Test Identifier - TC-03 Update player information Test Level- Software item level Test Class-Capability Test Test Completion Criteria-When player information is updated, the data in the DB should match the updated data. copyrights@SporTech B.I. 11/17/2018
TEST CASE copyrights@SporTech B.I. 11/17/2018
FINAL PRODUCT DEMO copyrights@SporTech B.I. 11/17/2018
THANK YOU!!! copyrights@SporTech B.I. 11/17/2018