Social Interactome Recommender Team Final Presentation

Slides:



Advertisements
Similar presentations
Dr. Denise Bannan, Mr. Jeff Chapko, Dr. Jill Langen This presentation outlines what steps would have been beneficial, along with deploying some CQIlean.
Advertisements

Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advanced Database Projects In Access © Hodder Education 2008 Access Projects – Problem Specification.
Learning Community II Survey Spring 2007 Analysis by Intisar Hibschweiler (Core Director) and Mimi Steadman (Director of Institutional Assessment)
Management Information Systems Business Processes, Information, and Information Systems Chapter 2.
IT 244 Database Management System Data Modeling 1 Ref: A First Course in Database System Jeffrey D Ullman & Jennifer Widom.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Unit B065 – Coding a solution PREP WORK 1)Make sure you keep a work log / diary. Use the table on page 16 of the hand book as a template 2)Keep a bibliography.
网络课程的小组学习项目设计 Group-Project Design for An Online Course Feng-Qi Lai, Ph.D. Indiana State University.
PROJECT WORK System Development Cycle. OVERVIEW Project work for the HSC course follows five stages of the traditional system development cycle. The SDC.
Chapter 6: THE EIGHT STEP PROCESS FOCUS: This chapter provides a description of the application of customer-driven project management.
Contemplative Practices Interviews Department of Computer Science Virginia Tech Blacksburg, VA 24061, USA April 30, 2015.
Prepared by Dr. Hoda Abdel Azim
Integration testing Integrate two or more module.i.e. communicate between the modules. Follow a white box testing (Testing the code)
Logical Architecture and UML Package Diagrams. The logical architecture is the large-scale organization of the software classes into packages, subsystems,
Abstract This study examined differences in self-reports before and after a targeted intervention for middle school students with two or more failing grades.
Social Interactome Breathalyzer “Breathe-EZ”
3.3 Fundamentals of data representation
Understanding NMC allegations data, and developing a coding frame to categorise future allegations data Rob Francis Matt Reynolds March 2017 Restricted.
Chapter 4: Business Process and Functional Modeling, continued
CS 325: Software Engineering
ITEC 630 Final Examination Spring 2015
Lesson Objectives Aims You should be able to:
Voluntary Pre-K: SIS Data Entry Good morning.
System Design Ashima Wadhwa.
SAMPLE Foster an Effective Feedback Environment
LCGAA nightlies infrastructure
SYSTEMS ANALYSIS Chapter-2.
Multi Rater Feedback Surveys FAQs for Participants
Multi Rater Feedback Surveys FAQs for Participants
Create a project schedule
CILIP Professional Registration & Portfolio Building
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
MaryCatherine Jones, MPH, Public Health Consultant, CVH Team
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Tweet Collections Multimedia, Hypertext, and Information Access
A Step-By-Step Tutorial for the Discipline Data Reporting Tool The Delaware Positive Behavior Support Project Slide 1:   Welcome to.
See Through Fog Imaging Project: P06441
A Step-By-Step Tutorial for the Discipline Data Reporting Tool The Delaware Positive Behavior Support Project Slide 1:   Welcome to.
Hey everyone, I’m Sunny …harsh caroline xavier
Graph Query Portal Amit Dayal David Brock
Introduction to Computer Programming
Database Vs. Data Warehouse
Collegiate Times Grades
Youngwummin: Ethics and Data Collection Methods
Data Quality By Suparna Kansakar.
Software life cycle models
Collection Management Webpages Final Presentation
Assistive Technology Data Collection
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Validation of Ebola LOD
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Using Family Survey Data for Program Improvement
Paleontology Topic Trends
Techniques for Requesting and Receiving Feedback in the Classroom
Social Interactome Recommender Team
Programming We have seen various examples of programming languages
Preparing a PROFILOR® Feedback Report
Prodcom ESTP course October 2010
Workforce Engagement Survey
SENSE: Survey of New Student Engagement
Autism Support Portal Members: Sib Quayum, Ryan Galliher, Ayumi Ritchie, Kenneth Nagies Course: Multimedia, Hypertext, and Information Access (CS 4624)
IT 244 Database Management System
Learning Community II Survey
Algorithms for Selecting Mirror Sites for Parallel Download
Joint Application Development (JAD)
Math Sports Final Release Bridger.
Adult Day Services Promotional Video
Python4ML An open-source course for everyone
Presentation transcript:

Social Interactome Recommender Team Final Presentation Sarah Devlin, Mary Kwiatkowski, Bridget Larcher, Timothy Tran, Matt Blumen, Wilson Rhodes CS 4624: Multimedia, Hypertext, and Information Access Virginia Tech, Blacksburg, VA 24061 May 2, 2017

Overview Social Interactome Project Deliverables Two Methods Top-Down Diffusion Lessons Learned Friendica Logo Social Interactome: research project that examines how social networks can be used to help substance abusers recover from their addiction. The goal of the project was to evaluate friendships and the homophily-based measures that two recovery buddies have in common and see how it contributes to their relationship.

Dataflow Diagram As you can see from the diagram, we had two different sources of information that were used in our code. The information collected from Experiment 1 Replicate 2 was stored in a database we could query, and there were also a number of surveys that were taken separately from which we could access from our Client. These surveys required manual cleaning, mostly due to some formatting issues, as well as automated cleaning for the accuracy of user input. The TopDown method relied on the completion of automated cleaning, but the diffusion code relied solely on clean data from the database and required no additional cleaning. This allowed us to divide and conquer the two methods to produce the output as efficiently as possible.

Sample Data from ‘different’ cleaning Dataframe Data Cleaning 1536 total buddy pairs 900 correctly reported buddies 288 incorrectly reported buddies approx 348 non-answers 5 very close, 50 somewhat close At the start of the study, users filled out a survey, asking to report their 6 assigned buddies. In order to help make top-down analysis more accurate we needed to check and clean this survey data. We compared the survey buddy results with the actual buddies assigned in the database. From this we were able find 288 incorrectly listed buddies. This allowed us to ignore the data from users that were proven to be inaccurate. We noticed some trends in the incorrect data, for example users were more likely to report 5 or 6 incorrect buddies than just one or two. This trend can be seen in the sample data shown to the right. The survey also asked the users to enter the “closeness” they felt toward their six buddies on a scale of not close, somewhat close, and very close. Based on the cleaning of the data we narrowed down all the entries to only 5 very close and 50 somewhat close pairs. Sample Data from ‘different’ cleaning Dataframe

Top-Down After Mary cleaned the data, it was given to me to work on what we called the topDown approach. Using the data, I looked at users who identified recovery buddies as “somewhat close” or “very close”. Among these relationships, I compared the users’ demographic information looking for trends. We expected more strong trends or clear similarities between users, but some false data we got made that difficult. The number of somewhat and very close relationships was also pretty low, making it difficult to do anything with a sample size this small. We also didn’t implement ordinality in this comparison, for example someone in the lowest income bracket and the second lowest are considered not similar in the same way that someone in the lowest income bracket and the highest income bracket would also not be considered the same. In the future we want users that are closer on the income spectrum to be viewed as more similar to each other. The Demographic Similarities Compared to the Total Number of Close Users in both Networks

Diffusion Diffusion, in the context of our project, is the spread of activity from one user to another. We measured the number of times each user participated in the same activity after one of their recovery buddies had. You can see these relationships in the network diagrams shown here, where the size of the node varies depending upon the amount of diffusion that user experiences. You can identify the users that have the most impact on their buddies’ participation in friendica. It was really interesting to see how one activity in particular could diffuse through a group of buddies. Its also interesting to see how the diffusion mimics the structure of the network. In Lattice, users are connected with many overlapping friends, which can be seen by many of these little diffusion networks. In Small world, users are connected so that the net distance from one friend to another is minimized, which you can see from the large interconnected network. Diffusion network diagram - Lattice Organization Diffusion network diagram - Small World organization

Lessons Learned Changing of goals and requirements require additional communication Team size can be a disadvantage without structured roles Being provided with some false data meant cleaning and working with limited data There were 3 major lessons we learned from problems we encountered over the course of our project: Problem: One problem that we ran into was that our requirements changed over the course of the semester. We were originally going to extend a previous team’s friend-recommending algorithm, but as needs changed, we shifted to more of an analysis role. From there, we focused on our top-down and diffusion methods to evaluate friendships using homophily-based measures. Solution: Since our requirements changed so much, we met with our client, Prashant, very often towards the end of the semester. Problem: The size of our team was another drawback. This had to do with communication, scheduling meetings, and delegation of work. It was also difficult for us to all meet at once and maintain focus. Solution: We learned to delegate work based on our team members’ capabilities and schedules making use of frequent communication and doodle polls. With more organized structure of roles and clearer communication, we overcame our initial work stagnation. Problem: We had access to little applicable data: a single survey with self-reporting, and only 256 participants. We were provided with some data that we could not use because the participants entered false information. This may have been because portions of the experiment were incentivised. Some of the entries were missing and some survey participants disregarded instructions and put unrelated data. Solution: We wrote code to clean and remove false data. As Mary mentioned, approximately one fourth of our data was incorrect and had to be removed. Of the remaining records, only 55 suited our needs and had self-reported friendships.

Acknowledgements Citations Client: Prashant Chandrasekar Professor: Dr. Edward A. Fox Social Interactome Team Grant: National Institute of Health #1R01DA039456-01 Citations https://github.com/tttran/SIFriendRecommender https://github.com/friendica/friendica/tree/develop/images https://vtechworks.lib.vt.edu/handle/10919/70956