Advanced Software Engineering PROJECT November 2015
1. Spark on Yarn and Mesos Yarn/Mesos+Spark+SparkStreaming Mesos: fine/coarse-grained Application: to retrieve Wikepedia posts as a stream Demonstrate “how” Yarn and Mesos help resource scheduling Visualization through Ganglia Performance analysis and comparisons Max. 2 students
2. Sparkathon Build super-smart weather apps that harness the power of IBM Bluemix™ and Apache Spark™. Worldwide competition, submitted by January 20, 2016 (5:00pm Eastern Time) Max. 4 students 只要入围, Project 给满分!
3. US Road Network Analysis GraphX + MLlib + Spark Streaming + Tableau Think about what you can do ^^ Community detection: J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6(1) , 2009.Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters Max. 3 students
4. Images on Internet s/dl/p2.html s/dl/p2.html Can you predict the seasonal trends? news trends? Political trends? Or human behaviors? Max. 3 students Data visualization (Tableau)
5. Your Proposal! Your turn to choose one Technically sound Application oriented Talk with me next lecture!
Arrangement Nov. 23: let me know your choice in class Week 1-2: Define your roles in a group, start literature research, and decide what to do Week 2-4: Propose solutions (architectures, frameworks, opensource, steps) Week 5-8: Implementation, performance analysis, and obtain results Week 9: Wrap up and spend a few days writing your report Jan. 16: project report to Jan. 18: project presentation
About Result Figures/Tables Each Image: Clear x-y labling At least 2 lines (One yours, one comparison) Each line at least 5 connected points Each Table Clear x-y labling At least 2 algorithms (One yours, one comparison) Each row/column at least 5 fields
Report Outline Introduction: background, why this, briefly explain what others have done Related Work: 10+ references (after 2012), describe what others have done, their advantages/disadvantages System Architecture Detailed Design: component design, algorithm design Simulation Results: figures/tables, discussions Conclusions and Future Work Written in English +5!
Attention! Not just an engineering project Aiming to train your research capability Look at what others have done? Are there existing problems? How to improve? Performance? Performance Analysis (10+ figures) Yours: accuracy, throughput, delay, system bound Comparisions: other algos, other systems OPEN SOURCE is the key! Grading is based on YOUR contributions!
IEEE Xplore:
Social Network Analysis Advanced Software Engineering
Key Players How to identify key/central nodes in network
Cohesion How to characterize a network’s structure
Example Facebook: 5.8million users (2009), avr 5.73 degrees, max 12 degrees Twitter: 5.2 billion relationships, avr 4.67 degrees 50% users only 4 step away Almost everyone <5 steps For any 1,500 random users, steps Erdos Number: Collaborative distance through paper co- authoring
Experiment: Forwarding Letters in US
Example: Social Evolution data set by MIT Media Lab 80 undergraduates with smart devices, moving around the campus. collects the phone usages and student locations from October 2008 to June phone usage: 3.15 million records of Bluetooth scans 3.63 million scans of WLAN access-points 61,100 call records 47,700 logged SMS events. students provide offline, self-report answers related to their health habits, diet and exercise, weight changes, and political opinions during the presidential election campaign.
Contact graph, only links of greater than 2,000 contacts between two students are shown. Bigger nodes indicate higher betweenness centrality value for the corresponding participants. Thicker edges indicate higher contact frequency between the connected nodes.