What are developers talking about? AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW DENNIS PORTENGEN.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

House slaves Vs. Field slaves
Computing Science Software Design and Development SOFTWARE DESIGN AND DEVELOPMENT USING PYTHON.
Welcome to the seminar course
Multi-AbstractionRetrievalMulti-AbstractionRetrieval MotivationMotivation ExperimentsExperiments Overall Framework Multi-Abstraction Concern Localization.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Mapping Studies – Why and How Andy Burn. Resources The idea of employing evidence-based practices in software engineering was proposed in (Kitchenham.
Evaluating Websites. Why is it so important to evaluate websites ? Think about these differences… Print Books or Magazine Articles  Checked for accuracy.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
I won’t cite a paper as a reference unless I’ve read it first. This seems like an obvious rule. Am I ever tempted not to follow it? o I read a paper by.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
First, let’s talk about some of your introductions from last time: – What did you think was good about it? – What did you think was poor about it? What.
1 Annotated Bibliography. 2 WHAT IS AN ANNOTATED BIBLIOGRAPHY? An annotated bibliography is a list of citations to different written works (i.e., books,
Introduction to System Analysis and Design
Writing a Science or Engineering Paper: It is just a story Frank Shipman Department of Computer Science Texas A&M University.
How to Read a CS Research Paper? Philip W. L. Fong.
Analysis of Aerobraking Accelerometer Data from Mars Paul Withers BU CSP Journal Club Monday
Writing a Research Proposal
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Course Timeline October 7 th : Project description October 14 th : Paperwork and Budget October 21 st : Successful Grant Writers (Project description due)
Computer Networks Paper Coordinator: Dr. Napoleon H. Reyes, Ph.D. Computer Science Institute of Information and Mathematical Sciences Rm
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Prepare Yourself for IR Research ChengXiang Zhai Department of Computer.
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
The Research on Credibility of Knowledge Management System Wang FanLin Department of Accounting Capital University of Economic Business Beijing, China.
Identification of the authors of short messages portals on the Internet using the methods of mathematical linguistics. Postgraduate:Sukhoparov M.E. Supervisor:doctor.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
CSCI 51 Introduction to Computer Science Dr. Joshua Stough January 20, 2009.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
MANAGING YOUR LITERATURE CRITICALLY “ A self-reflection on doing research” Dr. Norhalimah Idris D06 – – /013 –
Testing & modeling users. The aims Describe how to do user testing. Discuss the differences between user testing, usability testing and research experiments.
I'm thinking of a number. 12 is a factor of my number. What other factors MUST my number have?
Research Problem In one sentence, describe the problem that is the focus of your classroom research project about student learning: That students do not.
Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
Topic Modeling using Latent Dirichlet Allocation
Systematic Approaches to Literature Reviewing Dr Tamara O’Connor Student Learning Development
KUFA UNIVERSITY Department of Computer Science 09/12/2015.
BUILDING A GOOD PARAGRAPH OR: SAYING IT WITH STYLE!
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
1 Toward Metrics of Design Automation Research Impact Andrew B. Kahng ‡†, Mulong Luo †, Gi-Joon Nam 1, Siddhartha Nath †, David Z. Pan 2 and Gabriel Robins.
Literature Reviews Organizing for your Thesis Proposal.
Link Distribution on Wikipedia [0407]KwangHee Park.
Paper III Qualitative research methodology.  Qualitative research is designed to reveal a specific target audience’s range of behavior and the perceptions.
Writing a Science or Engineering Paper: It is just a story Frank Shipman Department of Computer Science Texas A&M University.
Visualization Lab By: Thomas Kraft.  What is being talked about and where?  Twitter has massive amounts of data  Tweets are unstructured  Goal: Quickly.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
What kind of social connections matter (in the context of job seeking)? Karon Gush, James Scott and Heather Laurie University of Essex The State of Social.
FOP: Multi-Screen Apps
Recommendation in Scholarly Big Data
Sentiment analysis algorithms and applications: A survey
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
3-7 Sample Programs This section contains several programs that you should study for programming technique and style. Computer Science: A Structured.
Multi-Dimensional Data Visualization
Frontiers of Computer Science, 2015, 9(4):608–622
U.S. History Research Paper Outline & Organizer (Due 4/11!!)
Provide Facts and Statistics
Mining Interviewer Observation
Topics discussed in this section:
Topics discussed in this section:
Michal Rosen-Zvi University of California, Irvine
Testing & modeling users
Bell Work 5/20/16 How do you think Astronomy will improve/change/be different in the coming decades? Why? I’m going to grade the next 10 days of bell.
U.S. History Research Paper Outline & Organizer
Research Instruments By: Dr. Matthew Kidder.
An Introduction to and Motivation for Visualization Research
Biology in the News.
I can tell the products of 6’s facts
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
Presentation transcript:

What are developers talking about? AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW DENNIS PORTENGEN

Authors Anton Barua (pursuing MSc. Computing Science) Stephen W. Thomas (PhD Computing Science) Dr. Ahmed E. Hassan (Business)

Goal of the paper “Uncovering the main discussion topics, their underlying dependencies, and trends over time.” (Barua et al., 2012) 4 RQs What are the main discussion topics? Does a question in one topic trigger answers in another? How does developer interest change over time? How do the interest in specific technologies change over time?

Main topics in article Topic modelling Uses word-frequencies and co-occurence frequencies to build a model of related words LDA (Latent Dirichlet Allocation) Statistical technique that creates topics of sets of words in a document Simple idea: ‘Planet’, ‘Space’, ‘Star’, ‘Orbit’ indicates that topic is related to astronomy

Research Methodology

PDD

Example Result of pre-processing Before pre-processingAfter pre-processing I’ve been having issues getting C sockets API to work properly in C++. Specifically, although I am including sys/socket.h, I still get compile time errors telling me that AF_INET is not defined. Am I missing something obvious, or could this be related to the fact that I’m doing this coding on z/OS and my problems are much more complicated? Issu c socket api work properly c++ specif include sy socket.h compil time error af_inet defin miss obvious relat fact code z os problem complic

Example output of LDA

Related Literature Categorized in 4 fields The general study of Q&A websites The study of Stack Overflow specifically The study of other social platforms for developers The use of LDA to study trends in software engineering data Difference with these studies Aimed at the textual context generated by users instead of user activity

Opinion STRONG POINTS Qualitative and quantitave techniques Large dataset Methodology applicable to other developer resources WEAK POINTS Methodology does not incorporate predictive model Experimentation with K value and value of treshold δ

Question time!