A Research Collaboratory for Open Source Software Research Yongqin Gao, Matt van Antwerp, Scott Christley, Greg Madey Computer Science & Engineering University.

Slides:



Advertisements
Similar presentations
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
Advertisements

Research CU Boulder Cyberinfrastructure & Data management Thomas Hauser Director Research Computing CU-Boulder
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
The Experience Factory May 2004 Leonardo Vaccaro.
By Chris Zachor.  Introduction  Background  Open Source Software  The SourceForge community and network  Previous Work  What can be done different?
NAACSOS 2005Scott Christley, Temporal Analysis of Social Positions An Algorithm for Temporal Analysis of Social Positions Scott Christley, Greg Madey Dept.
Community of Science The Leading Internet Site for Researchers Worldwide
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Is 'Designing' Cyberinfrastructure - or, Even, Defining It - Possible? Peter A. Freeman National Science Foundation January 29, 2007 The views expressed.
Online Collaboratory for NOM Research: Agent-based Simulations, Data-Mining, and Knowledge-Discovery Madey, G.R., Cabaniss, S.E., Maurice, P.A., Xiang,
Towards Understanding: A Study of the SourceForge.net Community using Modeling and Simulation Yongqin Gao Greg Madey Computer Science & Engineering University.
A Web-based Collaboratory for Supporting Environmental Science Research Xiaorong Xiang Yingping Huang Greg Madey Department of Computer Science and Engineering.
Supported in part by the National Science Foundation – ISS/Digital Science & Technology Analysis of the Open Source Software development community using.
SCHOOL OF INFORMATION UNIVERSITY OF MICHIGAN The Coming of Cyberinfrastructure Gary M. Olson.
Chapter 5 Application Software.
Analysis and Modeling of the Open Source Software Community Yongqin Gao, Greg Madey Computer Science & Engineering University of Notre Dame Vincent Freeh.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Fundamentals of Information Systems, Fifth Edition
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
VCP Virtual Community Portal Bruxelles, February 19-20, 2004 Claudio Beltrame.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Markus Dolensky, ESO Technical Lead The AVO Project Overview & Context ASTRO-WISE ((G)A)VO Meeting, Groningen, 06-May-2004 A number of slides are based.
1 What is the history of the Internet? ARPANET (Advanced Research Projects Agency Network) TCP/IP (Transmission Control Protocol/Internet Protocol) NSFNET.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
CceHUB An Environment for Collaborative Cancer Research Ann Christine Catlin CCE Annual Retreat May 26, 2010 clinical dataobservational & scientific data.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
© Internet 2012 Internet2 and Global Collaboration APAN 33 Chiang Mai 14 February 2012 Stephen Wolff Internet2.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Agent 2004Scott Christley, Public Goods Theory of Open Source Community Public Goods Theory of the Open Source Development Community using Agent-based.
6/12/99 Java GrandeT. Haupt1 The Gateway System This project is a collaborative effort between Northeast Parallel Architectures Center (NPAC) Ohio Supercomputer.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation Analyzing.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation Enabling.
Lucas Taylor LHC Grid Fest CERN 3 rd Oct Collaborating at a distance 1 Collaborating at a Distance Lucas Taylor ▪ LHC Grid Fest ▪ 3 Oct 2008 ▪ CERN.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
1 Title: Introduction to Computer Instructor: I LTAF M EHDI.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
Project Management May 30th, Team Members Name Project Role Gint of Communications Sai
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Internet2 Applications Group: Renater Group Presentation T. Charles Yun Internet2 Program Manager, Applications Group 30 October 2001.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA MINING Jin Xu, Yingping Huang, Gregory Madey Department of Computer Science and Engineering University.
NORDUnet NORDUnet e-Infrastrucure: Grids and Hybrid Networks Lars Fischer CTO, NORDUnet Fall 2006 Internet2 Member Meeting, Chicago.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
The Global Organization for Earth System Science Portals Don Middleton National Center for Atmospheric Research; Boulder, Colorado, USA Scientific Computing.
Lake Arrowhead 2005Scott Christley, Understanding Open Source Understanding the Open Source Software Community Presented by Scott Christley Dept. of Computer.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Information Technology Ms. Abeer Helwa
Grid Application Model and Design and Implementation of Grid Services
The Database Environment
Radical Collaboration between Computer Science and Archival Science to Educate Next Generation Archivists Jane Zhang Catholic University of America 2019.
Presentation transcript:

A Research Collaboratory for Open Source Software Research Yongqin Gao, Matt van Antwerp, Scott Christley, Greg Madey Computer Science & Engineering University of Note Dame ICSE - FLOSS 2007 Minneapolis, MN May 21, 2007 Supported in part by the National Science Foundation, CISE/IIS-Digital Society & Technology, under Grant No

Presentation Outline Background Collaboratories CyberInfrastructure Research Virtual Organizations (RVOs) Research data Design Utilization statistics Summary

Background - FLOSS Research  Current FLOSS research methods include:  Simulation studies  Surveys/interviews  Software analysis  Large-scale data collection  Statistical analysis  Data mining  The next steps  Collaboratories  CyberInfrastructure  Research Virtual Organizations (RVOs)  Such virtual organizations supporting distributed communities go by numerous names: collaboratories, co-laboratories, grid communities, science gateways, science portals, and others (NSF, 2007)

Research Collaboratory What is a Collaboratory? Precursor to the idea of the NSF CyberInfrastructure (CI) Collection of shared data, information, analytical toolkits and communication technologies A networked organizational form that includes social processes, collaboration techniques and agreements on norms, principles, value, and rules Finholt, T. A., and Olson, G. M. (1997) From laboratories to collaboratories: A new organizational form for scientific collaboration. Psychological Science. 8(1),

 ~200 in a ‘05 taxonomy (  Bioinformatics - Genomic resources (data & tools) NCBI, FlyBase, Ensembl, VectorBase, WormBase, etc.  NEES - Network for Earthquake Engineering Simulation NEES is a shared national network of 15 experimental facilities, collaborative tools, a centralized data repository, and earthquake simulation software, all linked by the ultra-high- speed Internet2 connections of NEESgrid. These resources support collaboration and discovery in the form of more advanced research based on experimentation and computational simulations of the ways buildings, bridges, utility systems, coastal regions, and geomaterials perform during seismic events. Research Collaboratories

Research Collaboratories (cont) CLEANER – An environmental cyberinfrastructure that provides data archives, collaboration and networking among community members, and information technology for engineering modeling, analysis, and visualization of data – Includes a CyberCollaboratory: a collaborative space where communities of researchers, practitioners, and policy-makers, and others come together to share knowledge and information, analyze data, solve problems, and collaborate on publications. – The CLEANER Project uses the CyberCollaboratory to support over 100 researchers and educators.

Research Collaboratories (cont) FLOSS Research - examples include: – FLOSSmole “Screen scraped” data from SourceForge, FreshMeat, RubyForge, FSF, etc. – CVSAnalY - GSyC/LibreSoft CVS/Subversion statistical analysis tool – The SourceForge Research Data Archive Archive of SourceForge.net back-end database dumps Wiki-based collaboratory This presentation!

Research Data Description SourceForge.net – The largest OSS development community – 148,000+ registered projects – 1,586,000+ registered users – Project data – Downloads, bug reports, forum activity, developers, project characteristics, etc. – Developer data – Activity – Project membership

Research Data Description Our Data Set – 30 monthly dumps between January April – 488G total and growing at 12G/month. – Every dump has tables. – Tables have up to 30 million records. Hosting Environment – Dual Xeon 3.06GHz, 4G RAM, 2T RAID storage – Linux ELsmp with PostgreSQL 8.1

Design Presentation Tier Browser interface Wiki Logic Tier Authentication Schema browser Queries & download Data Tier PostgreSQL Monthly schema

Data Tier PostgreSQL Database - timeline Monthly schema: one for each dump Mirrors the SourceForge.net backend

Data Tier Connection pool Persistent connections for improved performance

Presentation Tier Various access methods Documentation and references Community support - FAQ, schema browser, table definitions Wiki interface

Schema Browser

Logic Tier Interactive web query system – Authorized user can submit query to the back end repository through the web query – Results are provided by files with various formats: text with various delimiters, XML – Dynamic web schema browser – Authorized user can access the dynamic schema of the repository through the schema browser

Utilization - Sample Monthly activity (June 2006) – Total queries submitted: 16,947 – Total data files retrieved: 13,343 – Total bytes of query data downloaded: 26,684,556,278 Monthly activity (Feb 2007) – Total queries submitted: 38,659 – Total data files retrieved: 24,422 – Total bytes of query data downloaded: 13,048,335,165

Utilization Statistics

Summary SourceForge.net archive – – Access open to all academic/scholarly researchers - sublicense Plans – Programmable access method should be provided for complicated access – Web services in testing phase – Analysis/data mining tools, preselected data sets, etc. – FLOSS CyberInfrastructure - network of collaboratories? – FLOSS Research Virtual Organization (FLOSS-RVO)

Thank You! GNU