1 st The Arabidopsis Information Resource (TAIR) Workshop for Database/Web Resource Developers (those currently developing or want to develop or interested.

Slides:



Advertisements
Similar presentations
TAIR: Bringing together data for the global plant biology community kate dreher curator TAIR/PMN.
Advertisements

The Arabidopsis Information Resource (TAIR)
Kate Dreher AraCyc, TAIR, PMN Carnegie Institution for Science
Part I: Tips and techniques from curators Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.
Anatomy of a Multimedia Project
Case Studies Slovenia Julija Kutin METIS Workshop on the Statistical Business Process and Case.
Issues in Managing and Disseminating Changing Information in Biology Sue Rhee Carnegie Institution Department of Plant Biology.
Paula Mabee, University of South Dakota Eva Huala, Carnegie Institution for Science Andy Deans, North Carolina State University Suzanna Lewis, Lawrence.
How we assist knowledge collection Serving the monks Chris Evelo Dept of Bioinformatics – BiGCaT Maastricht University.
Peer Assessment of 5-year Performance ARS National Program 301: Plant, Microbial and Insect Genetic Resources, Genomics and Genetic Improvement Summary.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Alliance for Strategic Technology (AST) SUNY Business Intelligence Initiative January 8, 2009.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
WFleaBase Daphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept Don Gilbert,
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Erasmus Mundus Action 2. Missions of the EACEA  Implementing Community programmes  Managing projects life cycle  Information and communication  Results.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Ask A Librarian and QuestionPoint: Integrating Collaborative Digital Reference in the Real World (and in a really big library) Linda J. White Digital Project.
PROJECT MANAGEMENT Advanced Design, Multimedia & Web Technologies.
PLEXdb Plant Expression database Ethalinda Cannon Iowa State University January 15th, 2007.
National Workshop on ANSN Capacity Building IT modules OAP, Thailand 25 th – 27 th June 2013 KUNJEER Sameer B History of centralized ANSN website as well.
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Gramene’s Outreach Program. Outreach Components Workshops Website Improvements / Additions Public Announcements High School Outreach Collaborators and.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
TAIR Workshop Model Organism Databases and Community Annotation Plant and Animal Genome XVI Conference, San Diego January 13, 2008.
ICT Standards and Guidelines The Structure of the Project Akram Najjar CNSI – Senior Consultant Director of InfoConsult.
Copyright OpenHelix. No use or reproduction without express written consent1.
Expanding Biomedical Research in Maine Mount Desert Island Biological Laboratory Patricia H. Hand, Ph.D. Administrative Director.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
Community Interactions: Feedback, Support and Curation Eva Huala The Arabidopsis Information Resource (TAIR)
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
Bioinformatics Core Facility Guglielmo Roma January 2011.
CSI Leader Orientation – 101 Presented By:. To advance the process of creating and sustaining the built environment for the benefit of the construction.
The New Website of the Gene Ontology Consortium Seth Carbon Chris Mungall, PhD Monica Munoz-Torres, PhD Genomics Division,
DATA MANAGEMENT AND CURATION AT TAIR
PubSearch Danny Yoo, Iris Xu, Behzad Mahini Pub* Tools Website: Literature Curaotors’ Website:
The Public Face of TAIR User Interface Design Responsiveness to User Input.
1 Partnership: "Strengthening Teaching and Outreach Capabilities in Business and Management Education at Tavrida National University, Ukraine, Under a.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
AMA June 09 Board Meeting Discussion with the PCC Jennie McConagha & Kelley Peterson.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Presented by the College of Arts & Sciences with the Office of Contracts and Grants University of San Francisco April 2012.
Planning for School Implementation. Choice Programs Requires both district and school level coordination roles The district office establishes guidelines,
Development and Use of Controlled Vocabularies at the Arabidopsis Information Resource (TAIR) Sue Rhee Carnegie Institution Dept. Plant Biology
How Linked Open Data helps Museums Collaborate, Reach New Audiences, and Improve Access to art Information Eleanor E. Fink Manager, American Art Collaborative.
Building and Refining AraCyc: Data Content, Sources, and Methodologies Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science.

2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Erwin Laure ScalaLife Project Director.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Proposed Future for the ECB, CAM, & QVR Electronic Council Book (ECB) Council Administration Module (CAM) Query/View/Report (QVR) eRA Project Team July.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Towards a unified MOD resource: An Overview
Witness Statement – TAIR
Prepared by the PCC Web Task Force
Creating a Collaborative Learning Classroom
Functional Annotation of the Horse Genome
Adopt-A-Facility Program
San Diego Supercomputer Center
VALE Annual Users’ Conference
Presentation transcript:

1 st The Arabidopsis Information Resource (TAIR) Workshop for Database/Web Resource Developers (those currently developing or want to develop or interested in learning about developing biological databases) Speaker 1: Sue Rhee, Carnegie Institution, Dept. Plant Biology Speaker 2: Dan Weems, National Center for Genome Resources Speaker 3: Neil Miller, National Center for Genome Resources Speaker 4: Eva Huala, Carnegie Institution, Dept. Plant Biology Speaker 5: Marga Garcia-Hernandez, Carnegie Institution, Dept. Plant Biology January 10, 2004, Plant & Animal Genome XII

Goals of This Workshop Introduce the TAIR project Describe the system and human resources for developing/maintaining TAIR Present the reasons and approaches we took to develop and maintain TAIR Provide future directions for the main components of TAIR Address specific questions from the audience Panel discussion on general issues brought up by the audience

The Arabidopsis Information Resource (TAIR) Mission: An Arabidopsis community information management system to provide facile, unrestricted, and permanent access to accurate, up-to-date information about Arabidopsis biology A collaboration between Carnegie Institution and National Center for Genome Resources (NCGR) Started in 1999 Supported by NSF, NIH, Carnegie Institution, and NCGR

AAtDB (Harvard/MIT) Registered users ,000 12,000 AtDB (Stanford) TAIR (Carnegie/NCGR) 1  2  3  5  9  10  15  20  17 TAIR hosts first completed 2010 project (AFGC) Genome sequence released !!!!Renewal A short history of Arabidopsis Databases FTEs AGI- genome sequencing ( ) TAIR-ABRC-AIMS merged 2010 initiative ( )

Usage Statistics Monthly: ~900,000 page views ~30,000 IP addresses

Major Data Types Data TypesYear 4 Alias535,994 Assignment476,933 FeatureAssignment1,153,647 Attribution2,916,590 Clone348,300 CloneEnd254,573 Communication30,478 GeneModel37,450 Locus31,212 GeneAlias81,607 GeneticMarker3,700 Comment257,352 Data TypesYear 4 ExternalLink132,062 Sequence597,105 Polymorphism244,664 Organization4,847 Person12,518 Journal1,028 Publication20,605 Protein32,225 ProteinDomain14,956 Stock383,348 StockOrders17,620 SpeciesVariant838

Examples of New Data Types Added in One Year Data TypesYear 4 Germplasm197,688 Image1,390 Keyword16,945 Gene Annotation34,581 Taxon46 ExpressionSet115 Array Hybridizations515 ExpressionSummary5,948,096 And many more new data types we are currently adding and planning to add!

What do we do? User support Attend meetings, provide workshops, write web content and publications Research into technologies, design the logic of software structure, implement software Software documentation, maintenance, enhancement Design use cases, requirements, and specifications for software (querying, visualizing, browsing, editing, importing, exporting, analyzing data) Conceptual and logical design of data model Physical implementation of the database structure Identify sources of data, define data to curate, establish curation methods, and curate data Communicate with data providers

Organization structure and management Hierarchical breakdown of goals  projects  tasks  subtasks Establishment of project priority list and project leaders Individual project team members meet ad-hoc or regularly Establishment of 4-week cycle (breakdown goals to those that can be accomplished in 4 weeks follow-up once every two weeks Quarterly in-person meeting for 2 full days to review/revise the projects, priorities, and overall goals

TAIR Budget versus Expenditure

Current Financial and Human Resources GRANTS 7 active grants Original TAIR budget (incl. Supplements): $3,901,561 Additional budget from 6 other grants: $1,501,693 Total Budget of active grants for 5 yrs: $5,403,254 Annual Budget (Direct Cost): $1,080,650 PEOPLE Curators and assistants: 7.55 FTE Programmers and DB developer: 6.4 FTE Postdocs: 2 FTE DBA &SysAD: 0.1 FTE Web master: 0.7 FTE Outreach coordinator: 0.5 FTE Total: FTE

Alliances, Collaborations, Outreach Active participation in: Gene Ontology Consortium: controlled vocabularies GMOD (Generic Model Organism Database): software Plant Ontology Consortium: controlled vocabularies BioCurator: literature curation Bay Area Database Curator Consortium: curation issues Close Collaboration with: TIGR: genome annotation ABRC and NASC: stocks Garnet (UK), Gent/VIB (Belgium), AtGenExpress (Germany): microarrays MetaCyc: metabolic pathways, reactions, compounds Cold Tolerance Project: microarrays, transcriptional regulation Unknown GFP Localization Project: protein localization, unknown proteins Workshops : 14 th International Conference on Arabidopsis Research American Society of Plant Biologists meeting Plant & Animal Genome XII Conference (PAG XII) Local workshops at Stanford and Berkeley

General Lessons Learned 1. Don’t underestimate the time for planning, researching available technologies, knowledge, and people, conceptualizing and designing (It takes more time and pain to ‘redo’ than start slow!) 2. Collaboration between program-illiterate biologist and biology- illiterate programmer is IDEAL. Make no assumptions. 4. Nothing is ever COMPLETED. Always leave room for maintenance and enhancement. 3. While matrix organization (person *--* projects) is unavoidable, minimize the number of projects per person at a given time 5. Find other groups that are dealing with similar goals and collaborate. Good ideas come from talking to others.

General Future Directions/Issues -connection to other plant databases and other web resources -data exchange formats (excel, xml) -data presentation formats (xml, rdf) -db connection methods (CORBA, SOAP, BioMoby) -software sharing (Open Source, GMOD) -community curation (unresolved) -long-term sustainability (unresolved)

Current People Involved TAIR-Carnegie Director: Sue Rhee Head curator: Eva Huala Curators: Tanya Berardini Margarita Garcia-Hernandez Nick Moseyko Suparna Mundodi Leonore Reiser Peifen Zhang Curator assistant: Brandon Zoeckler Programmers: Behzad Mahini Danny Yoo Iris Xu Jessie Zhang Web master: Julie Tacklind Intern: Thomas Yan (San Jose State U.) TAIR-NCGR Project leader and DB developer: Dan Weems Senior programmer: Neil Miller Programmer: Mary Montoya DB Administrator: Faye Schilkey Systems Administrator: Forrest Black

Speakers Speaker 2: Dan Weems, National Center for Genome Resources “Design considerations while building the TAIR database ” Speaker 3: Neil Miller, National Center for Genome Resources “TAIR hardware and software architecture” Speaker 4: Eva Huala, Carnegie Institution, Dept. Plant Biology “Public face of TAIR: User interface design and incorporation of community feedback” Speaker 5: Marga Garcia-Hernandez, Carnegie Institution, Dept. Plant Biology “Data management and curation at TAIR”

Job Opportunity! We are looking for a Programmer at Carnegie Institution, Stanford, CA to start immediately to participate in TAIR software development. Tasks: Set up and maintain structural genome annotation pipeline using existing open-source software in collaboration with a curator. Development of TAIR’s curation and user web applications in collaboration with other developers and curators. Documentation of software. Presenting work at meetings. Requirements: Solid skills and several years of experience with Perl, J2SE, J2EE, relational databases (we use Sybase, MySQL, PostGres), UNIX/Linux, and Apache. Excellent written and verbal communication skills in English. A team-player.

History of the Arabidopsis Community Database – AAtDB (Harvard/MIT) (1 FTEs) – AIMS & ABRC (Ohio State, NSF) – AAtDB becomes AtDB (Stanford, NSF) (2-5 FTEs) -300 community members in the beginning –1999TAIR transitions from AtDB (9 FTEs) community members in the beginning –2001TAIR merges with AIMS community members –2002TAIR hosts the first completed functional genomics projects (AFGC, , NSF) –2004TAIR assumes maintenance of Arabidopsis genome annotation (TIGR, , NSF) –2004TAIR up for a renewal for next five year (15 FTEs) -12,500 community members