Introduction to the Microsoft Biology Foundation (MBF) Microsoft Biology Initiative Module 02.

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

Microsoft Office SharePoint Portal Server 2007 Introduction to InfoPath Forms Services Daryl L. Rudolph.
Tutorial 12: Enhancing Excel with Visual Basic for Applications
Java Programming, 3e Concepts and Techniques Chapter 4 Decision Making and Repetition with Reusable Objects.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
C# Programming: From Problem Analysis to Program Design1 Advanced Object-Oriented Programming Features C# Programming: From Problem Analysis to Program.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Guide To UNIX Using Linux Third Edition
ASP.NET Programming with C# and SQL Server First Edition
ASP.NET Programming with C# and SQL Server First Edition Chapter 8 Manipulating SQL Server Databases with ASP.NET.
1 An Introduction to Visual Basic Objectives Explain the history of programming languages Define the terminology used in object-oriented programming.
Creating and Running Your First C# Program Svetlin Nakov Telerik Corporation
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
Creating and Running Your First C# Program Telerik Software Academy Telerik School Academy.
1 Chapter One A First Program Using C#. 2 Objectives Learn about programming tasks Learn object-oriented programming concepts Learn about the C# programming.
McGraw-Hill© 2007 The McGraw-Hill Companies, Inc. All rights reserved. 1-1.
A First Program Using C#
Chapter 1: Introduction to Visual Basic.NET: Background and Perspective Visual Basic.NET Programming: From Problem Analysis to Program Design.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Tim Leung SQL Bits October  Features and Advantages  Architecture  Installation  Creating Reports.
M1G Introduction to Programming 2 4. Enhancing a class:Room.
Introduction to .Net Framework
Module 1: Introduction to C# Module 2: Variables and Data Types
Session 1 - Introduction and Data Access Layer
11 Getting Started with C# Chapter Objectives You will be able to: 1. Say in general terms how C# differs from C. 2. Create, compile, and run a.
Microsoft Visual Basic 2005: Reloaded Second Edition
Creating and Running Your First C# Program Svetlin Nakov Telerik Corporation
Lesley Bross, August 29, 2010 ArcGIS 10 add-in glossary.
An intro to programming. The purpose of writing a program is to solve a problem or take advantage of an opportunity Consists of multiple steps:  Understanding.
Java: Chapter 1 Computer Systems Computer Programming II.
Virtual techdays INDIA │ Nov 2010 Developing Office Biz Application using WPF on Windows 7 Sarang Datye │ Sr. Consultant, Microsoft Sridhar Poduri.
Lecture 1 Programming in C# Introducing C# Writing a C# Program.
Data File Access API : Under the Hood Simon Horwith CTO Etrilogy Ltd.
Tutorial 121 Creating a New Web Forms Page You will find that creating Web Forms is similar to creating traditional Windows applications in Visual Basic.
11 Web Services. 22 Objectives You will be able to Say what a web service is. Write and deploy a simple web service. Test a simple web service. Write.
DAT305 Boost Your Data-Driven Application Development Using SQL Server Centric.NET Code Generator Pascal Belaud Microsoft France.
Tutorial 111 The Visual Studio.NET Environment The major differences between Visual Basic 6.0 and Visual Basic.NET are the latter’s support for true object-oriented.
Scalable Game Development William Roberts Senior Game Engineer
Lecture Set 1 Part C: Understanding Visual Studio and.NET – Applications, Solutions, Projects (no longer used – embedded in Lecture Set 2A)
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
Programming in C#. I. Introduction C# (or C-Sharp) is a programming language. C# is used to write software that runs on the.NET Framework. Although C#
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. WEB.
Lecture Set 2 Part A: Creating an Application with Visual Studio – Solutions, Projects, Files.
Applied Computing Technology Laboratory QuickStart C# Learning to Program in C# Amy Roberge & John Linehan November 7, 2005.
Office Business Applications Workshop Defining Business Process and Workflows.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Getting Started with.NET Getting Started with.NET/Lesson 1/Slide 1 of 31 Objectives In this lesson, you will learn to: *Identify the components of the.NET.
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
Module 1 Introducing C# and the.NET Framework. Module Overview Introduction to the.NET Framework 4 Creating Projects Within Visual Studio 2010 Writing.
Microsoft Visual Basic 2015 CHAPTER ONE Introduction to Visual Basic 2015 Programming.
Microsoft Visual Basic 2012: Reloaded Fifth Edition Chapter One An Introduction to Visual Basic 2012.
1 Using jQuery JavaScript & jQuery the missing manual (Second Edition)
Lecture Transforming Data: Using Apache Xalan to apply XSLT transformations Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute.
Enterprise Library 3.0 Memi Lavi Solution Architect Microsoft Consulting Services Guy Burstein Senior Consultant Advantech – Microsoft Division.
INTRODUCTION BEGINNING C#. C# AND THE.NET RUNTIME AND LIBRARIES The C# compiler compiles and convert C# programs. NET Common Language Runtime (CLR) executes.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
Introduction to Visual Basic. NET,. NET Framework and Visual Studio
ASP.NET Programming with C# and SQL Server First Edition
Microsoft Foundation Classes MFC
Introduction to Visual Basic 2008 Programming
Hands-on Introduction to Visual Basic .NET
CIS16 Application Development Programming with Visual Basic
CIS16 Application Development – Programming with Visual Basic
Java IO and Testing made simple
Computer Programming-1 CSC 111
IS 135 Business Programming
Presentation transcript:

Introduction to the Microsoft Biology Foundation (MBF) Microsoft Biology Initiative Module 02

Agenda Introduction to MBF ▫ What is MBF? ▫ Common usage scenarios MBF Architecture ▫ Sequences, alphabets and symbols ▫ Parsers and formatters ▫ Introduction to algorithms MBF Starter Project ▫ Creating a new C# project MBF Source Code ▫ Building the source ▫ Testing with nUnit

What is MBF? Microsoft Biology Foundation (MBF) is a bioinformatics toolkit ▫ built on top of the.NET Framework 4.0 ▫ open source under MS-PL license ▫ foundation upon which other tools can be built Provides various components useful for biological analysis ▫ parsers to read and write common bioinformatics formats ▫ support for DNA, RNA and protein sequences ▫ algorithm framework for analysis and transformation ▫ web connector framework for web-service interaction

What is MBF intended to do? Primarily focused on genomics ▫ reusable data structures to represent sequences + symbols ▫ I/O framework to load/save sequences ▫ algorithm framework to process loaded sequences Provides an alternative to other biology frameworks ▫ similar concepts to BioJava or BioPerl ▫ takes advantage of Microsoft developer tools and.NET ▫ will evolve as Microsoft and other contributors add features Designed to manipulate large data sets ▫ in-memory compression of sequence data ▫ data virtualization for sequences larger than memory ▫ scalable algorithms that take advantage of multiple cores

MBF Design Goals Extensibility was a primary goal ▫ core concepts mapped as interfaces and ABCs ▫ can easily provide alternative implementations or add any missing features you need Language Neutral ▫ built on top of.NET – use any supported language ▫ supports dynamic languages such as IronPython Designed and implemented using best practices ▫ commented source code provided so nothing is a black box ▫ algorithms all cite publications Interoperability ▫ code can be run on several mainstream platforms

MBF vs. your application MBF is not an application in itself ▫ it does not provide any visualization of the data being managed ▫ it provides the basis for visualizations to be built on top of.NET Framework 4.0 Your Application MBF

Creating Applications with MBF MBF allows you to work with your data however you need Console (Text) Win Forms & WPF ASP.NET / WCF SilverlightAzureNT Service

Example: Sequence Assembler sequence data is loaded from FASTA file and assembled using MBF drawn as nucleotide symbols and graphics using WPF

Deploying your applications Possible to target non-Windows platforms [1] ▫ using Silverlight / Mono / Moonlight

Getting MBF MBF is available as an open source, free download ▫ Downloads section lists most recent build

MBF Licensing MBF is licensed under Ms-PL ▫ ▫ allows you to take the code and use it in academic or commercial products

Installing MBF Official releases are packaged as Setup Files ▫ include all the pre-built assemblies you can use immediately ▫ installs full.NET 4.0 framework if not already installed Several other tools available ▫ Sequence Assembler sample ▫ Excel add-in (

Installing MBF Run the Setup program to install the MBF elements ▫ only supports Windows installs, other platforms require manual installation using source code ▫ creates %Program Files%\Microsoft Biology Initiative\1.0\MBF

MBF Installation Installer creates several files and directories ▫ \Doc directory holds documentation files ▫ \Addin directory contains optional algorithms ▫ \Sdk directory contains samples and additional documentation ▫ Bio.dll is core MBF assembly ▫ WebServiceHandlers.dll provides web service capabilities

Documentation Several documents supplied with installation (in /Doc ) ▫ even more available from Two documents are required reading before you begin ▫ start with the MBF_Overview.docx ▫ then read the Programming_Guide.docx BioDotNet.chm help file provides API reference ▫ installed with SDK (full install) MBF_Overview.docx MBF_Programming_Guide.docx BioDotNet.chm

Download the source code Source code is also available online at CodePlex ▫ can download as a pre-packaged.ZIP file [1] Can also apply to contribute to the framework [2] ▫ provides TFS credentials to get access to repository

Architecture: Namespaces Bio Sequences Alphabets Alignments Genomic Intervals Phylogeny Bio.IO FASTA / FASTQ GenBank NEXUS … Bio.Algorithms Translation Alignment Sequence Assembly … Bio.Web BLAST ClustalW BioHPC …

MBF Core Types Bio namespace holds core types and base interfacesAlphabets ISequence DnaAlphabet ISequenceItem Nucleotide Sequence

Alphabets Valid symbols are defined in terms of an alphabet ▫ determines allowed characters and meaning Access standard alphabets through Instance properties ▫ supplies standard alphabets for DNA, RNA and protein sets ▫ can also access them using Alphabets static class Or create custom alphabets if necessary ▫ by implementing the IAlphabet interface var dnaAlphabet = DnaAlphabet.Instance;... var dnaAlphabet2 = Alphabets.DNA; var dnaAlphabet = DnaAlphabet.Instance;... var dnaAlphabet2 = Alphabets.DNA; These two statements retrieve the same alphabet

Sequence Items ISequenceItem defines a single symbol ▫ supplies name, attributes and character used to represent symbol ▫ most common form is Nucleotide var dnaAlphabet = DnaAlphabet.Instance; ISequenceItem dnaG = dnaAlphabet.LookupBySymbol("G"); Console.WriteLine("{0}: {1} {2} {3}, {4}, {5}", dnaG.Name, dnaG.IsAmbiguous, dnaG.IsGap, dnaG.IsTermination, dnaG.Symbol, dnaG.Value); var dnaAlphabet = DnaAlphabet.Instance; ISequenceItem dnaG = dnaAlphabet.LookupBySymbol("G"); Console.WriteLine("{0}: {1} {2} {3}, {4}, {5}", dnaG.Name, dnaG.IsAmbiguous, dnaG.IsGap, dnaG.IsTermination, dnaG.Symbol, dnaG.Value); Guanine: False False False, G, 0

Representing Sequences ISequence interface represents ordered list of sequence items ▫ store data relevant to DNA, RNA and Amino Acid structures ▫ can work with sequence as a list of items, or as a string public interface ISequence : IList { string ID { get; } string DisplayID { get; } IAlphabet Alphabet { get; } object Documentation { get; set; } MoleculeType MoleculeType { get; }... string ToString(); } public interface ISequence : IList { string ID { get; } string DisplayID { get; } IAlphabet Alphabet { get; } object Documentation { get; set; } MoleculeType MoleculeType { get; }... string ToString(); }

Sequence Implementations Several ISequence implementations in the framework ▫ each optimized for a specific purpose, most common is Sequence

Creating new sequences Sequence type is most basic ISequence implementation ▫ created as read-only by default ▫ insert Nucleotide items to populate ISequence sequence = new Sequence(Alphabets.DNA, "AGCT");... sequence.IsReadOnly = false; sequence.Add(DnaAlphabet.Instance.AC); sequence.Add(new Nucleotide('-',"Gap"));... sequence.RemoveAt(0); Console.WriteLine(sequence); ISequence sequence = new Sequence(Alphabets.DNA, "AGCT");... sequence.IsReadOnly = false; sequence.Add(DnaAlphabet.Instance.AC); sequence.Add(new Nucleotide('-',"Gap"));... sequence.RemoveAt(0); Console.WriteLine(sequence); GCTM-

Working with string-based data Common to work with sequences a strings ▫ provides a readable representation of the data ▫ lose some information (gaps, terminators, etc.) ▫ not efficient for larger sequences void ProcessSequence(ISequence sequence) { string data = sequence.ToString(); string reverse = new string(data.Reverse().ToArray()); foreach (char symbol in reverse) {... } } void ProcessSequence(ISequence sequence) { string data = sequence.ToString(); string reverse = new string(data.Reverse().ToArray()); foreach (char symbol in reverse) {... } }

Working with sequence-based data Better to work with real ISequenceItem data ▫ maintains full identity ▫ helper properties on ISequence perform common tasks [1] void ProcessSequence(ISequence sequence) { ISequence reverse = sequence.Reverse; foreach (ISequenceItem symbol in reverse) {... } } void ProcessSequence(ISequence sequence) { ISequence reverse = sequence.Reverse; foreach (ISequenceItem symbol in reverse) {... } }

Reading and writing sequences Most common way to obtain a sequence is through a parser ▫ loads sequence data from some persistent storage ▫ tied to a specific format ▫ can load one or more sequences together ▫ can support metadata and statistics for sequence Once loaded, sequence can be processed ▫ through methods of ISequence, or by algorithms Finally, sequences are saved using formatters ▫ writes collection of ISequence objects to persistent storage MBF has several available parsers and formatters [1] ▫ contained in the Bio.IO namespace ▫ designed for extensibility – to support your formats

Loading sequences with parsers Several supplied parsers load common bio sequence formats ▫ FastA, FastQ, GenBank, Gff All sequence parsers implement ISequenceParser ▫ provides consistent interface to parsing data ▫ supports loading data from files and streams (more on this later) public interface ISequenceParser : IParser { IList Parse(string filename); IList Parse(string filename, bool isReadOnly);... ISequence ParseOne(string filename); ISequence ParseOne(string filename, bool isReadOnly);... } public interface ISequenceParser : IParser { IList Parse(string filename); IList Parse(string filename, bool isReadOnly);... ISequence ParseOne(string filename); ISequence ParseOne(string filename, bool isReadOnly);... }

Loading data from specific formats If file format is known, specific parser can be used to load data ▫ easiest and least error prone method to loading data private IList LoadSequence(string filename) { FastaParser parser = new FastaParser(); IList data = parser.Parse(filename,true); return data; } private IList LoadSequence(string filename) { FastaParser parser = new FastaParser(); IList data = parser.Parse(filename,true); return data; } second parameter indicates to open in read-only mode for performance – indicating change tracking is not necessary

Handling multiple file formats SequenceParsers class manages built-in parser types ▫ can use FindParserByFile method to locate proper parser at runtime private IList LoadSequence(string filename) { ISequenceParser parser = SequenceParsers.FindParserByFile(filename); if (parser == null) return null; IList data = parser.Parse(filename,true); return data; } private IList LoadSequence(string filename) { ISequenceParser parser = SequenceParsers.FindParserByFile(filename); if (parser == null) return null; IList data = parser.Parse(filename,true); return data; } FindParserByFile returns null if file could not be identified [1]

Interrogating the parser list SequenceParsers also provides enumerable list of parsers private IList TryLoadSequence(string filename) { IList parsers = SequenceParsers.All; foreach (var parser in parsers) { try { return parser.Parse(filename, true); } catch { } } return null; } private IList TryLoadSequence(string filename) { IList parsers = SequenceParsers.All; foreach (var parser in parsers) { try { return parser.Parse(filename, true); } catch { } } return null; }

Saving sequences back to files Formatters take sequences and persists them ▫ same formats supported: FastA, FastQ, GenBank and Gff Abstracted by ISequenceFormatter interface ▫ supports file-based and stream-based writing (more on this later) public interface ISequenceFormatter : IFormatter { void Format(ICollection sequences, string filename); void Format(ISequence sequence, string filename); string FormatString(ISequence sequence); } public interface ISequenceFormatter : IFormatter { void Format(ICollection sequences, string filename); void Format(ISequence sequence, string filename); string FormatString(ISequence sequence); }

Saving a sequence SequenceFormatters provides list of available formatters void SaveSequence(string filename, IList seqList) { ISequenceFormatter formatter = SequenceFormatters.FindFormatterByFile(filename); if (formatter != null) { formatter.Format(seqList, filename); } } void SaveSequence(string filename, IList seqList) { ISequenceFormatter formatter = SequenceFormatters.FindFormatterByFile(filename); if (formatter != null) { formatter.Format(seqList, filename); } } void SaveFastASequence(string fname, IList seqList) { SequenceFormatters.Fasta.Format(seqList, fname); } void SaveFastASequence(string fname, IList seqList) { SequenceFormatters.Fasta.Format(seqList, fname); }

Running algorithms on Sequences MBF provides a small collection of popular algorithms ▫ alignment, translation, assembly, … ▫ designed specifically to plug in new algorithms Bio.Algorithms is where all the algorithmic code is located ▫ each algorithm is given unique namespace … can also be supplied in separate assemblies that MBF locates and provides access to at runtime [1]

Using the algorithm classes Algorithms generally ▫ take one or more ISequence elements as input and return one or more ISequence elements as output Algorithms come in two forms ▫ static methods – to run simple algorithm on a single sequence ▫ instance classes – to run algorithms on 1+ sequences using Bio.Algorithms.Translation;... ISequence DNAtoRNA(ISequence dnaSequence) { ISequence rnaSequence = Transcription.Transcribe(dnaSequence); return rnaSequence; } using Bio.Algorithms.Translation;... ISequence DNAtoRNA(ISequence dnaSequence) { ISequence rnaSequence = Transcription.Transcribe(dnaSequence); return rnaSequence; }

Using MBF in your applications Using MBF is as simple as adding a reference to Bio.dll ▫ can then begin consuming available types ▫ convenient to add assembly to project – to ensure it is available ▫ supplied distribution requires the full.NET 4.0 framework install

MBF Starter Project [Step 1] If you are starting fresh, you can use the MBF Starter Template ▫ added to Visual Studio 2010 when you installed MBF Select MBF Console Application from project types requires.NET Framework 4 and Visual C# project type selected

MBF Starter Project [Step 2] Select options you want to use from MBF in your new app ▫ each checkbox will add standard methods for you to utilize … click Finish to generate project provides simple text- file logging capability

MBF Starter Project [Step 3] Add your code to the Main method ▫ call the supplied methods to get / save and manipulate the sequences class Program { static void Main(string[] args) { // TODO: Your Code Goes Here } // Exports a given sequence to a file in FastA format static void ExportFastA(ISequence sequence, string filename); // Parses a FastA file which has one or more sequences. static IList ParseFastA(string filename); // Write a given string to the application log. static void WriteLog(string matter); // Method to align two sequences using NeedlemanWunschAligner. public static IList AlignSequences( ISequence referenceSequence, ISequence querySequence) } class Program { static void Main(string[] args) { // TODO: Your Code Goes Here } // Exports a given sequence to a file in FastA format static void ExportFastA(ISequence sequence, string filename); // Parses a FastA file which has one or more sequences. static IList ParseFastA(string filename); // Write a given string to the application log. static void WriteLog(string matter); // Method to align two sequences using NeedlemanWunschAligner. public static IList AlignSequences( ISequence referenceSequence, ISequence querySequence) }

Contributing back to MBF MBF is an open source project ▫ Microsoft wants your ideas, contributions and feedback Useful to download the source code ▫ read through to get good coding ideas ▫ to extended or repurpose Consider contributing changes / features back to the project ▫ read MBF_Onboarding.doc or download the guide from ownloadId= ownloadId= All contributions must be released under Ms-PL license

Examining the Source Code Can retrieve source code from TFS repository [1] ▫ or as self-contained.zip file Solution MBI.sln contains all projects ▫ Bio – MBF core library ▫ Bio.Workflow ▫ WebServiceHandlers ▫ Unit tests ▫ Sample SDK code

Unit Testing MBF includes suite of unit tests for all components ▫ uses nUnit ( included with distributionwww.nunit.org ▫ test cases are in separate unit-test assemblies ▫ any contributions to codebase must include unit tests

Running unit tests Running unit tests involves three steps 1.Execute nUnit.exe from public/ext/nunit/bin/net Open the Bio.Tests.dll assembly with nUnit 3.Click the Run button to execute the unit test

Writing your own unit tests Unit tests are just blocks of code written to test other code ▫ tests assumptions, edge cases, error cases, and functionality nUnit makes writing test cases easy ▫ uses.NET attributes to signal intent ▫ Assert class provides helper methods to test assertions [TestFixture] public class ReverseTests { [Test] public void TestReverse() { Sequence sequence = new Sequence(Alphabets.DNA, "AGCT"); string reverse = sequence.Reverse.ToString(); Assert.AreEqual("TCGA", reverse); } } [TestFixture] public class ReverseTests { [Test] public void TestReverse() { Sequence sequence = new Sequence(Alphabets.DNA, "AGCT"); string reverse = sequence.Reverse.ToString(); Assert.AreEqual("TCGA", reverse); } } identifies this as a unit test class identifies this as a unit test method verifies the two strings are equal

Summary MBF framework is used to build bioinformatics applications ▫ open source ▫ highly extensible ▫ scalable ▫ flexible data architecture Based on.NET ▫ language agnostic ▫ allows any application style ▫ can use all the power and flexibility of.NET Sequences are the core concept in the framework ▫ contain sequence items [symbols] ▫ based on alphabets ▫ read/written using parsers/formatters ▫ passed as arguments and returned from algorithms