Welcome to BIO 345 Introduction to Bioinformatics Professor Andrew Michaelson
What is Bioinformatics? √ The Sequence and Structure of Genes and Proteins Life Sciences Mathematics Informatics Adapted from: http://www.baylor.edu/content/imglib/1/2/7/7/127758.png
The Human Genome displayed as a Karyotype
Each chromosome contains a single DNA molecule © 1998-2014 Mayo Foundation for Medical Education and Research (MFMER)
Adenine Cytosine Guanine Thymine 4 base pairs are the primary source of information responsible for all of animal and plant life Adenine Cytosine Guanine Thymine © 1998-2014 Mayo Foundation for Medical Education and Research (MFMER)
A genetic map of Human Chromosome 7 Location of the gene responsible for Cystic Fibrosis in 90% of people of Northern European descent
Sequence of the Cystic Fibrosis Gene: CFTR
The deletion of these three base pairs is responsible
Bioinformatics is an interdisciplinary field that uses computer programs to answer biological questions √ The Sequence and Structure of Genes and Proteins Biology, Chemistry, Physics Mathematics Statistics/Computer Science Adapted from: http://www.baylor.edu/content/imglib/1/2/7/7/127758.png
Learning Objectives for the Course
Learning Objectives for the Course Be able to search and retrieve sequence information from sequence databases such as NCBI and ensembl.
Learning Objectives for the Course Be able to search and retrieve sequence information from sequence databases such as NCBI and ensembl. Be able annotate simple nucleic acid sequences.
Learning Objectives for the Course Be able to search and retrieve sequence information from sequence databases such as NCBI and ensembl. Be able annotate simple nucleic acid sequences. Be able to use fundamental programs such as BLAST, BLAT and Clustal Omega.
Learning Objectives for the Course Be able to search and retrieve sequence information from sequence databases such as NCBI and ensembl. Be able annotate simple nucleic acid sequences. Be able to use fundamental programs such as BLAST, BLAT and Clustal Omega. To establish evolutionary relationships of species through sequence comparisons.
Course Design
Course Design
Course Design
Course Design
Course Design
1st assignment (you should do this today): Send an email from your Farmingdale account to the address: Michaea@Farmingdale.edu Let me know what Biology and Computer classes you have taken and what areas of Biology you are most interested in or your favorite classes so far.
Learning Objectives Be aware of class policies and procedures Establish familiarity with the NCBI website Establish familiarity with the ensembl website Learn to obtain and read a FASTA sequence file Use PubMed to generate literature searches
Nucleic acid sequence databases are housed at NCBI
National Center for Biotechnology Information (NCBI) : http://www.ncbi.nlm.nih.gov/
Pulldown menu Displayed items will change
NCBI Bookshelf Offers Free Textbooks
Some Recommended Textbooks
PubMed is an excellent way to find health related journal articles
PubMed search is similar to other NCBI search options
Nucleic acid sequence databases are housed at NCBI
Searching for nucleic acid sequences using NCBI Nucleotide
Using Advanced Search Builder
Using Advanced Search Builder
Results
Sequence databases use accession numbers An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence.
Filters can help restrict results RefSeq= Reference Sequences—these are the gold standard of sequences. They are the most likely sequences to be complete and correct.
For any assignment or project you should always use the RefSeq sequence
Filters can help restrict results
For human gene information Ensembl.org has a better interface
For human gene information Ensembl.org has a better interface The current version of the Human Genome sequence is GRCh38= Genome Reference Consortium (GRC) human 38
Searching for genes in ensembl
Search Results
Ensembl Gene Record
Location of the gene on the chromosome
Location of the gene on the chromosome The molecular location: location of the gene in base pairs on chromosome 12 The cytogenetic location: location of the gene using banding pattern of chromosome 12 The cytogenetic location: chromosome 12 on the small arm, region 1, band 2, sub-band 1 (p arm is the small arm/q arm is the long arm)
Location of the gene on the chromosome The molecular location: location of the gene in base pairs on chromosome 12 The cytogenetic location: location of the gene using banding pattern of chromosome 12 The cytogenetic location: chromosome 12 on the small arm, region 1, band 1, sub-band 2, and sub-sub band 2.
Location of the gene on the chromosome
You will need to capture image information for this course Snipping tool allows your to take a snapshot of anything you want
Location of the gene on the chromosome
Downloading images from ensembl Move your cursor of this icon This will allow you to download an image as various file types. They can then be copied and pasted into your document.
The transcript table provides links to sequences
The RefSeq link will direct you to the NCBI nucleotide record for that gene
The RefSeq link will direct you to the NCBI nucleotide record for that gene Follow this hyperlink
NCBI nucleotide record Title for the record
NCBI nucleotide record Size of the sequence
NCBI nucleotide record This is the accession number for this specific sequence (there are other sequence files associated with this gene).
NCBI nucleotide record
FASTA is the universal sequence file type
FASTA is the universal sequence file type The > on the top line indicates the definition line
The definition line can be composed of an identifier and a description This is a compound identifier which has the GI (GenInfo #) and the (RefSeq #) The information found after the | is the description
The sequence lines follow the Identifier These are the sequence lines for the file. For a nucleic acid sequence they will be a string of A,C,T and G. Each line has 70 characters.
To access the worksheet: Download, complete, print out, and email it to me. It must be handed in by the end of class. All assignments and papers in this course must be emailed to me. To access the worksheet: Go to my website to download the worksheet. http://andrew-michaelson.com/fweb/lab_website/Bio345/Bio345.html Make sure to save the file to your USB disk and enable editing. Use your USB/thumb drive to save your work as you go—Don’t save in the bioinformatics folder!