Kyle Tretina with a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK Analysis of the Positively Selected and Non-Positively Selected.

Slides:



Advertisements
Similar presentations
The Human Genome Project Main reference: Nature (2001) 409,
Advertisements

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Outline to SNP bioinformatics lecture
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Analysis of Phenotypic Variations in the Mouse Genome Caused by Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Lesson 10 Bioinformatics
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
CSE 6406: Bioinformatics Algorithms. Course Outline
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Galaxy: Integrative, Reproducible Analysis of Genomics Data Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Korea BioInformation Center Byoung-Chul Kim
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Overview of Bioinformatics 1 Module Denis Manley..
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Animals and Creatures in “Of Mice and Men” RED Lennie’s relationships with animals YELLOW Animals in nature – by the pool, in the brush ORANGE Working.
Diving into the gene pool: Chromosomes, genes and DNA
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Accessing and visualizing genomics data
Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16 by Kyle Tretina, Dr. Pattle P. Pun INTRODUCTION.
THE CONCEPT OF THE GENOME AS THE COMPLETE SET OF GENES IN A CELL AND OF THE PROTEOME AS THE FULL RANGE OF PROTEINS THAT A CELL IS ABLE TO PRODUCE. THE.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Notes: Human Genome (Right side page)
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Higher Human Biology Sub topic 5 (a)
Gil McVean Department of Statistics
Harnessing the Power of Condor for Human Genetics
Human Cells Human genomics
Genomes and Their Evolution
Department of Genetics • Stanford University School of Medicine
Genomes and Their Evolution
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Gene Safari (Biological Databases)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Human Genome Project Seminal achievement. Scientific milestone.
SNPs and CNPs By: David Wendel.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Kyle Tretina with a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16

Introduction: Story of Evolutionary History Bacteria < Fish < Primate < Human Story: increasing organismal complexity as evolution proceeds

WHY? “But little Mouse, you are not alone, In proving foresight may be in vain: The best laid schemes of mice and men Go often askew, And leave us nothing but grief and pain, For promised joy!” – Robert Burns (1785)

Genetics Central Dogma: DNA  RNA  Protein Complexity ~ Number of Genes? Humans ~30,000 Flies ~ 14,000

G-Value Paradox

Complexity (K) ~ Gene Number (N)? Relationship? proportional: K ~ N polynomial: K ~ Na exponential: K ~ aN factorial: K ~ N! Jean-Michel Claveries: ON/OFF states 2 30,000 / 2 14,000 ≈ 3x

Goal Determine the role of non-coding DNA in gene regulation by looking at the functions of non-coding SNPs that are positively selected or non-positively selected on chromosome 16

Definitions SNP: single nucleotide polymorphism Variable between populations Importance likely due to stability of variation Selection: description of phenomena that only organisms best adapted to their environment tend to survive and create progeny Gene-selection algorithm and neutral selection theory (wrench)

Methods Overview HapMap Database  Selection Data  List of Chr16 SNPs UCSC Genome Database Mirror  SNP flanking sequence TRANSFAC  related transcription factor data for each SNP flanking sequence PReMod  confirm results

HapMap Phase I Data HapMap Project: an international effort to identify and catalog genetic similarities and differences in human beings (Haplotype Maps), also includes: Selection Data  List of Chr16 SNPs ~25,000 non-positively selected ~5,000 positively selected

UCSC Genome Browser Genome.UCSC.edu: a website containing several reference sequences and tools for visual and computational analysis Methods: Enter in each from list of RSID’s (SNP Identifiers) Note intersecting sequences Copy/Paste Sequences

UCSC Genome Browser Mirror Efficiency ~70seq/hr for 1.5yrs = ~1/3 sequences gathered 2hrs Online Instructions, but Complicated Data Structure Henry Ford: 1.1 million lines source code Many thanks to the Dr. Hayward (Wheaton College CS Faculty)

Sequences Collected Graph 1. The distributions of the positively selected SNPs used in the study across human chromosome 16 Graph 2. The distributions of the non-positively selected SNPs used in the study across human chromosome 16

TRANSFAC TRANSFAC: a relational database, available via the web as six flat files including various data concerning transcription factors, DNA-binding sites, and target genes Automation at CUHK

PReMod PReMod: a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. Enter ranges for SNP sequences Look for same pattern as TRANSFAC

Analysis MySQL Tables Programmed Scripts: Word Patterns: i.e. keywords, recurring identifiers Unique Entries Progress Statistics Overlap between N+ selected and + selected SNPs

Results SNP Selection RS NumbersSequence Gathered Non-Positive25, (24%) Positive (100%) Table 1. A summary of the manual SNP flanking sequence gathering from the UCSC Genome Browser

Results SNP Selection TotalNo SitesUnique Matches in Other Dataset TRANSFAC Entries to Be Looked Up Non- Positive25,5941,611 (6%)3,218 (13%)20,765 (81%)82 (<1%) Positive33,7702,437 (7%)361 (1.0%)30,972 (92%)10,641(32%)

Conclusions Data not all in yet Possible implications: Central Dogma Biology: information flow Quantification Genetic Natural Selection Views of Complexity of Humans Lesson Learned: value of bioinformatics High volume data requires computational analysis, not manual

Acknowledgements Many thanks to Dr. Pun, for letting me get involved in this project, for his vision and mentorship. Special thanks to Dr. Hayward, for putting in extra hours unpaid so that a student can follow his dreams of graduate school. Thanks to our collaborators at the Chinese University of Honk Kong – Dr. Tsui and Mr. Leung – for accessing the TRANSFAC database for us, and for being flexible to the demands of our project. The most thanks to God, for blessing me with the opportunity to work hard and learn. I pray that I might always be able to do these two things earnestly and voraciously.