Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is bioinformatics?

Similar presentations


Presentation on theme: "What is bioinformatics?"— Presentation transcript:

1 What is bioinformatics?
Daniel Svozil

2 Definition NCBI Wikipedia.org
Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned. Wikipedia.org The application of information technology and statistics to the field of molecular biology. The creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management, analysis and interpretation of biological data. - Extraction of biological knowledge from complex data

3 Extraction of biological knowledge from data
convert data to knowledge generate new hypotheses design new experiments Experimental From public databases

4 Omes Organism Cell genome – DNA sequence in an organism
transcriptome – mRNA of an entire organism proteome – all proteins in an organism metabolome – all metabolites in an organism interactome – all molecular interactions in an organism Genome Transcriptome Proteome Reactome Tissue architectures Cell interactions Sigaling …… Metabolome Cell Organism

5 Omes and Omics Genomics Proteomics
Primarily sequences (DNA and RNA) Databanks and search algorithms Supports studies of molecular evolution Proteomics Sequences (Protein) and structures Mass spectrometry, X-ray crystallography Databanks, knowledge bases, visualization Functional Genomics (transcriptomics) Microarray data Databanks, analysis tools, controlled terminologies Systems Biology (metabolomics) Metabolites and interacting systems (interactomics) Graphs, visualization, modeling, networks of entities

6 High-noise Genomics Transcriptomics Proteomics Metabolomics
Interactomics …… includes Sequencing Microarrays LC/MS NMR Two hybrid …… measured by “Omics” To reduce noise Advanced pre-processing techniques their data are High-throughput High-noise Reliable high-throughput information Techniques to analyze high-dimensional data and knowledgebases Biological knowledge Medical knowledge Improved health source: Bios 560R Introduction to Bioinformatics, userwww.service.emory.edu/~tyu8/560R/560R_1.pptx

7 Key reasearch in bioinformatics
sequence bioinformatics structural bioinformatics systems biology analysis of biological pathways to gain e.g. the understanding of disease processes

8 21st century – complex systems
Designing (forward-engineering) Understanding (reverse-engineering) Fixing Why is it so complex? Can we make a sense of this complexity? How is it robust?

9 studying genomes

10 Studying DNA - all following techniques are basic tools of biotechnology (genetic engineering)

11 Enzymes for DNA manipulation
Before 1970s, the only way in which individual genes could be studied was by classical genetics. Biochemical research provided (in the early 70s) molecular biologists with enzymes that could be used to manipulate DNA molecules in the test tube. Molecular biologists adopted these enzymes as tools for manipulating DNA molecules in pre-determined ways, using them to make copies of DNA molecules, to cut DNA molecules into shorter fragments, and to join them together again in combinations that do not exist in nature. These manipulations form the basis of recombinant DNA technology.

12 Recombinant DNA technology
The enzymes available to the molecular biologist fall into four broad categories: DNA polymerase – synthesis of new polynucleotides complementary to an existing DNA or RNA template Nucleases – degrade DNA molecules by breaking the phosphodiester bonds restriction endonucleases (restriction enzyme) – cleave DNA molecules only when specific DNA sequences is encountered Ligases – join DNA molecules together End modification enzymes – make changes to the ends of DNA molecules

13 endonucleases - make cuts at internal phosphodiester bonds
exonucleases - remove nucleotides from the ends of DNA and/or RNA molecules endonucleases - make cuts at internal phosphodiester bonds source: Brown T. A. , Genomes. 2nd ed.

14 DNA cloning DNA cloning (i.e. copying) – logical extension of the ability to manipulate DNA molecules with restriction endonucleases and ligases vector DNA sequence that naturally replicates inside bacteria. It consists of an insert (transgene) and larger sequence serving as the backbone of the vector. Used to introduce a specific gene into a target cell. Once the expression vector is inside the cell, the protein that is encoded by the gene is produced by the cellular-transcription and translation machinery ribosomal complexes. plasmid (length of insert: 1-10 kbp), cosmid (40-45 kbp), BAC ( kbp), YAC ( Mbp) -BAC – bacterial artificial chromosome, YAC – yeast artificial chromosome expression vector (expression construct) - generally a plasmid that is used to introduce a specific gene into a target cell. Once the expression vector is inside the cell, the protein that is encoded by the gene is produced by the cellular-transcription and translation machinery ribosomal complexes. The plasmid is frequently engineered to contain regulatory sequences that act as enhancer and promoter regions and lead to efficient transcription of the gene carried on the expression vector. The goal of a well-designed expression vector is the production of large amounts of stable mRNA, and therefore proteins. Expression vectors require sequences that encode for polyadenylation tail, minimal UTR sequence, Kozak sequence.

15 Vectors plasmid BAC (bacterial artificial chromosome)
DNA molecule that is separated from, and can replicate independently of, the chromosomal DNA. Double stranded, usually circular, occurs naturally in bacteria. Serves as an important tool in genetics and biotechnology labs, where it is commonly used to multiply (clone) or express particular genes. BAC (bacterial artificial chromosome) It is a particular plasmid found in E. coli. A typical BAC can carry about 250 kbp. YAC is an eukaryotic plasmid more about BAC: source: wikipedia

16 restriction endonuclease
ligase An animal gene has been obtained as a single restriction fragment after digestion of a larger molecule with the restriction enzyme BamHI. Small E. coli plasmid has been purified and treated with BamHI, which cuts the plasmid in a single position. The circular plasmid has therefore been converted into a linear molecule. Mix the two DNA molecules together and add DNA ligase. Various recombinant ligation products will be obtained, one of which comprises the circularized plasmid with the animal gene inserted into the position originally taken by the BamHI restriction site. If the recombinant plasmid is now re-introduced into E. coli, and the inserted gene has not disrupted its replicative ability, then the plasmid plus inserted gene will be replicated and copies passed to the daughter bacteria after cell division. More rounds of plasmid replication and cell division will result in a colony of recombinant E. coli bacteria, each bacterium containing multiple copies of the animal gene. This series of events constitutes the process called DNA or gene cloning. DNA cloning source: Brown T. A. , Genomes. 2nd ed.

17 PCR – Polymerase chain reaction
DNA cloning results in the purification of a single fragment of DNA from a complex mixture of DNA molecules. Major disadvantage: it is time-consuming (several days to produce recombinants) and, in parts, difficult procedure. The next major technical breakthrough (1983) after gene cloning was PCR. It achieves the amplifying of a short fragment of a DNA molecule in a much shorter time, just a few hours. PCR is complementary to, not a replacement for, cloning because it has its own limitations: the need to know the sequence of at least part of the fragment. PCR –see more at Unlike cloning, PCR is a test-tube reaction and does not involve the use of living cells: the copying is carried out not by cellular enzymes but by the purified, thermostable DNA polymerase of T. aquaticus.

18 Mapping genomes

19 What is it about? Assigning/locating of a specific gene to particular region of a chromosome and determining the location of and relative distances between genes on the chromosome. There are two types of maps: genetic linkage map – shows the arrangement of genes (or other markers) along the chromosomes as calculated by the frequency with which they are inherited together physical map – representation of the chromosomes, providing the physical distance between landmarks on the chromosome, ideally measured in nucleotide bases The ultimate physical map is the complete sequence itself. for more information see A genetic map provides an indirect estimate of the distance between two items and is limited to ordering certain items. Genetic map si like an interstate highway map, it serves to guide a scientist toward a gene, just like an interstate map guides a driver from city to city. On the other hand, physical maps mark an estimate of the true distance, in measurements base pairs, between items of interest. To continue our analogy, physical maps would then be similar to street maps, where the distance between two sites of interest may be defined more precisely in terms of city blocks or street addresses. The different types of maps vary in their degree of resolution, that is, the ability to measure the separation of elements that are close together. The higher the resolution, the better the picture.

20 Genetic linkage map Constructed by observing how frequently two markers (e.g. genes, but wait till next slides) are inherited together. Two markers located on the same chromosome can be separated only through the process of recombination. If they are separated, childs will have just one marker from the pair. However, the closer the markers are each to other, the more tightly linked they are, and the less likely recombination will separate them. They will tend to be passed together from parent to child. Recombination frequency provides an estimate of the distance between two markers.

21 Genetic linkage map On the genetic maps distances between markers are measured in terms of centimorgans (cM). 1cM apart – they are separated by recombination 1% of the time 1 cM is ROUGHLY equal to physical distance of 1 Mbp in human Value of genetic map – marker analysis Inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals (but absent in unaffected individuals), even though the molecular basis of the disease may not yet be understood nor the responsible gene identified. This represent a cornerstone of testing for genetic diseases. named after American geneticist Thomas Hunt Morgan Just a bit more about marker analysis:

22 Genetic markers A genetic map must show the positions of distinctive features – markers. Any inherited physical or molecular characteristic that differs among individuals and is easily detectable in the laboratory is a potential genetic marker. Markers can be expressed DNA regions (genes) or DNA segments that have no known coding function but whose inheritance pattern can be followed. genes – not ideal, larger genomes (e.g. vertebrates) → gene maps are not very detailed (low gene density) - markers are recognizable components of the landscape, such as rivers, roads and buildings

23 Genetic markers Must be polymorphic, i.e. alternative forms (alleles) must exist among individuals so that they are detectable among different members in family studies. Variations within exons (genes) – lead to observable changes (e.g. eye color) Most variations occur within introns, have little or no effect on an organism, yet they are detectable at the DNA level and can be used as markers. restriction fragment length polymorphisms (RFLPs) simple sequence length polymorphisms (SSLPs) single nucleotide polymorphisms (SNPs, pronounce “snips”)

24 RFLPs Recall that restriction enzymes cut DNA molecules at specific recognition sequences. This sequence specificity means that treatment of a DNA molecule with a restriction enzyme should always produce the same set of fragments. This is not always the case with genomic DNA molecules because some restriction sites exist as two alleles, one allele displaying the correct sequence for the restriction site and therefore being cut, and the second allele having a sequence alteration so the restriction site is no longer recognized. source: Brown T. A. , Genomes. 2nd ed.

25 SSLPs Repeat sequences that display length variations, different alleles contain different numbers of repeat units (i.e. SSLPSs are multi-allelic). variable number of tandem repeat sequences (VNTRs, minisatellites) repeat unit up to 25 bp in length simple tandem repeats (STRs, microsatellites) repeats are shorter, usually di- or tetranucleotide source: Brown T. A. , Genomes. 2nd ed.

26 SNPs Positions in a genome where some individuals have one nucleotide and others have a different nucleotide. Vast number of SNPs in every genome. Each SNP could have potentially four alleles, most exist in just two forms. The value of two-allelic marker (SNP, RFLP) is limited by the high possibility that the marker shows no variability among the members of an interesting family. The advantages of SNP over RFLP: they are abundant (human genome: 1.5 millions of SNPs, RFLPs) easire to type (i.e. easier to detect)

27 the order and spacing of the genes, measured in base pairs
more at Genome maps relative locations of genes are established by following inheritance patterns visual appearance of a chromosome when stained and examined under a microscope the order and spacing of the genes, measured in base pairs genetic map – cM … centimorgan, The higher the percentage of recombinants for a pair of traits, the greater the distance separating the two loci. In fact, the percent of recombinants is arbitrarily chosen as the distance in centimorgans (cM). see Chromosome mapping by counting recombinant phenotypes produces a genetic map of the chromosome. As a very rough rule of thumb, 1 cM on a chromosome encompasses 1 megabase of DNA. The lowest-resolution physical map is the chromosomal or cytogenetic map, which is based on the distinctive banding patterns observed by light microscopy of stained chromosomes. sequence map source: Talking glossary of genetic terms,


Download ppt "What is bioinformatics?"

Similar presentations


Ads by Google