Finding the needle in your DNAstack Ana Teresa Freitas Ciência 2010 – Encontro com a Ciência e Tecnologia em Portugal FIL, July 7, 2010 http://kdbio.inesc-id.pt KDBIO Group 26-12-2018
Finding the Needle 26-12-2018
Biology 2.0 Economist, June 2010 26-12-2018
The new law Computing has increased in potency according to Moore’s law It double in power roughly every two years Sequencing the Human genome took 13 years and $3 billion Now, Illumina can read the genome in 8 days and $10.000 Pacific Biosciences has a technology that in 3 years’ time will be able to map a human genome in 15 minutes for less than $1,000 Economist, June 2010 26-12-2018
RefSeq GenBank UniGene Biological databases C GA ATT GA C GA C ATT GA Curators RefSeq TATAGCCG ACGTGC TATAGCCG AGCTCCGATA CCGATGACAA ATTGACTA CGTGA TTGACA Labs TTGACA ACGTGC TTGACA Genome Assembly TATAGCCG CGTGA ATTGACTA ACGTGC TATAGCCG CGTGA CGTGA TATAGCCG ATTGACTA ATTGACTA ATTGACTA ATTGACTA TATAGCCG TTGACA TATAGCCG TATAGCCG TATAGCCG TATAGCCG ATT C GenBank GA UniGene AT C C ATT C Algorithms GA ATT GA GA ATT GA C GA ATT GA C GA C ATT GA
NIH NIG EMBL GenBank EMBL DDBJ Entrez NCBI Submissions Updates Francis Ouellette August 3rd, 1999 NIH Entrez NCBI GenBank Submissions Updates Submissions Updates EMBL DDBJ EBI CIB NIG Submissions Updates SRS EMBL getentry Lecture 2.0
So why do biologists care?
Database proliferation Three main reasons Database proliferation Hundreds at the moment More and more scientific discoveries result from inter-database analysis and mining Rising complexity of required data-combinations E.g. translational medicine: “from bench to bedside” (genomic data vs. clinical data) Proliferation = great and rapid increase in numbers; Grid = a network of evenly space horizontal and vertical lines (rooster); Semantic = related to the meaning;
26-12-2018
Research at the KDBIO group Algorithms on Strings, Trees and Graphs Programming and Database Systems Machine Learning Understanding genetic regulatory networks Sequence analysis Genome analysis Whole genome sequencing and re-sequencing Gene expression analysis Haplotype inference Genotype-phenotype linkage Discovery of motifs in DNA and RNA Improving clinical diagnosis Genotyping methods Modeling of metabolic networks Inference and modeling of regulation networks Information systems KDBIO Group 26-12-2018
YEASTRACT www.yeastract.com 26-12-2018
YEASTRACT USERS YEASTRACT KNOWLEDGE not DATA 26-12-2018
http://geneglob.inesc-id.pt/public/home.jsf PTDC/AGR-GPL/66564/2006 (Jorge Paiva PI, IICT) 26-12-2018
SDLink Web-based data management system Management and analysis of heterogeneous clinical and biological data Linked Data “… a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." KDBIO Group 26-12-2018
Semantic model prototype Harnessing Genetics and Imaging to Improve Diagnosis and Management of Hypertrophic Cardiomyopathy in Portugal PTDC/SAU-GMG/112538/2009 (submitted, Alexandra Fernandes PI, CQE IST) 26-12-2018
Biology 2.0 Computer Sc 2.0 Web 2.0 Medicine 2.0 DNA, 26-12-2018
KDBIO Group Members 8 PhDs Ana Teresa Freitas Arlindo Oliveira Susana Vinga Sara Madeira Paulo Fonseca Sara Silva Alexandre Francisco Luís Russo 3 Invited researchers João Carriço Jonas Almeida Marie-France Sagot 12 PhD Students 11 Graduate fellowships 26-12-2018