Download presentation
Presentation is loading. Please wait.
1
Transcription Factors A transcription factor is a protein that binds to a specific DNA sequence. This controls the flow of information from DNA to RNA (transcription). i.e., it can turn a specific gene on or off. Approximately 2600 genes in the human genome (~10%) code proteins with DNA-binding domains. Most are assumed to be transcription factors. For 10% to regulate the other 90%, they sometimes must work in combination. Genes are often flanked by several transcription binding sites.
2
Transcription Factors From the literature, get a list of human TFs. Determine all of binding sites for each TF. Classify the TF based on whether it works alone, or in combination with other TFs. S to S One TF binds to a single location M to S Multiple TFs bind to one gene S to M One TF binds to multiple locations M to M Multiple TFs bind in combination in multiple genes
3
Transcription Factors Name of TFGeneBinding startBinding EndTF Classification ABCg15059S to S XYZg3101105 g411001108S to M XWYg25065M to S
4
Pseudogenes Through a variety of processes, a gene can become corrupted and no longer function, and this gene is referred to as a pseudogene. For the organism to survive, however, there must still be a working copy of this gene ( the “true” gene). Project: Read in the genetic sequence for various bacteria. Determine pseudogenes from true genes. Determine the type of, or feature of, the pseudogenes. Compare the number of genes and the number of pseudogenes (and pseudogene type).
5
Genome Accession number GenePseudogeneType or Feature NC_000265G1G1 P1Mutation G1 P2Substitution G3G3 P1 G3 P2 G3 P3 Organism 1
6
Genome Accession number GenePseudogeneType or Feature NC_010395G1G1 P1Transposition G1 P2Substitution G1 P3 G2G2 P1 G2 P2 Organism 2
7
Pseudogenes
8
Protein-Protein Interactions Software has been written to look at a genetic sequence, determine motifs that could signal the presence of a gene, and translate that gene to see what protein it might create. In many organisms, there has not been time to study these proteins, but they are recorded in the data as “hypothetical proteins”. Find the hypothetical proteins for different bacteria and compare them to the yeast genome. The yeast genome has been well studied, so will act as your “known”. The program RPSBlast will do an alignment-based comparison of proteins. When you match a yeast protein to a hypothetical protein, they probably have the same function.
9
Bacteria 1 GeneHypothetical Protein SequenceYeast ProteinFunction G1P1MVMMLG…G6 P10 P2MFWEI…G4 P1 P3FMILGMM…G4 P5 G2P1 G3P1 P2
10
Longest Common Substring Longest Common Substring (LCS) is a way to look for similarities between the genetic sequences of different species. It compares two sequences and counts the number of bases that are the same. Eg., TAGGTTTGACCCTGC AGGGTTTGACCAATA have a 9 base substring in common. Comparing species to see which ones have the most in common genetically, should tell you which ones are the most closely related by evolution. Sequences for many bacteria can be found on biobase.ist.unomaha.edu at /clab_bdb/nucleotide/genomes/Bacteria/*
11
Longest Common Substring For this project, take several bacteria species that live in the human mouth, and several that live in the human gut, and compare them two at a time, and group them based on similarity. Mouth: Streptococcus mutans (NC_013928) Streptococcus pneumoniae (NC_010380) Neisseria meningitidis (NC_013016) Haemophilus influenzae (NC_000907) Lower GI Bifidobacterium bifidum (NC_014638) Mycobacterium abscessus ( NC_010397) Bacteroides vulgatus (NC_009614)
12
Longest Common Substring Organism 1Organism 2LCS NC_013928NC_0103802,938,450 NC_0130162,837,505 NC_0146382,637,238 NC_010380NC_0130163,098,789 NC_0146382,984,568
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.