The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last 2 classes, genes direct the expression of proteins which is essential for the normal structure and function of every Living cell. The goal of most molecular biology today is to understand gene structure and function inorder to establish the basis of cellular processes. Chapter 5 begins explaining gene structure by talking about genomes or the genetic backbone of organisms. What is a genome? -In this chapter we will talk about the components that make up a genome and how they are organized. We Will also discuss tools that are available to study and compare genomes.

Introduction to Cellular Genomes
The sequences of many cellular genomes are known: Bacteria Yeast Drosophila Many Plants and Animals Humans The development of gene cloning has enabled scientists to dissect complex eukaryotic genomes and probe the functions of eukaryotic genes. With advances in modern molecular biology, we now know the sequence of many organisms, including E.coli, yeast, drosophila, some plants and animals and humans. This provides scientists with a lot of information and enables the investigation of eukaryotic genes, but also creates even more questions. The great thing about the determination of these sequences is that it provides the foundation to understanding the basis of numerous human diseases.

The Complexity of Eukaryotic Genomes
The genomes of most eukaryotes are larger and more complex than those of prokaryotes. The presence of large amounts of noncoding sequences is a general property of the genomes of complex eukaryotes. This slide gives you a comparison of the genomes of many different types of organisms. The size of the genome is on the X-axis and the organisms are listed on the Y-axis. You can see that the smaller the organism, the smaller and less complex the genome of that organism. However, it is important to keep in mind that complexity and genome size are not always related in eukaryotic organisms. For example, mammals are definitely the most complex of the higher organisms, but you can see that there are other eukaryotes that have larger genomes. -Genomes of salamanders and lillies have 10 more DNA than the human genome. Why is there a disparity in size btw. These organisms? This is due the relative amounts of non-coding sequences that exists in complex eukaryotes. Higher eukaryotic organisms contain more genes that lower organisms, such as bacteria, but they also have more non-coding regions of DNA in their genomes. -The human genome contains about 25,000 genes which is about the same number of genes as the mouse and 5 times more than E.coli. Figure 5.1 Genome Structure

Introns and Exons A gene is a segment of DNA that is expressed to yield a functional product. Spacer sequences are long DNA sequences that lie between genes. Exons are segments of coding sequence. Introns (or intervening sequences) are segments of noncoding sequences. What is a gene? A piece of DNA that will encode a functional product. This could be either a RNA molecule such as rRNA or tRNA or it could be a protein. What are the non-coding DNA? -Spacer DNA sequences: some of the non-coding DNA can be accounted for by the long DNA sequences that are located between genes. -Introns: Much of the non-coding DNA is present within genes. Coding sequences or exons are separated by non-coding sequences or introns. During transcription, the entire gene is transcribed to form a primary RNA transcript. Then the introns are removed during a process called splicing to produce a mature mRNA molecule. This molecule contains only the exons or coding portions of the RNA molecule and will be used to encode a specific protein. Figure The structure of eukaryotic genes

Adenovirus is a useful model for studies of gene expression.
Introns and Exons Adenovirus is a useful model for studies of gene expression. RNA splicing is the joining of exons in a precursor molecule. Figure Identification of introns in adenovirus RNA. -Using the adenovirus genome as a model system, it can be shown exactly where the intron sequences are located within a gene. In part B of the figure you can visualize the introns. If you hybridize single stranded adenoviral DNA and hybridize it to the mRNA, you can see the regions of DNA that do not correspond to mRNA. These are the segments of DNA that are non-coding or intronic sequences. This can be visualized using an electron microscope. You would not expect to see this by eye or under a regular light microscope. You need much higher magnification than that. Of course, we know that that this process of removing the non-coding regions occurs by Splicing.

5.4 The mouse b-globin gene
Electron microscopic analysis of RNA-DNA hybrids. Nucleotide sequencing of cloned genomic DNAs and cDNAs Conclusion: The coding region of the mouse β-globin gene is interrupted by two introns that are removed from the mRNA by splicing. RNA-DNA hybrid experiment on b-globin gene -Much like the observations in adenovirus, similar observations were made in many cloned eukaryotic genes.

Introns and Exons A kilobase, or kb, of genomic DNA is one thousand nucleotides or nucleotide base pairs. Most introns do not specify the synthesis of a cellular product. What is the intron-exon structure in Eukaryotes? It is actually rather complex. The amount of intron DNA is often greater than that of exon DNA. The average human gene contains 9 exons that are interrupted by 8 introns. In terms of size most human genes are about 30,000 bp or 30 kb. Of the 30,000 bp which make up the average gene, only about bp consist of exons. This means that most genes are nearly 90% intronic or non-coding sequences. Introns are present in most eukaryotic genes. However, there are some genes that lack introns such as histones suggesting that introns are not essential to the function of all genes. In addition most simple eukaryotes such as yeast and prokaryotes such as bacteria also do not contain introns.

5.5 Alternative splicing Alternative splicing occurs when exons of a gene are joined in different combinations, resulting in the synthesis of different proteins from the same gene. So now that we know many things about introns, why do they exist? -Most introns do not encode cellular products such as RNA or protein, they do however have roles in regulating gene expression. -One role of introns, is that they allow for multiple protein products to be produced from the same gene. This process is known as alternative splicing. -The process of alternative splicing allow a single gene to encode multiple products. -Within a gene, it is possible for individual exons to encode separate functional protein domains. As a result it is possble for alternative splicing can lead to the recombination of exons to produce an assortment of different proteins. -This may account for the fact that there are not more genes present in a higher eukaryotic organisms such as humans. - In addtion sometime a process called Exon Shuffling can occur. In this process, the introns actually function to facillitate recombination between Exons of different genes which can result in even more combinations of protein coding sequences.

Repetitive DNA Sequences
A large portion of complex eukaryotic genomes consist of highly repeated noncoding DNA sequences. The sequencing of complete genomes has identified several types of highly repeated sequences. Non-coding Intron sequences make up a very large proportion of genome of many organisms. In humans, 20% of the genomic DNA consists of intronic sequences. -What is genomic DNA? -In eukaryotic genomes, there are other non-coding sequences present. Repetitive DNA sequences make up an even larger portion of the genome than introns. Nearly half of the human genome consists of interspersed non-coding elements. -There are specific types of highly repetitive sequence within eukaryotic genomes. -Simple sequence Repeats -Satellite DNA -Interspersed Repetitive sequences – SINEs and LINES

A simple-sequence repeat consists of tandem arrays of up to thousands of copies of short sequences. Satellite DNAs are simple-sequence repetitive DNA with a buoyant density differing from the bulk of genomic DNA. The two most common types of Interspersed Elements or families of highly repeated retrotransposons in mammalian genomes. : Short interspersed elements, or SINEs, Long interspersed elements, or LINEs, are two families of highly repeated retrotransposons in mammalian genomes. -Simple Sequence Repeats are large groupings (up to 1000 nucleotides) of copies of short sequences. The short sequences range in size from nucleotides. One example might be GAATCCTA, repeated over and over making up 1000 nucleotides. -An interesting note about these simple sequence repeats is that they can be separated from the rest of the genomic DNA by CSCl gradient centrifugation b/c their density differs from other DNA. Why is this? AT rich sequences are less dense than GC rich sequence, so depending on the sequence of the repeats they will have a different density. -Sometimes these simple sequence repeats are referred to as Satellite DNA seqeunces. They are repeated millions of times throughout the sequence of an organism’s genome. It makes up about 10% of an organism’s genome. They are referred to as satellite DNA b/c they have a different density than the rest of the genomic DNA -Other repetitive DNA sequences are dispersed throughout the genome. There are 2 classes of these interspersed repetitive elements which make up about 45% of mammalian genomes. Short interspersed elements are referred to as SINES and long interpersed elements are referred to as LINES -SINES are transcribed into RNA, but do not produce proteins products and as their name suggests they are typically shorter sequences bp. Their function is unknown. -LINES are also transcribed into RNA and some do produce functional proteins, but once again their function is unknown. They are typically 1-4 kb in length.

Retrotransposons are transposable elements that move via reverse transcription of an RNA intermediate. Retrovirus-like elements are transposons that are structurally similar to retroviruses. DNA transposons are transposable elements that move via DNA intermediates. Some interspersed repetitive elements may help regulate gene expression, but most appear not to make a useful contribution to the cell. -Both SINES and LINES are transposable elements that are often referred to as Retrotransposons. -Transposons are sequences that are capable of moving to different sites within genomic DNA . -Retrotranposons are sequences that move to a different location within the genomic DNA by creating a RNA intermediate (transcript). Then reverse transcriptase generates retrotransposon DNA which will integrate into a new chromosomal location.

Gene Duplication and Pseudogenes
A gene family is a group of related genes. Gene families are thought to have arisen by duplication of an original ancestral gene, with different members of the family then diverging as a consequence of mutations during evolution. What else leads to the huge size of eukaryotic genomes? There are some genes that are present in multiple copies. Not all of the genes are functional. Why would multiple copies of the same gene be present in an organism? -Sometimes when genes are needed in large quantities such as to make structural proteins such as ribosome or histones you need more than one copy of the gene to keep up with the cellular demand for the RNA an proteins. -In other cases, there may be groups of genes, called gene families, that may be transcribed in different tissues. For example, C/EBPa, b, d,g e are members of a gene family in the human genomes, with specific family members being expressed in different tissues. The same is true of for alpha and beta globin genes shown here. -Sometimes gene families are clustered together on the same chromosome. In other cases they are dispersed onto different chromosomes. -It is likely that a single ancestral gene has given rise to gene families by way of gene duplication probably due to the functional demand of different cell types in more complex organisms. Figure Globin gene families

Gene Duplication and Pseudogenes
Pseudogenes are nonfunctional gene copies that represent evolutionary relics that increase the size of eukaryotic genomes without making a functional genetic contribution. Genes can be duplicated by reverse transcription of an mRNA, followed by integration of the cDNA copy into a new chromosomal site. -When gene duplication occurs, it does not always result in another functional gene. Sometime mutations occur during the process and a non-fuctional or psuedogene is produced. These are another cause of the large genome size of complex eukaryotic organisms. -In the human genome, studies have shown that there are over 20,000 pseudogenes. -How do gene duplications occur? -duplication of a segment of DNA can result from the transfer of a block of DNA sequence to a new location in the genome. -It can also occur by reverse transcription of an mRNA and integration of the cDNA copy into a new chromosomal site. This form of gene duplication sounds similar to the mechanism of creating retrotransposons. Since it is made from mRNA, it lacks introns and many normal chromosomal sequences that direct transcription of the gene. This typically yields an processed pseudogene or an inactive gene. Figure Formation of a processed pseudogene

The Composition of Higher Eukaryotic Genomes
The genomes of higher animals, such as humans, are approximately 20–30 times larger than those of C. elegans and Drosophila. The increased size of the genomes of higher eukaryotes is due far more to the presence of large amounts of repetitive sequences and introns than to an increased number of genes. Although the numbers of and sizes of chromosomes vary considerably between different species, their basic structure is the same in all eukaryotes. -Basically, the take home message from all of this is that higher eukaryotes such as humans have much larger genomes than those of simple animals such as Drosophila. However this is not due to the fact that humans have so may more genes or chomosomes. As you can see in this table, the chromosome number is not dependent on the complexity of the organism. -With Only a little over 1% of the human genome consists of protein coding DNA sequences, the larger genome size is actually due to all of the non-coding sequences that are present in humans. Some of which is due to the presence of introns, allowing for one gene to encode more than one protein by way of alternative splicing which we will talk more about in Ch. 6. However, most of the of the non-coding sequence present in the genome of higher organisms is due the presence of repetitive and duplicated DNA sequences, pseudogenes, non-repetitive spacer seqeunces btw. genes and exon sequences present at the 3’ and 5’ ends of mRNA that are involved in translation, but do not encode amino acids (UTR).

Chromatin The complexes between eukaryotic DNA and proteins are called chromatin, which typically contain about twice as much protein as DNA. Histones are small proteins containing a high proportion of the basic amino acids, arginine and lysine, that facilitate binding to the negatively charged DNA molecule. So now that we understand that chromosome number is not an indicator of the complexity the organism’s genome, let switch gears a bit and discuss the structure and organization of DNA within chromosomes. -Although the number and size of genes/chromosomes may vary, the organization and structure of DNA is similar in all eukaryotic cells. -The DNA is tightly bound to small proteins which consist primarily of basic amino acids. They are referred to as histones. This allows for the DNA to be organized and stored within the nucleus of a cell. -We talked before about the substantial size of the human genome. This type is packaging is essential in order to fit all of an organism’s DNA into the nucleus of each cell. (in humans nearly 2 meters of DNA must fit into a 5-10 um sized nucleus. -The complex between DNA and histones that occurs in the nucleus of all eukaryotic cells is called chromatin -How does this organizational scheme differ from what occurs in prokaryotic cells? In bacteria there is only a single circular chromosome and it does not require the highly ordered structure b/c it does not have to fit within the nucleus. So there is no chromatin.

5.11 The organization of chromatin in nucleosomes
A nucleosome is the basic structural unit of chromatin. Nucleosome core particles contain 145 base pairs of DNA wrapped around an octamer consisting of two molecules each of histones H2A, H2B, H3, and H4. As we just discussed the complex btw. Histones and DNA is referred to as chromatin. -Let’s talk a bit about the organization of chromatin. There are a lot of protein subunits involved, most of which are these small basic proteins called histones. Histones make up a large portion of the mass of eukaryotic cells. -There are 5 major types of histones: H1, H2A, H2B, H3, and H4 which are part of the basic structural unit of chromatin called the nucleosome. -The 4 histone subunits bind together to form the nucleosome core particle. The DNA is winds abound these nucleosome core particles in an effort to tightly coil the DNA. In between nucleosome core particles there are linker DNA sequences and non-histone protiens. Each segment of chromatin encompasses about 200 bp of DNA and one nucleosome core particle. -There are also numerous non-histone chromosomal proteins present within eukaryotic cells. Some of these are also components of the nucleosome, while others have roles in replication, and gene expression. -Some early experiments were able to prove that chromosomal was organized in this manner. When both chromatin and naked DNA were digested with a nuclease, an enzyme which degrades DNA, only the naked DNA was cleaved into fragments. The chromatin could not be cleaved b/c the DNA was protected by the histones it was bound to.

Chromatin A chromatosome is a chromatin subunit that consists of base pairs of DNA wrapped around the histone core and is held in place by H1, a linker histone. -Detailed studies of nucleosome core particles revealed that they contain 147 bp of DNA that wrap 1.6 times around the histone proteins which consist of H2A, H2B, H3, and H4. The H1 proteins is added to the DNA as it wraps around the others. -When the H1 is part of the complex, it then becomes known as the chromatosome. The H1 histone, sometimes referred to as the linker histone, serves to hold the DNA in place around the histone core proteins. Figure Structure of a chromatosome

5.13 Chromatin fibers Collectively, a chromatosome and linker DNA (generally about 50 bp) make up a chromatin fiber. The entire unit or fiber is only 10 nm in size. -This means of packaging serves to decrease the size of the DNA sequence six fold. -So this the first level of compacting genomic DNA. Then, the chromatin is further condensed by coiling in 30 nm fibers as shown here. This results in a 50 fold total condensation of genomic DNA. -The H1 histone is involved in this process of supercoiling to form the 30 nm fibers. -Remember that ultimately the DNA has to be organized in such a way that it is accesible for the processes such as replication and transcription.

Chromatin Figure Interphase Chromosome Euchromatin is decondensed transcriptionally active interphase chromatin. Heterochromatin is condensed, transcriptionally inactive chromatin, and it contains highly repeated DNA sequences. -Her we have talked a lot about chromatin condensation and how it is necessary to allow the DNA to be store in the nucleus of eukaryotic cells. Keep in mind that the level of condensation is dependent on the life cycle of the cell. For example, in interphase the cells are non-dividing and as a result the chromatin is not condensed very tightly. It is decondensed and dispersed throughout the nucleus. This type of chromatin is called Euchromatin. -This allows for both transcription and DNA replication to occur. -Remember that replication of DNA must occur before cell division can begin again. So as such, this is the main objective of interphase, to prepare the cell for cell division. -In contrast, a small percentage of the interphase chromatin is very highly condensed and referred to as heterochromatin. It is very similar to the appearance of chromatin during mitosis. This figure shows you chromatin in the process of condensing during mitosis. It is important that the chromatin be highly condensed b/c the nucleus has to hold twice the amount of chromatin until the cell divides. So the loops of 30 nm chromatin fibers loop upon themselves to form the tightly compact metaphase chromosome. -Heterochromatin is essentially the opposite of Euchromatin. It is transcriptionally inactive and contains highly repeated sequences such as centromeres and telemoric sequences which we will talk more about on Wednesday. Figure Chromatin condensation during mitosis

Centromeres The centromere is a specialized region of the chromosome that plays a critical role in ensuring the correct distribution of duplicated chromosomes. Centromere DNA sequences were initially defined in yeasts. Plasmids that contain functional centromeres segregate like chromosomes and are equally distributed to daughter cells following mitosis. -Centromeric DNA sequences do not encode proteins. -Instead they function to regulate the distribution of replicated chromosomes to daughter cells during cell division. -As we discussed in our last class, the chromatin decondenses and replicates during interphase. So mitosis begins with two copies of every chromosome. -During mitosis the chromatin condenses and the two identical sister chromatids are held together by a centromere. Then as mitosis proceeds the sister chromatids attach to spindle fibers by way of the centromere which facillitates the identical chromatids to separate into 2 distinct daughter cells. -What is the role of centromeres? To serve as a site of attachment for microtubules of the mitotic spindle fiber and a site of association of identical sister chromatids during mitosis. Figure Chromosomes during mitosis

5.19 The centromere of a metaphase chromosome
A kinetochore is a specific DNA sequence to which a number of centromere-associated proteins bind. -We started off saying that Centromeres do not encode protiens. They do not, but they do bind proteins. Every eukaryotic chromosome contains centromeric DNA sequences to which Centromere-associated protiens bind. - When centromere associated proteins are bound to Centromeric DNA sequences, this region of the chromosome is referred to as the Kinetichore. It is the binding of the microtubes to the kinetichore proteins that facillitates the process of mitosis and ultimately cell division. In effect, these proteins functions to move the chromosomes along the spindle fibers to produce two genetically identical daughter cells.

5.20 Assay of a centromere in yeast
As we mentioned earlier, centromeric DNA sequences were originally isolated in yeast. It was possible to determine their function by using an assay to follow the segregation of plasmids at mitosis. -In this assay, both plasmids contain a selectable marker (LEU2) and DNA sequences that serve as orgins of replication in yeast (ARS). However, plasmid I lacks a centromere , while plasmid II contain centromere DNA (CEN). -The outcome is that plasmids that contain functional centromeric DNA sequences segregate just like chromosomes do during mitosis. The plasmid with out the centromere DNA does not segregate properly. -What is the impact of this? If a chromosome were lacking in centromeric DNA, then the chromosome may not segregate properly. This means that some daughter cells may get 2 copies of the chromosome, while other will not get any copies of the chromosome. Obviously these types of genetic mistakes could have serious consequences in terms of normal development of the organism.

5. 21 Centromeres of S. cerevisiae, S
5.21 Centromeres of S. cerevisiae, S. pombe, and Drosophila melanogaster This slide illustrates a couple of things: Centromere sequences are distinct to every type of eukaryotic organism. Even two different types of yeast illustrate differences in centromeric DNA. Much of the non-coding DNA sequence that we discussed in the last class are present in centromeres. -S.cerevisieae centromere: consists of 2 short conserved sequences (CDE1 and III) that is separated by bp of AT rich DNA (satellite DNA) -S. pombe centromere: It consists of a central core of unique DNA sequence that is flanked by tandem repeats of 3 repetitive sequence elements (B, K, L) -Drosophila centromere: consist of 2 AG rich satellite sequences, tranposons and non-repetitive DNA.

Telomeres Telomeres are the sequences at the ends of eukaryotic chromosomes. The telomere DNA sequences of a variety of eukaryotes are similar, consisting of repeats of a simple-sequence DNA containing clusters of G residues on one strand. In contrast to centromeres which are located at the center of chromosomes, telomeres are DNA sequences that are present at the end of chromosomes. In Eukaryotes, they typically consist of DNA sequence with a prevalence of G residues on one strand. These sequences are then repeated hundreds to thousands of times and terminate with a 3’ single stranded overhang. They are present in all chromosomes and have very distinct roles in the replication of chromosomes.

5.22 Structure of a telomere
The repeated sequences of telomere DNA of some organisms, including humans, form loops at the ends of chromosomes. Telomerase is a special enzyme that uses reverse transcriptase activity to replicate telomeric DNA sequences. This slide illustrates the structure of the telomeres. What is the purpose of Telomeres? The repeated sequence of telomere DNA forms loops at the end of chromosomes and binds proteins. Sometimes this loop folds back on itself forming a circular structure as you see here. This serves to protect the chromosome ends from degradation or linkage to other chromosomes. DNA polymerase is able to extend growing DNA chains, but cannot initiate synthesis of a new chain at the terminus of a linear DNA molecule. There fore, the ends of linear chromosomes can not be replicated by the normal action of DNA polymerase. An enzyme called Telomerase must be used. This specialized enzyme used reverse transcriptase activity to replicate telomeric DNA sequences. Ultimately the existence and maintenance of telomeres is critical in determining the lifespan and reproductive activity of cells. In other words, the study of telomeres may be important in aging.

The Sequences of Complete Genomes
The complete nucleotide sequences of both the human genome and the genomes of several model organisms, including E. coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila, Arabidopsis, and the mouse, have been analyzed. Overview of Genome Sequences: -As I mentioned in the last class, knowing the complete genome sequence of organisms like E.coli, yeast, Drosophila, mouse and humans has really revolutionized the field of molecular biology. -However, it has created even more unanswered questions. -This allows the focus to be turned to gene expression and what genome sequences function to regulate gene expression.

Prokaryotic Genomes The first complete sequence of a cellular genome was that of the bacterium Haemophilus influenzae. A megabase, or Mb, is one million nucleotides or nucleotide base pairs. Open-reading frames are long stretches of nucleotide sequence that can encode polypeptides. The H. influenzae genome has six copies of rRNA genes, 54 different tRNA genes, and potential protein-coding regions. This portion of your book, focuses on details of genomes of several different types of organisms. For the purpose of this course, we will focus on the bacterial genomes and human/mouse geneomes. -At present we know the complete genome sequence for about 100 different bacteria. -H. influenza was the first to be sequenced in the mid 1990’s. It is about 1.8 million bases or 1.8 megabases. -The E.coli genome is nearly twice the size of this bacteria. -How can you determine which portions of the genomic DNA encode proteins just by looking at the sequence? It is possible to use computer programs like the ones that we are using in lab to identify Open Reading Frames (ORFs)….these are long stretches of nucleotide sequence that encode polypeptides. These sequences do not contain any stop codons (UAA, UAG, UGA). ORFs greater than 100 codons are considered to be genes In a non-coding region of the gene you would expect to encounter one of the stop codons within a 100 codons. -Using this type of computer program, scientist could determine that H. influenzae genome has 6 copies of rRNA genes, 54 different rRNA genes and 1743 potential protein coding regions. -Another reason for studying genome sequences of different organisms is that they may provide information about the evolutionary relationship btw. Different organisms. Organisms that are derived from the same ancestor should have substantial genome sequence in common. -Rationale for using E.coli as organism of choice – simplicity of the organism. It’s genome is about 4.6 Mb and has about 4288 genes

The Yeast Genome Yeasts are model eukaryotic cells that can be studied much more readily than the cells of mammals or other higher eukaryotes. Yeasts have a high density of protein-coding sequences, similar to bacterial genomes. Yeasts are particularly amenable to functional analyses of unknown genes because of the facility with which normal chromosomal loci can be inactivated by homologous recombination with cloned sequences. As we have discussed previously, since yeast is a eukaryotes sometimes it is a better model system than bacteria. It at least has all the same internal cellular structures as more complex eukaryotes unlike bacteria. -It is possible to do structure-function studies in yeast. -You should read about the difference btw the various types of genomes, but we will not talk about the details of the differences btw the genomes. Just be sure to understand the value of knowing the genome sequences for all these different organisms. -Yeast -C. elegans -Drosophila -Arabidopsis -Mouse -Humans

The Human Genome The human genome is distributed among 24 chromosomes, each containing between 45 and Mb of DNA. -Let’s take the next few minutes to talk about the human Genome. -The human genome is 3 X 10 9 base pairs of DNA. This is large…..it is 10 times larger than the drosophila genome with the smallest human chromosome being several times larger than the entire yeast genome. Stretched end to end, the human genome is about 1 meter long. -You can understand now why it must be coiled around histone and supercoiled to fit inside the nucleus of a 5-10 um human cell. This sequence of the human genome is distributed among 24 chromosomes…..22 somatic chromosomes and 2 sex chromosomes (X and Y). Every person has 23 chromosomes in the cells of their bodies that collectively contain all of their genes (their entire genome). -In this figure you see, bands of genes located on each chromsome. It is possible to localize a particular gene to a particular region of each chromosome.

5.30 Fluorescence in situ hybridization
Fluorescence in situ hybridization, or FISH, localizes genes using probes labeled with fluorescent dyes to chromosomes. One method used to localize genes to particular chromosome is called FISH, Flourescence in situ hybridization. As you can see her it uses probes with flourescent dyes linked to them. -Here a flourescent probe for the gene encoding the lamin B receptor is hybridized to stained human metaphase chromosomes. This technology allows you to map a particular gene to a specific location or genetic loci.

5.31 Sequence of human chromosome 1
The sequenced euchromatin portion of the human genome encompasses approximately 2.9 ´ 106 kb of DNA. The human genome consists of only 20,000–25,000 genes, which is not much larger than the number of genes in simpler animals like C. elegans and Drosophila. In this diagram, you can really see how gene are distributed in bands of different genetic loci over the length of the chromosome…..this is an example of human chromosome 1. -Let’s talk about the human genome for a minute…..the genome was sequenced and published in This work was done independently by two separate teams….the International Human Genome Sequenceing Consortium and Craig Venter of Celera Genomics (one private and one governmental). Both of these groups obtains about 90% of the genomic sequence….with only small sequence gaps. What they sequenced was the Euchromatin portion of the genome which was about 2.9 x 106 kb of DNA. -The refined sequence, with no gaps was then published in This includes the remaining 0.3 x This portion corresponds to the highly repetitive portion of the genome….specifically the heterochromatin. -It was not surprising to everyone that there was such a large amount of non-coding sequence in the genome, but it was a little surprising that there were not more genes or protein coding sequences present within the genome. - The average human gene is about 30 kb with only about 1.4 kb being protein coding sequence. This means that over 90% of the gene is intronic sequence.

The Genomes of Other Vertebrates
A large and growing number of vertebrate genomes have been sequenced in the last few years, including the genomes of fish, chickens, and other mammals. The mammalian genomes that have been sequenced, in addition to the human genome, include the genomes of the mouse, rat, dog, and chimpanzee. The sequence of the genome of the chimpanzee, our nearest evolutionary relative, is expected to help pinpoint the unique features of our genome that distinguish humans from other primates. Several other vertebrate genome have also been sequenced….such as….. -Having these sequences, makes it easy to use some of these animals as model systems to study human diseases. Remember, it is not always to possible to do experiments in humans. Often times, the experiments must first be done in animals and then moved up the line to humans. Many of these organisms are great model systems to study gene function…..the role of a gene in determining cell function and perhaps disease. -Over 40 % of the predicted human proteins are related to proteins in other sequenced organisms……many of these conserved proteins have basic cellular functions such as metabolism, transcription, translation or DNA repair. -Of course, the human genome also contains genes are specific to humans such of those genes that are involved in the immune response, blood clotting and nerve response. -Mouse, rat and human genomes show 90% homolog in their coding sequences. -Human and Chimp genomes show 99% homology…..differences exist primarily in the protein coding sequences of the genome. This of course produces proteins differences between the two organisms. This may help us understand the basis of the evolutionary divergence btw. these two organisms.

Bioinformatics and Systems Biology
Bioinformatics, a field of science that lies at the interface between biology and computer science, is focused on developing the computational methods needed to analyze and extract useful biological information from the sequence of billions of bases of DNA. Systems biology seeks a quantitative understanding of the integrated dynamic behavior of complex biological systems and processes. As we have discussed before, knowing the genomic sequences of numerous organism has really changed the face of molecular and cellular biology. It has led the way to trying to understand the function of genes in biological processes such as cell survival and cell death. -In effect, having the sequences at hand allow us to look at more than one gene at a time. In the past, it was only possible to examine one gene at a time. Obviously, this makes it much harder to identify what gene is critical to the development of a disease, particularly if you do not have any clues as to what gene you are looking for. -Now it is possible to used techniques such as Gene expression arrays which provide information about 30, 000 different gene. This give the researcher a lot of data at one time and they he/she has to figure how to analyze or interpret all of that data. -This type of new technology has led to the development of 2 new fields of biology…..bioinformatics and systems biology. -Bioinformatics is basically the used of computer programs to analyze and extract information from DNA sequence comparisons. -Systems Biology is the combination of large scale experimental scenarios and computational analysis of the collected data. -These are commonly used to study the function of genes. As I mentioned before, trying to identify the gene or genes that is at the root of a particular biological process or disease state.

Systematic Screens of Gene Function
Large-scale screens based on RNA interference (RNAi) are being used to systematically dissect gene function in a variety of organisms, including Drosophila, C. elegans, and mammalian cells in culture. In RNAi screens, double-stranded RNAs are used to induce degradation of the homologous mRNAs in cells. Let’s look at one example in which you might us a systems approach to analyze gene function. -In the past, you might systematically knock out one gene at a time and examine the effect on a particular biological system. As you might imagine, this could take a very long time and you may never find the answer if the biological process you are looking for required the interaction of two genes. -Now, you could do a large scale using RNAi (RNA interference) to systematically determine gene function. -Double stranded RNAs are used to induce degratdation of the homologous mRNAs within cell. In this scenario, a variety of double standed RNA molecules (to specific genes) are placed in individual cells. The cells are added to each well and the biological assay is performed. In the case it is asked whether cell growth occurs. If it does not occur then it is thought the RNA molecule blocked growth and that particular gene is critical to the cell growth pathway in those cells. -This type of assay can be used to determine the genes involved in all sort of biological processes including cell death, cell signaling and protein degradation.

5.34 Conservation of functional gene regulatory elements
Regulation of Gene Expression Genome sequences can, in principle, reveal not only the protein-coding sequences of genes, but also the regulatory elements that control gene expression. A variety of computational approaches are being used to characterize functional regulatory elements, based on the assumption that functionally important sequences are conserved in evolution. cell4e-fig jpg t is possible to not only detect genes, but also their regulatory elements. Every gene contains regulatory sequences that are necessary to control gene expression – turn it on or off depending on the cellular signals. So understanding these sequences and their role in controlling gene expression is as important as identifying the gene itself. However, it is generally more difficult to isolate and identify these regions then it is to identify protein coding regions. This is because many regulatory sequences are short sequences, generally spanning only 10 bases. They appear not different than any other sequence of 10 bases. -Identifying these sequences is a big challenge for molecular biologists. -One way to identify these sequences is by comparing the genomic sequences of similar organisms. A computer program similar to the one we are using now could be used for this purpose. -Here human, mouse, rat and dog sequences near the transcription start site of a gene are compared and a functional regulatory element that binds the transcription factor Err-a is identified. -Computer alogorithms are used to detect regulatory sequences that occur in cluster.

Variation among Individuals and Genomic Medicine
Variations between individual genomes underlie differences in physical and mental characteristics, including susceptibility to many diseases. The genomes of two unrelated people, which differ in about one of every thousand bases, are in the form of single base changes known as single nucleotide polymorphisms (SNPs). Fig Single nucleotide polymorphisms (SNPs) in human chromosome 1 This slide shows SNPs in chromosome 1 -The genomes of 2 unrelated people differ in about one of every thousand bases. Most of the variation is in single base changes that are called single nucleotide polymorphisms (SNPs). -There are over 1 million commonly occuring SNPs that have been mapped to the human genome that are distributed relatively evenly throughout the genome with 90% of protein coding sequences containing at least 1 SNP. -They are most like responsible for the majority of genetic differences that exist btw. 2 humans. -Knowing the location of these SNP could lead to certain SNPs/genes being associated with susceptibilty to certain genetic diseases and may ultimately help physicians tailor their protocols for treating genetic diseases.

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.

Similar presentations

Presentation on theme: "The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.

Similar presentations

Presentation on theme: "The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last."— Presentation transcript:

Similar presentations

About project

Feedback