1 Gene Predictor Date:20/11/2003 Implemented By: Zohar Idelson Supervisor: Dr. Yizhar Lavner Winter - Summer 2003.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Introduction to molecular biology. Subjects overview Investigate how cells organize their DNA within the cell nucleus, and replicate it during cell division.
Transformation Principle In 1928 Fredrick Griffith heated the S bacteria and mixed with the harmless bacteria thinking that neither would make the mice.
Ab initio gene prediction Genome 559, Winter 2011.
Prof. Drs. Sutarno, MSc., PhD.. Biology is Study of Life Molecular Biology  Studying life at a molecular level Molecular Biology  modern Biology The.
GENETIC-CONCEPTS.
Hidden Markov Models in Bioinformatics
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
RNA and Protein Synthesis
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
ECE 501 Introduction to BME
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.
Introduction to Molecular Biology. G-C and A-T pairing.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Lecture 12 Splicing and gene prediction in eukaryotes
Hosted by The Greatest Biology teachers at Rider.
SC.L.16.3 Describe the basic process of DNA replication and how it relates to the transmission and conservation of the genetic information.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
Comparative Genomics of the Eukaryotes
Hidden Markov Models In BioInformatics
Bio-Medical Informatics
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
From Gene To Protein Chapter 17. The Connection Between Genes and Proteins Proteins - link between genotype (what DNA says) and phenotype (physical expression)
DNA, RNA, and Proteins.  Students know and understand the characteristics and structure of living things, the processes of life, and how living things.
CSE 6406: Bioinformatics Algorithms. Course Outline
DNA and Chromosomes DNA is present in such large amounts in many tissues that it’s easy to extract and analyze. But where is DNA found in the cell? How.
FROM DNA TO PROTEIN Transcription – Translation We will use:
Intelligent Systems for Bioinformatics Michael J. Watts
RNA Structure and Transcription Mrs. MacWilliams Academic Biology.
FROM DNA TO PROTEIN Transcription – Translation. I. Overview Although DNA and the genes on it are responsible for inheritance, the day to day operations.
Sevas Educational Society All Rights Reserved, 2008 Module 1 Introduction to Bioinformatics.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Chapter 21 Eukaryotic Genome Sequences
A Biology Primer Part II: DNA, RNA, replication, and reproduction Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 3 The Biological Basis of Life. Chapter Outline  The Cell  DNA Structure  DNA Replication  Protein Synthesis.
Chap. 1 basic concepts of Molecular Biology Introduction to Computational Molecular Biology Chapter 1.
DNA The Molecule of Life. What is DNA? DeoxyriboNucleic Acid Chargaff’s Law A=T, G=C R. Franklin and M. Wilkins Crystal X-ray J Watson and F Crick Model.
Using DSP To Find Coding Regions in DNA Sequences Anna de Regt and Rio Akasaka.
Chapter 17 From Gene to Protein. 2 DNA contains the genes that make us who we are. The characteristics we have are the result of the proteins our cells.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
DNA, RNA & Protein Synthesis.
Brief Overview of Macromolecules DNA, RNA, and Proteins.
DNA and the genetic code DNA is found in the chromosomes in the nucleus in eukaryotic cells or in the cytoplasm in prokaryotic cells. DNA is found in the.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
GENETICS Part 3 Contents: Review, DNA Song, A-T & C-G,
Nucleic Acids. Bio-molecules are compounds composed of repeating units of their building blocks i.e. monomers. There are four major classes of bio- molecules.
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Microbiology Chapter 9 Genetics - Science of the study of heredity, variations in organisms that are transferable from generations to generation DNA is.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
BASIC GENETICS, COMMON TO ALL LIVING THINGS GENOME NUCLEOTIDES CHROMOSOME GENE DNA MUTATION NATURAL SELECTION.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
Modern Genetics How information is passed from parents to offspring.
Introduction to molecular biology Data Mining Techniques.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
CHAPTER 12 DNA, RNA, & Protein Synthesis Put these notes behind your meiosis notes.
Ch 12 DNA and RNA 12-1DNA 12-2 Chromosomes and DNA Replication 12-3 RNA and Protein Synthesis 12-4 Mutations 12-5 Gene Regulation 12-1DNA 12-2 Chromosomes.
DNA and Protein Synthesis
Topic 25 – RNA and protein synthesis
Molecular Genetics Transcription & Translation
CONTINUITY AND CHANGE.
Ab initio gene prediction
Evolution of eukaryote genomes
Transcription Credit for the original presentation is given to Mrs. Boyd, Westlake High School.
RNA and protein synthesis
The Toy Exon Finder.
Presentation transcript:

1 Gene Predictor Date:20/11/2003 Implemented By: Zohar Idelson Supervisor: Dr. Yizhar Lavner Winter - Summer 2003

2 Genomic Signal Processing Genomic Signal Processing is a relatively new field in Bioinformatics, in which signal processing algorithms and methods are used to study functional structures in the DNA. Genomic Signal Processing is a relatively new field in Bioinformatics, in which signal processing algorithms and methods are used to study functional structures in the DNA. An appropriate mapping of the DNA sequence into one or more numerical sequences, enables the use of many digital signal processing tools. An appropriate mapping of the DNA sequence into one or more numerical sequences, enables the use of many digital signal processing tools. atgcggatttgccgtcgatgtc… Gene Predictor Gene DNA Segment

3 DNA in Eukaryotes is organized in chromosomes. DNA in Eukaryotes is organized in chromosomes. The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca … ). The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca … ). In exons (protein-coding region), during the biological amino acids building, those letters are read as triplets (codons). Every codon signals which amino acid to build (there 20 aa). In exons (protein-coding region), during the biological amino acids building, those letters are read as triplets (codons). Every codon signals which amino acid to build (there 20 aa). There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions). There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions). Every gene start with a start-codon and ends with a stop-codon. An exon cannot consists of more than one stop-codon. Every gene start with a start-codon and ends with a stop-codon. An exon cannot consists of more than one stop-codon. Non coding areas (majority usually) has a lot more random behavior than genes. Most of the DNA is non coding. Non coding areas (majority usually) has a lot more random behavior than genes. Most of the DNA is non coding. Genes can be detected by some statistics regularities, like codon usage, nucleotide usage, periodicity and data base comparison. Genes can be detected by some statistics regularities, like codon usage, nucleotide usage, periodicity and data base comparison. DNA Basics

4 Organisms Classified into two types: Classified into two types: Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi, … ) Eukaryotes: contain a membrane-bound nucleus and organelles (plants, animals, fungi, … ) Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) Prokaryotes: lack a true membrane-bound nucleus and organelles (single-celled, includes bacteria) Not all single celled organisms are prokaryotes! Not all single celled organisms are prokaryotes!

5 Cells Complex system enclosed in a membrane Complex system enclosed in a membrane Organisms are unicellular (bacteria, baker ’ s yeast) or multicellular Organisms are unicellular (bacteria, baker ’ s yeast) or multicellular Humans: Humans: – 60 trillion cells –320 cell types Example Animal Cell biology_intro.htm

6 DNA Basics – cont. DNA in Eukaryotes is organized in chromosomes. DNA in Eukaryotes is organized in chromosomes.

7 Chromosomes In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes In eukaryotes, nucleus contains one or several double stranded DNA molecules orgainized as chromosomes Humans: Humans: –22 Pairs of autosomes –1 pair sex chromosomes Human Karyotype Session8/Session8.html

8

9 What is DNA? DNA: Deoxyribonucleic Acid DNA: Deoxyribonucleic Acid Single stranded molecule (oligomer, polynucleotide) chain of nucleotides Single stranded molecule (oligomer, polynucleotide) chain of nucleotides 4 different nucleotides: 4 different nucleotides: –Adenosine (A) –Cytosine (C) –Guanine (G) –Thymine (T)

10 Nucleotide Bases Purines (A and G) Purines (A and G) Pyrimidines (C and T) Pyrimidines (C and T) Difference is in base structure Difference is in base structure Image Source: biology_intro.htmwww.ebi.ac.uk/microarray/ biology_intro.htm

11 DNA

12

13 Genome chromosomal DNA of an organism chromosomal DNA of an organism number of chromosomes and genome size varies quite significantly from one organism to another number of chromosomes and genome size varies quite significantly from one organism to another Genome size and number of genes does not necessarily determine organism complexity Genome size and number of genes does not necessarily determine organism complexity

14 ORGANISMCHROMOSOMESGENOME SIZEGENES Homo sapiens Homo sapiens (Humans) 233,200,000,000~ 30,000 Mus musculus (Mouse) 20, 2600,000,000~30,000 Drosophila melanogaster Drosophila melanogaster (Fruit Fly) 4180,000,000~18,000 Saccharomyces cerevisiae (Yeast) ,000,000~6,000 Zea mays (Corn)102,400,000,000??? Genome Comparison

15

16 The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca … ) The DNA in each chromosome can be read as a discrete signal to {a,t,c,g}. (For example: atgatcccaaatggaca … ) DNA Basics – cont.

17 In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa). In genes (protein-coding region), during the construction of proteins by amino acids, these nucleotides (letters) are read as triplets (codons). Every codon signals one amino acid for the protein synthesis (there are 20 aa). DNA Basics – cont.

18 There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions). There are 6 ways of translating DNA signal to codons signal, called the reading frames (3 * 2 directions). DNA Basics – cont. …CATTGCCAGT…

19 DNA Basics – Cont. …CATTGCCAGT… Start: ATG Stop: TAA, TGA, TAG gene Exon Intron Exon

20 The Problem Given unannotated DNA, find the genes. Given unannotated DNA, find the genes. In practice, find the exons and their RF. In practice, find the exons and their RF. Smaller scale problem: given some annotated DNA of a creature, find the exons of unannotated DNA of the same creature. Smaller scale problem: given some annotated DNA of a creature, find the exons of unannotated DNA of the same creature. atgcggatttgccgtcgatgtc… Gene Predictor Exon

21 Solution Scheme Solution scheme: Solution scheme: –Work in windows analysis. –Find parameters that gives a good prediction in annotated DNA (of the same organism). Learn how to distinguish exons regions from non-exons regions. –Extract those parameters from the unannotated DNA, and use the discrimination rule in order to predict. Almost all methods shown here fit to this scheme. Almost all methods shown here fit to this scheme.

22 Creatures in the Project C. elegansS. cerevisiae (yeast)

23 Existing Methods Many methods relies on the pseudo periodicity of 3 in genes. For that we define: Many methods relies on the pseudo periodicity of 3 in genes. For that we define: –U b is the binary indicator series for base B. –U B is the STFT of u b. N, the window size, is in the hundreds. Exons size is in order of 10 1 … 10 3 ). N, the window size, is in the hundreds. Exons size is in order of 10 1 … 10 3 (in S. Cerevisiae). Overlapping windows. Overlapping windows. –There exists a connection between the DFT in k = N/3 frequency and nucleotides usage.

24 Calculating the DFT of a DNA sequence * ATCGTACAGCTGCAAAGCATAGATTCGGTCACAGTTG… S(n) … … … u A (n) u T (n) u C (n) u G (n) *Silverman and Linsker 1986; Voss 1992

25 Spectrogram A way for showing the amplitude of U A, U C, U G and U T together. Linear Transform to RGB. Magnitude is represented by brightness Finding exons visually: bright horizontal lines, usually in k = N/3 Position( nucleotides ) Frequency N/3

26 Spectrogram – cont. DNA of C. Elegans chr. III versus totally random DNA

27 Power Spectrum Difference between gene to non-gene areas is in 1 order of magnitude Used for k = N/3

28 IIR Anti Notch Filtering IIR anti notch filter aimed to find “ peaks ” of a chosen frequency IIR anti notch filter aimed to find “ peaks ” of a chosen frequency all-pass Anti-notch

29 Optimized Spectral Content Measure (OSCM) Find good coefficients (a,g,t) for high differentiation between exons and introns. Ignoring C since of the linear dependency in the rest. Ar, Tr, Gr are generated from random DNA sequence, or Introns. Performance:

30 OSCM Example Direction mistake Good forward detection Good reverse detection

31 OSCM Justification In genes, the 4 complex variables A,T,C,G are not all-random and tend to be near a specific angle (phase). In genes, the 4 complex variables A,T,C,G are not all-random and tend to be near a specific angle (phase). In introns, the values of phase seems to be pure random. In introns, the values of phase seems to be pure random. Those unique angles enable us to detect reading frame as well. Those unique angles enable us to detect reading frame as well.

32 Distribution of the phase of the DFT at the freq of 1/3 in the genes of S. Cerevisiae: Distribution of arg(A) angular mean = angular deviation = Distribution of arg(T) Distribution of arg(C)Distribution of arg(G) Argument distributions for all experimental genes in all chromosomes in S. Cerevisiae angular mean = angular deviation = angular mean = angular deviation = angular mean = angular deviation =

33 Distribution of arg(A) Distribution of arg(C) Distribution of arg(G) Argument distribution for non-coding regions in all chromosomes in S. Cerevisiae Distribution of arg(T) Distribution of the phase of the DFT at the freq of 1/3 in the introns of S. Cerevisiae:

34 Fourier Spectra and Position Asymmetry f(b,i) is the frequency of the base b in the codon position i, i=1,2,3.

35 Genes versus Introns Coding regions genes and exons Introns and intergenic spacers LARGEsmallMagnitude NarrowdistributionRandomlydistributedPhase Distribution of the DFT of T at 1/3 frequency Distribution of the DFT of G at 1/3 frequency (Data taken from S.Cerevisiae, chr. IV)

36 Finding Reading Frame (OSCM Phase)  Is concentrated around  1,  2 and  3 corresponding to each reading frame.  Is concentrated around  1,  2 and  3 corresponding to each reading frame. Lowering the variance of  with the optimization: Lowering the variance of  with the optimization: Transforming  to color. Transforming  to color. Deriving reading frame by a simple look. Deriving reading frame by a simple look. Blue3 Green2 Red1Color Reading Frame

37 New Methods in This Project Linear prediction Linear prediction Classification by clustering (CC) Classification by clustering (CC) Classification by compression ratios Classification by compression ratios

38 Linear Prediction Create a walk from the indicator sequences Create a walk from the indicator sequences For each window, find LP coefficients. Look for differences in correlation by: For each window, find LP coefficients. Look for differences in correlation by: –Poles map –Frequency response –Prediction error No new findings in this method. No new findings in this method.

39 Classification by Clustering Recall: DFT in k=N/3 frequency has a strong correlation with genes locations and reading frames (as shown in part A) Recall: DFT in k=N/3 frequency has a strong correlation with genes locations and reading frames (as shown in part A)part Apart A Here we ’ ll attempt to use it in order to discriminate exons from the rest, in a 6D space Here we ’ ll attempt to use it in order to discriminate exons from the rest, in a 6D space Learning phase: clustering Learning phase: clustering Classification phase: fuzzy KNN Classification phase: fuzzy KNN

40 Classification by Clustering Clustering Stage: Example From left to right: C, G and T. S. Cerevisiae 5 th chromosome.

41 Classification by Clustering RF = ° -120° Max סף Exon? Reading frame (if it’s an exon) (T,C,G) new sample RF = 1 RF =? 1 RF =? 3 RF =? 2 DNA = … atcgtgactagc … DFT(k=N/3) Indicator DFT(k=N/3) Indicator DFT(k=N/3) Indicator T CG Start here uTuT uCuC uGuG

42 Classification Rule Fuzzy KNN: create a fuzzy membership function and choose the one with the highest score. Add fuzzy clustering iteration to the LBG algorithm. Fuzzy KNN: create a fuzzy membership function and choose the one with the highest score. Add fuzzy clustering iteration to the LBG algorithm. Two methods for classifying gene/non- gene: Two methods for classifying gene/non- gene: –Add genes and non-genes scores, and max sum wins. –Max centroid score wins. 2 nd method used (better performance). Scores sums are used for reading frame: max r.f. wins. 2 nd method used (better performance). Scores sums are used for reading frame: max r.f. wins.

43 Results Creature: S. Cerevisiae. Creature: S. Cerevisiae. Learning was done on the 5 th chromosome. Learning was done on the 5 th chromosome. Parameters: Parameters: –K=7 and m=2 of fuzzy KNN. –True exon  50% exon. –Thresh = 1. Total: only 4.6% of true exons weren ’ t detected at all. Total: only 4.6% of true exons weren ’ t detected at all. # missed# exonsf_n_exonsrf_truef_nf_p Total

44 CC - Example

45 CC - Improving Instead of deciding for each reading frame separately and then decide which r.F. “ Won ”, we can replicate the centroids for the other reading frames and the classification rule will determine [exon / non-exon] + [reading frame], at the same time. This suppose to cause a more fair competition between the reading frames. Instead of deciding for each reading frame separately and then decide which r.F. “ Won ”, we can replicate the centroids for the other reading frames and the classification rule will determine [exon / non-exon] + [reading frame], at the same time. This suppose to cause a more fair competition between the reading frames.

46 Classification by Compression Rates A T C G A T C G T A C G C A T G C A T G C A T G C A T G A A A A 60…11829 In forward coding, creating 3 different codon sequences. In classification of reverse coding, first complementing all the DNA, then treating it like forward (and results will also be reversed) In the end of this stage, we have 6 codon seriates. Nucleotides ( ‘ A ’, ’ C ’, ’ T ’, ’ G ’ ) Codons (0..63)

47 The Idea If we have a dictionary with the popular words ( = codon sequences) in exons which aren ’ t popular in non-exons then: If we have a dictionary with the popular words ( = codon sequences) in exons which aren ’ t popular in non-exons then: –Good compression will be achieved in exons –Good compression will not be achieved in introns So we need a good dictionary and a good compressing algorithm So we need a good dictionary and a good compressing algorithm

48 Building the Dictionary Aim: the output dictionary is expected to hold short popular words in exons. Aim: the output dictionary is expected to hold short popular words in exons. Using LZW algorithm. Using LZW algorithm. Input: all exons of learnt chromosome. Input: all exons of learnt chromosome. Initial dictionary: all codons. Initial dictionary: all codons. Add restriction on length of words to be entered to the dictionary. Add restriction on length of words to be entered to the dictionary. Output I: dictionary with words that appeared in exons. Output I: dictionary with words that appeared in exons. Output II: the code of the exons by the dictionary. Output II: the code of the exons by the dictionary.

49 LZW: Encoding 1)Accum  first input letter 2)If dict.Find(accum) == false 1)Dict.Add(accum) 2)Code.Add(index) 3)Accum  accum(end) 4)Return to (2) 3) Else: 1)Index = dict.Findwhere(accum) 2)Accum.Add(next letter from input) 3)Return to (2)

50 Dictionary Pruning Output LZW dictionary is a tree (TRIE). Output LZW dictionary is a tree (TRIE). Aim: keep the most popular words, but don ’ t allow undesired redundancy. Aim: keep the most popular words, but don ’ t allow undesired redundancy. Method: Method: –Go on every level of the tree (starting in max length words) and take predefined number of popular words. –Pass number of appearances (from output code) to parents: pass the sum of all, OR pass the sum of untaken. More variations: multiply by the entropy.

51 Using Entropy for Better Pruning [ ] [ ] [ ] [ ] [ ] *log(4) = 48 [ ] [ ] 40 40*log(1) = 0 [ ] [ ] [ ] [ ] [ ] *(-1)*[5/6*log(5/6) + 2*1/24*log(1/24) + 1/16*log(1/16)] = 20* =

52 Compression Rates Classification 1. Input: DNA of a chromosome and gene based dictionary 1. Input: DNA of a chromosome and gene based dictionary 2. 6 codons sequences for the 6 different reading frames 4. 6 compress rates vectors 6. 6 binary vectors + post processing data 6. 6 binary vectors + post processing data 8. 6 binary vectors – the final classification 8. 6 binary vectors – the final classification 5. Rf_wins = Argmax{compress_rate(rf),thresh) Lowerthresh = Argmax{compress_rate(rf),lower-thresh) Too_much_stops = 1 if window has more than 1 stop codon 3. Compressing with genes based dictionary 7. Post Processing

53 Post Processing Lower threshold technique: tag as true every window that is between close already-tagged windows, if value larger than the lower threshold. Lower threshold technique: tag as true every window that is between close already-tagged windows, if value larger than the lower threshold. Stop codons quantity in the window: more than one => not an exon-window (which is larger than analysis window size). Stop codons quantity in the window: more than one => not an exon-window (which is larger than analysis window size).

54 Compression Rates: Example

55 Stop Codons Usage 100,000b of 2 nd chromosome 100,000b of 2 nd chromosome 1 where there is one stop codon in the window, at most 1 where there is one stop codon in the window, at most

56 Post Processing: Stop-codon Usage Stop codon usage cleans up many potential false positives, without damaging any success measure Hence, a lower principal threshold can be determined and we ’ ll get better performance Without stop codon usage

57 Compression Rates: Results Learnt chromosome = 1 st, window size = 100c, dictionary size = 1381 (32 codons, branching = 3) Learnt chromosome = 1 st, window size = 100c, dictionary size = 1381 (32 codons, branching = 3) After choosing best configuration, going over all the chromosomes: After choosing best configuration, going over all the chromosomes: THRESH# miss# exonsf_n_exonsrf_truef_nf_p# total

58 Compression Rates: Improving Use non-exon dictionary, or prune exon- dictionary considering non-exon common words. Use non-exon dictionary, or prune exon- dictionary considering non-exon common words. Adaptive dictionary: when detecting an exon, use its common words to update the current dictionary. Adaptive dictionary: when detecting an exon, use its common words to update the current dictionary.