1 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation and Acquisition from Text Yolanda Gil Varun Ratnakar.

Slides:



Advertisements
Similar presentations
RNA and Protein Synthesis
Advertisements

NUCLEIC ACIDS : DNA and RNA Nucleic Acids Very Large, Complex, DNA & RNA Store Important Info in the Cell. (Genetic Information) ATP is an energy carrier.
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Varun Ratnakar Norm Tubman USC/Information Sciences Institute.
Nucleic Acids The amino acid sequence of a polypeptide is programmed by genes. Genes consist of DNA, which is a polymer belonging to the class of compounds.
Nucleic Acids.
Transcription & Translation Biology 6(C). Learning Objectives Describe how DNA is used to make protein Explain process of transcription Explain process.
What makes you look like your parents? Your parents passed down their DNA to you. What’s carried in your DNA that gives you your traits & characteristics?
Protein Synthesis Ordinary Level. Lesson Objectives At the end of this lesson you should be able to 1.Outline the steps in protein synthesis 2.Understand.
Transcription and Translation… Its what make you, YOU!
RNA and Protein Synthesis
SC.L.16.3 Describe the basic process of DNA replication and how it relates to the transmission and conservation of the genetic information.
Hon. Biology Period 6. Nucleic Acids Nucleic acids are large complex organic molecules composed of carbon, oxygen, hydrogen, nitrogen, and phosphorus.
Chapter 10 – DNA, RNA, and Protein Synthesis
Biology 10.1 How Proteins are Made:
Do Now Why is it important to learn about DNA and how can DNA be used to help people? NUA Notebook Check Today.
1 USC INFORMATION SCIENCES INSTITUTE Modeling and Using Simulation Code for SCEC/IT Yolanda Gil Jihie Kim Varun Ratnakar Marc Spraragen USC/Information.
Protein Synthesis Pages Part 3. Warm-Up: DNA DNA is a double stranded sequence of ___________ (smallest unit of DNA). 2.Short segments of.
RNA Use this power point to help you complete notes for interactive notebook.
RNA & Protein Synthesis. DNA Determines Protein Structure The genetic information that is held in the molecules of DNA ultimately determines an organism’s.
End Show Slide 1 of 39 Copyright Pearson Prentice Hall 12-3 RNA and Protein Synthesis RNA and Protein Synthesis.
Chapter 11 DNA Within the structure of DNA is the information for life- the complete instructions for manufacturing all the proteins for an organism. DNA.
Molecular genetics 30 September 2010 Donald Winslow.
Unit 4: Molecular Genetics Left sidePg #Right SidePg # Unit Page58Table of contents59 Double Bubble60C.N. – DNA & RNA Structure 61 DNA & RNA Coloring62.
LO: SWBAT describe the connection between DNA and proteins DN: What is a protein? What are the building blocks of proteins? HW: Castle Learning- DNA.
Nucleic Acids.
Macromolecules part 4 dna. Central Dogma of Biology DNA  mRNA  protein DNA TRANSCRIBES to mRNA – What does transcribe mean??? – To COPY – Process is.
DNA Structure and Protein Synthesis (also known as Gene Expression)
Nucleic acids: the code of life The next class of biological molecules, nucleic acids, are the information-bearing “code of life”. Like proteins, nucleic.
NUCLEIC ACIDS. The four major classes of macromolecules are: Carbohydrates Proteins Lipids Nucleic acids.
Chapter 5 Part 5 Nucleic Acids 1. The amino acid sequence of a polypeptide is programmed by a discrete unit of inheritance known as a. A gene is a segment.
Transcription Objectives: Trace the path of protein synthesis.
Gene Expression Gene: contains the recipe for a protein 1. is a specific region of DNA on a chromosome 2. codes for a specific mRNA.
DNA WATSON AND CRICK DOUBLE HELIX DNA FUNCTION CONTROL CELL FUNCTIONS STORE HEREDITARY INSTRUCTIONS COPY THOSE HEREDITARY INSTRUCTIONS TO A NEW.
Nucleic Acids Nucleic acids provide the directions for building proteins. Two main types…  DNA – deoxyribonucleic acid  Genetic material (genes) that.
DNA Deoxyribose Nucleic Acid – is the information code to make an organism and controls the activities of the cell. –Mitosis copies this code so that all.
+ Protein Synthesis. + REVIEW: DNA plays 2 essential roles in organisms: #1: Allows cells to reproduce. How? DNA replication allows cells to pass along.
DNA. Unless you have an identical twin, you, like the sisters in this picture will share some, but not all characteristics with family members.
RNA and Protein Synthesis Chapter How are proteins made? In molecular terms, genes are coded DNA instructions that control the production of.
Placed on the same page as your notes Warm-up pg. 48 Complete the complementary strand of DNA A T G A C G A C T Diagram 1 A T G A C G A C T T A A C T G.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
RNA and Transcription. Genes Genes are coded DNA instructions that control the production of proteins within the cell To decode the genetic message, you.
Nucleic Acids DNA & RNA.
Chapter 10 – DNA, RNA, and Protein Synthesis
CHAPTER 5 GENETIC CONTROL Leonardus, S.Si..
What is a genome? The complete set of genetic instructions (DNA sequence) of a species.
LO: SWBAT describe the connection between DNA and proteins
RNA Ribonucleic Acid Single-stranded
Aim: What is the connection between DNA & protein?
Mrs. Wharton’s Science Class
DNA Structure & Function
The Chemical Building Blocks of Life
Nucleic Acids Section 3.5.
Lec2 م. م مياسه مثنى.
Chapter 3 The Double Helix.
Transcription.
DNA and the Production of Proteins
DNA & Protein Synthesis
The Structure & Function of DNA, RNA, and protein.
DNA and RNA Structure and Function
What is RNA? Do Now: What is RNA made of?
RNA and Transcription DNA RNA PROTEIN.
DNA & RNA Notes Unit 3.
Objective: Students will be able to identify the monomers of nucleic acids and their characteristics Students will distinguish differences between RNA.
Unit 5: DNA, RNA and Protein Synthesis
REVIEW DNA DNA Replication Transcription Translation.
Making Proteins Transcription Translation.
Unit Animal Science.
Nucleic Acids.
Learning Objectives Learn the Base Pairs of DNA
Unit 3: Genetics Part 1: Genetic Informaiton
Presentation transcript:

1 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation and Acquisition from Text Yolanda Gil Varun Ratnakar trellis.semanticweb.org USC/Information Sciences Institute

2 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Motivation: How KBs Are Built Today Knowledge Acquisition Tools Read/ask /study/listen... …reason/deduce/solve …analyze/group/index... …structure/relate/fit... KB Domain Expert Knowledge Engineer

3 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Motivation: The Aftermath of Knowledge Base Development Knowledge Acquisition Tools …reason/deduce/solve Read/ask /study/listen... …analyze/group/index... …structure/relate/fit... KB Domain Expert Knowledge Engineer TRASH

4 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Motivation: Capturing the Design of Knowledge Bases ((( )) ()))) Richer representations More ambiguous More versatile (defconcept bridge ())) More formal More concrete More introspectible Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,... Info. extraction templates, dialogue segments and pegs, filled-out forms, high-level connections,... Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat) Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps,... WWW

5 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Claims Knowledge can be reused at any level of (in)formality Knowledge can be extended more easily Addt’l documents and semi-formal structures readily available Knowledge can be translated and integrated at any level to facilitate interoperability KR languages can be a straitjacket for some kinds of knowledge Intelligent systems will provide better justifications Many users want to know where axioms came from before they trust system’s reasoning Content providers will not need to be sophisticated programmers/knowledge engineers May be easier for end users to organize knowledge rather than formalize it Good symbiosis of sophisticated and unsophisticated users

6 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT An Example: Building a Knowledge Base from a Textbook (DARPA Rapid Knowledge Formation -- RKF) “…The first step a cell takes in reading out part of its genetic instructions is to copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen- bonding with A, the base-pairing properties described for DNA also apply to RNA…” -- Essential Cell Biology, Alberts et al. 1992

7 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Protein Synthesis in RKF’s SHAKEN Authored by a Biologist [Chaudri et al 2001]

8 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Step 1: Selecting Relevant Knowledge Fragments “…The first step a cell takes in reading out part of its genetic instructions is to copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen- bonding with A, the base-pairing properties described for DNA also apply to RNA…” -- Essential Cell Biology, Alberts et al. 1992

9 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Step 2: Composing Stylized Knowledge Fragments - ribose - it is a kind of sugar, like deoxyribose - it is contained in the nucleotides of RNA - uracil - it is a kind of nucleotide, like adenine and guanine - it can base-pair with adenine - RNA - it is a kind of nucleic acid, like DNA - it contains uracil instead of thymine - it is single-stranded - it folds in complex 3-D shapes - nucleotides are linked with phospohodiester bonds, like DNA - there are many types of RNA - RNA is the template for synthesizing protein - its nucleotides contain the sugar ribose (DNA has deoxyribose) - gene - subsequence of DNA that can be used as a template to create protein - protein synthesis - non-destructive creation process: RNA and protein created from DNA - its speed is regulated by the cell - substeps: (ordered in sequence) 1) RNA transcription - a DNA fragment (a gene) is copied, just like DNA is copied during DNA synthesis - the result is an RNA chain 2) protein translation - RNA is used as a template

10 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Step 3: Creating Knowledge Base Items … (defconcept uracil :is-primitive nucleotide :constraints (:the base-pair adenine)) (defconcept RNA :is (:and nucleic-acid (:some contains uracil))) …

11 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation and Acquisition from Text User starts with documents, extracts a small amount of information from them Text contains significant portions for context/reference/recall IKRAFT allows users to annotate text with statements, expressed in natural language Highlight portions of original text, annotate statement Statements tend to be stylized Statements are parsed, system generates summary of: Objects Events/actions

12 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Annotating Manual Information Extraction

13 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Extracting Statements from Complementary/Contradictory Text Sources

14 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Documenting Seismic Hazard in Southern California

15 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Seismic Hazard Analysis (SHA) for Southern California Earthquake Center (SCEC)

16 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT DOCKER: Scientist Publishes SHA Models SCEC ontologies AS97 msg types AS97 ontology constrs docs User specifies: Types of model parameters Format of input messages Documentation Constraints User Interface Constraint Acquisition Model Specification DOCKER Web Browser Wrapper Generation (WSDL, PWL) AS97

17 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Documenting the Model with IKRAFT

18 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Documenting Each Constraint

19 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Formalizing Simple Constraints

20 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Documentation of Constraints (Some Are Formalized, Some Are Not)

21 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT DOCKER: Engineer Uses SHA Model User Interface Shared ontologies AS97 msg types AS97 ontology constrs docs Constraint Reasoning User can: Browse through SHA models Invoke SHA models Get help in selecting appropriate model KR&R (Powerloom) Model Reasoning Pathway Elicitation DOCKER Web Browser AS97

22 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT DOCKER Detects Constraint Violations

23 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Should Engineer Override Constraint Specified by Model Developer?

24 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Engineer Brings Up IKRAFT to Find Reasons for the Constraint

25 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Engineer Can Check Additional Model Constraints (Not Formalized)

26 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Constraints Grounded on Model Documentation

27 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Engineers Makes an Informed Decision on Whether to Override the Constraint

28 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Discussion Overhead in capturing the rationale? Related to motivation and payoff Rationale here is captured in a very simple process Related Work: Documenting design rationale [Shum 96] Methodologies for knowledge base development [Schreiber et al 00] Higher-level languages, e.g., KARL [Fensel et al 98]

29 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Conclusions and Future Work IKRAFT helps users document formal expressions Each formal expression is back up by a concise NL statement that is linked back to one or more sources Users can understand justification for system’s reasoning (e.g., SHA) Future work: NLP techniques to extract terms from user’s concise statements Controlled grammar for formulation of statements Other documentation: e.g., tables, forms, exceptions High payoff in capturing the rationale of knowledge bases

30 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT Speculation: Will the (Semantic) Web End Up Looking Like This? ((( )) ()))) Richer representations More ambiguous More versatile (defconcept bridge ())) More formal More concrete More introspectible Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,... Info. extraction templates, dialogue segments and pegs, filled-out forms, high-level connections,... Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat) Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps,...