Discovering Structural Models Lecture 19
Structural Models in Science Structural models encode the spatial relationships among the components of some physical object. Examples of structural models include Bohr’s model of the atom; Watson and Crick’s double helix model of DNA; the composition and organization of molecules; and the geological strata of a particular region. The discovery of structural models often serves as a first step at explanation, moving beyond descriptive knowledge. The computational discovery of structural models has explored both historical cases and current needs.
Structural Discovery Systems There are several types of structural discovery systems, but Valdes-Perez (1993) provided a unified view of one set. STAHL discovers chemical compounds from reactions. DALTON discovers atomic models from chemical reactions. MECHEM discovers chemical reaction pathways. GELL-MANN discovers the structure of elementary particles. BR-3 discovers the properties of elementary particles. MENDEL discovers genotype interactions from phenotypes. One can view the search space of each system as a set of matrices that may grow in size as the search expands.
DALTON The DALTON system discovers the elemental structure of molecules by reasoning about reaction equations. Starting with (hydrogen oxygen ➞ water), DALTON can ultimately determine the atomic components: ({{h h} {h h}} {{o o}} ➞ {{h h o} {h h o}}) This system’s result asserts that hydrogen and oxygen molecules are diatomic; and hydrogen and oxygen molecules combine in a 2:1 ratio to produce 2 water molecules. DALTON arrives at its discoveries through heuristic search guided by knowledge available to 19 th century chemists.
DENDRAL The DENDRAL system discovers the chemical bonds in a molecule given its formula and mass spectrogram. From the formula C 6 H 5 OH and other relevant information, DENDRAL can produce structures such as HCHC HCOH HCHC CHCH CHCH C Like DALTON, DENDRAL relies on heuristic search to discover structural models. However, DENDRAL incorporates knowledge from 20 th century chemistry to guide its more extensive search.
GELL-MANN GELL-MANN discovers hidden structures within the context of particle physics. As input, GELL-MANN takes a collection of observed particles and their properties. As output, the system produces a set of components and combinations that map to the particles. For example, the system could consider a list of elementary particles, such as the baryon octet on the next slide. From this, it would conjecture the properties of quarks and map the baryons to various arrangements of the quarks. Zytkow and Fischer’s (1996) computational model of structure discovery forms the basis of GELL-MANN.
GELL-MANN Example Input:Output: particlechargeisospinstrange. p11/20 n0-1/20 Σ+Σ+ 11 Σ0Σ0 00 Σ-Σ- Ξ0Ξ0 01/2-2 Ξ-Ξ- -1/2-2 quarkchargeisospinstrange. u2/31/20 d-1/3-1/20 s-1/30 part.ch.iso.str.quarks 101uuu p11/20uud n0-1/20uus Σ+Σ+ 11udd Σ0Σ0 00uds Σ-Σ- uss -3/20ddd Ξ0Ξ0 01/2-2dds Ξ-Ξ- -1/2-2dss 0-3sss From information about the elementary particles, GELL- MANN can infer the standard quark model.
Sequence Assembly Systems DNA sequencing technologies reconstruct a complete genome by examining large quantities of DNA fragments. Sequence assembly systems read the gene sequence of each fragment as text and return the genome. Informatics tools such as ARACHNE and Celera Assembler address the sequencing problem by finding repeated fragments; searching for overlapping fragments; correcting errors; and joining overlapping fragments into contiguous regions. Several checks are made to ensure that the resulting structure is well supported by the data.
Pathway Tools An operon is a set of adjacent genes that are transcribed together to produce multiple proteins. Modern tools for bioinformatics support operon prediction in bacteria, which yields information about gene function. Pathway Tools, which powers BioCyc and EcoCyc, uses several factors to predict operons in bacterial genomes: the distance between genes, the direction of transcription, and the functional knowledge in the knowledge base. Researchers have applied operon prediction systems to a variety of genomes, but validation has been problematic. Experimental verification remains a necessary step.
Structural Modeling: Summary As we have seen, structural modeling tasks appear in a variety of scenarios and across scientific disciplines. In addition to the few cases we have discussed, current research on structural model creation includes identifying anatomical structure based on CT scans; and determining geological structure from seismic data. Pathway tools differs in that it uses classifiers to identify secondary structure in DNA sequences. SedSim, which infers models of geological structures, uses physics-based, dynamic models to build its models. However, most of the systems that we discussed carry out some form of heuristic search to build structural models.