The Need for Speed. The PDF-4+ database is designed to handle very large amounts of data and provide the user with an ability to perform extensive data.

Slides:



Advertisements
Similar presentations
Database vocabulary. Data Information entered in a database.
Advertisements

Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
M. Mateen Yaqoob The University of Lahore Spring 2014.
The Assembly Language Level
Chapter 12 Memory Organization
Click Here to Begin. Objectives Purchasing a PC can be a difficult process full of complex questions. This Computer Based Training Module will walk you.
Data Mining Tools Sorted Displays Histograms SIeve.
Synchrotron Diffraction. Synchrotron Applications What? Diffraction data are collected on diffractometer beam lines at the world’s synchrotron sources.
Quantitative Analysis Reference Intensity Ratio (RIR)
Electron Diffraction Applications Using the PDF-4+ Relational Database.
T.G. Fawcett, S. N. Kabbekodu, F. Needham, J. R. Blanton, D. M. Crane, J. Faber International Centre for Diffraction Data Using PDF-4+/Organics to discover.
Timothy G. Fawcett, Soorya N. Kabbekodu, Fangling Needham and Cyrus E. Crowder International Centre for Diffraction Data, Newtown Square, PA, USA Experimental.
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
Web Server Hardware and Software
Chapter 8 File organization and Indices.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Advanced Identification Tools. This tutorial will demonstrate how a user can increase both the speed and efficiency of the material identification process.
Access Tutorial 3 Maintaining and Querying a Database
Creating a Calibration Measurement Monitoring System for Many, Ever-changing, Complex Instruments John Wilson Software QA Engineer Agilent Technologies,
COMPUTER CONCEPTS.
Electron Diffraction Search and Identification Strategies.
Lab 8 – C# Programming Adding two numbers CSCI 6303 – Principles of I.T. Dr. Abraham Fall 2012.
PDF-2 Tools and Searches. PDF-2 Release 2009 Using DDView The PDF-2 Release 2009 database requires retrieval software, such as ICDD’s DDView or developers’
WHAT IS VRAY? V-ray is a rendering engine that is used as an extension of certain 3D computer graphics software. The core developers of V-Ray are Vladimir.
Data Mining with DDView+ and the PDF-4 Databases Solid Solution Crystal Cell Parameters Some slides of this tutorial have sequentially-layered information.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Computing hardware CPU.
TI-Nspire™ 3.9 What’ New. Introduction Release Overview: o Scope o Supported Configurations o “Must-Know” Information Hands-On Activity Question & Answer.
SIeve+ Introduction SIeve+ is a Plug-In module to the DDView+ software which is integrated in the PDF-4 products. SIeve+ is licensed separately at an additional.
Advanced Searches Using History Advanced Searches What? For a given session, a list of Standard Format Past Searches is automatically saved each time.
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
NATIONAL GEOSPATIAL-INTELLIGENCE AGENCY NGA DNC NGA DNC DNC Bulletin Improvements.
Optimizing multi-pattern searches for compressed suffix arrays Kalle Karhu Department of Computer Science and Engineering Aalto University, School of Science,
The Subfile Search In this tutorial, the following Topics are covered: 1. What is a Subfile? 2. How is a Subfile defined? 3. Learn more about a Subfile.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
1 SIeve+ Introduction SIeve+ is a Plug-In module to the DDView+ software which is integrated in the PDF-4 products. SIeve+ is licensed separately at an.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
“Live” Tomographic Reconstructions Alun Ashton Mark Basham.
August 2005 TMCOps TMC Operator Requirements and Position Descriptions Phase 2 Interactive Tool Project Presentation.
Data Mining with DDView+ and the PDF-4 Databases Carbamazepine Polymorphs Some slides of this tutorial have sequentially-layered information that is best.
Fundamentals of Rietveld Refinement III. Additional Examples
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
Digital Pattern Simulations. Pattern Simulations.
Collections Management Museums EMu Searching EMu Searching Explained (What’s going on under the hood!) Bernard Marshall Chief Technical Officer KE Software.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
PDF-4 Tools and Searches. PDF The PDF database is powered by our integrated search display software, DDView+. PDF boasts 53 search.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
T HE G AME L OOP. A simple model How simply could we model a computer game? By separating the game in two parts: – the data inside the computer, and –
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Info Read SEGY Wavelet estimation New Project Correlate near offset far offset Display Well Tie Elog Strata Geoview Hampson-Russell References Create New.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
XRD data analysis software development. Outline  Background  Reasons for change  Conversion challenges  Status 2.
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
The TDR Targets Database Prioritizing potential drug targets in complete genomes.
Track-While-Scan (TWS)
CHARACTERIZATION OF THE STRUCTURE OF SOLIDS
Chapter 2 Memory and process management
Knut Kröger & Reiner Creutzburg
ICDD Release 2008 New Features
A similarity index for comparing diffraction patterns
Specifications Clean & Match
Applying principles of computer science in a biological context
Just In Time Compilation
Presentation transcript:

The Need for Speed

The PDF-4+ database is designed to handle very large amounts of data and provide the user with an ability to perform extensive data mining. The database also incorporates many sophisticated algorithms to calculate, analyze and display diffraction data. By design, speed is sacrificed whenever there is a trade-off between speed and capability. However speed improvements are made annually and this tutorial gives many helpful hints on how to improve speed for any database product.

Speed Gobblers (Speed bumps) Sophisticated on-the fly calculations Inadequate computer hardware Sorts and searches producing or using very large tables

Speed Accelerators (Turbo) Smaller data sets Targeted and selective (smaller)searches producing small tables

Speed Basics The processor and system memory control the overall speed of the database and searches that use the database. If you exceed the specifications the system will work faster, if you do not meet the specifications certain operations will work very slowly or not work at all.

Search Speeds There are several search speeds given in this presentation. These were given to present the reader with a concept of relative speed for several types of searches. Your search times may vary depending on the exact computer specifications of your PC. The PC used in these tests was 32-bit with a Vista™ operating system 2 GB RAM and 2.00 GHz dual core processor.

How fast is this spinning ? The shark should swim in a full circle in under one second. The diffraction patterns should be smooth and continuously drawn.

Speed Basics All Release 2006 and later PDF-2 and PDF-4+ databases use Sybase as a database platform and use JAVA as a software platform. iAnywhere, the producer of Sybase, ICDD, and JAVA all continuously upgrade their systems to improve search speeds. You should see faster speeds with each product release, if you do a comparative analysis.

Database Search Options Search Options Release 2009 PDF Searches, 291,440 Data Sets PDF-2 49 Searches, 218,610 Data Sets PDF-4/Organics 48 Searches, 370,844 Data Sets Searches can be combined. Display Options PDF Display Fields PDF-224 Display Fields Default page displays 8 fields. This can be reduced to 1 or expanded to all fields. Search Display

Search Speed – Size of the Search The larger the search, the slower the speed. Display all fields (85) on all Inorganic materials [ 85 X 262,365 = 22,301,025 Fields in the table] Search takes a minute Display default fields (8) on all Inorganic Materials [ 8 x 262,365 = 2,098,920 Fields in the table] Search takes ~ 23.8 Seconds Display PDF#, chemical and mineral name for all Zeolites [ 3 x 3,155 = 9,465 Fields in the table ] Search takes ~ Blink of the eye (0.4 Seconds)

History The history panel tracks the search speed. Searches 3 to 5, displayed 3 fields for each entry so the time is directly related to the number of entries “hit”. Search 6 at 94.4 seconds was identical to search 5 (8.8 seconds), except that 44 fields were displayed instead of 3. No. of entries Search speed

Large Calculations Slow Things Down Integral Index Integral Index – does a point by point comparison of imported diffraction patterns in comparison to dynamically generated experimental and calculated patterns (see tutorial on integral index). Each pattern is thousands of points, so if large collections of patterns are used in the calculation, then the integral index calculation takes time. Digital Pattern Calculations For digital patterns, three algorithms are used depending on the type of data available in the reference material. The most resource-intensive algorithm is used by calculations of patterns from atomic coordinates. Calculation of a single pattern is done in less than a second. Calculation of thousands of patterns takes minutes.

Results Form In this example, six entries were highlighted and a “click” on the right hand button of the mouse produces all six patterns, nearly instantaneously.

Digital Patterns This calculation of 77 superimposed explosives patterns took several seconds. The more entries in the calculation, the longer it takes.

Capability Trade Offs Database searches are frequently used for data mining. Many data mining examples are provided in the tutorials. A preferred technique in data mining is to use broad search parameters, analyze the results and then apply more restrictions as you find materials of interest. For example, a researcher might analyze all zirconia-containing materials, then focus on yttria or ceria stabilized zirconia and then reduce the candidate list to tetragonally stabilized zirconias, if they were studying cutting tool compositions. To be effective in the trade-off between capability and speed, one might want to do computationally intensive calculations (i.e. pattern calculation, Integral index) in the later stages of data mining, when the candidate list has been narrowed.

Preference Options in SIeve+

SIeve+ This is the preferences Table for SIeve+. There are several options that can increase or decrease the search speed. Pattern GOM and Integral Index both involve time-consuming calculations. To increase speed, Toggle these off by “point and click”.

Search and Match Windows and GOM (Goodness of Match) The wider the search and match windows, the more candidates are reviewed and selected. The more data being compiled for display, the slower the speed. Search WindowMatch WindowGOM LimitNo of CandidatesTime ,8855 sec < 1 sec <<1 sec ,136< 1sec ,4505 sec ,1761 sec This pattern has 118 peaks identified for match. Note: In this example pattern, GOM and integral index calculations were turned OFF.

Integral Index and Pattern GOM Search Match GOM No of WindowWindowLimitCandidatesTimePreferences ,1761 secNone ,1765 minutesPattern GOM ,17650 minutesIntegral Index This pattern has 118 peaks identified for match. This is the same example as the previous slide. However, in the second case, the pattern GOM was being calculated for all candidate materials, this took 5 minutes. The integral index was calculated for 4,176 materials and this took 50 minutes.

Capability Trade Offs Large search and match windows are required when you suspect that you have poor quality data or a high chance of sample displacement errors. If you have high quality data (standardized data sets), you should be narrowing both these windows. GOM, Goodness of Match, calculations are based on a scale of Using a high GOM limit (i.e. >4000) means that you will be able to identify major phases, but you are unlikely to identify minor and trace phases where you may only have a few characteristic d-spacings above noise levels. A good strategy is to lower the GOM limit when you are looking for minor phases. Integral index, while computationally complex, is a basic similarity index that can identify materials independent of the material crystallinity. The search can also be adapted to defined crystallite size ranges. However, if the unknowns are highly crystalline, this calculation can be a significant detriment to speed and productivity, we recommend that you turn it off with highly crystalline data sets.