Welcome from Optima Systems COSMOS performance improvements Paul Grosvenor Deerfield Beach 2013 Tuesday October 22nd.

Slides:



Advertisements
Similar presentations
Multi-user and internet mapping. Multi-user environments Simple file server solution, LAN (Novel, Windows network) View from everywhere, edit from one.
Advertisements

Emre Yenier Rethinking Database System Architecture: Towards a Self-tuning RISC-style Database System.
Generating Medical Logic Modules for Clinical Trial Eligibility Craig Parker Brigham Young University.
Parallelization and Tuning. Rough Timetable 09:00-09:05 Introduction 09:10-09:50 Dyalog Tuning Tools (Jay Foad) Break 10:00-10:45 Parallel Each (Michael.
Introduction to Model-View-Controller (MVC) Web Programming with TurboGears Leif Oppermann,
Chapter 9 Imperative and object-oriented languages 1.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
Caching the MDSPlus Data via Hibernate By Ajith M Jose Comp6703 Project Client: Raju Karia Supervisor: Dr. Henry Gardner (Development of “WebScope”)
Distributed components
Clarity Educational Community Clarity Educational Community Creating and Tuning SQL Queries that Engage Users.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.
Generating Medical Logic Modules for Clinical Trial Eligibility Craig Parker Brigham Young University.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
ChartWare Product Tour. Upon Log-in, view all reminders. Items in red are past-due. Use this for “internal messaging”
From Item To Information SIR Databases in BGU School of Medicine David de Leeuw Ben Gurion University of the Negev June 2001.
Introduction to Object-oriented Programming CSIS 3701: Advanced Object Oriented Programming.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
 Introduction Introduction  Purpose of Database SystemsPurpose of Database Systems  Levels of Abstraction Levels of Abstraction  Instances and Schemas.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
 Embeds Java code  In HTML tags  When used well  Simple way to generate dynamic web-pages  When misused (complex embedded Java)  Terribly messy.
Meet with the AppEngine Márk Gergely eu.edge. What is AppEngine? It’s a tool, that lets you run your web applications on Google's infrastructure. –Google's.
Goodbye rows and tables, hello documents and collections.
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
Oct 26, 2005 CDT DOM Roadmap Doug Schaefer. Parser History  CDT 1.0 ► JavaCC based parser  Used to populate CModel and Structure Compare ► ctags based.
Domain Name System. CONTENTS Definitions. DNS Naming Structure. DNS Components. How DNS Servers work. DNS Organizations. Summary.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley.
Netprog: Java Intro1 Crash Course in Java. Netprog: Java Intro2 Why Java? Network Programming in Java is very different than in C/C++ –much more language.
DataFlex Connectivity Kit for Pervasive.SQL Eddy Kleinjan.
Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD
A NoSQL Database - Hive Dania Abed Rabbou.
Crystal And Elliott Edward M. Kwang President. Objective A brief demo of Crystal Report to entice you –People spend thousand of dollars to attend Crystal.
Three Blind Mice Dyalog User Conference 17 th October 2012 Paul Grosvenor Managing Director.
Creating Web Documents: How the Web works Client / Server Protocols Access methods Homework: Complete experiment & report on Discussion Forum.
Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: Monday, Wednesday 2:30 pm – 3:30.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Optimisation across networks Dyalog User Conference 18 th October 2012 Paul Grosvenor Managing Director.
3 Data. Software And Data Data Data element – a single, meaningful unit of data. Name Social Security Number Data structure – a set of related data elements.
The Module Road Map Assignment 1 Road Map We will look at… Internet / World Wide Web Aspects of their operation The role of clients and servers ASPX.
Welcome to Technology Michael Cox October 20, 2015 Do now: Open your typing test data file Take a three minute typing test at Typingtest.com (Aesop test)
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Domain Name System INTRODUCTION to Eng. Yasser Al-eimad
Basics of the Domain Name System (DNS) By : AMMY- DRISS Mohamed Amine KADDARI Zakaria MAHMOUDI Soufiane Oujda Med I University National College of Applied.
Lab 301 Populating Template Data from a Third Party Data Source Justin Pava, Software Release Manager Andrew Schoonmaker, Software QA Engineer.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Visual Studio Database Tools (aka SQL Server Data Tools)
Windows Azure SQL Federation
Unit 5: Providing Network Services
Introduction to NewSQL
IsoveraDL Performance Enhancements
Welcome hp printer support contact phone number
CS 457 – Lecture 10 Internetworking and IP
September 11, Ian R Brooks Ph.D.
Searching CSCE 121 J. Michael Moore.
Database Implementation Issues
Visual Studio Database Tools (aka SQL Server Data Tools)
IBM Storwize V7000F for Electronic Health Records
DATABASE IMPLEMENTATION ISSUES
Go, go Query Store! Gail Shaw.
Database Systems (資料庫系統)
Database Implementation Issues
LAW/531T BUSINESS LAW 2019 The Latest Version // law531uop.com
Presentation transcript:

Welcome from Optima Systems COSMOS performance improvements Paul Grosvenor Deerfield Beach 2013 Tuesday October 22nd

The Problem Lots and lots of data (568Tb largest encountered so far) Even today the traditional researcher works, thinks and reports in 2D Analysis based on assumptions which hide meaning Outdated protocols Federated (composite) database

What is COSMOS Largely written in APL Data visualisation tool Top down view of the data lake It has been described as a Thesis generator Currently targeted at US electronic medical records (EMR data) Built in “canned queries” – e.g. survivability

COSMOS version 1

More Problems Scalability Security Performance Got to be Sexy

COSMOS now

Some Solutions to the COSMOS Problem Much help from Dyalog – and APL of course Caching enquiries Mapped Files Flash client side interface Syncfusion Special Casing vs generalisation Refactoring

drug←23 patients←( ) ( ) ( ) drug=patients A typical example

seed←1000?1000 counts←?nubs ⍴ items vec←counts ⍴ ¨ ⊂ seed :For x :In ⍳ 100 a←100=¨vec b←( ⊂ 100)=¨vec c←100 ∘ =¨vec d←100 ⍷ ¨vec e←( ⊂ 100) ⍷ ¨vec f←100 ∘⍷ ¨vec :If ∧ /a ∘ ≡¨b c d e f :Continue :Else ∘ :EndIf :EndFor A simple test

vectorsitems100=vec [x=nVectors] timings

23=¨( ) ( ) ( ) ( ⊂ 23)=¨( ) ( ) ( ) ∘ =¨( ) ( ) ( ) ⍷ ¨( ) ( ) ( ) ( ⊂ 23) ⍷ ¨( ) ( ) ( ) ∘⍷ ¨( ) ( ) ( ) [x f nVectors] timings

vectorsitems100=¨vec ( ⊂ 100)=¨vec100 ∘ =¨vec100 ⍷ ¨vec( ⊂ 100) ⍷ ¨vec100 ∘⍷ ¨vec [x f nVectors] timings

vectorsitems100=¨vec ( ⊂ 100)=¨vec100 ∘ =¨vec100 ⍷ ¨vec( ⊂ 100) ⍷ ¨vec100 ∘⍷ ¨vec

[x f nVectors] timings

23=( ) ( ) ( ) =(,23) ∘⍳ ¨( ) ( ) ( ) [x ⍳ y] Example

vectorsitems100=vec x ⍳ y [x ⍳ y] Example

bool← ⍴ 0 bool[index]←1 int← ⍴⍳ 10 int[index]←1 Index Assignment

indicesbool[index]←1int[index]←

Index Assignment

bool←items ⍴ bool= bool< bool≤ Boolean Operations

itemsbool=0bool<1bool≤ Boolean Operations

Generalisation or Special Casing Up to 10x speed-up Be aware of your data Caching of previous queries Lots faster Mapped Files Much better memory handling Data shared across processes Up to 1.5x speed-up So What ?

Version 1 analysis – 20 million records – 15 minutes (DCF files and integer pointers) Version 2 analysis – 50 million records – 3 minutes (Mapped files and Boolean masks) Version 3 analysis – 150 million records – 45 seconds Latest version - >300 million records – circa 30 seconds n.b. SQL and federated dataset pool – 2 weeks A Case in Point

Thank You and Questions Contact us: Optima House, Mill Court, Spindle Way, Crawley, West Sussex RH10 1TT Tel: Fax: