Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;

Slides:



Advertisements
Similar presentations
Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology
Advertisements

Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Some of these slides are based on material from the ACM Computing Curricula 2005.
Chapter 14 The Second Component: The Database.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
The Gamma Operator for Big Data Summarization
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes 1.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1: IntroductionData Management & Engineering1 Course Overview: CS 395T Semantic Web, Ontologies and Cloud Databases Daniel P. Miranker Objectives: Get.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Web Search Engines and Information Retrieval on the World-Wide Web Torsten Suel CIS Department Overview: introduction.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Introduction To Windows Azure Cloud
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
1 A K-Means Based Bayesian Classifier Inside a DBMS Using SQL & UDFs Ph.D Showcase, Dept. of Computer Science Sasi Kumar Pitchaimalai Ph.D Candidate Database.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
CS6530 Graduate-level Database Systems Prof. Feifei Li.
ISECON Feinstein1 IM’ LATE FROM THERE TO HERE – FROM HERE TO THERE A FEW BITS AND BYTES David Feinstein Educator of the year 2004.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
CS212: DATA STRUCTURES Lecture 1: Introduction. What is this course is about ?  Data structures : conceptual and concrete ways to organize data for efficient.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Medical Data Mining Carlos Ordonez University of Houston Department of Computer Science.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou.
Machine Learning in CSC 196K
Evacuating the Comfort Zone: (Via Curriculum Reform…)
1 Database Systems Group Research Overview OLAP Statistical Tests Goal: Isolate factors that cause significant changes in a measured value – Ex:
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Knowledge Discovery in a DBMS Data Mining Computing models and finding patterns in large databases current major challenge in database systems & large.
Book web site:
CPSC-310 Database Systems
Big Data, Data Mining, Tools
Intro to Machine Learning
Spark Presentation.
课程名 编译原理 Compiling Techniques
Hadoop Clusters Tess Fulkerson.
Big Data Analytics in Parallel Systems
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Introduction to Spark.
DBMS Group Overview of our research
Declarative Creation of Enterprise Applications
Parallel Analytic Systems
Overview of big data tools
Chapter 11 Database Performance Tuning and Query Optimization
Query Optimization.
Wellington Cabrera Carlos Ordonez
The Gamma Operator for Big Data Summarization
Wellington Cabrera Advisor: Carlos Ordonez
Welcome! Knowledge Discovery and Data Mining
Wellington Cabrera Advisor: Carlos Ordonez
Carlos Ordonez, Javier Garcia-Garcia,
The Gamma Operator for Big Data Summarization on an Array DBMS
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.
CPSC-608 Database Systems
CMPT 120 Lecture 26 – Unit 5 – Internet and Big Data
Presentation transcript:

Database Systems Carlos Ordonez

What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms; RAM-efficient data structures at two storage levels Efficiency? Desirable O(n) I/O Hardware? Small computer, single server, parallel DBMS server, parallel cluster; 1 disk, RAID Infrastructure? DBMS, parallel system Boring? Theory+programming

Database systems research today Transaction processing? done Efficient querying? done Fast external algorithms? Simple tasks. Parallel computation? Well proven DBMS shared- nothing, but still many challenges (big data). Exploiting new hardware? Difficult, low level Analyzing? Most difficult: data mining, statistics Future? Big data

DB Systems involves Core CS research: Theory+Programming Theory we use: –Time complexity, I/O cost models –Large data structures; especially external –Relational model is here to stay –Multivariate statistics, machine learning, discrete math –Numerical methods: linear algebra, optimization –Compilers: parsing/compiling/optimizing code; recursion Programming (even some hacking): –Systems in a broad sense –Languages: C, C++; efficiency, pointers, legacy systems code; Java, C# mainly for portability –Numerical libraries like LAPACK, OS thread libraries –DBMS SQL UDFs API with C, C++, C#

Research topics GOAL: Integrating statistical and machine learning algorithms with a DBMS (external algorithms, queries, UDFs) Difference with machine learning algorithms: Size, external algorithms (small RAM), queries, low level optimization, generally simpler models Main topics by students: –Zhibo Chen: OLAP cubes, parametric statistical tests, cube ops on flash memory –Mario Navas, Naveen Mohanam: Singular Value Decomposition for PCA and ML Factor Analysis, data summarization on multicore CPUs –Carlos Garcia-Alvarado: keyword search across docs and db, ranking, query recommendation –Sasi Pitchaimalai: Bayesian classification, multithreaded summarization –Wellington Cabrera: stochastic search variable selection on high dimensional data, SVD on high-d data –David Matusevich: Hybrid EM and MCMC mixture models on large data sets, database transformations for data mining

Representative problems OLAP cubes Finding predictive association rules Bayesian classification Cluster, PCA and regression

Why is our database systems research “cool”? Theory+Programming Optimization, O(f(n)), systems (external data structures, discrete math, compiler, OS) Goes from hardware-level stuff (multi-core, cache memory), to high-level query optimization in SQL Database systems techniques are used in search engines like Google and Yahoo (and vice-versa) DBMS technology used everywhere

Why join DBMS group? Balance between theory (math) and programming We target “DB systems” conferences: ACM SIGMOD and “IR/DM” conferences ACM CIKM (IR+DB+DM) Mature and stable CS research area Job/internship: many opportunities in DBMS and search engines; Job security on any large company Visit my web page, DBLP. Google “Ordonez SQL”