Big Data Open Source Software and Projects ABDS in Summary IX: Level 11C I590 Data Science Curriculum August 15 2014 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Advertisements

Big Data Open Source Software and Projects ABDS in Summary I I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XIX: Layer 14B Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layers 3 to 4 Data Science Curriculum March Geoffrey Fox
Web Server Administration
The World Wide Web and the Internet Dr Jim Briggs 1WUCM1.
Big Data Open Source Software and Projects ABDS in Summary XVII: Layer 13 Part 2 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary VII: Level 10 I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XXI: Layer 15B Part 1 Data Science Curriculum March Geoffrey Fox
Intro to C# Language Richard Della Tezra IS 373. What Is C#? C# is type-safe object-oriented language Enables developers to build a variety of secure.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Big Data Open Source Software and Projects ABDS in Summary III: Layer 5-Part 1 Data Science Curriculum March Geoffrey Fox
SaaS, PaaS & TaaS By: Raza Usmani
Big Data Open Source Software and Projects ABDS in Summary XIV: Layer 11C Data Science Curriculum March Geoffrey Fox
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
SQLite BY Jordan Smith Brian Wetzel Chris Hull William Anderson.
Platform as a Service (PaaS)
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Cross Platform Mobile Backend with Mobile Services James
IT – DBMS Concepts Relational Database Theory.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
Open Source: It's Already Here Dave Cross Magnum Solutions Ltd
CSC2012 Database Technology & CSC2513 Database Systems.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Introduction to Cloud Computing
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.
Service Computation 2010November 21-26, Lisbon.
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XVIII: Layer 14A Data Science Curriculum March Geoffrey Fox
PHP Features. Features Clean syntax. Object-oriented fundamentals. An extensible architecture that encourages innovation. Support for both current and.
WEB DEVELOPMENT WITH PHP/MYSQL. WEB DEVELOPMENT COURSE HAS DIFFERENT NAME IN DIFFERENT INSITUTES, THIS IS A CORE COURSE FOR BS/MS STUDENTS. THIS IS ALSO.
Distributed Data Management Graeme Kerr Oracle in R&D Programme.
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Skill Area 214 Introduce World wide web(www)
MySQL An Introduction Databases 101.
Powered by Microsoft Azure, Auctori Is the Next Generation in Multilingual, Global, Search Engine Optimized Web Content Management Systems MICROSOFT AZURE.
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Introduction to Database Programming with Python Gary Stewart
What is Database  A database is an organized collection of data. The data is typically organized to model aspects of reality in a way that supports processes.
The Holmes Platform and Applications
PGT(CS) ,KV JHAGRAKHAND
Status and Challenges: January 2017
Open Source distributed document DB for an enterprise
LAMP, WAMP and.. L. Grewe.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
I590 Data Science Curriculum August
Data Science Curriculum March
Big Data Young Lee BUS 550.
Database Software.
Big Data Open Source Software and Projects ABDS in Summary I
Department of Intelligent Systems Engineering
Database Management Systems
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Microsoft Azure Services Platform
Convergence of Big Data and Extreme Computing
I590 Data Science Curriculum August
Presentation transcript:

Big Data Open Source Software and Projects ABDS in Summary IX: Level 11C I590 Data Science Curriculum August Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

HPC-ABDS Layers 1)Message Protocols 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: 6)DevOps: 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)SQL / NoSQL / File management: 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe 14)Basic Programming model and runtime, SPMD, Streaming, MapReduce, MPI: 15)High level Programming: 16)Application and Analytics: 17)Workflow-Orchestration: Here are 17 functionalities. Technologies are presented in this order 4 Cross cutting at top 13 in order of layered diagram starting at bottom SQL Technologies

Apache Derby Apache Derby is a relational database management system written in Java and based on the SQL and JDBC standards Derby offers a small footprint (~2.6 megabytes), an embedded JDBC driver, and is easy to deploy and use. Derby originated in 1996 as a startup out of Oakland, CA called Cloudscape Inc. Cloudscape was acquired by Informix and then later by IBM. IBM donated the code to Apache in 2004, creating the Derby incubator project. Derby is a subproject of Apache DB. Derby has been included as part of the Java API since the Java 7 release, rebranded as “JavaDB”. Typically used as an embedded database. Performance not competitive as a standalone system.

MySQL Popular GNU license SQL database or relational database management system (RDBMS), – Second in number of installations to SQLite as an open source RDBMS Now owned by Oracle with open source and supported versions. Part of LAMP which refers to archetypal model of web service solution stacks, originally consisting of four components: Linux, the Apache HTTP Server, the MySQL relational database management system, and the PHP programming language. – As a solution stack, LAMP is suitable for building dynamic web sites and web applications. Used in cloud architectures but not often as central storage engine but rather for “small” metadata and such Though MySQL began as a low-end alternative to more powerful proprietary databases, it has gradually evolved to support higher-scale needs as well. It is still most commonly used in small to medium scale single-server deployments, either as a component in a LAMP-based web application or as a standalone database server. Much of MySQL's appeal originates in its relative simplicity and ease of use, which is enabled by an ecosystem of open source tools such as phpMyAdmin. In the medium range, MySQL can be scaled by deploying it on more powerful hardware, such as a multi-processor server

PostgreSQL PostgreSQL is an open source high quality object-relational database ORDBMS with many similarities to MySQLhttp://en.wikipedia.org/wiki/PostgreSQL According to originally PostgreSQL was known for my features and MySQL for more performance and better ease of use but with time, the systems have become more similarhttp:// PostgreSQL is developed by the PostgreSQL Global Development Group, a diverse group of many companies and individual contributors. It is free and open source software, released under the terms of the PostgreSQL License, a permissive free software license. Michael Stonebraker, a distinguished Berkeley faculty member developed Ingres on which PostgreSQL (Post Ingres) is based

SQLite Public domain is a lightweight RDBMS designed to be used as a library (i.e. embedded) rather than a standalone serverhttp:// The browsers Google Chrome, Opera, Safari and the Android Browser all allow for storing information in, and retrieving it from, a SQLite database within the browser, using the Web SQL Database technology Mozilla Firefox and Mozilla Thunderbird store a variety of configuration data (bookmarks, cookies, contacts etc.) in internally managed SQLite databases, and even offer an add-on to manage SQLite databases. Skype is a widely deployed application that uses SQLite It used inside main smartphone O/S – Apple, Microsoft, Blackberry, Symbian, Android SQLite is ACID-compliant and implements most of the SQL standard, using a dynamically and weakly typed SQL syntax

Oracle Dominant commercial object-relational (objects, classes and inheritance are directly supported in database schemas and in the query language) database management system ORDBMS with reputation for high quality and high cost Started in 1977 when Larry Ellison and friends founded Software Development Laboratories (SDL). stems compares many other proprietary and open source systems stems Microsoft SQL Server, IBM DB2 and to a lesser extent Sybase and Teradata are other major commercial RDBMS Has all sorts of extensions such as spatial query support and all sorts of “editions” (Enterprise, Standard, Express) There is substantial debate comparing this classic approach to Hadoop based approaches like Hive which parallelize with greater performance Oracle supports ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. – Compare with Eventually consistent services which provide BASE (Basically Available, Soft state, Eventual consistency) semantics, – BASE gets inconsistent answers before convergence of multiple distributed updates

SciDB SciDB is an array database designed for multidimensional data management and analytics common to scientific, geospatial, financial, and industrial applications. – Arrays are natively supported including parallel operations on them – It is developed by company Paradigm4, co-founded by Michael Stonebraker of PostgreSQL fame. – License is Affero General Public License AGPL Key features include: – Support of provenance – Out of memory arrays – Massive scale math on the arrays for linear algebra and analytics. – Uncertainty can be modeled by associating error-bars with data. – Efficient storage. Partly motivated as a database community answer to Hadoop

Public Cloud SQL as a Service Provides traditional databases as a service on clouds Azure SQL Service us/library/azure/dn aspx based on SQL Serverhttp://msdn.microsoft.com/en- us/library/azure/dn aspx Google Cloud SQL based on MySQL Amazon Relational Database Service (Amazon RDS) with MySQL, PostgreSQL, Oracle and SQL Server