© 2013 IBM Corporation Implement high-level parallel API in JDK Richard Ning – Enterprise Developer 1 st June 2013.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
My AmeriCorps Release 3 State Commissions and Programs User Roles and Management – Implementing Presentation developed for the Corporation for National.
© 2008 Oracle Corporation – Proprietary and Confidential.
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Analysis of Computer Algorithms
1 Multithreaded Programming in Java. 2 Agenda Introduction Thread Applications Defining Threads Java Threads and States Examples.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
September 2013 ASTM Officers Training Workshop September 2013 ASTM Officers Training Workshop Membership & Roster Maintenance September 2013 ASTM Officers.
Membership & Roster Maintenance Officers Training Workshop September 2012 Kevin Shanahan 1.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Introduction to Algorithms 6.046J/18.401J
IBM Rational Team Concert
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
1 Functional Programming Lecture 8 - Binary Search Trees.
ZMQS ZMQS
Micro Focus Research 1 As far as youre aware, how does your organization plan to drive business growth over the next three years? (Respondents' first choices)
© 2009 IBM Corporation iEA16 Defining and Aligning Requirements using System Architect and DOORs Paul W. Johnson CEO / President Pragmatica Innovations.
CS16: Introduction to Data Structures & Algorithms
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Configuration management
Web Performance Tuning Lin Wang, Ph.D. US Department of Education Copyright [Lin Wang] [2004]. This work is the intellectual property of the author. Permission.
Ack: Several slides from Prof. Jim Anderson’s COMP 202 notes.
Testing Workflow Purpose
GpiI-2C Identifying software project stages, tasks and deliverables
ABC Technology Project
Symantec Education Skills Assessment SESA 3.0 Feature Showcase
1 4 Square Questions B A D C Look carefully to the diagram Now I will ask you 4 questions about this square. Are you ready?
Energy & Green Urbanism Markku Lappalainen Aalto University.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Hardware/Software Codesign.
3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.
Processes Management.
Executional Architecture
Addition 1’s to 20.
25 seconds left…...
Copyright © Cengage Learning. All rights reserved.
Test B, 100 Subtraction Facts
IT Analytics for Symantec Endpoint Protection
Week 1.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
12-Apr-15 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
Click to add text © 2012 IBM Corporation 1 Streams Toolkit Landscape InfoSphere Streams Version 3.0 Mike Branson Toolkits.
Click to add text © 2012 IBM Corporation 1 Visualization of View Data Susan L. Cline SWS Visualization.
Click to add text © 2012 IBM Corporation 1 InfoSphere Streams Streams Console Applications InfoSphere Streams Version 3.0 Warren Acker InfoSphere Streams.
Playback for Epic Ability to turn off default thresholds 1.
© 2013 IBM Corporation IBM UrbanCode Deploy v6.0.1 Support Enablement Training Source Configuration and Database Upgrades Michael Malinowski
Kristy Foster – L2 Software Engineer October 16, 2014
Kristy Foster – L2 Software Engineer March 18, 2014
IBM Blockchain An Enterprise Deployment of a Distributed Consensus-based Transaction Log Ben Smith & Kostantinos Christidis 1 ©2016 IBM Corporation.
Implementing Listening Producers in IBM Sterling Filegateway
Presentation transcript:

© 2013 IBM Corporation Implement high-level parallel API in JDK Richard Ning – Enterprise Developer 1 st June 2013

© 2013 IBM Corporation 2 Important Disclaimers – THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. – WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. – ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES. – ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE. – IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBMS CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE. – IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. – NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS.

© 2013 IBM Corporation 3 Introduction to the speaker Developing enterprise application software since 1999 (C++, Java) Recent work focus: IBM JDK development My contact information: –mail:

© 2013 IBM Corporation What should you get from this talk? By the end of this session, you should be able to: –Understand implementation of high-level parallel API in JDK –Understand how parallel computing works on multi-cores

© 2013 IBM Corporation Agenda Introduction: multi-threading, multi-cores, parallel computing Case study Other high-level parallel API Roadmap 4 4

© 2013 IBM Corporation Introduction Multi-Threading Multi-core computer Parallel computing

© 2013 IBM Corporation Case study Execute the same task for every element in a loop Use multi-threading for the execution

© 2013 IBM Corporation Can it improve performance?

© 2013 IBM Corporation time CPUCPU t1 t2 t1 t2 t1 Multi-threading on computer with one core

© 2013 IBM Corporation 100% CPU usage with single thread and multi-threading Performance even decreases with extra threading consuming Can't improve performance It is useless to use multi- threading(par allel) API)

© 2013 IBM Corporation CPU1 Multi-threading on computer with multi-core

© 2013 IBM Corporation Cor4 t4 t2 t3 t1 Cor3 Cor2 Cor1 Thread runs separately on every core time

© 2013 IBM Corporation Raw thread Any improvement? Executor –Users need to create and manage it Disadvantages – Not flexible – the number of threads is hard to configure flexibly > core number, resources are consumed in thread context, even decrease performance < core number, some cores are wasted No balance, the calculation can't be allocated into every core equally – Not flexible – the number of threads is hard to configure flexibly > core number, resources are consumed in thread context, even decrease performance < core number, some cores are wasted No balance, the calculation can't be allocated into every core equally

© 2013 IBM Corporation Separate creation and execution of thread Use thread pool to reuse thread

© 2013 IBM Corporation A high-level API concurrent_for

© 2013 IBM Corporation

The API is easy to use, users only need to input executed task and data range and don't care about how they are executed. However they still have disadvantages. 1.The number of thread in thread pool isn't aligned to core number 2.Task executes an entry once, which isn't sufficient 3.A task is targeted to a thread, which isn't flexible

© 2013 IBM Corporation 123 n Thread Pool 1 3 n 2 Tasks m CPU Core Thread Task Core: 4 Thread: n Task: m Overloading : n>>4 Not flexible: m >n

© 2013 IBM Corporation Thread Pool CPU Core Thread Thread number = core number Core number doesn't align to thread number: Use fixed thread pool

© 2013 IBM Corporation Task division: another task division strategy ForkJoinPool Fork Join Task1 Task2Task3 Task5Task6Task7 Divide and conquer 1. Divide big task into small tasks recursively 2. Execute the same operation for every task 3. Join result of every small task Task4

© 2013 IBM Corporation

Better use for divide and conquer problem Previous issues (thread oversubscription and starvation, unbalancing) still exist Task dividing strategy is from users, isn't configured properly according to running condition

© 2013 IBM Corporation New parallel API based on task scheduler

© 2013 IBM Corporation Thread Pool CPU Core Thread TASKQUEUETASKQUEUE Initial status Tasks are allocated equally, One thread by one core Every thread maintains its task queue which consists of affiliated tasks

© 2013 IBM Corporation Thread Pool CPU Core Thread TASKQUEUETASKQUEUE 1015 Unbalancing loading

© 2013 IBM Corporation Thread Pool 1234 CPU Core Thread TASKQUEUETASKQUEUE Balancing loading by task stealing and adding new tasks

© 2013 IBM Corporation Parallel API with new working mechanism - concurrent_for Range: the range of data set [0, n) Strategy: the strategy of dividing range: automatic, static with granularity Task: the task which executes the same operation on range

© 2013 IBM Corporation

Other high-level parallel API Can add data set while executing it concurrently. concurrent _while Use divide_join based task to return calculation result. concurrent _reduce Sort data set concurrently. concurrent sort for example, a matrix multiply another matrix int[5][10] matrix1, int[10][5] matrix2 int[5][5] matrix3 = matrix1 * matrix2 int[5][5] matrix3 = concurrent_multiply(matrix1, matrix2) Math calculation

© 2013 IBM Corporation Anyway we always can achieve performance improvement by parallel computing based on multi-cores.

© 2013 IBM Corporation Scalable Roadmap Implement high-level parallel API in JDK based on new task scheduler Correct Portable High performance

© 2013 IBM Corporation Review of Objectives Now that youve completed this session, you are able to: –Understand design of new parallel API based on task. –Understand what parallel computing is and what is good for

© 2013 IBM Corporation Q & A

© 2013 IBM Corporation Thanks!