Optimization of System Performance using OpenMP m5151117 Yumiko Kimezawa May 25, 20111RPS.

Slides:



Advertisements
Similar presentations
Exploiting Execution Order and Parallelism from Processing Flow Applying Pipeline-based Programming Method on Manycore Accelerators Shinichi Yamagiwa University.
Advertisements

Parallel Processing with OpenMP
Introduction to Openmp & openACC
Reporter :LYWang We propose a multimedia SoC platform with a crossbar on-chip bus which can reduce the bottleneck of on-chip communication.
- the new generation realtime operating system For embedded and fault tolerant applications.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Mark Neil - Microprocessor Course 1 Digital to Analog Converters.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Conversion Between Video Compression Protocols Performed by: Dmitry Sezganov, Vitaly Spector Instructor: Stas Lapchev, Artyom Borzin Cooperated with:
A.R. Hertneky J.W. O’Brien J.T. Shin C.S. Wessels Laser Controller One (LC1)
Spike Sorting Algorithm implemented on FPGA Elad Ilan Asaf Gal Sup: Alex Z.
Smart EQ Digital Stereo Equalizer Dustin Demontigny David Bull.
Zach Allen Chris Chan Ben Wolpoff Shane Zinner Project Z: Stereo Range Finding Based on Motorola Dragonball Processor.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Design and Implementation of a Virtual Reality Glove Device Final presentation – winter 2001/2 By:Amos Mosseri, Shy Shalom, Instructors:Michael.
ADAPTIVE TRAFFIC CONTROLLER Professor Doshi Peter Petrakis (team manager) Marcin Celmer Matt Wilhelm Tom Stack.
VAHCS Voice Activated Home Control System By: Kyle Joseph Troy Resetich Advisors: Dr. Malinowski Dr. Schertz.
Algorithms in a Multiprocessor Environment Kevin Frandsen ENCM 515.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Eye-RIS. Vision System sense – process - control autonomous mode Program stora.
GPGPU platforms GP - General Purpose computation using GPU
Microcontrollers, Basics Fundamentals of Designing with Microcontrollers 16 January 2012 Jonathan A. Titus.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
An FPGA implementation of real-time QRS detection H.K.Chatterjee Dept. of ECE Camellia School of Engineering & Technology Kolkata India R.Gupta, J.N.Bera,
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.
Parallel Edge Detection Daniel Dobkin Asaf Nitzan.
NetBurner MOD 5282 Network Development Kit MCF 5282 Integrated ColdFire 32 bit Microcontoller 2 DB-9 connectors for serial I/O supports: RS-232, RS-485,
Research Summary and Schedule m Yumiko Kimezawa August 1, 20121RPS.
Adding the TSE component to BANSMOM system and Software Development m Yumiko Kimezawa October 4, 20121RPS.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Team 2 Yimin Xiao Jintao Zhang Bo Yuan Yang.  The project we propose is a digital oscilloscope with playback function that provides almost any function.
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
Service Engineeing & Optimization Revision 1.1 MOTOROLA L6 i-Mode L6 i-Mode Block Diagram.
Digitization When data acquisition hardware receives an analog signal it converts it to a voltage. An A/D (analog-to-digital) converter then digitizes.
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
NIOS II Ethernet Communication Final Presentation
200 pt 300 pt 400 pt 500 pt 100 pt 200 pt 300 pt 400 pt 500 pt 100 pt 200pt 300 pt 400 pt 500 pt 100 pt 200 pt 300 pt 400 pt 500 pt 100 pt 200 pt 300 pt.
Towards the Design of Heterogeneous Real-Time Multicore System Adaptive Systems Laboratory, Master of Computer Science and Engineering in the Graduate.
Team 5 – Silver Snakes Technical Evaluation Content Communication Speaker System February 15, 2012.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Electrocardiogram (ECG) application operation – Part A Performed By: Ran Geler Mor Levy Instructor:Moshe Porian Project Duration: 2 Semesters Spring 2012.
Microprocessor based Design for Biomedical Applications MBE 3 – MDBA XI : Project Outlooks.
Lab 2 Parallel processing using NIOS II processors
Team 2 Yimin Xiao Jintao Zhang Bo Yuan Yang.  An ability to sample analog voltage signal range from -12 V to 12 V via BNC;  An ability to reconstruct.
Research Progress Seminar
Additional Hardware Optimization m Yumiko Kimezawa October 25, 20121RPS.
Advanced Hardware/Software Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Adapted Systems.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
S Yumiko Kimezawa A design of the ECG prototype system for two leads November 5, 20101Preliminary presentation.
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
Mark Neil - Microprocessor Course 1 Digital to Analog Converters.
Team 6. Guitar Audio Amplifier Audio Codec DSP Wireless Adapter Motor Array PC LCD Display LED Arrays Pushbutton or RPG Input Device
CEng3361/18 CENG 336 INT. TO EMBEDDED SYSTEMS DEVELOPMENT Spring 2007 Recitation 01.
Investigation of BANSMOM System m Yumiko Kimezawa February 3, 20121RPS.
1 Munther Abualkibash University of Bridgeport, CT.
System on a Programmable Chip (System on a Reprogrammable Chip)
Digital Signal Processor HANYANG UNIVERSITY 학기 Digital Signal Processor 조 성 호 교수님 담당조교 : 임대현
January 21, 2011GT20101 Multicore SoC Architecture and Prototyping for Parallel ECG Processing s Yumiko Kimezawa Supervised by Prof. Abderazek Ben.
February 1, 2011GT20101 Multicore SoC Architecture and Prototyping for Parallel ECG Processing s Yumiko Kimezawa Supervised by Prof. Abderazek Ben.
Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.
Lecture 3 - Instruction Set - Al
ECE 477 Final Presentation Team 2 Spring 2012
Allen D. Malony Computer & Information Science Department
Digital Signal Processors-1
Presentation transcript:

Optimization of System Performance using OpenMP m Yumiko Kimezawa May 25, 20111RPS

Outline Problems with Previous Research What is OpenMP and MPI ? -A brief example of OpenMP Research Goal Future Work May 25, 2011RPS2

Problems with Previous Research The scale of systems is large Accuracy of PPD Algorithm A/D converters are not implemented -Raw ECG data is analog data May 25, 2011RPS3 PPD Algorithm will be optimized by using OpenMP or MPI

What are OpenMP and MPI ? OpenMP -A shared-memory application programming interface (API) -Facilitator of shared-memory parallel programming -Is used on a board range of SMP architecture -Is usually used to parallelize loops MPI -Message Passing Interface -Programming for distributed-memory architecture -Multiple processes operate independently May 25, 2011RPS4

A brief example of OpenMP May 25, 2011RPS5 Fork-join programming model Team of threads Initial threads Fork Join Thread 0: for(i=0; I < 25; i++) sum += i; Thread 1: for(i=25; I < 50; i++) sum += i; Thread 2: for(i=50; I < 75; i++) sum += i; Thread 3: for(i=75; I < 100;I ++) sum += i; + Sum

Research Goal Optimization of -the HW Implementation of the A/D converter -the SW Parallelization of algorithm Improvement of accuracy of algorithm Real-time evaluation - Conclusive results are sent to the web May 25, 2011RPS6

Future Work Continue to study OpenMP Research of other algorithms May 25, 2011RPS7

May 25, 2011RPS8

Period detection Peaks processing Data reading Derivation Autocorrelation Finding interval Extraction Store of results Discrimination Period-Peaks Detection (PPD) Algorithm (2/2) May 25, 20119RPS

Period detection Peaks processing Data reading Derivation Autocorrelation Finding interval Extraction Store of results Discrimination Period-Peaks Detection (PPD) Algorithm (2/2) : Current sampling data (filtered ECG signals) : Current time (step) Equation May 25, RPS

Period detection Peaks processing Data reading Derivation Autocorrelation Finding interval Extraction Store of results Discrimination Period-Peaks Detection (PPD) Algorithm (2/2) Autocorrelation function : The autocorrelation function : The number of times needed for the calculations to get the period (256 in our program) : The filtered ECG signal May 25, RPS

Period detection Peaks processing Data reading Derivation Autocorrelation Finding interval Extraction Store of results Discrimination Period-Peaks Detection (PPD) Algorithm (2/2) May 25, RPS

System architecture Graphic LCD Controller Master CPU Memory Master CPU Memory Master CPU Timer Graphic LCD Graphic LCD LED JTAG UART JTAG UART PPD Module Master Module LED Controller LED Controller Avalon Bus FIR Filter Timer Slave CPU Memory Slave CPU External Memory External Memory Shared Memory Shared Memory ECG Data Rom May 25, RPS : Data flow : Control signal : Data flow : Control signal

System architecture Graphic LCD Controller Master CPU Memory Master CPU Memory Master CPU Timer Graphic LCD Graphic LCD LED JTAG UART JTAG UART PPD Module Master Module LED Controller LED Controller Avalon Bus FIR Filter Timer Slave CPU Memory Slave CPU External Memory External Memory Shared Memory Shared Memory ECG Data Rom May 25, RPS 1: Signal Reading : Data flow : Control signal : Data flow : Control signal

System architecture Graphic LCD Controller Master CPU Memory Master CPU Memory Master CPU Timer Graphic LCD Graphic LCD LED JTAG UART JTAG UART PPD Module Master Module LED Controller LED Controller Avalon Bus FIR Filter Timer Slave CPU Memory Slave CPU External Memory External Memory Shared Memory Shared Memory ECG Data Rom May 25, RPS 2: Filtering : Data flow : Control signal : Data flow : Control signal

System architecture Graphic LCD Controller Master CPU Memory Master CPU Memory Master CPU Timer Graphic LCD Graphic LCD LED JTAG UART JTAG UART PPD Module Master Module LED Controller LED Controller Avalon Bus FIR Filter Timer Slave CPU Memory Slave CPU External Memory External Memory Shared Memory Shared Memory ECG Data Rom May 25, RPS 3: Analysis : Data flow : Control signal : Data flow : Control signal

System architecture Graphic LCD Controller Master CPU Memory Master CPU Memory Master CPU Timer Graphic LCD Graphic LCD LED JTAG UART JTAG UART PPD Module Master Module : Data flow : Control signal : Data flow : Control signal LED Controller LED Controller Avalon Bus FIR Filter Timer Slave CPU Memory Slave CPU External Memory External Memory Shared Memory Shared Memory ECG Data Rom May 25, RPS 4: Display