Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Slides:



Advertisements
Similar presentations
Multiuser Detection for CDMA Systems
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
Real-Time DSP Multiprocessor Implementation for Future Wireless Base-Station Receivers Bryan Jones, Sridhar Rajagopal, and Dr. Joseph Cavallaro.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
1 Wireless Communication Low Complexity Multiuser Detection Rami Abdallah University of Illinois at Urbana Champaign 12/06/2007.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
1/44 1. ZAHRA NAGHSH JULY 2009 BEAM-FORMING 2/44 2.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
1 Complexity of Network Synchronization Raeda Naamnieh.
Reference: Message Passing Fundamentals.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Overview.  UMTS (Universal Mobile Telecommunication System) the third generation mobile communication systems.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Paradigms for Process Interaction ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
APPLICATION OF SPACE-TIME CODING TECHNIQUES IN THIRD GENERATION SYSTEMS - A. G. BURR ADAPTIVE SPACE-TIME SIGNAL PROCESSING AND CODING – A. G. BURR.
Strategies for Implementing Dynamic Load Sharing.
10 January,2002Seminar of Master Thesis1 Helsinki University of Technology Department of Electrical and Communication Engineering WCDMA Simulator with.
Frequencies (or time slots or codes) are reused at spatially-separated locations  exploit power falloff with distance. Best efficiency obtained with minimum.
Mapping Techniques for Load Balancing
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
1 Techniques to control noise and fading l Noise and fading are the primary sources of distortion in communication channels l Techniques to reduce noise.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Multiuser Detection with Base Station Diversity IEEE International.
Lab 2 Parallel processing using NIOS II processors
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Combined Multiuser Detection and Channel Decoding with Receiver.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Fundamentals of Digital Communication
Outline Why this subject? What is High Performance Computing?
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
These slides are based on the book:
Auburn University
Auburn University
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Overview Parallel Processing Pipelining
Parallel Programming By J. H. Wang May 2, 2017.
Parallel and Distributed Simulation Techniques
The University of Adelaide, School of Computer Science
12.4 Memory Organization in Multiprocessor Systems
CSE8380 Parallel and Distributed Processing Presentation
Overview Parallel Processing Pipelining
COMP60621 Fundamentals of Parallel and Distributed Systems
Multithreaded Programming
Database System Architectures
Chapter 01: Introduction
COMP60611 Fundamentals of Parallel and Distributed Systems
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
Presentation transcript:

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT

2 Outline 1. Overview 1.1 The Requirement for Computational Speed of Simulation for Wireless WCDMA system 1.2 Parallel Programming 2. Types of Parallel Computers 2.1 Shared Memory Multiprocessor System 2.2 Message Passing Multiprocessor with Local Memory 3. Parallel Programming Scenarios 3.1 Ideal Parallel Computations 3.2 Partitioning and Divide-and- Conquer Strategies 3.3 Pipelined Computation 3.4 Synchronous Computation 3.5 Load balancing 3.6 Multiprocessor with Shared Memory 4. Progress of the project

3 1. Overview 1.1 The Requirement for Computational Speed of Wireless WCDMA Network Simulation In mobile communication, the development of advanced signal processing techniques such as smart antenna and MUD can improve the system performance, but require signal or system level simulation. Simulation is an important tool for getting insight into the problem. However, often it is very time consuming task to simulate the signal processing algorithms It is necessary to speed up simulation. Parallel programming is one of the best techniques to solve this problem.

4 1.2 Parallel Programming Parallel programming can speed up the execution of a program by dividing the program into multiple fragments that can be executed simultaneously, each on it’s own processor. Parallel programming involves: ♦ Decomposing an algorithm or data into parts ♦ Distributing sub-tasks which are processed by multiple processors simultaneously ♦ Coordinating work and communications between those processors

5 1.2 Parallel Programming ( cont. ) The Requirements for Parallel Programming ♦ Parallel architecture being used ♦ Multiple processors ♦ Network ♦ Environment to create and manage parallel processing ♦ A parallel algorithm and parallel program

6 2. Types of Parallel Computers 2.1 Shared Memory Multiprocessor System CPU Memory ♦ Multiple processors operate independently but share the same memory resources. ♦ Only one processor can access the shared memory location at a time ♦ Synchronisation achieved by controlling with READING FROM and WRITING TO the shared memory. CPU

7 2.1 Shared Memory Multiprocessor System (cont.) ♦ Advantages Easy for user to use efficiently Data sharing among tasks is fast ( speedup memory access ) ♦ Disadvantages The size of memory might be a limiting factor. Increase the number of processors without increase of the size of memory can cause severe bottlenecks User is responsible for establishing synchronization.

8 2.2 Message Passing Multiprocessor with Local Memory ♦ Multiple processors operate independently but each has its own local memory. ♦ Data is shared across communication network using message passing ♦ User is responsible for synchronization using message passing. Network Memory CPU CPU Memory

9 2.2 Message Passing Multiprocessor with Local Memory (cont) ♦ Advantages Memory scalable to number of processors. Increase number of processors with their own memory, the total size of memory will be increased comparing with the shared memory multiprocessor system. Each processor can rapidly access its own memory without limitation. ♦ Disadvantages Difficult to map existing data structures. User is responsible for sending and receiving data among processors To minimize overhead and latency, data should be stacked up in large blocks before receiving nodes will need it.

10 3. Parallel Programming Scenario 3.1 Ideal Parallel Computations A computation can be readily divided into completely independent parts that can be executed simultaneously. Example: In the simulation of Uplink WCDMA (single user), signal processing at the transmitter and the receiver are  divided into smaller parts,  executed by separate processors.

Ideal Parallel Computations (cont.) Example: simulation of wireless communication with Ideal Parallel Computation CPU 2 Channel coding and data matching CPU4 Spreading and scrambling CPU 5 Pulse shaping filtering CPU 1 Source data generation (traffic/packet) CPU 3 Modulation Transmitter CPU 10 Channel decoding CPU 9 demodulation Receiver CPU 6 Reconstruction of the composite signal (signal, channel, AWGN ) AWGN CPU 8 Rake combining CPU 7 Matched filtering Radio channel

Task Partitioning and Divide-and-Conquer Strategies Partitioning: the problem is simply divided into separate parts and each part is computed separately Divide-and-Conquer: to divide task continually into smaller and smaller subtasks before solving the smaller parts and the results are combined Example: In the simulation of Rake combining technique in WCDMA, the problem can be continually divided among different fingers. In each finger, the problem can be also divided into correlating, delay equalizing, MRC/EGC combining.

Partitioning and Divide-and- Conquer Strategies (cont.) Example: the simulation of wireless communication with Divide-and- Conquer Strategy Rake Combining Finger K Finger 2 Finger 1 CPU 2 modified with the channel estimate CPU 3 combining with MRC/EGC CPU 1 Correlating

Pipelined Computation The problem is divided into a series of tasks that have to be completed one after the other. Each task will be executed by a separate processor Partially sequential in nature Example: In the simulation of WCDMA transmitter and receiver, each block of signal processing needs the output of the previous block as its input. In this case, Pipelining technique is adopted to parallel sequential source code.

Pipelined Computation (cont.) Example: the simulation of wireless communication with Pipelined Computation CPU 2 Channel coding and data matching CPU4 Spreading and scrambling CPU 5 Pulse shaping filtering CPU 1 Source data generation (traffic/packet) CPU 3 Modulation Transmitter CPU 10 Channel decoding CPU 9 demodulation Receiver CPU 6 Reconstruction of the composite signal (signal, channel, AWGN ) AWGN CPU 8 Rake combining CPU 7 Matched filtering Radio channel

Synchronous Computation Processors need to exchange data between themselves. All the processes start at the same time in a lock-step manner Each process must wait until all processes have reached a particular reference point (barrier) in their computation. Example: WCDMA system  Smart Antenna (SA) : the signal processing in each branch of antenna elements must be finished before combining them.  Rake Combining: the signal processing in each finger must be finished before combining them.  Multiuser Detection(MUD): as MUD for each user signal needs other users’ signal message, the operation for all users’ signal must be finished before MUD.

Synchronous Computation (cont.) Example: the simulation of wireless communication with Synchronous Computation AWGN CPU AWGN CPU … … Received signal reconstruction Matched filtering … Beam forming … Rake Combining CPU Rake Combining … CPU Rake Combining CPU Finger K CPU Finger 1 Modified with the channel estimate Correlating … … … CPU Beamforing Combining … User 1 CPU Finger K CPU Finger 1 Modified with the channel estimate Correlating … User N … MUD w w

Synchronous Computation (cont.) Example: the simulation of wireless communication with Synchronous Computation Mutiuser Detection CPU... CPU... CPU The output of user 1’ beamforming /combining The signature waveform of user 1 The output of user 2’ beamforming /combining The signature waveform of user 2 The output of user N’ beamforming /combining The signature waveform of user N

Load balancing to distribute computation load fairly across processors in order to obtain the highest possible execution speed. Example: WCDMA system  Smart Antenna (SA) : the speed of Direction of arrival (DOA) variation for different user signal can be different, this means that beamforming processor for different user could have different number of operations. The load of all processors can be fairly balanced by detecting if the solution has been reached on each processor.  Rake Combining: the number of multipath signals for different users could be different. The load of all processors can be fairly balanced by detecting if the solution has been reached by each processor.

Load balancing (cont.) Example: the simulation of wireless communication with Load balancing Rake Combining CPU 1 ( user 1) Computation time CPU 2 ( user 2 has more number of multipath signals) than that of other users Computation time CPU N ( user N) Computation time Beamforming CPU N+1 ( user 1) Computation time CPU N+2 ( the channel parameter of user 2 are varying faster than that of other users) Computation time CPU 2N ( user N) Computation time

Multiprocessor with Shared Memory Multiprocessor with shard memory can speed up programming by storing the executable code and data in shared memory for each processor. Example In the simulation of WCDMA with multiple users, each part of signal processing model could have certain number of algorithms, for example  adaptive Beamforming: RLS, LMS, CMA, Conjugate Gradient Method  Multiuser Detection: Decorrelating detector, MMSE Detector, Adaptive MMSE Detection etc.  All codes for these algorithms are stored in the shared memory.  Processing for each user shares all these codes  The processor for each user can access these executable codes in the shared memory to speed up the programming.

Multiprocessor with Shared Memory (cont.) Example: the simulation of wireless communication by Multiprocessor with Shared Memory Beamforming Cache CPU 1 ( user 1)... Cache CPU N ( user N) Memory module ( RLS ) Memory module (CMA)... Multiuser Detection Cache CPU 1 ( user 1)... Cache CPU N ( user N) Memory module (decorrelating detector) Memory module ( MMSE )...

23 4. Progress of the project The following models of WCDMA system are developed /integrated into simulator -Spreader/despreder -Spatial Processing -RAKE receiver -Fading radio channel -Some simulation results are obtained for the models verification -Interactions with SARG at Stanford on Rake receiver model verifications Work on translation from MATLAB into C language with further parallelization is accomplished at UCLA.