Download presentation
Presentation is loading. Please wait.
Published byAnthony Pierce Modified over 9 years ago
1
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT
2
2 Outline 1. Overview 1.1 The Requirement for Computational Speed of Simulation for Wireless WCDMA system 1.2 Parallel Programming 2. Types of Parallel Computers 2.1 Shared Memory Multiprocessor System 2.2 Message Passing Multiprocessor with Local Memory 3. Parallel Programming Scenarios 3.1 Ideal Parallel Computations 3.2 Partitioning and Divide-and- Conquer Strategies 3.3 Pipelined Computation 3.4 Synchronous Computation 3.5 Load balancing 3.6 Multiprocessor with Shared Memory 4. Progress of the project
3
3 1. Overview 1.1 The Requirement for Computational Speed of Wireless WCDMA Network Simulation In mobile communication, the development of advanced signal processing techniques such as smart antenna and MUD can improve the system performance, but require signal or system level simulation. Simulation is an important tool for getting insight into the problem. However, often it is very time consuming task to simulate the signal processing algorithms It is necessary to speed up simulation. Parallel programming is one of the best techniques to solve this problem.
4
4 1.2 Parallel Programming Parallel programming can speed up the execution of a program by dividing the program into multiple fragments that can be executed simultaneously, each on it’s own processor. Parallel programming involves: ♦ Decomposing an algorithm or data into parts ♦ Distributing sub-tasks which are processed by multiple processors simultaneously ♦ Coordinating work and communications between those processors
5
5 1.2 Parallel Programming ( cont. ) The Requirements for Parallel Programming ♦ Parallel architecture being used ♦ Multiple processors ♦ Network ♦ Environment to create and manage parallel processing ♦ A parallel algorithm and parallel program
6
6 2. Types of Parallel Computers 2.1 Shared Memory Multiprocessor System CPU Memory ♦ Multiple processors operate independently but share the same memory resources. ♦ Only one processor can access the shared memory location at a time ♦ Synchronisation achieved by controlling with READING FROM and WRITING TO the shared memory. CPU
7
7 2.1 Shared Memory Multiprocessor System (cont.) ♦ Advantages Easy for user to use efficiently Data sharing among tasks is fast ( speedup memory access ) ♦ Disadvantages The size of memory might be a limiting factor. Increase the number of processors without increase of the size of memory can cause severe bottlenecks User is responsible for establishing synchronization.
8
8 2.2 Message Passing Multiprocessor with Local Memory ♦ Multiple processors operate independently but each has its own local memory. ♦ Data is shared across communication network using message passing ♦ User is responsible for synchronization using message passing. Network Memory CPU CPU Memory
9
9 2.2 Message Passing Multiprocessor with Local Memory (cont) ♦ Advantages Memory scalable to number of processors. Increase number of processors with their own memory, the total size of memory will be increased comparing with the shared memory multiprocessor system. Each processor can rapidly access its own memory without limitation. ♦ Disadvantages Difficult to map existing data structures. User is responsible for sending and receiving data among processors To minimize overhead and latency, data should be stacked up in large blocks before receiving nodes will need it.
10
10 3. Parallel Programming Scenario 3.1 Ideal Parallel Computations A computation can be readily divided into completely independent parts that can be executed simultaneously. Example: In the simulation of Uplink WCDMA (single user), signal processing at the transmitter and the receiver are divided into smaller parts, executed by separate processors.
11
11 3.1 Ideal Parallel Computations (cont.) Example: simulation of wireless communication with Ideal Parallel Computation CPU 2 Channel coding and data matching CPU4 Spreading and scrambling CPU 5 Pulse shaping filtering CPU 1 Source data generation (traffic/packet) CPU 3 Modulation Transmitter CPU 10 Channel decoding CPU 9 demodulation Receiver CPU 6 Reconstruction of the composite signal (signal, channel, AWGN ) AWGN CPU 8 Rake combining CPU 7 Matched filtering Radio channel
12
12 3.2 Task Partitioning and Divide-and-Conquer Strategies Partitioning: the problem is simply divided into separate parts and each part is computed separately Divide-and-Conquer: to divide task continually into smaller and smaller subtasks before solving the smaller parts and the results are combined Example: In the simulation of Rake combining technique in WCDMA, the problem can be continually divided among different fingers. In each finger, the problem can be also divided into correlating, delay equalizing, MRC/EGC combining.
13
13 3.2 Partitioning and Divide-and- Conquer Strategies (cont.) Example: the simulation of wireless communication with Divide-and- Conquer Strategy Rake Combining Finger K Finger 2 Finger 1 CPU 2 modified with the channel estimate CPU 3 combining with MRC/EGC CPU 1 Correlating
14
14 3.3 Pipelined Computation The problem is divided into a series of tasks that have to be completed one after the other. Each task will be executed by a separate processor Partially sequential in nature Example: In the simulation of WCDMA transmitter and receiver, each block of signal processing needs the output of the previous block as its input. In this case, Pipelining technique is adopted to parallel sequential source code.
15
15 3.3 Pipelined Computation (cont.) Example: the simulation of wireless communication with Pipelined Computation CPU 2 Channel coding and data matching CPU4 Spreading and scrambling CPU 5 Pulse shaping filtering CPU 1 Source data generation (traffic/packet) CPU 3 Modulation Transmitter CPU 10 Channel decoding CPU 9 demodulation Receiver CPU 6 Reconstruction of the composite signal (signal, channel, AWGN ) AWGN CPU 8 Rake combining CPU 7 Matched filtering Radio channel
16
16 3.4 Synchronous Computation Processors need to exchange data between themselves. All the processes start at the same time in a lock-step manner Each process must wait until all processes have reached a particular reference point (barrier) in their computation. Example: WCDMA system Smart Antenna (SA) : the signal processing in each branch of antenna elements must be finished before combining them. Rake Combining: the signal processing in each finger must be finished before combining them. Multiuser Detection(MUD): as MUD for each user signal needs other users’ signal message, the operation for all users’ signal must be finished before MUD.
17
17 3.4 Synchronous Computation (cont.) Example: the simulation of wireless communication with Synchronous Computation AWGN CPU AWGN CPU … … Received signal reconstruction Matched filtering … Beam forming … Rake Combining CPU Rake Combining … CPU Rake Combining CPU Finger K CPU Finger 1 Modified with the channel estimate Correlating … … … CPU Beamforing Combining … User 1 CPU Finger K CPU Finger 1 Modified with the channel estimate Correlating … User N … MUD w w
18
18 3.4 Synchronous Computation (cont.) Example: the simulation of wireless communication with Synchronous Computation Mutiuser Detection CPU... CPU... CPU......... The output of user 1’ beamforming /combining The signature waveform of user 1 The output of user 2’ beamforming /combining The signature waveform of user 2 The output of user N’ beamforming /combining The signature waveform of user N
19
19 3.5 Load balancing to distribute computation load fairly across processors in order to obtain the highest possible execution speed. Example: WCDMA system Smart Antenna (SA) : the speed of Direction of arrival (DOA) variation for different user signal can be different, this means that beamforming processor for different user could have different number of operations. The load of all processors can be fairly balanced by detecting if the solution has been reached on each processor. Rake Combining: the number of multipath signals for different users could be different. The load of all processors can be fairly balanced by detecting if the solution has been reached by each processor.
20
20 3.5 Load balancing (cont.) Example: the simulation of wireless communication with Load balancing Rake Combining...... CPU 1 ( user 1) Computation time CPU 2 ( user 2 has more number of multipath signals) than that of other users Computation time CPU N ( user N) Computation time Beamforming...... CPU N+1 ( user 1) Computation time CPU N+2 ( the channel parameter of user 2 are varying faster than that of other users) Computation time CPU 2N ( user N) Computation time
21
21 3.6 Multiprocessor with Shared Memory Multiprocessor with shard memory can speed up programming by storing the executable code and data in shared memory for each processor. Example In the simulation of WCDMA with multiple users, each part of signal processing model could have certain number of algorithms, for example adaptive Beamforming: RLS, LMS, CMA, Conjugate Gradient Method Multiuser Detection: Decorrelating detector, MMSE Detector, Adaptive MMSE Detection etc. All codes for these algorithms are stored in the shared memory. Processing for each user shares all these codes The processor for each user can access these executable codes in the shared memory to speed up the programming.
22
22 3.6 Multiprocessor with Shared Memory (cont.) Example: the simulation of wireless communication by Multiprocessor with Shared Memory Beamforming Cache CPU 1 ( user 1)... Cache CPU N ( user N) Memory module ( RLS ) Memory module (CMA)... Multiuser Detection Cache CPU 1 ( user 1)... Cache CPU N ( user N) Memory module (decorrelating detector) Memory module ( MMSE )...
23
23 4. Progress of the project The following models of WCDMA system are developed /integrated into simulator -Spreader/despreder -Spatial Processing -RAKE receiver -Fading radio channel -Some simulation results are obtained for the models verification -Interactions with SARG at Stanford on Rake receiver model verifications Work on translation from MATLAB into C language with further parallelization is accomplished at UCLA.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.