Download presentation
Presentation is loading. Please wait.
Published byApril Martin Modified over 9 years ago
1
Alex Becker
2
Multi-core is short for “multiple cores” Advances in technology allow for several discrete cores on one chip This however is not multi-CPU
3
A core contains the core processor, but does not include the other components of the CPU A CPU contains things such as a front-side bus, caches, and video processing
4
Number of transistors on a chip doubles every two years Able to split the transistors into two separate cores
5
Multi-core chips can handle multiple tasks better than single core Case in point: SETI@Home Two cores meant one could be dedicate to S@H The other core would take care of normal tasks Parallel computing Overall speed increase at the same or lower power requirements
6
Early multi-core chips had a major downside compared to single-core chips Individual core speed was less than a single- core’s speed Tasks would run slower
7
First commercial multi-core chips were Advance Micro Devices’ Opteron Designed for servers Provided a significant advantage over prior, single-core devices More cores meant servers could process more data Speed isn’t as critical in server applications compared to standard applications
8
Intel’s first multi-core offering was a Xeon The second major offering was the Core Duo The Core Duo was designed for mobile computing Also offered significant advantages Allowed a lower thermal design power (TDP) than single core chips Saved power
9
Used in video cards CUDA technology Hundreds to over a thousand CUDA cores
11
Parallelism is running multiple threads, cores, or CPUs to run parts of a program side by side Results in a speed increase How much of an increase though?
12
One would expect a linear increase in speed as amount of the program running in parallel increases
14
Back in the 1960s, Gene Amdahl came up with a formula to determine the speed increase that would come with parallel computing. This became known as Amdahl’s Law
16
10 minute program 9 minutes can be in parallel 1 minute must be sequential Based on this: r p = 0.9 r s = 0.1 N = 4 X will be around 3/8
18
Core Count: GTX 670: 1344 GTX 680: 1536 Or, the 670 has 7/8 the cores of the 680 Should one expect a performance drop of 1/8?
19
Obviously no. Amdahl’s law shows that using more cores suffers from diminishing returns Using the example from before, let’s see the time differences GTX 670: ~10.07% of the time, or 60.42s GTX 680: ~10.06% of the time, or 60.36s The difference is much less than 12% Noticed in real world frame rate testing
20
Writing in parallel has its own set of challenges Thread Safety What can and can’t be parallel
21
A simple equation?
22
void seq_function(int n, float a, float *x, float *y) { int i; for(i = 0; i < n; i++) y[i] = a * x[i] + y[i]; } //Call: seq_function(n, 3.14, x, y);
23
Nvidia’s implementation of parallel processing Designed to be parallel from the get go Uses blocks (blockidx.x) Each block has threads, represented as blockDim.x Individual thread ID is threadIdx.x When calling, specify number of blocks and the number of threads per block
24
__global__ void CUDA_function(int n, float a, float *x, float *y) { int i = blockIdx.x * blockDim.x + threadIdx.x; if(i < n) y[i] = a * x[i] + y[i]; }
25
int n_blocks = (n + 255) / 256 CUDA_function >(n, 3.14, x, y); n_blocks is the total number of blocks, and there are 256 threads per block
26
Parallel code can cause race conditions and other nasty side effects Different levels of safety Thread Unsafe Thread Safe Thread Safe-MT
27
“Thread Locks” For example, when one thread is accessing data that possibly could be accessed by other threads, the thread locks the data When another thread tries to access the data, the thread is put in a queue When the first thread is done, the waiting thread can access the data
28
An atomic function is a function that must be the only function being executed at that time The only threads able to access the data contained within one are the threads in the function itself Uninterruptable
29
Way of the future Subset of parallel development Program speeds will increase as more programs become parallel Video game engines Eventually almost every program will have some parallelism in it
30
Intel Canada - English. Moore’s Law Inspires Intel Innovation. Retrieved from http://www.intel.com/content/www/us/en/silicon- innovations/moores-law-technology.html Nickolls, J., Buck, I., Garland, M., & Skadron, K. ACMQ Site - ACM Queue. Scalable Parallel Programming with CUDA - ACM Queue. Retrieved from http://queue.acm.org/detail.cfm?id=1365500 Oracle Documentation. Multithreaded Programming Guide. Retrieved from http://docs.oracle.com/cd/E19963-01/html/821- 1601/docinfo.html EECS Instructional Support Group Home Page. Retrieved from http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf ARK | Your source for information on Intel® products. ARK | Mobile Intel® Pentium® 4 Processor - M 2.30 GHz, 512K Cache, 400 MHz FSB. Retrieved from http://ark.intel.com/products/27360/Mobile-Intel- Pentium-4-Processor---M-2_30-GHz-512K-Cache-400-MHz-FSB
31
ARK | Your source for information on Intel® products. ARK | Intel® Core⢠Duo Processor T2700 (2M Cache, 2.33 GHz, 667 MHz FSB). Retrieved from http://ark.intel.com/products/27238/Intel-Core-Duo-Processor-T2700-2M-Cache-2_33- GHz-667-MHz-FSB ARK | Your source for information on Intel® products. Intel® Pentium® Mobile Processor (Mobile). Retrieved from http://ark.intel.com/products/family/41878/Intel- Pentium-Mobile-Processor/mobile ARK | Your source for information on Intel® products. Intel® Core⢠Duo Processor (Mobile). Retrieved from http://ark.intel.com/products/family/22731/Intel-Core-Duo- Processor/mobile Angelini, C. Tom's Hardware: Hardware News, Tests and Reviews. GeForce GTX 670 2 GB Review: Is It Already Time To Forget GTX 680? : Giving GK104 A Haircut. Retrieved from http://www.tomshardware.com/reviews/geforce-gtx-670-review,3200.html Robbins, D. IBM - United States. Common threads: POSIX threads explained, Part 2. Retrieved from http://www.ibm.com/developerworks/library/l-posix2/ Welcome [Savannah]. Atomically and Non-Atomically Executed Code Blocks. Retrieved from http://www.nongnu.org/avr-libc/user- manual/group__util__atomic.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.