Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU.

Alex Becker

 Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU

 A core contains the core processor, but does not include the other components of the CPU  A CPU contains things such as a front-side bus, caches, and video processing

 Number of transistors on a chip doubles every two years  Able to split the transistors into two separate cores

 Multi-core chips can handle multiple tasks better than single core  Case in point: SETI@Home  Two cores meant one could be dedicate to S@H  The other core would take care of normal tasks  Parallel computing  Overall speed increase at the same or lower power requirements

 Early multi-core chips had a major downside compared to single-core chips  Individual core speed was less than a single- core’s speed  Tasks would run slower

 First commercial multi-core chips were Advance Micro Devices’ Opteron  Designed for servers  Provided a significant advantage over prior, single-core devices  More cores meant servers could process more data  Speed isn’t as critical in server applications compared to standard applications

 Intel’s first multi-core offering was a Xeon  The second major offering was the Core Duo  The Core Duo was designed for mobile computing  Also offered significant advantages  Allowed a lower thermal design power (TDP) than single core chips  Saved power

 Used in video cards  CUDA technology  Hundreds to over a thousand CUDA cores

 Parallelism is running multiple threads, cores, or CPUs to run parts of a program side by side  Results in a speed increase  How much of an increase though?

 One would expect a linear increase in speed as amount of the program running in parallel increases

 Back in the 1960s, Gene Amdahl came up with a formula to determine the speed increase that would come with parallel computing.  This became known as Amdahl’s Law

 10 minute program  9 minutes can be in parallel  1 minute must be sequential  Based on this:  r p = 0.9  r s = 0.1  N = 4  X will be around 3/8

 Core Count:  GTX 670: 1344  GTX 680: 1536  Or, the 670 has 7/8 the cores of the 680  Should one expect a performance drop of 1/8?

 Obviously no.  Amdahl’s law shows that using more cores suffers from diminishing returns  Using the example from before, let’s see the time differences  GTX 670: ~10.07% of the time, or 60.42s  GTX 680: ~10.06% of the time, or 60.36s  The difference is much less than 12%  Noticed in real world frame rate testing

 Writing in parallel has its own set of challenges  Thread Safety  What can and can’t be parallel

A simple equation?

void seq_function(int n, float a, float *x, float *y) { int i; for(i = 0; i < n; i++) y[i] = a * x[i] + y[i]; } //Call: seq_function(n, 3.14, x, y);

 Nvidia’s implementation of parallel processing  Designed to be parallel from the get go  Uses blocks (blockidx.x)  Each block has threads, represented as blockDim.x  Individual thread ID is threadIdx.x  When calling, specify number of blocks and the number of threads per block

__global__ void CUDA_function(int n, float a, float *x, float *y) { int i = blockIdx.x * blockDim.x + threadIdx.x; if(i < n) y[i] = a * x[i] + y[i]; }

int n_blocks = (n + 255) / 256 CUDA_function >(n, 3.14, x, y);  n_blocks is the total number of blocks, and there are 256 threads per block

 Parallel code can cause race conditions and other nasty side effects  Different levels of safety  Thread Unsafe  Thread Safe  Thread Safe-MT

 “Thread Locks”  For example, when one thread is accessing data that possibly could be accessed by other threads, the thread locks the data  When another thread tries to access the data, the thread is put in a queue  When the first thread is done, the waiting thread can access the data

 An atomic function is a function that must be the only function being executed at that time  The only threads able to access the data contained within one are the threads in the function itself  Uninterruptable

 Way of the future  Subset of parallel development  Program speeds will increase as more programs become parallel  Video game engines  Eventually almost every program will have some parallelism in it

 Intel Canada - English. Moore’s Law Inspires Intel Innovation. Retrieved from http://www.intel.com/content/www/us/en/silicon- innovations/moores-law-technology.html  Nickolls, J., Buck, I., Garland, M., & Skadron, K. ACMQ Site - ACM Queue. Scalable Parallel Programming with CUDA - ACM Queue. Retrieved from http://queue.acm.org/detail.cfm?id=1365500  Oracle Documentation. Multithreaded Programming Guide. Retrieved from http://docs.oracle.com/cd/E19963-01/html/821- 1601/docinfo.html  EECS Instructional Support Group Home Page. Retrieved from http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf  ARK | Your source for information on Intel® products. ARK | Mobile Intel® Pentium® 4 Processor - M 2.30 GHz, 512K Cache, 400 MHz FSB. Retrieved from http://ark.intel.com/products/27360/Mobile-Intel- Pentium-4-Processor---M-2_30-GHz-512K-Cache-400-MHz-FSB

 ARK | Your source for information on Intel® products. ARK | Intel® Coreâ¢ Duo Processor T2700 (2M Cache, 2.33 GHz, 667 MHz FSB). Retrieved from http://ark.intel.com/products/27238/Intel-Core-Duo-Processor-T2700-2M-Cache-2_33- GHz-667-MHz-FSB  ARK | Your source for information on Intel® products. Intel® Pentium® Mobile Processor (Mobile). Retrieved from http://ark.intel.com/products/family/41878/Intel- Pentium-Mobile-Processor/mobile  ARK | Your source for information on Intel® products. Intel® Coreâ¢ Duo Processor (Mobile). Retrieved from http://ark.intel.com/products/family/22731/Intel-Core-Duo- Processor/mobile  Angelini, C. Tom's Hardware: Hardware News, Tests and Reviews. GeForce GTX 670 2 GB Review: Is It Already Time To Forget GTX 680? : Giving GK104 A Haircut. Retrieved from http://www.tomshardware.com/reviews/geforce-gtx-670-review,3200.html  Robbins, D. IBM - United States. Common threads: POSIX threads explained, Part 2. Retrieved from http://www.ibm.com/developerworks/library/l-posix2/  Welcome [Savannah]. Atomically and Non-Atomically Executed Code Blocks. Retrieved from http://www.nongnu.org/avr-libc/user- manual/group__util__atomic.html

Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU.

Similar presentations

Presentation on theme: "Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU.

Similar presentations

Presentation on theme: "Alex Becker.  Multi-core is short for “multiple cores”  Advances in technology allow for several discrete cores on one chip  This however is not multi-CPU."— Presentation transcript:

Similar presentations

About project

Feedback