O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora.

Slides:



Advertisements
Similar presentations
Interval Notation Notes
Advertisements

N EW Y ORK S TATE P ARK C LOSINGS. T HE I SSUE Proposed closing/reduction of services of 55 parks and historic sites $11.5 million needed for $8.3 billion.
RL - Worksheet -worked exercise- Ata Kaban School of Computer Science University of Birmingham.
1 Labor Force Participation Here we look at some definitions about the labor force from a measurement point of view.
LONGBOARD MAKE PROJECT Caroline Manning& Cameron Baller.
O PERATING S YSTEMS AND A RCHITECTURES CS-M98 P ART 7 S CHEDULING Benjamin Mora 1 Swansea University Dr. Benjamin Mora.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
SFIS/HRMS Upgrade Presentation to Users August 16, 2011.
ENV 2006 CS3.1 Envisioning Information: Case Study 3 Data Exploration with Parallel Coordinates.
Given Connections Solution
CS 128/ES Lecture 9a1 Geocoding and Routing Ya can’t get thereah from hereah!
Defining Complex Numbers Adapted from Walch EducationAdapted from Walch Education.
Warm-Up 2/19 Factor each polynomial. 1. x² – x – 20 (x – 5)(x + 4)
EXAMPLE 2 Graph an exponential function Graph the function y = 2 x. Identify its domain and range. SOLUTION STEP 1 Make a table by choosing a few values.
HPC 1.4 Notes Learning Targets: - Solve equations using your calculator -Solve linear equations -Solve quadratic equations - Solve radical equations -
SECTION 2.2 Absolute Value Functions. A BSOLUTE V ALUE There are a few ways to describe what is meant by the absolute value |x| of a real number x You.
1 Using Semaphores CS 241 March 14, 2012 University of Illinois Slides adapted in part from material accompanying Bryant & O’Hallaron, “Computer Systems:
More practice with Fractions, Decimals and few other review items Are You Ready for the Test!
How many times can you write statistics in a minute? By: Madeline Stenken and Tara Levine.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Dr Bill Harvey Deputy Director, Learning and Teaching Transforming tertiary education through ICT: hype or reality?
Systems of Equations Solving by Graphing Systems of Equations One way to solve equations that involve two different variables is by graphing the lines.
Ideation and Concept Creation Lachlan Blackhall. Where do ideas come from? Vitally important to understand where ideas and concepts come from. Entrepreneurs.
EXPONENTIAL FUNCTIONS Section TOPIC FOCUS I can… Identify exponential growth and decay Graph exponential functions.
Solve polynomial equations with complex solutions by using the Fundamental Theorem of Algebra. 5-6 THE FUNDAMENTAL THEOREM OF ALGEBRA.
CS4500CS4500 Dr. ClincyLecture1 Lecture #1 Chapter 5: Addressing (part 1 of 3)
LADIT L ATENCY D IAGNOSTICS T OOL COMP 415, Spring 2008.
Type your title here By:. What is the Order of Operations?
Warm UP Take a few minutes and write 5 things you remember about the quadratic formula?? Take a few minutes and write 5 things you remember about the quadratic.
Slide 2 Plan = Perfect Practice for ACT PSAT = Perfect Practice for SAT Important to Know?
Becoming a Technical Designer. Technical Design A technical designer is a person who takes a designer’s idea and makes it a reality. They take the design.
What are Background Checking Applications? By: Intelifi Screening Technology.
Solving Inequalities Using Addition or Subtraction Honors Math – Grade 8.
Category Category Category Category Category
A polynomial function is a function of the form f (x) = a n x n + a n – 1 x n – 1 +· · ·+ a 1 x + a 0 Where a n  0 and the exponents are all whole numbers.
Working from home Online technology and the changing locations of women's labour Dr Melissa Gregg Department of Gender and Cultural Studies University.
TRUE FALSE QUIZCORE 3 & 4 Round 1 Round 2Round TOTAL OVERALL SCORE
P ROBLEM S ITUATION A basketball team stopped at a fast food restaurant after a game. They divided into two groups. One group bought 5 chicken sandwiches.
Bin Packing Algorithms. Bin Packing Consider a set of bins, all the same cross section and height. The bin packing problem is to pack into the bins a.
Good Day! 11/22/2016 Starter: When you see a person in a commercial that looks like a Dr. (they have a lab coat on) Do you believe what they say? The next.
Federal Acquisition Institute Mike Cameron May 11, 2005
An “enjoyable” introduction to Programming
Ideation and Concept Creation
Splash Screen.
International Trade Dr. Aravind Banakar –
BACK SOLUTION:
Warmup 1. Solve x2 – 14x + 9 = 0 by completing the square.
How TEST4U works TEST4U is an interactive test that automatically assesses candidate’s answer to a question.
Inconsistent and Dependent Systems and Their Applications
Introduction to Usability Engineering
التخطيط الاستراتيجي ووضع خطة العمل
A SMALL TRUTH TO MAKE Life 100%
Job Market Readiness Student Preparation
 Smart technics to solve Netgear router problem.  Troubleshot of all kind of Netgear issue.  Round the clock Services.
The skill of learning how to get on with people.
Merits for the week will focus on this skill being demonstrated.
“Our target groups are spread all over the world across 34 sites
Solve the equation: 6 x - 2 = 7 x + 7 Select the correct answer.
Making Tens.
Introduction to Usability Engineering
Making Tens.
Cognitive Development of a Toddler
Using the Quadratic Formula to Solve Quadratic Equations
Exercise Every positive number has how many real square roots? 2.
Ordered Pair – (11 - 2) CS-708.
Multiplying Up.
Classified Balance Sheet Goes Here
Types of Errors And Error Analysis.
Approaching Standards
Presentation transcript:

O PERATING S YSTEMS AND A RCHITECTURES CS-M98: C OURSEWORK S OLUTION Benjamin Mora 1 Swansea University Dr. Benjamin Mora

M ARKING RANGE 2 Benjamin Mora Swansea University Full understanding of problem and solution (>97) Ready for employment in HPC sector None of you (some very close though)! Almost there with multithreading. (70 to 97) Just need to see and understand solution. Most students in this category. Real issues with multithreading concepts, merging temporary results, and few basic C errors (50 to 70) Some hard work is really needed to understand the full solution <50: Issues with basic (C) programming and algorithmic concepts, including pointers and creating a data-structures Catching-up is crucial!!!

Q1 3 Benjamin Mora Swansea University Alignement of Data. Similar to lab exercise. See CPU part marks.

Q1 4 Benjamin Mora Swansea University void AoS_to_SoA (float *image, int x, int y) { imageRed=new float[x*y+PADDING]; imageGreen=new float[x*y+PADDING]; imageBlue=new float[x*y+PADDING]; unsigned long long alignR=(((unsigned long long) *imageRed)&31)/4; unsigned long long alignG=(((unsigned long long) *imageGreen)&31)/4; unsigned long long alignB=(((unsigned long long) *imageBlue)&31)/4; alignedRed=imageRed+8-alignR; alignedGreen=imageGreen+8-alignG; alignedBlue=imageBlue+8-alignB; float *R=alignedRed; float *G=alignedGreen; float *B=alignedBlue; for (int i=0;i<x*y;i++) { R[i]=image[3*i]; G[i]=image[3*i+1]; B[i]=image[3*i+2]; }

Q2 L OOP FOR K ITERATIONS 5 Benjamin Mora Swansea University for (int k=0;k<knnIterations;k++) { //1.init seed sums to 0 for (int seed=0;seed<N;seed++) { seedSums[0][seed]=0; seedSums[1][seed]=0; seedSums[2][seed]=0; seedCounters[seed]=0; } …

Q2 T HEN 6 Benjamin Mora Swansea University … //2. Determine and compute average of closer seeds for (int pixel=0;pixel<x*y*3;pixel+=3) { float maxDistance=10; int found=-1; for (int seed=0;seed<N;seed++) //Loop to be optimized { float dx=image[pixel+0]-seeds[0][seed]; float dy=image[pixel+1]-seeds[1][seed]; float dz=image[pixel+2]-seeds[2][seed]; float distanceSquare=dx*dx+dy*dy+dz*dz; if (distanceSquare<maxDistance) { //A closer seed has been found maxDistance=distanceSquare; found=seed; }

Q2 R ECOMPUTE NEW SEEDS 7 Benjamin Mora Swansea University //Last step for the iteration: compute average and update the current seed list for (int seed=0;seed<N;seed++) { if (seedCounters[seed]>0.01) { seeds[0][seed]=seedSums[0][seed]/seedCounters[seed]; seeds[1][seed]=seedSums[1][seed]/seedCounters[seed]; seeds[2][seed]=seedSums[2][seed]/seedCounters[seed]; } …//End of iteration

Q2 8 Benjamin Mora Swansea University Optimizing the inner loop Process 8 pixels at a time. Compare 8 pixels against one seed! Some were confused and tried 8 pixels vs 8 seeds Use cmplt and blend to replace condition. 2 blend s instructions needed! Some replicated mask computations! The part after the inner loop cannot be parallelized though. Still good speed-up using SIMD Especially when # seeds > 32 Many ways to do it. Extra cast computations done by all of you!

Q2 9 Benjamin Mora Swansea University Optimization comes from: Processing 8 pixels at a time. Removing the branch (no if then) Still tricky to get good speed up. Going further Loop unrolling. Minimize the number of computations inside the inner loop. Put all constant operations like set1 outside loop. Avoid shared cache lines when multithreading!

Q2 L OOP FOR K ITERATIONS 10 Benjamin Mora Swansea University float seedSums[3][N]; float seedCounters[N]; //Seed initialization; for(int j=0;j<3;j++) for(int i=0;i<N;i++) seeds[j][i]=(rand()+0.5f)/(RAND_MAX+1.f); for (int k=0;k<knnIterations;k++) { for (int seed=0;seed<N;seed++) { seedSums[0][seed]=0; seedSums[1][seed]=0; seedSums[2][seed]=0; seedCounters[seed]=0; }

Q2 L OOP FOR K ITERATIONS 11 Benjamin Mora Swansea University float seedSums[3][N];float seedCounters[N]; float8 seedId[N]; for (int seed=0;seed<N;seed++) seedId[seed]=set1((float &) seed); for(int j=0;j<3;j++) for(int i=0;i<N;i++) seeds[j][i]=(rand()+0.5f)/(RAND_MAX+1.f); for (int k=0;k<knnIterations;k++) { float8 seeds8[3][N]; for (int seed=0;seed<N;seed++) { seedSums[0][seed]=0; seedSums[1][seed]=0; seedSums[2][seed]=0; seedCounters[seed]=0; seeds8[0][seed]=set1(seeds[0][seed]); seeds8[1][seed]=set1(seeds[1][seed]); seeds8[2][seed]=set1(seeds[2][seed]); }

Q2 T HEN 12 Benjamin Mora Swansea University … //2. Determine and compute average of closer seeds for (int pixel=0;pixel<x*y*3;pixel+=3) { float maxDistance=10; int found=-1; for (int seed=0;seed<N;seed++) //Loop to be optimized { float dx=image[pixel+0]-seeds[0][seed]; float dy=image[pixel+1]-seeds[1][seed]; float dz=image[pixel+2]-seeds[2][seed]; float distanceSquare=dx*dx+dy*dy+dz*dz; if (distanceSquare<maxDistance) { //A closer seed has been found maxDistance=distanceSquare; found=seed; }

Q2 T HEN 13 Benjamin Mora Swansea University float8 *R=(float8 *) alignedRed; float8 *G=(float8 *) alignedGreen; float8 *B=(float8 *) alignedBlue; for (int pixel=0;pixel<x*y;pixel+=8) { float8 maxDistance=set1(10); float8 found8=set1(-1.f); //Just for initialization for (int seed=0;seed<N;seed++) //Loop to be optimized { float8 dx=sub8(R[0],seeds8[0][seed]); float8 dy=sub8(G[0],seeds8[1][seed]); float8 dz=sub8(B[0],seeds8[2][seed]); float8 distanceSquare=add8(add8(mul8(dx,dx),mul8(dy,dy)),mul8(dz,dz)); float8 comparison=cmplt8(distanceSquare,maxDistance); maxDistance=blend8(maxDistance,distanceSquare,comparison); found8=blend8(found8,seedId[seed],comparison); }

Q2 T HEN 14 Benjamin Mora Swansea University //Sum the pixel values to the appropriate seed for (int i=0;i<8;i++) { int found=(int&) found8.m256_f32[i]; seedCounters[found]+=1.; seedSums[0][found]+=((float *) R)[i]; seedSums[1][found]+=((float *) G)[i]; seedSums[2][found]+=((float *) B)[i]; } R++; G++; B++; } …

Q2 R ECOMPUTE NEW SEEDS 15 Benjamin Mora Swansea University Still the same!!! //Last step for the iteration: compute average and update the current seed list for (int seed=0;seed<N;seed++) { if (seedCounters[seed]>0.01) { seeds[0][seed]=seedSums[0][seed]/seedCounters[seed]; seeds[1][seed]=seedSums[1][seed]/seedCounters[seed]; seeds[2][seed]=seedSums[2][seed]/seedCounters[seed]; } …//End of iteration

Q3 16 Benjamin Mora Swansea University Most of you got the principles more or less right Practical implementation was wrong! Barriers were sometimes at the wrong location. Most of you added extra, unneeded barriers. Mutex have been accepted. Putting a lock on every seed change is too much/not good! Errors: Only using results from one thread at each iteration.

Q3 I DEA 17 Benjamin Mora Swansea University Break down image in 4 pieces For each thread iteration: Copy seeds in local variables (Performance) Loop for the current chunk of pixels. Compute seedSums and seeCounters the same way. Copy results in globally visible but separate variables. Barrier One thread Adds results from other threads to its own results Then Compute RGB average and update seeds. Barrier

Q3 C REATING T HREADS 18 Benjamin Mora Swansea University void knnCompressionSIMDPosix(float *image, int x, int y) { AoS_to_SoA(image,x,y); threadJobSize=x*y/nbThreads; pthread_t threads[nbThreads]; pthread_barrier_init(&barrier, NULL, nbThreads); for (int i=0;i<nbThreads;i++) pthread_create(&threads[i], NULL, posixThread, (void *) i); for (int i=0;i<nbThreads;i++) //separate loop pthread_join(threads[i], NULL); }

Q3 T HREAD ’ S J OB 19 Benjamin Mora Swansea University void * posixThread(void *arg) { long long threadNumber=(long long) arg; int firstPixel=threadNumber*threadJobSize; int lastPixel=firstPixel+threadJobSize; float seedSums[3][N]; float seedCounters[N]; //Seed initialization; float8 seedId[N]; for (int seed=0;seed<N;seed++) seedId[seed]=set1((float &) seed); if (threadNumber==0) for(int j=0;j<3;j++) for(int i=0;i<N;i++) seeds[j][i]=(rand()+0.5f)/(RAND_MAX+1.f); pthread_barrier_wait(&barrier);

Q3 T HREAD ’ S J OB 20 Benjamin Mora Swansea University for (int k=0;k<knnIterations;k++) { … Seed initalization is the same float8 *R=(float8 *) (alignedRed+firstPixel); float8 *G=(float8 *) (alignedGreen+firstPixel); float8 *B=(float8 *) (alignedBlue+firstPixel); for (int pixel=firstPixel;pixel<lastPixel;pixel+=8) { … loop code does not change … R++;G++;B++; }

Q3 M ERGING R ESULTS 21 Benjamin Mora Swansea University for (int seed=0;seed<N;seed++) { temporaryResults[threadNumber][0][seed]=seedSums[0][seed]; temporaryResults[threadNumber][1][seed]=seedSums[1][seed]; temporaryResults[threadNumber][2][seed]=seedSums[2][seed]; temporaryCounters[threadNumber][seed]=seedCounters[seed]; } pthread_barrier_wait(&barrier);

Q3 M ERGING R ESULTS 22 Benjamin Mora Swansea University if (threadNumber==0) { for (int thread=1;thread<nbThreads;thread++) for (int seed=0;seed<N;seed++) { temporaryResults[0][0][seed]+=temporaryResults[thread][0][seed]; temporaryResults[0][1][seed]+=temporaryResults[thread][1][seed]; temporaryResults[0][2][seed]+=temporaryResults[thread][2][seed]; temporaryCounters[0][seed]+=temporaryCounters[thread][seed]; } …

Q3 M ERGING R ESULTS 23 Benjamin Mora Swansea University for (int seed=0;seed<N;seed++) { if (temporaryCounters[0][seed]>0.01) { seeds[0][seed]=temporaryResults[0][0][seed] /temporaryCounters[0][seed]; seeds[1][seed]=temporaryResults[0][1][seed] /temporaryCounters[0][seed]; seeds[2][seed]=temporaryResults[0][2][seed] /temporaryCounters[0][seed]; } } //end condition threadNumber==0 pthread_barrier_wait(&barrier); //end of iteration, seeds have been updated!