From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (2024)

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (1)

From Serial to Parallel From Serial to Parallel From Serial to Parallel From Serial to Parallel

Stephen Blair-ChappellIntel Compiler Labs

www.intel.com

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (2)

AgendaAgendaAgendaAgenda

�Why Parallel?

�Optimising Applications

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview6/18/20102

�Steps to move from Serial to Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (3)

Congratulations BCS FIG.wmv

3

Congratulations BCS FIG.wmv

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (4)

Moving to Parallel Moving to Parallel Moving to Parallel Moving to Parallel –––– a view from some developersa view from some developersa view from some developersa view from some developers

�Top 5 challenges

–Legacy

–Education

–Tools

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview6/18/20104

–Tools

–Fear of many cores

–Maintainability

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (5)

Why Parallel?Why Parallel?Why Parallel?Why Parallel?

Section 1

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (6)

Why is everyone going multi-core?

Po

wer

Den

sit

yP

ow

er

Den

sit

y(W

/cm

2)

(W/c

m2)

Power Density Race

1,0001,000

10,00010,000

Nuclear ReactorNuclear Reactor

Rocket NozzleRocket Nozzle

Sun’s SurfaceSun’s Surface

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Po

wer

Den

sit

yP

ow

er

Den

sit

y(W

/cm

2)

(W/c

m2)

4004400480088008

80808080

80858085

80868086

286286386386

486486

PentiumPentium®®

processorsprocessors

11

1010

100100

’70’70 ’80’80 ’90’90 ’00’00 ’10’10

Hot PlateHot Plate

Nuclear ReactorNuclear Reactor

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (7)

Moore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpretedMoore’s Law reinterpreted� Speed no longer

increasing

� Num transistors still growing

� Num Cores rather than clock speed is doubling every 18

From K. Olukotun, L. Hammond, H. Sutter, and B. Smith

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

doubling every 18 months

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (8)

Theoretical growth of coresTheoretical growth of coresTheoretical growth of coresTheoretical growth of cores

Growth of Multicore

128

512

20481000

10000

Nu

m C

ore

s

Cores

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

2

8

32

128

21

10

100

2000 2005 2010 2015 2020 2025

year

Nu

m C

ore

s

Cores

Cores Act

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (9)

Future: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and ManycoreFuture: Multicore and Manycore

All Large Core

Mixed Largeand

Small Core

All Small Core

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

All Small Core

Note: the above pictures don’t necessarily represent any current or future Intel products

Connections to memory bank(s), connections between processors,memory coherency models – all come into play. Diversity!

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (10)

MultiMultiMultiMulti----core : beating the core : beating the core : beating the core : beating the powerpowerpowerpower\\\\performanceperformanceperformanceperformance barrierbarrierbarrierbarrier

1.00x1.00x1.00x1.00x1.00x1.00x1.00x1.00x

1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x

1.13x1.13x1.13x1.13x1.13x1.13x1.13x1.13x

PowerPowerPowerPowerPowerPowerPowerPower

PerformancePerformancePerformancePerformancePerformancePerformancePerformancePerformance

1.02x1.02x1.02x1.02x1.02x1.02x1.02x1.02x

1.73x1.73x1.73x1.73x1.73x1.73x1.73x1.73x

DualDualDualDualDualDualDualDual--------CoreCoreCoreCoreCoreCoreCoreCore

0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x

0.87x0.87x0.87x0.87x0.87x0.87x0.87x0.87x

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

OverOverOverOverOverOverOverOver--------clockedclockedclockedclockedclockedclockedclockedclocked(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)(+20%)

Relative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative singleRelative single--------core frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcccore frequency and Vcc

DesignDesignDesignDesignDesignDesignDesignDesignFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequencyFrequency

DualDualDualDualDualDualDualDual--------corecorecorecorecorecorecorecore((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)

0.51x0.51x0.51x0.51x0.51x0.51x0.51x0.51x

UnderUnderUnderUnderUnderUnderUnderUnder--------clockedclockedclockedclockedclockedclockedclockedclocked((((((((--------20%)20%)20%)20%)20%)20%)20%)20%)

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (11)

Improved Transistor DensityImproved Transistor Density ~2x~2x

Improved Transistor Switching SpeedImproved Transistor Switching Speed >20%>20%

Reduced Transistor Switching PowerReduced Transistor Switching Power ~30%~30%

Reduction in gate oxide leakage powerReduction in gate oxide leakage power >10x>10x

Industry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm HighIndustry’s First 45 nm High----K + Metal Gate K + Metal Gate K + Metal Gate K + Metal Gate Transistor TechnologyTransistor TechnologyTransistor TechnologyTransistor Technology

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

45 nm S45 nm S--RAM CellRAM Cell0.346 µm20.346 µm2

66--transistortransistor 65 nm S65 nm S--RAM Cell RAM Cell 0.570 µm20.570 µm2

Enables New Features, Higher Performance, Enables New Features, Higher Performance, Greater Energy EfficiencyGreater Energy Efficiency

65 nm Transistor65 nm Transistor 45 nm HK + MG45 nm HK + MG

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (12)

Intel’s Teraflops Research Chip

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Podtech_Intel_Research_Day_Terascale.flv

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (13)

Intel’s Teraflops Research Chip

Speed

GHz

Power

Watts

Perf.

Teraflops

3.16 62 1.01

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

5.1 175 1.63

5.7 265 1.81

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (14)

LarrabeeLarrabeeLarrabeeLarrabee

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (15)

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (16)

What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?What can we do with faster Computers?

20

40

60

80

100

120

0 5 10

Processors

Time

• Solve problems faster

– Reduce turn-around time of big jobs

– Increase responsiveness of interactive apps

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

100

200

300

400

500

600

700

0 5 10

Processors

Problem Size• Get better solutions in the

same amount of time

– Increase resolution of models

– Make model more sophisticated

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (17)

Optimising CodeOptimising CodeOptimising CodeOptimising Code

Section 2

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (18)

Two points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelisingTwo points to address before you start parallelising

�Will buying a faster computer solve your problem?

Dr Yann Golanski, York

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

�Maybe Serial Optimisation will be sufficient.

6/18/201020

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (19)

A ThreeA ThreeA ThreeA Three----Tiered Tuning ModelTiered Tuning ModelTiered Tuning ModelTiered Tuning Model

Tuning Level Question being asked Examples of issues

System wide Can my system be ‘tuned’ to improve the

performance of my application

Network, disk and memory

performance.

Intrusion by 3rd party programs

such as virus scanners.

Application

Heuristics

Can my application code or heuristics to

improve performance?

Code redundancy. Inefficient

program algorithms. Poor

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

Heuristics improve performance? program algorithms. Poor

memory allocation strategies.

Bad \ missing threading

implementation.

Architectural

Bottlenecks

Is the CPU architecture being used at its best? Stalls in CPU pipeline. Data

alignment . Cache misses. Using

expensive instructions. Failing to

use latest generation optimised

instructions.

6/18/201021

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (20)

Compiler generated Optimisations

Global Compiler Options

Inter-procedural Optimisations

Profile Guided Optimisations

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201022

Optimisations

Vectorisation

Parallelisation

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (21)

Example of handExample of handExample of handExample of hand----crafted SSE instructionscrafted SSE instructionscrafted SSE instructionscrafted SSE instructions

1: bool SSEHasNumber(SUDOKU *pPuzzle,__m128i BinArray[], int

i, int j)

2: {

3: __m128i Tmp1 = ( _mm_and_si128(pPuzzle->BinNum[j-1],

BinArray[i]));

4: __m128i Tmp2 = _mm_setzero_si128();

5:

6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);

Time Taken Speedup

No SSE 4.55 sec 1

Copyright © 2007, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Intel® Software Development Products Overview

6: Tmp2 = _mm_cmpeq_epi32(Tmp2, Tmp1);

7:

8: unsigned int p[4];

9: _mm_storeu_si128((__m128i *)p, Tmp2);

10:

11: if (p[0] == 0 || p[1] == 0 || p[2] == 0)

12: return true;

13: return false;

14: }

No SSE 4.55 sec 1

With SSE 0.19 sec 24

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (22)

Modern Architectures have lots of features to help speed up code

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201024

The internals of the Intel low power IA architecture

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (23)

Intel® VTune™ Performance Analyzer

Graphical tool

Helps characterise runtime performance

System-wide View of application environment

Use to tune serial and

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201025

Use to tune serial and parallel code

Use to identify Hot Spotsin Code

Use to generate a call graph

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (24)

Call graphCall graphCall graphCall graph:::: Application workflowApplication workflowApplication workflowApplication workflow

The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.

The red lines show the critical path. The critical path is the most time-consuming call path. It is based on self time.

Filter view by self timeFilter view by self time

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201026

Bright orange nodes indicate functions with the highest self time.

Bright orange nodes indicate functions with the highest self time.

Intel, VTune, and the Intel logo are trademarks or registered trademarks of Intel

Corporation or its subsidiaries in the United States or other countries.

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (25)

Execution Units

ReservationStation

ReorderBuffer

MemorySub-system

Inst. Fetch

Branch Pred

5. uops dispatched

The life of a program instruction

1. Instruction read from memory

2. Instruction fed

4. uops queued in RS

6. Results sent to ROB

Decoder

Retirement

Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

dispatched 2. Instruction fed to Decoder

3. Micro-ops (uops)

generated

7. Instruction marked – all

uops executed

8. Instruction sent for

retirement

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (26)

Execution Units

ReservationStation

ReorderBuffer

MemorySub-system

Inst. Fetch

Branch Pred

Hardware Performance Events

BUS_TRANS_ANY.ALL_AGENTS

RS_UOPS_DISPATCHED.CYCLES_NONE

BUS_TRANS_ANY.ALL_AGENTS

RS_UOPS_DISPATCHED.CYCLES_NONE

Decoder

Retirement

Copyright © 2008, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

CPU_CLK_UNHALTED.CORE

INST_RETIRED.ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

MEM_LOAD_RETIRED.L2_MISS

CPU_CLK_UNHALTED.CORE

INST_RETIRED.ANY

RS_UOPS_DISPATCHED.CYCLES_NONE

MEM_LOAD_RETIRED.L2_MISS

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (27)

Demo Demo Demo Demo 0 0 0 0 –––– Using Intel© Using Intel© Using Intel© Using Intel© VTuneVTuneVTuneVTuneTMTMTMTM

Performance AnalyzerPerformance AnalyzerPerformance AnalyzerPerformance Analyzer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

From 1 to 1,000,000

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (28)

Notice the System Wide View Notice the System Wide View Notice the System Wide View Notice the System Wide View –––– Can you see any Can you see any Can you see any Can you see any problems?problems?problems?problems?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201030

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (29)

What is the problem here?What is the problem here?What is the problem here?What is the problem here?

� VTune Sample-over-time view

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201031

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (30)

The HotspotThe HotspotThe HotspotThe Hotspot

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201032

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (31)

Four Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to ParallelFour Steps in Moving to Parallel

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201034

Section 2

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (32)

Steps in moving from Serial to Parallel

Architectural Analysis

IntroducingParallelism

Validating

Serial

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201035

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (33)

Key Questions

Design

• Is my program parallel?

• Where is the best place to parallelise my program?

• How can I get my program to run faster?

• What’s the expected speedup?

Code & Debug

• How?

• How difficult?

• Is my code still working?

Verify

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Verify

• Is the parallelism correct?

• Do I have deadlocks or data races?

• Do I have memory errors?

• Does my program still work as intended?

Tune

• Do my tasks do equal amounts of work?

• Is my application scalable?

• Is the threading running efficiently?

6/18/201036

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (34)

Intel Software Tools Supporting Parallel Design Cycle

Architectural Analysis

IntroducingParallelism

Serial

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201037

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (35)

Architectural Analysis

IntroducingParallelism

Serial

Tools

Existing Intel Software

Intel Parallel Studio

Intel® VTuneTM

Performance Analyzer Advisor/Amplifier

Intel Compilers

Parallel Libraries

Composer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201038

ValidatingCorrectness

Performance Tuning

Parallel

Intel® Thread Checker

Inspector

Intel® Thread Profiler Amplifier

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (36)

For Microsoft Visual Studio* C++ architects, developers, and software

innovators creating parallel Windows* applications.

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201039

Intel® Parallel Studio includes:

• Intel® Parallel Advisor Lite **

• Intel® Parallel Composer

• Intel® Parallel Inspector

• Intel® Parallel Amplifier

** Beta – from whatif.intel.com

Microsoft Visual Studio* plug-in

End-to-end product suite for parallelism

Forward scaling to many core

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (37)

Architectural Analysis

IntroducingParallelism

Serial

Step 1

Which part of my code should I make Parallel?

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (38)

Key Questions - Design

Is my program parallel?

Where is the best place to parallelise my program?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

How can I get my program to run faster?

What’s the expected speedup?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (39)

Identifying best parts to Parallelize

Ph

as

e 1

Ph

as

e 2

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201043

Serial

Ph

as

e 2 Parallel

We need to Identify Hotspots

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (40)

Why use parallelism?Amdahl’s Law

Describes the upper bound of parallel execution speedup

Serial code limits speedup

0.5 + 0.250.5 + 0.25

n = 2n = 2n = n = ∞∞0.5 + 0.00.5 + 0.0

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201044

(1-P

)P

Tserial

(1-P

)

P/2n = number of processors

Tparallel = {(1-P) + P/n} Tserial

Speedup = Tserial / Tparallel

0.5 + 0.250.5 + 0.25

1.0/0.75 = 1.331.0/0.75 = 1.33P/∞∞

0.5 + 0.00.5 + 0.0

1.0/0.5 = 2.01.0/0.5 = 2.0

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (41)

Some code is not worth making parallel…

Don’t parallelise code

– just because it’s clever

– With low CPU utilisation

– I/O bound

• Do parallelise code that

– Eats significant CPU

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201045

– Eats significant CPU cycles

• You need to get visibility of the runtime behaviour

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (42)

Architectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your codeArchitectural Extensions can speed up your code

�Always optimise your code

�Even if you don’t go parallel, some architectural features can still give

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201046

architectural features can still give significant speed-up

�Example, SSE extensions

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (43)

Our Application - Prime Number Generator

bool TestForPrime(int val)

{ // let’s start checking from 3

int limit, factor = 3;

limit = (long)(sqrtf((float)val)+0.5f);

while( (factor <= limit) && (val % factor) )

factor ++;

return (factor > limit);

}

void FindPrimes(int start, int end)

i factor

61 3 5 7 63 365 3 567 3 5 7

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201047

void FindPrimes(int start, int end)

{

int range = end - start + 1;

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

}

67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (44)

Demo 1 – Getting the Demo 1 – Getting the Benchmark

From 1 to 1,000,000

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (45)

Our Application - Prime Number Generator

bool TestForPrime(int val)

{ // let’s start checking from 3

int limit, factor = 3;

limit = (long)(sqrtf((float)val)+0.5f);

while( (factor <= limit) && (val % factor) )

factor ++;

return (factor > limit);

}

void FindPrimes(int start, int end)

i factor

61 3 5 7 63 365 3 567 3 5 7

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201049

void FindPrimes(int start, int end)

{

int range = end - start + 1;

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

}

67 3 5 7 69 3 71 3 5 7 73 3 5 7 9 75 3 577 3 5 7 79 3 5 7 9

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (46)

Optimise the serial code firstOptimise the serial code firstOptimise the serial code firstOptimise the serial code first

� Using Intel Compiler to automatically generate SSE instructions.

– Code ran twice as fast

– No change made to original code

Calculating Pi

004018D9 movaps xmmword ptr [esp],xmm0

004018DD paddd xmm5,xmm6

004018E1 addpd xmm7,xmm3

004018E5 mulpd xmm7,xmm2

004018E9 add eax,8

004018EC mulpd xmm7,xmm7

004018F0 movaps xmm0,xmmword ptr ds:[406770h]

004018F7 addpd xmm7,xmm1

004018FB divpd xmm0,xmm7

004018FF cvtdq2pd xmm7,xmm5

00401903 paddd xmm5,xmm6

00401907 addpd xmm4,xmm0

0040190B movaps xmm0,xmmword ptr ds:[406770h]

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201050

TimeSecs

Speedup

No SSE 1.29 1.00

With SSE 0.66 1.95

0040190B movaps xmm0,xmmword ptr ds:[406770h]

00401912 addpd xmm7,xmm3

00401916 mulpd xmm7,xmm2

0040191A mulpd xmm7,xmm7

0040191E addpd xmm7,xmm1

00401922 divpd xmm0,xmm7

00401926 movaps xmm7,xmmword ptr [esp]

0040192A addpd xmm7,xmm0

0040192E cvtdq2pd xmm0,xmm5

00401932 movaps xmmword ptr [esp],xmm7

Example of SSE compiler-generated instructions

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (47)

Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Demo Demo Demo Demo 2 2 2 2 –––– Using the Intel CompilerUsing the Intel CompilerUsing the Intel CompilerUsing the Intel Compiler

From 1 to 1,000,000

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (48)

Swapping compilers.Swapping compilers.Swapping compilers.Swapping compilers.

� From solution drop-down menu

� Action is reversible

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201052

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (49)

Program built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compilerProgram built and run with Intel compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201053

Speedup 1.09

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (50)

Identifying HotspotsIdentifying HotspotsIdentifying HotspotsIdentifying Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201054

Pinpointing places where an application could be parallelised

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (51)

The Big QuestionThe Big QuestionThe Big QuestionThe Big Question

“How can I make my code run faster?”

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201055

run faster?”

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (52)

Today’s QuestionToday’s QuestionToday’s QuestionToday’s Question

“Where do I split up my code to take advantage of multiple CPU cores?”

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201056

CPU cores?”

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (53)

The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…The task, Identifying the Hot Spot…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201057

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (54)

… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.… and Splitting up the Work.

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201058

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (55)

Demo 2 Demo 2 Demo 2 Demo 2 –––– Finding the Hotspots Finding the Hotspots Finding the Hotspots Finding the Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201059

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (56)

Finding a Hot SpotFinding a Hot SpotFinding a Hot SpotFinding a Hot Spot

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201060

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (57)

Where to Parallelise Where to Parallelise Where to Parallelise Where to Parallelise –––– AmplifierAmplifierAmplifierAmplifier

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201061

Call StackHotspot

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (58)

Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?Design: What’s the expected speedup?

�Use Amdhals LawSpeedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201062

S = 0Speedup = 1 / [ 0 + ( 1 - 0 ) / 2 ]

= 1 / [ 0 + 0.5 ]

Speedup = 2 ( i.e. new speed ~ 0.672 seconds)

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (59)

Alternate CalculationAlternate CalculationAlternate CalculationAlternate Calculation

Speedup = 1/[s+(1-s)/n + H(n)]s is serial part (fraction of 1)H is parallel overhead (ignore)n is number of cores

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201063

S = 1 - (1.688 – 0.012)/1.688 = .007Speedup = 1 / [ .007 + ( 1 - .007 ) / 2 ]

= 1 / [ 0007 + 0.4965 ]

Speedup = 1.986 ( i.e. CPU Time ~ 0.850 seconds)

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (60)

Architectural Analysis

IntroducingParallelism

Serial

Step 2

Implement Parallelism in code

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (61)

Key Questions – Code & Debug

How?

How difficult?

Is my code still working?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Is my code still working?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (62)

Common types of parallelismCommon types of parallelismCommon types of parallelismCommon types of parallelism

�Functional or Task Parallelism

�Data Parallelism

�Software Pipelining

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201067

�Software Pipelining

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (63)

Task and Data ParallelismTask and Data ParallelismTask and Data ParallelismTask and Data Parallelism

� Different job for each thread

� e.g. one thread prints, another reads keyboard

� Splitting workload between multiple identical threads

� e.g. three identical threads perform calculations on data array

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201068

Task parallelismTask parallelism Data parallelismData parallelism

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (64)

Software PipelineSoftware PipelineSoftware PipelineSoftware Pipeline

Collect ACore 1

Core 2

Core 3

Collect B Collect C Collect D …

Transfer A Transfer B Transfer C Transfer D

Polish A Polish B Polish C

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201069

Core 3

Core 4

Time

Polish A Polish B Polish C

Produce A Produce B

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (65)

QuestionQuestionQuestionQuestion

�How many different ways can you think of to implement parallelism?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201070

parallelism?

–E.g OpenMP, …, …

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (66)

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201071

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (67)

Auto ParallelismAuto ParallelismAuto ParallelismAuto Parallelism

Loop-level parallelismautomatically suppliedby the compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201072

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (68)

AutoAutoAutoAuto----parallelization parallelization parallelization parallelization

� Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives.

Windows* Linux* Mac*

/Qparallel -parallel -parallel

/Qpar_report[n] -par_report[n] -par_report[n]

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201073

� Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze.

/Qpar_report[n] -par_report[n] -par_report[n]

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (69)

Optimisation Results Optimisation Results Optimisation Results Optimisation Results –––– pi applicationpi applicationpi applicationpi application

Optimisation Time Taken (secs) Speedup

default default default default 0.9380.9380.9380.938 1111

autoautoautoauto----vectorisationvectorisationvectorisationvectorisation 0.3750.3750.3750.375 2.52.52.52.5

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201074

autoautoautoauto----parallelismparallelismparallelismparallelism 0.5160.5160.5160.516 1.81.81.81.8

autoautoautoauto----vec. & autovec. & autovec. & autovec. & auto----par.par.par.par. 0.2030.2030.2030.203 4.64.64.64.6

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (70)

OpenMP ArchitectureOpenMP ArchitectureOpenMP ArchitectureOpenMP Architecture

� Fork-Join Model

Worksharing constructs

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201075

� Worksharing constructs

� Synchronization constructs

� Directive/pragma-based parallelism

� Extensive API for finer control

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (71)

OpenMP RuntimeOpenMP RuntimeOpenMP RuntimeOpenMP Runtime

Environment Variables

User

Application

Directive Compiler

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201076

Threads in Operating System

Runtime Library

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (72)

OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model: OpenMP Programming Model:

Fork-Join Parallelism: �Master thread spawns a team of threads as needed.

�Parallelism added incrementally until performance are met: i.e. the sequential program evolves into a parallel program.

Parallel Regions A Nested A Nested

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201077

Parallel RegionsMaster Thread in red

A Nested Parallel region

A Nested Parallel region

Sequential PartsSequential PartsSequential PartsSequential Parts*Other names and brands may be claimed as the property of others.

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (73)

Introducing ParallelismIntroducing ParallelismIntroducing ParallelismIntroducing Parallelism

#pragma omp parallel for

for( int i = start; i <= end; i+= 2 ){

if( TestForPrime(i) )

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

OpenMP Divide iterations of the forfor loop

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201078

ShowProgress(i, range);

} Create threads here for this parallel region

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (74)

Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using Demo 3 : Adding parallelism using ####pragmapragmapragmapragma ompompompomp forforforfor

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201079

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (75)

Results Results Results Results –––– Open MP Open MP Open MP Open MP

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201080

Amazing!

We Have a speed up!

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (76)

Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?Code: Is my code still working?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201081

Bother !!!!!!!!!!!!!!!!!!! Number of primes is wrong

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (77)

QuestionsQuestionsQuestionsQuestions

Are the results right?

Was the run quicker?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201082

Was the run quicker?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (78)

Architectural Analysis

IntroducingParallelism

Serial

Step 3

Check for any problems

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (79)

Key Questions - Verify

Is the parallelism correct?

Do I have deadlocks or data races?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

Do I have memory errors?

Does my program still work as intended?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (80)

New paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new toolsNew paradigm requires new tools

�Using traditional debugging tools is difficult /impossible– Printf – not re-entrant

– Debugging several threads is notoriously hard

– Many debuggers \ profilers are not multi-core enabled

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201086

�Multi-core tools are available

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (81)

Shared

Non deterministic Error Sources in parallel Applications

• Shared Resourcesrequire locks

Shared

Thread1 Thread2

L1

Thread1 Thread2

• Locks can– ‘serialize’ a program– lead to Deadlocks

X=0 X=0

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201087

SharedMemory X

SharedMemory X

time

L1

time

X=X+1 X=X+1

X=1

Wrong Result( X should be 2)

X=2

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (82)

Demo 4 Demo 4 Demo 4 Demo 4 –––– Checking for threading Checking for threading Checking for threading Checking for threading errors errors errors errors

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201088

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (83)

Checking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel InspectorChecking for Errors with Parallel Inspector

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201089

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (84)

The Offending SourcesThe Offending SourcesThe Offending SourcesThe Offending Sources

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201090

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (85)

Protecting shared variables#pragma omp parallel for

for( int i = start; i <= end; i+= 2 ){

if( TestForPrime(i) )

#pragma omp critical

globalPrimes[gPrimesFound++] = i;

ShowProgress(i, range);

}

Will create a critical section for this reference

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201091

}

#pragma omp critical

{

gProgress++;

percentDone = (int)(gProgress/range *200.0f+0.5f)

}

Will create a critical section for both these references

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (86)

Demo 5 Demo 5 Demo 5 Demo 5 –––– Fixing the threading Fixing the threading Fixing the threading Fixing the threading errors errors errors errors

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201092

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (87)

Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!Data Races Have Disappeared!

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201093

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (88)

Number of Primes is CorrectNumber of Primes is CorrectNumber of Primes is CorrectNumber of Primes is Correct

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201094

Number of primes is correct

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (89)

Architectural Analysis

IntroducingParallelism

Serial

Step 4

Tune for best performance

ValidatingCorrectness

Performance Tuning

Parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (90)

Key Questions -Tune

Is the threading running efficiently?

Do my tasks do equal amounts of work?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

of work?

Is my application scalable?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (91)

Performance Issues

Load Balancing

Synchronisation Overhead

Scalability

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201098

Difficult to examine without the right tools

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (92)

A Reminder – Where are we?

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/201099

Number of primes is correct

Almost as slow as the serial version

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (93)

Demo 6 – Find the Threading Demo 6 – Find the Threading Performance Issues

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (94)

Hotspot Analysis

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010101

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (95)

Source View

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010102

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (96)

Improving the Performance

void ShowProgress( int val, int range )

{

int percentDone;

gProgress++;

percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);

if( percentDone % 10 == 0 )

void ShowProgress( int val, int range )

{

int percentDone;

static int lastPercentDone = 0;

#pragma omp critical

{

gProgress++;

percentDone = (int)((float)gProgress/(float)range*200.0f+0.5f);

}

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010103

The algorithm has many more updates than the 10 needed for showing progress

if( percentDone % 10 == 0 )

printf("\b\b\b\b%3d%%", percentDone);

}

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

printf("\b\b\b\b%3d%%", percentDone);

lastPercentDone++;

}

}

This change should fix the contention issue

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (97)

Demo 7 – Fixing the Demo 7 – Fixing the Synchronisation issues

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (98)

Superb Speedup … ???

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010105

Speedup 7.36

On a dual core?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (99)

Demo 8 – Getting New Serial Demo 8 – Getting New Serial Benchmark

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (100)

That’s better (but disappointing)…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010107

Speedup 1.55

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (101)

Demo 9 – Correcting the Demo 9 – Correcting the Synchronisation Issue

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (102)

Hotspots

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010109

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (103)

Locks & Waits

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010110

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (104)

Source Code View

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010111

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (105)

Fixing the synchronisation issue -1

This fix removes the need for a critical section

void FindPrimes(int start, int end)

{

// start is always odd

int range = end - start + 1;

#pragma omp parallel for

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010112

#pragma omp parallel for

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;

ShowProgress(i, range);

}

}

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (106)

Fixing the synchronisation issue - 2

This fix removes the need for a critical section

void ShowProgress( int val, int range )

{

long percentDone, localProgress;

static int lastPercentDone = 0;

localProgress = InterlockedIncrement(&gProgress);

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010113

localProgress = InterlockedIncrement(&gProgress);

percentDone = (int)((float)localProgress/(float)range*200.0f+0.5f);

if( percentDone % 10 == 0 && lastPercentDone < percentDone / 10){

printf("\b\b\b\b%3d%%", percentDone);

lastPercentDone++;

}

}

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (107)

That’s better (but still disappointing)…

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010114

Speedup 1.6

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (108)

Demo 9 – Improving the Demo 9 – Improving the Load Balancing

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (109)

Threads are not doing equal work

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010117

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (110)

Fixing a Load Imbalance

Distribute the work more evenly

void FindPrimes(int start, int end)

{

// start is always odd

int range = end - start + 1;

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010118

#pragma omp parallel for schedule(static, 8)

for( int i = start; i <= end; i += 2 )

{

if( TestForPrime(i) )

globalPrimes[InterlockedIncrement(&gPrimesFound)] = i;

ShowProgress(i, range);

}

}

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (111)

That’s better

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010119

Speedup 1.92

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (112)

A Finely Balanced threaded Program!

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010120

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (113)

ScalabilityScalabilityScalabilityScalability

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010121

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (114)

Key Questions -Tune

Is the threading running efficiently?

Do my tasks do equal amounts of work?

1111

2222

3333

4444

s

p

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel

of work?

Is my application scalable?

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (115)

Moving to Parallel – a view from some developers

Top 5 challenges

•Legacy

•Education

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010123

•Education

•Tools

•Fear of many cores

•Maintainability

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (116)

Scalability http://paralleluniverse.intel.com

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010124

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (117)

The Results

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010125

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (118)

Without the printfs

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010126

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (119)

A run of 10 million

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010127

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (120)

Architectural Analysis

IntroducingParallelism

Serial

Tools

Existing Intel Software

Intel Parallel Studio

Intel® VTuneTM

Performance Analyzer Advisor/Amplifier

Intel Compilers

Parallel Libraries

Composer

Copyright © 2008, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

From Serial to Parallel6/18/2010128

ValidatingCorrectness

Performance Tuning

Parallel

Intel® Thread Checker

Inspector

Intel® Thread Profiler Amplifier

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (121)

thank youthank youthank youthank you

intel.com / go / parallelintel.com / go / parallelintel.com / go / parallelintel.com / go / parallel

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf· 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (122)

Q&AQ&Athank youthank youthank youthank you

From Serial to Parallel - Fortranfortrandev/2010/BCS_multicore_programming.pdf · 0.346 8m2 66--transistortransistor65 nm S65 nm S--RAM Cell RAM Cell 0.570 8m2 Enables New Features, - [PDF Document] (2024)

References

Top Articles
Latest Posts
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 5618

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.