|
|
# System
|
|
|
|
|
|
* **Processor:** Intel(R) Xeon(R) Gold 6148 CPU
|
|
|
* **Base frequency:** 2.4 GHz
|
|
|
* **Number of sockets:** 2
|
|
|
* **Number of memory domains per socket:** 1
|
|
|
* **Number of cores per socket:** 20
|
|
|
* **Number of HWThreads per core:** 2
|
|
|
* **[MachineState](https://github.com/RRZE-HPC/MachineState) output:** NA
|
|
|
|
|
|
# Tool chain
|
|
|
|
|
|
```
|
|
|
+----------+---------------------------------+
|
|
|
| Compiler | icc (ICC) |
|
|
|
|----------|---------------------------------|
|
|
|
| Version | icc (ICC) 19.0.5.281 20190815 |
|
|
|
+----------+---------------------------------+
|
|
|
```
|
|
|
|
|
|
Optimizing flags: ```-fast -xHost -qopt-streaming-stores=always -std=c99 -ffreestanding -qopenmp```
|
|
|
|
|
|
# Results
|
|
|
|
|
|
All results are in ```GB/s```.
|
|
|
|
|
|
Summary results:
|
|
|
```
|
|
|
+---------------------------------------------+
|
|
|
| Single core | 19.00 (SDaxpy) |
|
|
|
| Memory domain | 115.89 (Sum with 14 cores) |
|
|
|
| Socket | 115.89 (Sum with 14 cores) |
|
|
|
| Node | 229.31 (Sum with 15 cores) |
|
|
|
+---------------------------------------------+
|
|
|
```
|
|
|
|
|
|
Results for scaling within a memory domain:
|
|
|
```
|
|
|
#nt Init Sum Copy Update Triad Daxpy STriad SDaxpy
|
|
|
1 7.39 14.09 10.49 16.27 13.62 18.27 15.40 19.00
|
|
|
2 14.57 27.25 20.52 31.74 26.53 35.37 29.96 36.73
|
|
|
3 21.91 40.00 30.57 47.09 39.45 52.56 44.47 54.16
|
|
|
4 28.94 52.09 40.19 60.48 51.44 68.01 57.77 69.96
|
|
|
5 36.21 63.46 49.92 74.09 63.90 81.99 71.58 83.68
|
|
|
6 43.04 75.01 59.44 85.45 75.58 89.70 83.40 90.53
|
|
|
7 50.45 84.73 68.64 91.62 84.81 93.93 89.79 94.36
|
|
|
8 57.51 93.43 77.54 94.11 89.01 95.54 92.84 96.28
|
|
|
9 64.19 100.12 83.53 97.39 93.09 98.29 96.14 98.49
|
|
|
10 70.88 106.13 87.67 98.40 95.39 99.16 97.28 99.03
|
|
|
11 76.98 110.59 90.45 99.15 97.03 99.72 97.83 99.10
|
|
|
12 81.95 112.96 91.72 98.14 96.61 98.78 96.99 98.16
|
|
|
13 84.41 115.15 92.95 98.13 97.42 99.06 97.54 98.35
|
|
|
14 84.86 115.89 93.90 97.72 97.14 98.66 97.08 97.85
|
|
|
15 84.78 114.69 93.73 96.78 96.87 98.17 96.73 97.47
|
|
|
16 84.03 113.83 93.17 95.97 96.15 97.32 96.07 96.62
|
|
|
17 84.57 113.70 94.01 96.05 96.75 97.71 96.41 97.17
|
|
|
18 83.99 113.39 93.70 95.74 96.54 97.59 96.62 97.13
|
|
|
19 83.93 112.60 93.45 95.08 96.29 97.09 96.19 96.79
|
|
|
20 83.30 112.49 93.06 94.28 95.83 96.56 95.85 96.42
|
|
|
|
|
|
```
|
|
|
|
|
|
Results for scaling across memory domains. Shown are the results for the number of memory domains used (nm) with columns number of cores used per memory domain.
|
|
|
|
|
|
Init:
|
|
|
```
|
|
|
#nm 1 2
|
|
|
1 7.39 14.75
|
|
|
2 14.57 29.35
|
|
|
3 21.91 44.02
|
|
|
4 28.94 58.19
|
|
|
5 36.21 72.19
|
|
|
6 43.04 85.75
|
|
|
7 50.45 99.30
|
|
|
8 57.51 111.99
|
|
|
9 64.19 123.43
|
|
|
10 70.88 133.82
|
|
|
11 76.98 139.95
|
|
|
12 81.95 140.73
|
|
|
13 84.41 144.44
|
|
|
14 84.86 146.38
|
|
|
15 84.78 147.77
|
|
|
16 84.03 150.37
|
|
|
17 84.57 150.95
|
|
|
18 83.99 151.23
|
|
|
19 83.93 150.43
|
|
|
20 83.30 149.43
|
|
|
|
|
|
```
|
|
|
|
|
|
Sum:
|
|
|
```
|
|
|
#nm 1 2
|
|
|
1 14.09 27.70
|
|
|
2 27.25 53.52
|
|
|
3 40.00 79.00
|
|
|
4 52.09 102.78
|
|
|
5 63.46 127.70
|
|
|
6 75.01 148.95
|
|
|
7 84.73 168.84
|
|
|
8 93.43 186.01
|
|
|
9 100.12 201.89
|
|
|
10 106.13 211.85
|
|
|
11 110.59 219.94
|
|
|
12 112.96 223.29
|
|
|
13 115.15 229.00
|
|
|
14 115.89 228.15
|
|
|
15 114.69 229.31
|
|
|
16 113.83 227.30
|
|
|
17 113.70 226.97
|
|
|
18 113.39 224.66
|
|
|
19 112.60 224.22
|
|
|
20 112.49 223.27
|
|
|
|
|
|
```
|
|
|
|
|
|
Copy
|
|
|
```
|
|
|
#nm 1 2
|
|
|
1 10.49 20.87
|
|
|
2 20.52 41.20
|
|
|
3 30.57 61.05
|
|
|
4 40.19 80.42
|
|
|
5 49.92 99.51
|
|
|
6 59.44 118.12
|
|
|
7 68.64 136.09
|
|
|
8 77.54 153.09
|
|
|
9 83.53 165.66
|
|
|
10 87.67 173.55
|
|
|
11 90.45 179.98
|
|
|
12 91.72 182.30
|
|
|
13 92.95 184.18
|
|
|
14 93.90 184.25
|
|
|
15 93.73 185.11
|
|
|
16 93.17 184.21
|
|
|
17 94.01 184.01
|
|
|
18 93.70 184.57
|
|
|
19 93.45 184.32
|
|
|
20 93.06 182.00
|
|
|
|
|
|
```
|
|
|
|
|
|
Update
|
|
|
```
|
|
|
#nm 1 2
|
|
|
1 16.27 32.21
|
|
|
2 31.74 63.49
|
|
|
3 47.09 93.18
|
|
|
4 60.48 121.47
|
|
|
5 74.09 149.08
|
|
|
6 85.45 169.12
|
|
|
7 91.62 182.68
|
|
|
8 94.11 189.52
|
|
|
9 97.39 193.93
|
|
|
10 98.40 197.11
|
|
|
11 99.15 197.96
|
|
|
12 98.14 197.52
|
|
|
13 98.13 197.27
|
|
|
14 97.72 193.76
|
|
|
15 96.78 194.84
|
|
|
16 95.97 193.09
|
|
|
17 96.05 193.44
|
|
|
18 95.74 191.37
|
|
|
19 95.08 190.67
|
|
|
20 94.28 188.75
|
|
|
|
|
|
```
|
|
|
|
|
|
Triad
|
|
|
```
|
|
|
#nm 1 2
|
|
|
1 13.62 27.06
|
|
|
2 26.53 53.27
|
|
|
3 39.45 78.55
|
|
|
4 51.44 102.96
|
|
|
5 63.90 126.70
|
|
|
6 75.58 149.65
|
|
|
7 84.81 167.93
|
|
|
8 89.01 178.80
|
|
|
9 93.09 185.31
|
|
|
10 95.39 189.80
|
|
|
11 97.03 192.82
|
|
|
12 96.61 193.17
|
|
|
13 97.42 193.25
|
|
|
14 97.14 191.94
|
|
|
15 96.87 191.92
|
|
|
16 96.15 192.05
|
|
|
17 96.75 191.51
|
|
|
18 96.54 191.50
|
|
|
19 96.29 190.84
|
|
|
20 95.83 189.95
|
|
|
|
|
|
```
|
|
|
|
|
|
# Scaling
|
|
|
|
|
|
Memory bandwidth scaling within one memory domain:
|
|
|

|
|
|
|
|
|
The following plots illustrate the the performance scaling over multiple memory domains using different number of cores per memory domain.
|
|
|
|
|
|
Memory bandwidth scaling across memory domains for init:
|
|
|

|
|
|
|
|
|
Memory bandwidth scaling across memory domains for sum
|
|
|

|
|
|
|
|
|
Memory bandwidth scaling across memory domains for copy
|
|
|

|
|
|
|
|
|
Memory bandwidth scaling across memory domains for Triad
|
|
|
 |