[EchoClient] support latency histograms
Histograms can only be collected when using a fixed amount of iterations.
When the '--histogram ' argument is passed each Client collects 4 time stamps (each 8 byte):
- Before requesting the send operation
- After requesting the send operation
- After getting unblocked and dispatched because the send operation finished
- After getting unblocked and dispatched because the recv operation finished
Taking the timestamps is enabled using a template and thus does not introduce any runtime cost if they are not used except binary size.
Before termination three latencies are calculated and written to the histogram file as csv data for each client and each echo.
- total_latency := (T4 - T1)
- after_send_latency := (T4 - T2)
- after_send_dispatch_latency := (T4 - T3)