Implement per thread first-touch policy in likwid-bench
Created by: cod3monk
likwid-bench
allocates and initializes data on first thread, not on the threads which will eventually use it. This leads to performance issues if the thread group in a domain spans multiple NUMA domains.
It should be sufficient to move initialization with the threads for better performance and more intuitive behaviour.