[Runtime] use pthread_getaffinity_np to determine default worker count
The value reported by std::thread::hardware_concurrency()
can be more than the actually available CPUs for example in a container or qemu.
pthread_getaffinity_np
fills a CPU set with all CPU a thread can be scheduled on.
Fixes #19 (closed).