> OpenMP is one of the easiest ways to make existing code run across CPU cores. ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		pixelesque 5 hours ago \| parent \| context \| favorite \| on: OpenMP 6.0 > OpenMP is one of the easiest ways to make existing code run across CPU cores. True (or with Intel TBB), however as someone with a lot of experience optimising HPC algorithms for rendering, geometry processing and simulation, there are caveats, and quite often you can get situations where the existing code that is parallelised this way more naively can spend disproportionate amounts of CPU usage on spinlocks in OpenMP or TBB instead of doing useful work. (I've also noticed the same thing happening with Rayon in Rust). Sometimes I've looked at code other colleagues have "parallelised" this way, and they've said "yes, it's using multiple threads", but when you profile it with perf or vtune, it's clearly not really doing that much useful parallel work, and sometimes it's even slower than single-threaded from a wall-clock standpoint, and people just didn't check if it was faster, they just looked at the CPU usage, and didn't notice the spinlocks.

CoastalCoder 3 hours ago [–]

Here's some reading that I personally have found helpful for optimizing parallel programs:

The best I've found so far:

https://cdn.kernel.org/pub/linux/kernel/people/paulmck/perfb...

And some other good reading:

https://www.amazon.com/Systems-Performance-Brendan-Gregg/dp/...

https://fgiesen.wordpress.com/2014/08/18/atomics-and-content...

https://travisdowns.github.io/blog/2020/07/06/concurrency-co...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact