Modern CPUs are smarter than you think

When it comes to programming, most of us write code at a level of abstraction that might be true of a 1960s computer. Input comes in, you process it, and you produce output. Sure, a call to strcpy might work better on a modern CPU than an older one, but your basic algorithms are the same. But what if there were ways to define your programs that would run better on modern hardware? That’s what a pre-printed book is about [Sergey Slotin] answers.

As a simple example, consider the effects of branching on pipelines. Almost all modern computers have a pipeline. That is, an instruction fetches data while an older instruction calculates something, while an even older instruction stores its results. The problem occurs when you have already partially executed a statement when you realize that a previous statement has caused a branch to another part of your code. Now the pipeline has to be pulled back and performance suffers as the pipeline is refilled. Everything that had an effect should be reversed and everything else should be thrown away.

That’s bad for performance. Therefore, some CPUs try to predict whether a branch is likely to occur or not and then speculatively fill the pipeline for the predicted event. However, you can, for example, structure your code to make it clearer how branching will occur or even, for some compilers, explicitly tell the compiler whether the branching is likely or not.

As you might expect, such techniques depend on your CPU and you need to benchmark to show what’s really going on. The text is full of graphs of execution times and an analysis of the generated assembly code for x86 to explain the results. For example, even something you think is a pretty good algorithm like binary search suffers from modern architectures and you can improve its performance with a few tricks. Actually, it’s interesting that the tricks work on GCC, but make no difference on Clang. Again, you have to measure these things.

Probably 90% of us will never need to use any of the optimizations you find in this book. But it’s a great book if you like solving puzzles and analyzing complex details. Of course, if you need to squeeze those extra microseconds out of a loop or if you’re writing a library where performance matters, this might just be the book you’re looking for. While it doesn’t cover many different CPUs, the ideas and techniques are applicable to many modern CPU architectures. You’ll just have to do the work of figuring out how to use another CPU.

We’ve looked at things like this before. Pipelines for example. However, sometimes optimizing your algorithm isn’t as effective as just changing it for a better algorithm.

This post Modern CPUs are smarter than you think

was original published at “”