Linux Fu: a strange use for Fork()

If you’re a Star Trek fan, you probably remember the phrase “You need to learn why things work on a spaceship.” The truth is that in most episodes it wasn’t very useful to know how to ignore another ship’s console or make gunpowder, but when it did, it really saved the day. Linux is a lot like that. There are a few things you probably don’t need to know often, but if you do, it makes a huge difference. In this particular post, I want to look at a strange use of the fork system call. For many purposes, you never need to know this particular irregular usage. But if you need it, you really need it.

This is actually based on an old client of mine who used Unix to put out a huge and very critical report every day. The report had a lot of math because they were trying to optimize something and then produced a lot of reports. At the time, the report output was on old green-bar paper on a line printer. The problem was that the report took about 14 hours to complete, including the prints. If someone found something wrong, there was no time to run the report again because the next day’s report would have to start before the second run ended.

The customer had a bunch of Windows programmers and – at the time – there was nothing really analogous to a real fork call in Windows. I looked at the code and realized that probably most of the code was waiting time to print the output. The computer had multiple CPUs and multiple printers, but that one program was attached to that one printer. There was a lot of data, so writing it to a database and then running different reports on it wasn’t a good option. The answer was to use the power of a fork. With a code change that took less than 30 minutes, the report was ready in less than five hours. They were very satisfied.

So how did I do it? The answer lies in how fork works. Just about every time you see a fork, you see some sort of exec call to start a new program. So when you think about fork, you probably think it’s part of how you start a new program and most of the time it’s true.

What exactly does fork() do?

However, the call does something very strange. It actually copies the entire running process into a new process. It then runs the new process. Of course, the original process is also performed. Normally when you see a fork, it will look like this:

int kindPID; childPID = fork(); if (childPID == 0) exec….  /* load the child program and run it */ /* the parent comes here only with childPID set to the new process ‘PID */ …

In other words, the return value for fork is zero for a child process and slightly different for the parent process. Some early Unix systems actually copied everything in the running process. However, that’s really inefficient, especially if you usually just load a new program right away.

Modern systems use COW or Copy On Write semantics. That is, the new process gets a reference to the original process memory and copies only relatively small amounts of memory when the child or parent program makes changes to that area of ​​memory. This is good for things like instruction spaces that shouldn’t change anyway, since very few people write self-modifying code. That means that both parent and child will see the exact same data right after a fork call, but any changes they make will not be reflected on the other side.

Parallel processing made easy

For my client’s lengthy report, the program was largely I/O bound. However, each report also had some pretty hairy math, in addition to all the math it took to get to the point where each report could run. Instead of running everything in one process, I’ve split the program into several pieces. The first piece did as much math as possible that applied to almost everything. Then the program called fork a number of times and each kid started a report that did a little more for itself and claimed a printer to write the output.

Since the CPU had multiple processors, everything was accelerated. Report three did not have to wait for reports one and two to be ready. Everyone could control the printers at the same time. It was an overall win and it took almost no time to make this solution.

Admittedly, not every problem will allow for a solution like this. But giving each report process a memory copy of the data was very fast compared to reading it from a file or database. The data didn’t change after the reports started, so real memory consumption wasn’t easy either.

An example

So is it really that simple? It is. The only problem now is that with modern machines it is difficult to find a simple problem to demonstrate the technique. I finally decided to just do something simple, but do a lot. My made-up task: fill a very large number of double-precision floating point numbers with some made-up but predictable data and then find the mean. By really big I mean 55 million entries or more.

I’ve created a program that can do the job in two ways. First, it just does it in the simplest way possible. A loop loops each item in the array, you add them up and divide them at the end. On my computer it takes a few times to run this a few times, averaging about 458 milliseconds – using the time command to figure that out.

The program can also accept an F parameter on the command line. If that’s in effect, the setup is the same, but a fork creates two processes to split the array in half and find the average of each half. I didn’t want the child to communicate back to the process, but of course you can. Instead, all you need to do is read the two averages, add them together, and divide by two to get the true average. I didn’t want to add the overhead to communicate the result, but it would be easy enough to do.

The time for the fork version to turn? About 395 milliseconds. Of course, your results will vary, and while 60 milliseconds may not seem like much, it does show that when two processes work together, multiple cores can work simultaneously.

The larger the array, the greater the time savings. For example, setting the size to 155,256,000 saved about 150 milliseconds. Of course, these timings are not scientific and there are many factors to consider, but the data clearly shows that splitting the work between two processes works faster.

The code

The code is clear. The work is not difficult, it is just a lot.

#include #include #include #include // compile: gcc -o stress stress.c // run: time stress // time stress F #define SIZE 55256000 // how big is the array? double big array[SIZE]† // The process routine goes from llimit to ulimit // For the single case that’s all // For the underlying case we split in two unsigned int ulimit=SIZE; unsigned int llimit=0; double total; // running total // Load the array with some fake data void setup (void) { unsigned int i; for (i=llimit;i1 && (*argv[1]==’f’ || *argv[1]==’F’)) dofork=1; // f or F will trigger a fork setup(); // load array if (!dofork) { // single case // ulimit and llimit are already set process(); output(0); } else // forking here { if (pid=fork()) { // parent — adjust ulimit ulimit=SIZE/2; Process(); waitpid(pid,NULL,0); // wait for child exit (0); } else { // child — adjust lower and upper bound llimit=SIZE/2; ulimit=SIZE; Process(); output(0); } } // we never come here }

Why things work on a spaceship

Now you know how fork really works. Yeah, sort of. There are many nuances about which handles are passed on to the child and which are not. And like I said, you won’t need this very often. But there are times when it will really save the day.

If you want to take multitasking to a higher level, try an older Linux Fu. Or check out the GNU Parallel tool.

This post Linux Fu: a strange use for Fork()

was original published at “”