EDIT (5/13/2019): midproc
and midprocrunner
have been renamed to reap
and reaper
, respectively.
I ran into a situation with a project where I needed to build two separate programs to work together. The first being a server, and the second being an on-going client-facing process. The server is designed to kick off any number of instances of the on-going process. Before having written any code to have the server run the on-going processes as children, I had intended the child processes to be independent of the server. That is, if the server went down, the child processes would still function. They would store requests in a queue and periodically attempt to reconnect to the server. Once the connection was re-established, everything would go back to normal. It turned out that Unix processes don't quite work like I had expected.
In reality, when a process spawns another process, it is called a "child"; the original is called the "parent". As long as the parent process has not exited, the child will continue to run. If for some reason the parent goes down, so does the child. That is, unless the child is considered "detached". A child process is detached when it no longer has a parent (I believe that technically, parent-less children are owned by an init
process). Thus, you can have a parent process spawn a detached child and immediately exit, which will leave the child process all alone.
However, there is a caveat with detached child processes. If a detached child process is given an exit signal (e.g. SIGKILL) before its original parent has exited, it will become a "zombie". Zombie processes do not consume resources, but they are still referenced in the process ID table. In order to dispose of the zombies, the original parent must exit.
Now, if you're trying to build a SaaS product where up-time is paramount, shutting down the server to periodically cleanup zombie processes is not a viable solution.
It was at this point when researching the situation that I decided that my goal was unachievable. I expressed my frustration to a co-worker, who gave some encouraging feedback. Still stumped, I took a few days to work on other things that needed my immediate attention. When I came back to the problem, I had a "d'oh!" moment.
I realized that I was misunderstanding how processes interacted. In the example I mentioned previously, when a detached-child-process's top-level parent exits, the child is assigned to the init
process. The same occurs when you have a nested child process. For example:
- Grandparent Process
- Parent Process
- Child Process
Here is where I was getting confused. I had assumed that when the parent process exits, the detached child process would be assigned to the grandparent process. I was wrong. No matter how nested a detached child process is, when its parent exits, it's assigned to the init
process. That makes it a top-level process. Then, the grandparent and child processes would be side-by-side, running independently.
- Grandparent Process
- Child Process
When I finally understood this, I went about creating the solution. After which I released the reap
and reaper
Go packages.
The reap
package can be used to create intermediate process runners. Or, in the context of the previous example, a temporary nested-parent process.
The reaper
package is an intermediate process runner that implements the reap
package. It's a simple implementation that doesn't get in the way, and gets the job done.
Here's an example of the usage of the reaper
:
package main
import (
"bytes"
"fmt"
"os/exec"
"strconv"
)
func main() {
// prepare a buffer, to which the PID will be written
var stdout bytes.Buffer
// prepare the command
sleepCmd := exec.Command("reaper", "-cmd='sleep'", "-args='30'")
sleepCmd.Stdout = &stdout
// run the command
err := sleepCmd.Run()
if nil != err {
panic(err)
}
// convert the PID string to a valid integer
pidInt, err := strconv.ParseInt(stdout.String(), 10, 64)
if nil != err {
panic(err)
}
pid := int(pidInt)
fmt.Printf("Created a detached process with an ID of %d!\n", pid)
}
Compiling and running the above program, with reaper
installed, will result in a detached-process for the sleep
command. Based on the time provided, 30
seconds by default, the process will stay around after its parent exits. It's also nice enough to clean up after itself when it's done. This is a simple example, but the runner could be used to detach any individual process.
For my needs, this works splendidly. If you're interested in using this, and would like to see the ability to detach multiple commands at once (e.g. sleep 30 && echo "something" | awk '{ print $0 }'
) let me know!
That's all, enjoy!
Credits
Cover image by Jens Lelie on Unsplash! :D
Top comments (2)
Nice article. Not sure about "GO" but why don't you try to check server process status from child process and if it's not running simply terminate child procress. I guess, this will also be a simple solution.
Howdy Mayur, thanks! Checking the status of the parent process is certainly something that can be done, but that's only possible in an ideal situation. Developing a service network in a decoupled way, can increase fault tolerance and scalability. When developing for reliability, you want to reduce the damage that could be caused by a catastrophic failure. That is where detached processes can really shine. If a service network is built in this way, then you could easily remove/add/upgrade any individual services and the network as a whole would continue to function properly with little to no data-loss or effect to the end user.
Cheers!