There comes a time in every sysadmins life when he needs to let his scripts take on a life of their own. They start working on big jobs for hours on end, and it becomes unreasonable to keep staring at the terminal to make sure everything is ok.
But, like any good helicopter-parent will tell you, you can't trust your kids scripts to do the right thing! Your job is to bring them up correctly, point them in the right direction and hope make sure they make the right decisions. Here's a handy check-list to put the mental fun back in fundamental!
1. Be clear! Absolute paths everywhere.
One of the more common gotchas when writing scripts is assuming the startup directory of the script. You can't afford to be vague like that! Don't let them guess the context, hard-code it in.
For example, you're testing your script locally, and it's looking great! You are now prepared to say "It works on machine!" even after it breaks in production! But that won't save you once your script just does not start from the crontab, because of something simple, like a bad log path.
Ways to approach this:
- Define the absolute paths within constants on top of the script
- Include the paths from the environment
- Or get the current directory path from within the script
- Require the "start path" be provided when running the script as an argument
2. Keep tabs on them! Log everything.
So, while you're looking directly at the output as it's happening, it behaves wonderfully! Your child script is a well-behaved and productive member of its environment. Surely you can let them do things on their own?
WRONG
If you're not paranoid at this point, you're just unaware of all the things that can go wrong! From paths and permissions and all the way to broken loops and undefined behaviour, there's a myriad of things that can break - and you should have a paper trail of it happening.
The last thing you want is having to debug a major outage by having to cause another one!
If you're running your script as a cronjob, it's easy to log the output:
0 9 * * * bash /var/scripts/check_messages.sh >/var/scripts/logs/check_messages.log 2>&1
The above will run the script once a day, but will write to the same log file every day, overwriting it. So you can get yesterdays output. That's the least you can do. Also notice the absolute path for the log file as well! We're not messing around.
Alternatively you can have logging within the script itself. A few things to keep in mind:
- Use timestamps everywhere! You need to know when something happened.
- If you think it's needed, include email alerts, so you're notified about critical issues on time.
- If you're keeping most of your logs, use
logrotate
. You don't want to run out of disk space because of your logs.
3. Don't assume that it won't fail! Supervise it.
Say your script is some kind of polling script, or something that needs to run forever. That's not something you should be starting and stopping from the crontab. It's also not ideal to get a notification just when it dies - it should restart on it's own as well.
Supervisord
to the rescue!
This is an awesome tool you need to be using in these cases. It's like the best personal trainer out there! He never sleeps and just runs after your processes yelling "No slacking! Keep running!".
For example, I've recently had issues with the Docker deamon dying from time to time. Obviously, me jumping into the server to start it up again was unproductive, so I added this to the supervisord
config file:
[program:dockerd]
command=dockerd
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/dockerd-supervisor.log
This tells it to supervise the dockerd
process, and to restart it when it dies. It also logs what's happening to a log file we can use for debugging later on.
And this works with your own scripts as well! It's also neat if you have a script that only runs for 1 hour and needs to start back up again. This way you avoid having potential same-process overlap like when using cronjobs.
4. Have a way to stop them! Teach them the signals.
Eventually, you'll want to be able to tell the fruit of your loins keyboard to hold on a second while you reconsider what's been happening. If you don't plan for this ahead of time you might end up sending hard-kill signals like SIGKILL
which can have nasty consequences.
Bad things that could happen by killing a script mid-way:
- File permissions stay wrong
- Corrupted files or database data
- Orphan processes and half-processed data
- Unreleased resources
To avoid the above, you should put in a SIGTERM
handler. You can do this in most any scripting language, I even have it working in my PHP scripts.
This way the script can finish what it's doing, process it's current batch of data, finish writing that CSV file line, and release all the used resources. So, when you write killall -s SIGTERM dockerd
you let it finish what it's doing before it stops.
Also, minor side-note: if you have sleep()
code in your script, the sleeping will be skipped to speed things up after it receives the SIGTERM
signal. Speaking of sleeping...
5. Make sure they're not selfish! Don't overload the system.
The great(and terrible) thing about your scripts running is that they will try to get the most out of the resources available. So if they're transferring a 1TB file over the network, you'll wish you'd changed the WiFi password!
They need to realize that they're not the only one in the flat server and that they need to clean up after themselves, as well as not hog the utilities.
There's a few guidelines you can go by to make sure they play nice:
- When transferring over the network, use upper limits. For example
rsync
has the--bwlimit
parameter for this. - When bombarding the database with queries, give it some leeway - sleep the script from time to time.
- When writing heavily to the disk, make sure it's not hogging the disk that the database uses.
- If possible, have it run during off-peak hours, and monitor the server performance.
Summary
So that's my list of "script" lessons I've learned the hard way - so you don't have to! Do share your tips in the comments as well, and point me (and others) to other nice resources on the subject.
You can never be too careful with them!
Mandatory disclamer: I do not condone helicopter parenting when it involves real children as opposed to child processes.
Top comments (7)
HI! Thanks for sharing. Your post is about how to write qualitative scripts, and yeah these things you've mentioned are important.
The other thing is not covered here that many scripts are written for one time purpose, it's hard to reuse those ones ( maybe not all the scripts should be reused, but anyway ... ). One of the idea I try to reinforce though my open source project SparrowHub is reusable scripts.
Best
Alexey
Hey man - thanks for sharing your project! I love the idea already, would have saved me some time in the past.
Will definitely look into it, and maybe make some of my scripts re-usable and post them there.
You are welcome, Nick.
I love this post. It makes me laugh, but is also super useful. Thanks!
Thanks for the praise π Mission accomplished then :D
Good tips! I as actually planned to create a python script daemon inside a low cost router (using openwrt), with such a tips as yours it could save the whole mission! So thank you for sharing your experience and thanks to google stories for picking up yours πβοΈ
Glad you found it useful!
I came up on Google stories - awesome :D