Not every company has racks of servers to devote to a single web property. While we're experiencing record numbers of sessions and pageloads on OMBE.com at present, we've still not decided to bite the bullet and shell out for load-balancing or a CDN and the attendant headaches of pushing everything from single-server orientation into multi-server-ready.
So, when we began experiencing load spikes without corresponding traffic spikes, we had some investigation to do.
What we found was that we had several labor-intensive scripts running from cron(8)
, doing DB maintenance and stuff (such as generating static HTML from a large dataset) ... and they were all contending for system resources. Here's what our crontab file used to look like (and this is what a LOT of examples of crontab files look like):
#min hr day month wkday command
*/5 * * * * /etc/scripts/kill_lockfile
5 4 * * * /etc/scripts/dump_db
30 1 */2 * * /etc/scripts/make_sitemaps
0,30 * * * * /etc/scripts/make_all_filters
*/7 * * * * /etc/scripts/fix_brandnames
55,25 * * * * /etc/scripts/fix_foo
48,18 * * * * /etc/scripts/fix_bar
35,5 * * * * /etc/scripts/create_baz
.... and many more ...
So, make_sitemaps
is DB-intensive, and so are dump_db
, make_all_filters
and fix_brandnames
, and probably some other "fix_this" scripts. God forbid they ever run at the same time ... but sooner or later, with a crontab(5)
like this, they will, and they did. It made pages slow to a virtual crawl ... and that's not good for your visitors.
So, what's the answer? Simplify!
Here's what our crontab(5)
looks like now:
#min hr day month wkday command
*/5 * * * * /etc/scripts/frequent
59 * * * * /etc/scripts/hourly
02 0 * * * /etc/scripts/daily
Each of these scripts is a wrapper that calls all the jobs we want to do daily, hourly, or frequently in sequence. So, none of the jobs run concurrently.
Now, how do we keep frequent
from running when hourly
is running? We create and hold a lockfile in /tmp
.
Here's the top of daily
--- PLEASE NOTE: this is pseudo-code that resembles the love-child of PHP & Shell (perhaps with a little perl too); you'll want to implement this in your language of choice.
$lockfile = '/tmp/cronlock';
while (file_exists($lockfile)) {
sleep 15;
}
echo daily > $lockfile;
At the end of daily
we erase our lockfile.
hourly
looks much the same. Although we put a little insurance at the top in case our hourly
jobs take ... a full hour:
//check for the lockfile, sleep if exists
if (file_get_contents($lockfile) == "hourly") {
// we are waiting on an instance of ourself!
exit;
}
while (file_exists($lockfile)) {
sleep(60); // wait for the lockfile to disappear; check every minute
}
frequent
is a little different; since it runs several times an hour, we don't sleep if the lockfile's present. We just exit ... after all, cron
will run our script again in 5 minutes!
if (file_exists($lockfile)) {
exit();
}
So, now we have all our maintenance scripts playing nice with each other, running in sequence and taking turns. The server's load average may still spike, but it will be because of traffic, not because of our maintenance jobs.
Let me know if this is helpful, and have a great WEYTI!
Top comments (0)