Bash script to check timestamp on cron logfile
Jun 30, 2010
For one of my current clients we've been making some changes to a script that runs nightly via cron
. Things like moving over to using bundler
, not running it via script/runner
to avoid loading the full Rails stack, and a couple of other things that have introduced the risk that a dependency may not load in production or be available in the context/user in which the script is run. Having been caught out once, it's time to put some monitoring on the sucker to make sure it doesn't happen again.
Redirecting script output to a logfile
For most cron
jobs we pipe the output to a log file somewhere to help keep out inboxes sane. It also means we can then monitor the log for events, and fire off an appropriate message (IM/jabber, email, sms) depending on the severity. So we redirect STDERR to STDOUT, and then append that to a file:
/path/to/myscript >> /var/log/myscript 2>&1
In addition to that though, we want to know the exit code for the last time it ran, so I'll append that to the end of the command
/path/to/myscript >> /var/log/myscript 2>&1; echo $? > /var/log/myscript-error-code
Returning the exit code
We're using Zabbix to monitor services, so what I need is a means of raising an error if the exit code from the last run was not zero. That could fairly trivially be done by simply doing cat /var/log/myscript-error-code
but I know were going to want to extend that just a little, so I'm going to put in in a shell script:
#!/bin/bash
LOG_DIR=/var/log
ERROR_CODE_FILE=myscript-error-code
cat $LOG_DIR/$ERROR_CODE_FILE
Checking that the script has been run in the past 24 hours
There are a number of servers this codebase is deployed to, but we only want it to run on one of them nightly. That also introduces the risk that the servers wont know which one should run the script, and as a result it doesn't run at all. To mitigate against that, we want to make sure the file has exit code is no more than 24 hours old:
MINS_SINCE_UPDATE=-1500
if [[ -z $(find $LOG_DIR -cmin $MINS_SINCE_UPDATE -name $ERROR_CODE_FILE) ]];
then
echo "Error"
fi
I've used the cmin
option because using atime
is problematic with monitoring things that should occur in less than a day, and I want to be confident in the difference between it running 23 hours ago, and 25 hours ago. The -z
option is testing if the following command returns an empty response.
Calculating the difference between now and the file timestamp
So I can immediately tell whether the problem lies with cron/scheduling or the script I'm going to make the monitoring send a more meaningful error when it is a scheduling error:
LAST_MODIFIED=`stat -c %y $LOG_DIR/$ERROR_CODE_FILE | cut -f1 -d '.'`
HOUR_SINCE=$(((`date -d "$LAST_MODIFIED" +%s` - `date +%s`)/3600))
echo "Import last run $LAST_MODIFIED ($HOUR_SINCE hours ago)"
Here I'm using stat
to return the date the file was last updated. I pass that into date
to convert it to seconds since epoch, and then subtract the current seconds since epoch. Divide those seconds by 3600 and you get how long in hours since that file was updated.
Putting it all together
So the final script looks something a little like this:
#!/bin/bash
LOG_DIR=/var/log
ERROR_CODE_FILE=myscript-error-code
MINS_SINCE_UPDATE=-1500
if [[ -z $(find $LOG_DIR -cmin $MINS_SINCE_UPDATE -name $ERROR_CODE_FILE) ]];
then
echo "Error"
LAST_MODIFIED=`stat -c %y $LOG_DIR/$ERROR_CODE_FILE | cut -f1 -d '.'`
HOUR_SINCE=$(((`date -d "$LAST_MODIFIED" +%s` - `date +%s`)/3600))
echo "Import last run $LAST_MODIFIED ($HOUR_SINCE hours ago)"
else
cat $LOG_DIR/$ERROR_CODE_FILE
fi
All that is left is to plug it in to Zabbix, and hope that it never fails again.
Previously I led the Terraform product team @ HashiCorp, where we launched Terraform Cloud and set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.