MySQL does not yet provide builtin functionality to work with database jobs. But in most cases, this is not really a problem. At least, not one that can't be solved. In most cases, some generic external scheduling tool like Cron (Linux) or Scheduled Tasks (Windows) can be used to do this.
Although these tools are probably sufficient, I decided that it would be fun to build a scheduler in the MySQL stored procedure language (Yep, I'll be calling that MySPL again). For one thing, I was just curious whether it would be possible.
I mean, all the ingredients are there really:
- we can build tables to hold the information that comprises a schedule and some definition of the actual work that needs to be done by the job
- we can build procedures that access data from these tables, and have the flow of control be influenced by this data
- as of the 5.0.12 version, we've got the
SLEEP()
function, wich is really important to implement recurrence. It's a bit tucked away inside the docs, but it's there allright! - as of the 5.0.13 version, we can use the prepared statement syntax from within MySPL
For those that are interested in all the details, there's an installation script available for download right here. It contains the bare schedular, and a little API as well. This little API makes it quite easy to define a job or alter it's characteristics. There's no documentation yet, but I hope this blog post can get you started. I am planning on writing some examples and documentation in the near future. I did include COMMENTs, so that should be of some help.
How do these pieces work together?
We can set up a table to hold the data that defines the actual work as well as the data associated with actually scheduling job execution. This information can be used by a particular procedure, the job runner. The job runner can be called, passing it a reference to a particular job. Then, the job runner takes care of scheduling and executing the actual work associated with the job.
Basically, the job runner implements a loop that uses the prepared statement syntax to perform the actual work. For each such individual run, the
CURRENT_DATETIME()
for start and finish are recorded, and this is used to have the SLEEP()
function suspend the loop until the next execution is due. The scheduling information is retrieved anew for each cycle, so job execution can be influenced by changing the schedule data while the job runner is running the job.Now there's something about this job runner mechanism that needs to be clarified right away. The job runner is just an ordinary procedure that uses the information in the
job
table. So, unlike the case with Oracle database jobs, there's no background process that automatically takes care of running jobs once we define them in the job
table. We need to first define them and then call a job runner to actually run the job.Because there's no way to call the job runner procedure asynchronously, the client that's executing a job runner will block until the entire job schedule has finished.
However, once a job runner is running, it will notice particular changes made to the job schedule. So, the timewindow wherein the job is supposed to run can be manipulated, as well as the interval between job cycles. We can also have the job runner inspect a status column. With these devices, we can instruct a running job gracefully (that is, without killing the thread running the job) from inside another thread, simply by updating data in a table.
The job
base table
We'll start by The Most Important Thing First, and that's, yep you guessed it right: the data. We are going to need data to define the characteristics of a job, that is, when it should become active, what it's name is, what work it needs to do...that stuff. Once we're running the job, we will need status information: data to see if a particular job is done etc.
Here's what I came up with:
delimiter go
drop table if exists job
go
create table job (
job_id
int unsigned
NOT NULL
auto_increment
COMMENT 'Job identification for reference purposes'
, job_owner
varchar(77)
NOT NULL
COMMENT 'The name of the user that owns this job'
, job_name
varchar(129)
NOT NULL
COMMENT 'The per-owner unique name of this job.'
, job_description
varchar(1024)
COMMENT 'A description of this job.'
, job_procedure_schema
varchar(64)
NOT NULL
COMMENT 'The schema in which the job procedure resides.'
, job_procedure_name
varchar(64)
NOT NULL
COMMENT 'The name of the job procedure.'
, job_schedule_start
datetime
COMMENT 'If applicable, the start of the time window for which the job is scheduled.'
, job_schedule_finish
datetime
COMMENT 'If applicable, the end of the time window for which the job is scheduled.'
, job_schedule_interval
int unsigned
NOT NULL
COMMENT 'The number of seconds between starts of job execution instances'
, job_status
enum(
'ABORTED'
, 'DISABLED'
, 'ENABLED'
, 'ERROR'
, 'FINISHED'
, 'PAUSED'
, 'RESUMED'
, 'RUNNING'
, 'SLEEPING'
)
NOT NULL
default 'ENABLED'
COMMENT 'The current status of the job'
, job_runner_connection_id
int unsigned
COMMENT 'The CONNECTION_ID of the connection executing the runjob procedure for this job.'
, job_runner_start
datetime
COMMENT 'The last time a jobrunner was started for this job'
, job_runner_finish
datetime
COMMENT 'The last time a jobrunner finished this job'
, job_runner_cycles
int unsigned
COMMENT 'The number of cycles the jobrunner successfully performed since jobrunner_start'
, job_runner_last_cycle_start
datetime
COMMENT 'The last time a job cycle started'
, job_runner_last_cycle_finish
datetime
COMMENT 'The last time a job cycle finished'
, constraint pk_job primary key (
job_id
)
, constraint uk_job1 unique (
job_owner
, job_name
)
)
ENGINE=MyIsam
DEFAULT CHARSET=latin1
COMMENT='A table that schedules stored procedure execution'
go
Although you could put a lot more effort in defining table structures to make up a nice complete scheduler, I decided that a simple structure would do for most of the cases. The following brief explanation of the columns is meant to draw a general outline. The table contents are to be manipulated using a little API of MySPL procedures.
job definition
A job belongs to a particular database user, the
job_owner
. A job also has a job_name
, wich uniquely identifies the job per owner. Optionally, a job_description
can be entered so that an administrator or developer can read what the job is supposed to do. Also, an job_id
was added to make it easy to reference a particular job.The actual work that's performed by the job is represented as stored procedure. So, in order to define a job, you will first have to write yourself a (parameterless) procedure that does all the actual work. Then, you can enter the schema and the name of this procedure in the
job_procedure_schema
and job_procedure_name
columns of the job
table.Note that there's really no technical reason to require a stored procedure for this.
In fact, any statement that can be executed from within the job runner procedure using the prepared statement syntax would do equally well. However, I chose to require a stored procedure because I think it's nicer that way. It ensures that there's a clear seperation between the job schedule definition and the actual work performed within it.
job scheduling
We still need some information to actually schedule job execution. Therefore, we have the
job_schedule_start
and job_schedule_end
columns.Together, these define a timeframe wherein the job is supposed to be automatically executed. The
job_schedule_interval
determines the frequency of execution. Right now, scheduling is very primitive. Because you can only schedule using a fixed interval, it is impossible to schedule things at, say, every first of the month. That's because a month is not a fixed period of time. Oracle jobs use an expression that's evaluated at runtime to determine when the next execution will start. This is something I'm thinking of implementing, but for now, we're stuck with fixed intervals.
running the schedule
The other columns provide information about the current job runner executing the job.
Some are there for monitoring purposes, but some others are used to actully maintain the job runner's state. Most of these columns are updated each time a loop cycle done by the job runner starts or finishes the procedure associated with the job.
The
job_status
provides information about the state of the job: wheter it needs to be started, wheter is is currently executing the job procedure, and more.Most of the available statuses are automatically provided by the job runner once it's running a job. The
job_runner_connection_id
identifies the thread currently running the job. This is used internally to prevent different job runner instances from running the same job. This can also be useful to kill the job.The
job_runner_start
and job_runner_finish
record the start and finish of the execution of the job runner procedure itself. The job_runner_cycles
contains the number of cycles that the job runner has gone through since it was started. The job_runner_cycle_last_start
and job_runner_cycle_last_finsished
columns are used to record when the last cycle started or finished. The job$run
procedure
Once you scheduled jobs by entering data into the
job
table, you can invoke the job runner to actually execute the job according to schedule. I've implemented this in the procedure job$run
. What happens in this procedure is this:- We try to get a lock for this job runner. This prevents having more than one job runner executing the same job simultaneously
- If we succeed in acquiring the lock, we retrieve some data describing the job. We use this data decide wheter the job is eligable for running. For example, it could be the case that the
job_schedule_finish
is earlier than the current date and time, wich makes the job not eligable for running. - If we decide the job is eligable for running, we could have to wait in case the current date and time are earlier than the
job_schedule_start
. THis is achieved with theSLEEP()
function - Once we reach the
job_schedule_start
date and time, we can start cycles executing the job procedure specified by thejob_procedure_schema
andjob_procedure_name
columns. Actual execution is performed by thePREPARE
andEXECUTE
syntax. - Cycling will occur at the frequency specified by
job_schedule_interval
. Again, theSLEEP()
is used. This time, we need toSLEEP()
through the period between the finish of theEXECUTE
and start ofEXECUTE
plusjob_schedule_interval
seconds - For each cycle, some condition could arise that should terminate cycling. For example,
job_schedule_finish
could be reached, orjob_status
may be updated by another thread to actively initiate termination.
delimiter go
drop procedure if exists job$run
go
create procedure job$run(
-- the name of the job to run
-- this must correspond to an existing row in the job table
-- for which current_user() = job.job_owner
-- and for wich the p_job_name = job.job_name
in p_job_name varchar(64)
)
comment 'Activates a previously defined stored procedure execution schedule.'
BLOCK_MAIN: begin
declare JOBSTATUS_ABORTED varchar(7)
default 'ABORTED';
declare JOBSTATUS_ENABLED varchar(7)
default 'ENABLED';
declare JOBSTATUS_ERROR varchar(5)
default 'ERROR';
declare JOBSTATUS_FINISHED varchar(8)
default 'FINISHED';
declare JOBSTATUS_PAUSED varchar(6)
default 'PAUSED';
declare JOBSTATUS_RUNNING varchar(7)
default 'RUNNING';
declare JOBSTATUS_SLEEPING varchar(8)
default 'SLEEPING';
declare v_message varchar(1024);
declare MESSAGE_JOB_LOCKED varchar(1024)
default 'Job already locked by connection with id: ';
declare MESSAGE_ERROR_LOCKING_JOB varchar(1024)
default 'An error occurred while locking the job';
declare MESSAGE_JOBSTATUS_NOT_ENABLED varchar(1024)
default 'Current job status not ''ENABLED''';
declare MESSAGE_SCHEDULE_HAS_EXPIRED varchar(1024)
default 'The schedule has expired';
declare MESSAGE_INTERRUPTED varchar(1024)
default 'Interrupted while sleeping';
declare MESSAGE_NO_SUCH_JOB varchar(1024)
default 'No such job';
declare MESSAGE_PROCEDURE_FAILED varchar(1024)
default 'The job procedure raised an exception';
declare MESSAGE_JOB_ABORTED varchar(1024)
default 'Job aborted at external request';
declare MESSAGE_OK varchar(1024)
default 'Ok';
declare v_lock_status tinyint unsigned;
declare v_locked_by int unsigned;
-- get a lock to prevent simultaneous execution of the same job
call job$get_lock(
p_job_name
, v_lock_status
, v_locked_by
);
if v_lock_status = 1 then
-- Ok, we're now the one and only job runner for this job
BLOCK_LOCK_ACQUIRED: begin
declare v_current_timestamp timestamp
default current_timestamp();
declare v_job_id int unsigned;
declare v_job_schedule_start datetime;
declare v_job_status enum(
'ABORTED'
, 'DISABLED'
, 'ENABLED'
, 'ERROR'
, 'FINISHED'
, 'PAUSED'
, 'RUNNING'
, 'SLEEPING'
);
declare exit handler for not found
set v_message := MESSAGE_NO_SUCH_JOB;
-- retrieve some general data and decide whether this job is eligble to run
select j.job_id
, coalesce(
j.job_schedule_start
, v_current_timestamp
)
, case
when j.job_status != JOBSTATUS_ENABLED
then MESSAGE_JOBSTATUS_NOT_ENABLED
when j.job_schedule_finish < v_current_timestamp
then MESSAGE_SCHEDULE_HAS_EXPIRED
else MESSAGE_OK
end as message
, concat(
'CALL'
, ' '
, '`',j.job_procedure_schema,'`'
, '.'
, '`',j.job_procedure_name,'`'
, '()'
)
into v_job_id
, v_job_schedule_start
, v_message
, @job$run
from job as j
where j.job_owner = current_user()
and j.job_name = p_job_name
;
if v_message = MESSAGE_OK then
-- Yes, this job is eligble to run
BLOCK_ENTER_RUNJOBS: begin
declare v_sleep int unsigned;
declare v_job_schedule_interval int unsigned;
declare v_job_schedule_finish datetime;
declare v_job_runner_cycles int unsigned
default 0;
declare exit handler for not found
begin
set v_message := MESSAGE_NO_SUCH_JOB;
set v_job_status := JOBSTATUS_ERROR;
end;
-- preparing the job procedure
prepare stmt_jobproc
from @job$run
;
-- notify the outside we're running the job
-- it's just for monitoring purposes anyway
-- the lock is supposed the provide the actual safety
update job
set job_status = JOBSTATUS_RUNNING
, job_runner_connection_id = connection_id()
, job_runner_start = v_current_timestamp
, job_runner_finish = NULL
, job_runner_cycles = NULL
, job_runner_last_cycle_start = NULL
, job_runner_last_cycle_finish = NULL
where job_id = v_job_id
;
-- check if job is scheduled in the future in order to wait for it
if v_job_schedule_start > v_current_timestamp then
update job
set job_status = JOBSTATUS_SLEEPING
where id = v_job_id
;
-- sleep until job cycling is due
set v_sleep := timestampdiff(
SECOND
, v_current_timestamp
, v_job_schedule_start
);
if sleep(v_sleep) = 1 then
set v_message := MESSAGE_INTERRUPTED;
set v_job_status := JOBSTATUS_ERROR;
leave BLOCK_ENTER_RUNJOBS;
end if;
end if
;
-- this is the main loop
LOOP_RUNJOBS: loop
-- renew schedule data, it might've changed while we where asleep
select job_schedule_finish
, job_schedule_interval
, job_status
, current_timestamp()
into v_job_schedule_finish
, v_job_schedule_interval
, v_job_status
, v_current_timestamp
from job j
where j.job_id = v_job_id
;
-- decide wheter to finish or to continue cycling
if v_current_timestamp > v_job_schedule_finish then
set v_job_status := JOBSTATUS_FINISHED;
leave LOOP_RUNJOBS;
elseif v_job_status = JOBSTATUS_ABORTED then
leave LOOP_RUNJOBS;
else
-- skip the actual work when we're pausing
if v_job_status != JOBSTATUS_PAUSED then
update job
set job_status = JOBSTATUS_RUNNING
, job_runner_last_cycle_start = v_current_timestamp
, job_runner_cycles = v_job_runner_cycles
where job_id = v_job_id
;
-- calling the job procedure
BLOCK_EXECUTE_JOBPROC: begin
declare exit handler for sqlexception
begin
set v_job_status := JOBSTATUS_ERROR;
set v_message := MESSAGE_PROCEDURE_FAILED;
leave LOOP_RUNJOBS;
end;
execute stmt_jobproc;
end BLOCK_EXECUTE_JOBPROC
;
set v_job_runner_cycles := v_job_runner_cycles + 1
;
update job
set job_status = JOBSTATUS_SLEEPING
, job_runner_last_cycle_finish = current_timestamp()
, job_runner_cycles = v_job_runner_cycles
where job_id = v_job_id
;
end if;
-- calculate the amount of sleep until the next cycle's due
set v_sleep := timestampdiff(
SECOND
, current_timestamp()
, date_add(
v_current_timestamp
, interval v_job_schedule_interval SECOND
)
);
if v_sleep > 0 then
if sleep(v_sleep) = 1 then
set v_message := MESSAGE_INTERRUPTED;
set v_job_status := JOBSTATUS_ERROR;
leave BLOCK_ENTER_RUNJOBS;
end if;
end if;
end if;
end loop LOOP_RUNJOBS;
end BLOCK_ENTER_RUNJOBS
;
deallocate prepare stmt_jobproc
;
update job
set job_status = v_job_status
, job_runner_finish = current_timestamp()
where job_id = v_job_id
;
end if;
end BLOCK_LOCK_ACQUIRED
;
-- release the lock so another job runner can run the job
call job$release_lock(
p_job_name
, v_lock_status
, v_locked_by
);
elseif v_lock_status = 0 then
-- Rats! some other job runner was here first
set v_message := concat(MESSAGE_JOB_LOCKED,v_locked_by);
else
-- Whoops! an error occurred aqcuiring the lock
set v_message := MESSAGE_ERROR_LOCKING_JOB;
end if
;
-- Let the caller now what happened here
select p_job_name as job_name
, v_message as message
;
end BLOCK_MAIN
;
go
I used the bold typeface to highlight the usage of
SLEEP()
and the elements that are related to the prepared statement syntax. Locking is implemented using
job$get_lock
and job$release_lock
, wich use the builtin GET_LOCK()
and RELEASE_LOCK()
functions. Just download install_job.mysql to see how it works exactly.The rest is pretty straight-forward: just nesting and looping.
20 comments:
I think the fact that it works is really good - however most scheduling is not done in isolation.
Jobs are distributed around the enterprise, on different architectures, in various languages with complex dependancies on each other.
I have written a distributed scheduler with each node responsible for jobs on its own host, with a central repository available for reporting. It is all java-based, and I want to kick off an open source project to take development further, but have no idea where to start! Any ideas anyone?
Hi anonymous,
Of course, a solution like this can never solve these distributed jobs problems.
However, in my work, I have encountered quite a lot of situations where database jobs like these came in quite handy: loading a staging area or datawarehouse overnight, recording database performance data throughout the day, dumping flatfile text reports each week.
When the situation allows it, I think that keeping all the information about the job inside the database has some advantages over an external tool. For example, dumping the database also dumps the jobs.
Concerning your project - sounds like you could make a lot of people happy. I have zero experience in organizing an open source project, but it seems sensible to take a look around at
http://sourceforge.net/
Even if you do not want to make it available there, you can get some inspiration and ideas there.
Thanks, I'll look around at sourceforce.
As for automatically getting the jobs backed up with the database, given its java roots it may not surprise you that my scheduler stores all its info in a JDBC datasource - in my case, MySQL! So its already being backed up...
Hi!
In 5.1 we have the following support:
SCHEDULE
[ schedule-name ]
{ START TIME start-time | BETWEEN start-time AND end-time }
[ EVERY period { HOURS | MINUTES | SECONDS } ]
[ ON { ( day-of-week, ... ) | ( day-of-month, ... ) } ]
[ START DATE start-date ]
So we are looking at scheduled event support directly in the database.
Cheers,
-Brian
Well, what can I say?
Let me start by saying that I'm really glad there are some guys and galls out there that are taking the time to give at least some *REAL* feedback on this blog post (or this blog for that matter)...so far, I've seen lot's of bots advertising using the kind of "Hey, what a great blog you got there, check out my XXX-site" comments.
Second, let me repeat some phrases from my post:
"MySQL does not yet provide builtin functionality to work with database jobs. But in most cases, this is not really a problem...In most cases, some generic external scheduling tool...can be used to do this.
...these tools are probably sufficient, I decided that it would be fun to build a scheduler in the MySQL stored procedure language...I was just curious whether it would be possible.
"
So here it is: when it comes to some real scheduling problem, I would not be presenting this as The Solution or whatever. I was just playing around when I came up with this job thing. It started with a single procedure job runner procedure, just to have a legitimate reason for me to put my teeth into MySQL dynamic sql, and, to a lesser extend, the SLEEP() function. I put the rest of the API together in "no-time" to make it look and feel like a convincing whole.
So, anonymous (I still dont know your name), I wish you all the best with your open source project. If you feel like it, drop a comment here with the particulars so I can follow your project's development.
And Brian (I'm guessing you are an employee of MySQL AB): this syntax looks very promising! You guys keep surprising me. I am really, really considering to advise my customers of using MySQL as a database solution instead of Oracle or MS SQL.
Just a question Brian, will it be possible to specify an expressions for intervals etc? This is IMO one of the very cool features about Oracle jobs.
Hi!
Yes, this is the Brian who works for MySQL AB :)
And I do believe that you can specify intervals in the syntax. I pointed this blog entry to the developer who wrote the code, and hopefully he will respond.
Cheers!
-Brian
Hi,
I am relatively new to MySQL and working on version 5.0.
Can anyone please tell me that how can I implement SCHEDULING in this version MySQL.
thanks.
Hi!
Just download
http://www.xcdsql.org/MyJOB/install_job.mysql
here should be a read me in there that explains it.
Let me know if you need more help.
Hi,
I have some problem installing from GUI Tools on 5.0.67:
Script line: 216 This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)
Script line: 698 LEAVE with no matching label: LOOP_RUNJOBS.
Then, the script does not install job$get_lock_name and job$run.
hi, can you tell me how to use this jobs, what i'm suppose to do to make that jobs work, because i really need this scehduling execution for my company project to use job. I'm clueless after install it.
thank you for your help
@itrade: these problems are now fixed, thanks for letting me know.
@anonymous: there is now an example in the installation script. The general pattern is that you first create a stored procedure that executes the work you want to have scheduled. Then you call: job$create or job$create_and_schedule to tell the job runned when to start running the job, and how often, and optionally when the job should stop. Finally, you call job$run. job$run will then keep the connection occupied until the schedule ends (or is aborted)
Hi Roland,
First of all, thanks for posting this very useful job scheduling solution.
After successfully running the install_job I decided to give it a try.
I ran job$create_and_schedule followed by job$run. However I ended up getting a 'No such job' message.
Your help would be appreciated. Thanks.
Hi Roland,
I've successfully copy your Scheduling Store Procedures/Functions. Thanks a lot.
Now I've a question. When I run a schedule, about 24hours and interval is 5 second, is it the MySQL will take many resource or make the database run abnormal or hang?
its really very very userfull post .. i am really thankful for all
I know it is a far stretch to get an answer after all this time but...
I am trying to get this working under MySql 5.0(my host won't upgrade to a newer version...), and there are two problems. First, running the script for making the job$drop procedure returns a syntax error for around the line
declare v_message varchar(1024);
There aren't any more details regarding the error.
The second problem is really the biggest issue - when I try to run a job with job$run, the script tells me that the stored procedure raised an exception - but would not tell me what the exception is. What is worse is that I tried it with a very bare bones procedure that basically does nothing - and again, an exception and no idea what it is or how to fix it.
I would be very grateful for any help, even though it is unlikely. I really need this to work.
Hi Kiril!
thanks for your interest in this post. I'm a bit rusty but I'll do my best to help as goood as I can.
Regarding problem #1, can you please post the *exact* error message? It just works for me so i have no idea what to look for...
As for problem #2, can you please send me the code you run to create your procedure, as well as the calls to the job$ API that you use to schedule it?
Also, does the simple example in the comment section that heads the script work for you or not?
kind regards,
Roland
Hi Roland,
It is a really nice surprise to receive a response so fast. Thanks a lot for your help!
#1 Here is the full error message that I receive
[SQL]
create procedure job$drop(
in p_job_name varchar(64)
)
COMMENT 'Removes a job'
BLOCK_MAIN: begin
declare v_message varchar(1024);
[Err] 1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 7
I am not very experienced in SQL so I might be missing some syntax error there.
#2 Actually I fixed that, I copied the install_jobs.mysql script by hand as opposed to just running it and now I don't get the error - now it works perfectly. I am not sure what the problem was, I suppose I should try installing from the command line in the future. In any case, thanks for the fast reply and the help, and of course for the great tool.
Best regards,
Kiril.
Ok Kiril, thanks for the reply.
Regarding that error: it looks as if somehow the initial DELIMITER statement (at the top of the script) wasn't picked up. My original script uses %% as delimiter, which is rather unorthodox. I also noticed that in many places the %% delimiter is not on its own new line, you could try and make sure that they are - it might help.
A possible cause would be if the client you're using doesn't send the entire script as is to the server, but chops it up itself (forgetting to execute the initial DELIMITER command) although I would expect more things to go wrong if that really was the case. So basically, I'm still at a loss as to what the cause is for that problem. Anyway, it's not a big deal, you can easily drop jobs manually by simply deleting them from the table.
As for #2: great! glad to hear it works for you now. I'll just leave it here then, thanks for deploying the software, I hope it will be useful for you.
kind regards,
Roland.
Hi Roland !
Thanks for a very helpful table and procedure. I know it can take some time to put these things together. It seems the on Aug. 9, 2012 the link: http://www.xcdsql.org/MyJOB/install_job.mysql is 404 ... maybe someone could post the code somewhere? It would be a big help.
Thanks again.
The link for install_job.mysql is 404 but I found it at :
http://ubportal.googlecode.com/svn/trunk/urbanbloke/ddl/urbanbloke-jobs.sql
Thanks !
Post a Comment