Sarbjit Parmar's Hello World
  • Business
  • Technology
  • vAnsh LLC
  • About
  • Contact
  • HW

Job Scheduling Best Practices

10/13/2013

0 Comments

 
While working on various data warehousing or batch jobs I have pondered, ran proof of concept projects for evaluating job scheduling tools, and implemented jobs so that they could be run in minimal time utilizing all the resources available in the execution environment while avoiding pitfalls like excessive parallelism. I have come up with some observations/ideals/requirement in this area, as described below.
  • Make the jobs ready for as quick a restart as possible.
  • Job dependency should be managed robustly.
  • Every job should report its status back to parent process/job scheduler. It is very important especially when we need to manage dependencies based on the success/failure of a predecessor job.
  • Every failed job should capture the error reported for performing a root cause analysis.
  • Job plan in the job scheduler ideally should not hardcode the degrees of parallelism.
  • Resource consumption requirements should be defined at every job level. A scheduler should be able to use this information to decide which job could be scheduled next. Some jobs’ resource requirements are readily available to the scheduler because it’s running those tasks. However other requirements may not be that readily available, e.g. the database resource consumption. Good scheduling tools allow specification of metadata for resources being either actual or virtual for running the jobs.
  • Scheduler should know the overall resources(either actual detected or virtually defined) available in the system (cluster/SMP/MPP) so that it could keep track of which resources are being used and which resources are available so that it could run the jobs for which dependencies have been satisfied and
    resources are available. It should manage the consumption of resource so that overscheduling of the jobs does not happen. This is very important to avoid excessive context switching and overall slowdown in performance. Allowing the scheduler to allocate resources also allows flexibility in the schedule that does not need constant tuning based on the changes in job plans.
  • After a job is complete the allocated resources should be released back to the scheduler.
  • Every job should be run at the earliest available opportunity and no later.
  • An ideal scheduler provides a very flexible calendar for running jobs, e.g. on First day, nth Day, Last Day of Month/Week/Year. The day may be regular day, working day, holiday, etc. And these types of days could/should be easily configurable.
  • An ideal scheduler should also be able to subscribe to other events for triggering jobs, including availability of certain files, notifications from emails or other queuing systems.
  • An ideal scheduler provides ability to transfer files, preferably with pipelining mechanisms so that files are transferred for immediate consumption and not landed to the disk.
  • There should be multiple notification mechanisms available for the users when certain events occur, these events should be easily defined/configured. The event notifications may be available through operator consoles/emails/text, etc. 
  • Every batch job should take away the resources for the minimal time from online user consumption. Make smart use of an offline processing area.
0 Comments

    About Sarbjit Parmar

    A practitioner with technical and business knowledge in areas of  Data Management( Online transaction processing, data modeling(relational, hierarchical, dimensional, etc.), S/M/L/XL/XXL & XML data, application design, batch processing, analytics(reporting + some statistical analysis), MBA+DBA), Project Management / Product/Software Development Life Cycle Management.

    Archives

    March 2018
    May 2016
    January 2015
    March 2014
    February 2014
    January 2014
    October 2013
    September 2013

    Categories

    All
    Acid
    Analytical Query
    Bigdata
    Columnar Database
    Compression
    Database
    Database Recovery
    Data Warehouse
    Data Warehouse
    Hierarchy
    Infogbright
    Informatica
    Interview
    Jboss
    Jgroups
    Job Scheduling
    Linux
    Mdm
    Metadata
    Normalization
    Oracle
    Performance Tuning
    Pivot
    PL/SQL
    Redo
    Repository
    Scheduler
    Siperian
    Sql
    Undo

    RSS Feed

Proudly powered by Weebly
  • Business
  • Technology
  • vAnsh LLC
  • About
  • Contact
  • HW