One of the simple things for a DBA is when one get alerted for job failure in Oracle , the would need to start investigating the job immediatly , the cause of failure can vary from one to job to another based on what the job is doing. In some cases it would be required to stop the job from running further and causing more damage. The goal of this post is to investigate what job failed.
My script here shows an example of listing the jobs and thier run history.
https://github.com/abhilash-8/ora-tools/blob/master/jobs.sql
ORCL> @jobs
ORCL> COL owner FORMAT a15
ORCL> COL job_name FORMAT a30
ORCL> COL job_action FORMAT a50
ORCL> COL repeat_interval FORMAT a20
ORCL> COL job_type FORMAT a20
ORCL> COL state FORMAT a20
ORCL> COL start_date FORMAT a15
ORCL> COL last_start_date FORMAT a45
ORCL> COL next_run_date FORMAT a45
ORCL> COL actual_start_date FORMAT a45
ORCL> COL elapsed_time FORMAT a45
ORCL> set lines 200 pages 200
ORCL> col last_run_duration for a30
ORCL> col run_duration for a20
ORCL> col status for a10
ORCL> col cpu_used for a18
ORCL> col ADDITIONAL_INFO_JOBS for a50
ORCL> select name from v$database;
NAME
---------
ORCL
Elapsed: 00:00:00.01
ORCL> alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
Session altered.
Elapsed: 00:00:00.00
ORCL> --All Jobs
ORCL> SELECT owner,job_name,job_type,enabled,state,last_start_date,last_run_duration FROM dba_scheduler_jobs order by owner,last_start_date desc;
OWNER |JOB_NAME |JOB_TYPE |ENABL|STATE |LAST_START_DATE |LAST_RUN_DURATION
---------------|------------------------------|--------------------|-----|--------------------|---------------------------------------------|------------------------------
SYS |this_some_job_1 |PLSQL_BLOCK |FALSE|DISABLED | |
SYS |this_some_job_2 | |FALSE|DISABLED | |
SYS |this_some_job_3 |STORED_PROCEDURE |FALSE|DISABLED | |
SYS |this_some_job_4 |PLSQL_BLOCK |FALSE|DISABLED | |
SYS |this_some_job_5 |PLSQL_BLOCK |FALSE|DISABLED | |
SYS |this_some_job_6 |STORED_PROCEDURE |FALSE|DISABLED | |
SYS |this_some_job_7 |PLSQL_BLOCK |FALSE|DISABLED | |
SYS |this_some_job_8 |PLSQL_BLOCK |TRUE |SCHEDULED |02-JUL-20 03.24.41.635862 AM -07:00 |+000000000 00:00:00.004980
SYS |this_some_job_9 |PLSQL_BLOCK |TRUE |SCHEDULED |02-JUL-20 03.00.03.206899 AM PST8PDT |+000000000 00:00:00.050023
23 rows selected.
Elapsed: 00:00:00.05
ORCL> --Currently running Jobs
ORCL> SELECT owner,job_name, session_id, running_instance, elapsed_time, cpu_used FROM dba_scheduler_running_jobs;
no rows selected
Elapsed: 00:00:00.01
ORCL> --Details About a Job
ORCL> def job_name='&job_name'
Enter value for job_name: ORA$AUTOTASK_CLEAN
ORCL> select * from (select instance_id,owner,job_name,status,error#,cpu_used,ACTUAL_START_DATE,run_duration from dba_scheduler_job_run_details where ACTUAL_START_DATE > sysdate - (4/24) and job_name like '&job_name' order by ACTUAL_START_DATE) where rownum<20;
INSTANCE_ID|OWNER |JOB_NAME |STATUS | ERROR#|CPU_USED |ACTUAL_START_DATE |RUN_DURATION
-----------|---------------|------------------------------|----------|----------|------------------|---------------------------------------------|--------------------
1|SYS |this_some_job_9 |FAILED | 0|+000 00:00:00.00 |02-JUL-20 03.00.03.205407 AM PST8PDT |+000 00:00:00
Elapsed: 00:00:00.01
ORCL> SELECT owner,job_name,enabled,state,repeat_interval,next_run_date,job_action FROM dba_scheduler_jobs where job_name like '&job_name' ;
OWNER |JOB_NAME |ENABL|STATE |REPEAT_INTERVAL |NEXT_RUN_DATE |JOB_ACTION
---------------|------------------------------|-----|--------------------|--------------------|---------------------------------------------|--------------------------------------------------
SYS |this_some_job_9 |TRUE |SCHEDULED | |03-JUL-20 03.00.00.207505 AM PST8PDT |DBMS_STATS.gather_table_stats('EMPLOYEES', 'DEPARTMENTS')
Elapsed: 00:00:00.21
ORCL> select instance_id,owner,job_name,status,ACTUAL_START_DATE,RUN_DURATION from dba_scheduler_job_run_details where job_name like '&job_name' order by ACTUAL_START_DATE;
INSTANCE_ID|OWNER |JOB_NAME |STATUS |ACTUAL_START_DATE |RUN_DURATION
-----------|---------------|---------------------------|----------|---------------------------------------------|--------------------
1|SYS |this_some_job_9 |SUCCEEDED |03-JUN-20 03.00.00.591966 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |04-JUN-20 03.00.00.633528 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |05-JUN-20 03.00.00.762012 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |06-JUN-20 03.00.02.583213 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |07-JUN-20 03.00.02.693093 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |08-JUN-20 03.00.00.961612 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |09-JUN-20 03.00.00.977119 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |10-JUN-20 03.00.01.013260 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |11-JUN-20 03.00.01.038857 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |12-JUN-20 03.00.01.087523 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |13-JUN-20 03.00.02.952222 AM PST8PDT |+000 00:00:00
2|SYS |this_some_job_9 |SUCCEEDED |14-JUN-20 03.00.03.006117 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |15-JUN-20 03.00.01.339363 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |16-JUN-20 03.00.01.350864 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |17-JUN-20 03.00.01.523058 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |18-JUN-20 03.00.01.557903 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |19-JUN-20 03.00.01.592501 AM PST8PDT |+000 00:00:00
2|SYS |this_some_job_9 |SUCCEEDED |20-JUN-20 03.00.03.246115 AM PST8PDT |+000 00:00:00
2|SYS |this_some_job_9 |SUCCEEDED |21-JUN-20 03.00.03.273684 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |22-JUN-20 03.00.01.775250 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |23-JUN-20 03.00.01.705317 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |24-JUN-20 03.00.01.782067 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |25-JUN-20 03.00.01.810243 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |26-JUN-20 03.00.01.841948 AM PST8PDT |+000 00:00:00
2|SYS |this_some_job_9 |SUCCEEDED |27-JUN-20 03.00.03.266193 AM PST8PDT |+000 00:00:00
2|SYS |this_some_job_9 |SUCCEEDED |28-JUN-20 03.00.03.344950 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |29-JUN-20 03.00.03.060039 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |30-JUN-20 03.00.03.107325 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |SUCCEEDED |01-JUL-20 03.00.03.157702 AM PST8PDT |+000 00:00:00
1|SYS |this_some_job_9 |FAILED |02-JUL-20 03.00.03.205407 AM PST8PDT |+000 00:00:00 <---- So you know when this failed
30 rows selected.
Some triggering alerts of dbms_scheduler jobs
are errors like ORA-06512 , ORA-27369 which are logged in the alert logs. In most cases there is a trace file associated with this job failure , you can find the trace file like this
for e.g. ORA-06512 Errors , you can see the history of traces in the alert log
# tail -100 /path/to/alertlog.log > /tmp/test.txt ; grep -rnw /tmp/test.txt -A 5 -B 5 -e"06512"
--
414-2020-07-03T01:05:14.036789-04:00
415-Archived Log entry 101030 added for T-1.S-101038 ID 0x57363bbf LAD:1
416-2020-07-03T01:29:35.590460-04:00
417-Errors in file /path/to/trace/orcl_j000_40017.trc:
418-ORA-12012: error on auto execute of job 46132
419:ORA-06512: at "SYS.DBMS_STATS", line 27989
420-ORA-12899: value too large for column "SYS"."WRI$_OPTSTAT_OPR"."JOB_NAME" (actual: 47, maximum: 32)
421:ORA-06512: at "SYS.DBMS_STATS", line 2061
422:ORA-06512: at "SYS.DBMS_STATS", line 27970
423:ORA-06512: at line 1
424-2020-07-03T01:44:45.125142-04:00
425-Thread 1 cannot allocate new log, sequence 101040
426-Private strand flush not complete
You have can see the trace file as
/path/to/trace/orcl_j000_40017.trc
You could dig deeper on the trace file to see which client , module , action driver triggered this job job failure.
# head -50 /path/to/trace/orcl_j000_40017.trc
Trace file /path/to/trace/orcl_j000_40017.trc
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
Build label: RDBMS_12.2.0.1.0_LINUX.X64_170125
ORACLE_HOME: /u01/app/oracle/product/12.2.0.1/db_1
System name: Linux
Node name: oralinux01
Release: 3.10.0-862.el7.x86_64
Version: #1 SMP Wed Mar 21 18:14:51 EDT 2018
Machine: x86_64
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 58
Unix process pid: 40017, image: oracle@*********** (J000)
*** 2020-07-03T01:29:35.590312-04:00
*** SESSION ID:(253.48256) 2020-07-03T01:29:35.590333-04:00
*** CLIENT ID:() 2020-07-03T01:29:35.590337-04:00
*** SERVICE NAME:(SYS$USERS) 2020-07-03T01:29:35.590340-04:00
*** MODULE NAME:() 2020-07-03T01:29:35.590344-04:00
*** ACTION NAME:() 2020-07-03T01:29:35.590347-04:00
*** CLIENT DRIVER:() 2020-07-03T01:29:35.590350-04:00
ORA-12012: error on auto execute of job 46132
ORA-06512: at "SYS.DBMS_STATS", line 27989
ORA-12899: value too large for column "SYS"."WRI$_OPTSTAT_OPR"."JOB_NAME" (actual: 47, maximum: 32)
ORA-06512: at "SYS.DBMS_STATS", line 2061
ORA-06512: at "SYS.DBMS_STATS", line 27970
ORA-06512: at line 1
And if you quickly need to know where your alert log is check my other post about the info.sql
https://dev.to/abhilash8/the-info-sql-for-oracle-developer-s-dba-s-pn8
Remember to identify details about what job failed and its schedule before attempting to fix it.
Top comments (1)
Agreed, quick troubleshooting is crucial and connections. Check logs and job dependencies first.