Kill process transaction rollback in progress. Estimated rollback completion: 0%.

12/10/2014 11:17

While dealing with a rollback/kill process (in SQL Server 2005 SP4 + CU3 environment) which is stuck for more than 10 days, it was a VERITAS NetBackup (DB tape backup) process,

On checking the DMV's dm_exec_requests and dm_exec_sessions using below query,

SET
TRANSACTION
ISOLATION
LEVEL
READ
UNCOMMITTED

SELECT

db_id(DB_NAME(er.[database_id])) [DBID]

,er.[session_id] AS [SessionID]

,er.[command] AS [CommandType]

,est.[text] [StatementText]

,er.[status] AS [Status]

,CONVERT(DECIMAL(5, 2), er.[percent_complete]) AS [Complete_Percent]

,CONVERT(DECIMAL(38, 2), er.[total_elapsed_time] / 60000.00) AS [ElapsedTime_m]

,CONVERT(DECIMAL(38, 2), er.[estimated_completion_time] / 60000.00) AS [EstimatedCompletionTime_m]

,er.[last_wait_type] [LastWait]

,er.[wait_resource] [CurrentWait]

FROM
sys.dm_exec_requests
AS er

INNER
JOIN
sys.dm_exec_sessions
AS es ON er.[session_id] = es.[session_id]

CROSS
APPLY sys.dm_exec_sql_text(er.[sql_handle]) est

WHERE er.[command] like
'Killed%'

Output,

As result showed no progress, hence to see if it is blocked/locked, ran sp_lock 69 to get lock info,

There seems to be no blocking/locking for this process, looked like a stale process in system, to troubleshoot further, collected the host process id using sysprocesses as,

select spid, blocked,waittime,lastwaittype,dbid,last_batch,open_tran,hostprocess,cmd from sysprocesses where spid = 69

Output showed hostprocess as 8628,

As backup runs on local system, hence I searched for this hostprocess PID in taskmanager and found nothing as 8628, so it lead me to kind of <non-existent process> situation.

Not numerous blogs states that sql restart is best option and even Microsoft recommends it, found following link which states the similar kind of issue and also a kb article to collect stack dump and send to Microsoft for such cases, https://blogs.msdn.com/b/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx

In my case SQL restart was not a quick option, hence did further investigation and found Steve's blog to fix such issues using Process Explorer from sysinternals, following which I downloaded Process Explorer and ran on the system and searched for process id 8628 and following were the results,

Now, the question was which process to kill, as most of problems happen because of child process, hence to me it seems that SQLVDI was the child process which got stale and need to be removed first because what happen in NetBackup process is, a cmd process triggers which in turn trigger the backup threads, now after close of backup thread only cmd process gets completed, in some cases if cmd get terminated then child process which is backup thread keeps on running and become stale and which can lead to such situation, so to fix it I first killed the VMBU process and then killed the <non-existent Process> (8628).

If this doesn't solve issue then restart SQL Server to fix it.

References:

https://steveverschaeve.be.s170058.gridserver.com/?p=116

https://forum.sysinternals.com/definition-of-nonexistent-process_topic10446.html

https://superuser.com/questions/215351/how-do-i-kill-a-process-that-is-dead-but-listening