Kill process transaction rollback in progress. Estimated rollback completion: 0%.
While dealing with a rollback/kill process (in SQL Server 2005 SP4 + CU3 environment) which is stuck for more than 10 days, it was a VERITAS NetBackup (DB tape backup) process,
On checking the DMV's dm_exec_requests and dm_exec_sessions using below query,
SET
TRANSACTION
ISOLATION
LEVEL
READ
UNCOMMITTED
SELECT
db_id(DB_NAME(er.[database_id])) [DBID]
,er.[session_id] AS [SessionID]
,er.[command] AS [CommandType]
,est.[text] [StatementText]
,er.[status] AS [Status]
,CONVERT(DECIMAL(5, 2), er.[percent_complete]) AS [Complete_Percent]
,CONVERT(DECIMAL(38, 2), er.[total_elapsed_time] / 60000.00) AS [ElapsedTime_m]
,CONVERT(DECIMAL(38, 2), er.[estimated_completion_time] / 60000.00) AS [EstimatedCompletionTime_m]
,er.[last_wait_type] [LastWait]
,er.[wait_resource] [CurrentWait]
FROM
sys.dm_exec_requests
AS er
INNER
JOIN
sys.dm_exec_sessions
AS es ON er.[session_id] = es.[session_id]
CROSS
APPLY sys.dm_exec_sql_text(er.[sql_handle]) est
WHERE er.[command] like
'Killed%'
Output,
As result showed no progress, hence to see if it is blocked/locked, ran sp_lock 69 to get lock info,
There seems to be no blocking/locking for this process, looked like a stale process in system, to troubleshoot further, collected the host process id using sysprocesses as,
select spid, blocked,waittime,lastwaittype,dbid,last_batch,open_tran,hostprocess,cmd from sysprocesses where spid = 69
Output showed hostprocess as 8628,
As backup runs on local system, hence I searched for this hostprocess PID in taskmanager and found nothing as 8628, so it lead me to kind of <non-existent process> situation.
Not numerous blogs states that sql restart is best option and even Microsoft recommends it, found following link which states the similar kind of issue and also a kb article to collect stack dump and send to Microsoft for such cases, https://blogs.msdn.com/b/psssql/archive/2008/09/12/sql-server-2000-2005-2008-recovery-rollback-taking-longer-than-expected.aspx
In my case SQL restart was not a quick option, hence did further investigation and found Steve's blog to fix such issues using Process Explorer from sysinternals, following which I downloaded Process Explorer and ran on the system and searched for process id 8628 and following were the results,
Now, the question was which process to kill, as most of problems happen because of child process, hence to me it seems that SQLVDI was the child process which got stale and need to be removed first because what happen in NetBackup process is, a cmd process triggers which in turn trigger the backup threads, now after close of backup thread only cmd process gets completed, in some cases if cmd get terminated then child process which is backup thread keeps on running and become stale and which can lead to such situation, so to fix it I first killed the VMBU process and then killed the <non-existent Process> (8628).
If this doesn't solve issue then restart SQL Server to fix it.
References:
https://steveverschaeve.be.s170058.gridserver.com/?p=116
https://forum.sysinternals.com/definition-of-nonexistent-process_topic10446.html
https://superuser.com/questions/215351/how-do-i-kill-a-process-that-is-dead-but-listening