O1MD1CV: crash after resuming pre-empted task

Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4964434
RAC: 0
Topic 205691

Task 614658373 crashed due to 0xC0000090 an invalid floating point operation shortly after it was resumed. The time between resuming the task and the logged crash may have been spent downloading symbols.

The task was pre-empted and resumed multiple times because GPU was alternating between tasks that needed and tasks that don't need supporting full CPU core.

Copy-pasting stuff here for safe keeping in case this turns out to be something other than one off event.

Useful stack dump:

Error occured on Thursday, February 23, 2017 at 19:13:03.

C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe caused a Float Invalid Operation at location 0094194c in module C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe.

Call stack:

0094194C C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe:0094194C

0040DFD9 C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe:0040DFD9

00942A5F C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe:00942A5F

004013CE C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe:004013CE

004014E8 C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe:004014E8

EA488364 C:\WINDOWS\System32\KERNEL32.DLL:EA488364 BaseThreadInitThunk

ECEE70D1 C:\WINDOWS\SYSTEM32\ntdll.dll:ECEE70D1 RtlUserThreadStart

BOINC's own diagnostics dump that looks a bit odd:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Float Invalid Operation (0xc0000090) at address 0x000000000094194C

Engaging BOINC Windows Runtime Debugger...

********************

BOINC Windows Runtime Debugger Version 7.1.0

Dump Timestamp : 02/23/17 19:13:03
Install Directory : C:\BOINC\
Data Directory : C:\BOINC data
Project Symstore :
LoadLibraryA( C:\BOINC\\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\BOINC\\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\BOINC\\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\BOINC\\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\BOINC data\slots\5;C:\BOINC data\projects\einstein.phys.uwm.edu

ModLoad: 0000000000400000 0000000001da7000 C:\BOINC data\projects\einstein.phys.uwm.edu\einstein_O1MD1CV_1.00_windows_x86_64__AVX.exe (-nosymbols- Symbols Loaded)

ModLoad: 00000000ece80000 00000000001d1000 C:\WINDOWS\SYSTEM32\ntdll.dll (6.2.14393.479) (-exported- Symbols Loaded)
File Version : 10.0.14393.206 (rs1_release.160915-0644)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.206

ModLoad: 00000000ea480000 00000000000ab000 C:\WINDOWS\System32\KERNEL32.DLL (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.206 (rs1_release.160915-0644)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.206

ModLoad: 00000000e9e00000 000000000021d000 C:\WINDOWS\System32\KERNELBASE.dll (6.2.14393.479) (-exported- Symbols Loaded)
File Version : 10.0.14393.206 (rs1_release.160915-0644)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.206

ModLoad: 00000000eae20000 00000000000a2000 C:\WINDOWS\System32\ADVAPI32.dll (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000eb210000 000000000009e000 C:\WINDOWS\System32\msvcrt.dll (7.0.14393.0) (-exported- Symbols Loaded)
File Version : 7.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 7.0.14393.0

ModLoad: 00000000ecc30000 0000000000059000 C:\WINDOWS\System32\sechost.dll (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000ea530000 0000000000121000 C:\WINDOWS\System32\RPCRT4.dll (6.2.14393.82) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000eb200000 0000000000008000 C:\WINDOWS\System32\PSAPI.DLL (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000ea870000 0000000000165000 C:\WINDOWS\System32\USER32.dll (6.2.14393.576) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000e93c0000 000000000001e000 C:\WINDOWS\System32\win32u.dll (6.2.14393.51) (-exported- Symbols Loaded)
File Version : 10.0.14393.51 (rs1_release_inmarket.160801-1836)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.51

ModLoad: 00000000ea410000 0000000000034000 C:\WINDOWS\System32\GDI32.dll (6.2.14393.206) (-exported- Symbols Loaded)
File Version : 10.0.14393.206 (rs1_release.160915-0644)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.206

ModLoad: 00000000e9bc0000 0000000000182000 C:\WINDOWS\System32\gdi32full.dll (6.2.14393.576) (-exported- Symbols Loaded)
File Version : 10.0.14393.576 (rs1_release_inmarket.161208-2252)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.576

ModLoad: 00000000eb0d0000 000000000002e000 C:\WINDOWS\System32\IMM32.DLL (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000e8690000 0000000000032000 C:\WINDOWS\SYSTEM32\ntmarta.dll (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000ea020000 00000000000f5000 C:\WINDOWS\System32\ucrtbase.dll (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000ea450000 000000000001c000 C:\WINDOWS\System32\IMAGEHLP.DLL (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

ModLoad: 00000000e5a00000 0000000000192000 C:\WINDOWS\System32\dbghelp.dll (6.2.14321.1024) (-exported- Symbols Loaded)
File Version : 10.0.14321.1024 (rs1_release.160715-1616)
Company Name : Microsoft
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14321.1024

ModLoad: 00000000dd4b0000 000000000000a000 C:\WINDOWS\SYSTEM32\version.dll (6.2.14393.0) (-exported- Symbols Loaded)
File Version : 10.0.14393.0 (rs1_release.160715-1616)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version : 10.0.14393.0

*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 0, Write: 0, Other 0

- I/O Transfers Counters -
Read: 0, Write: 0, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0
QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0

- Virtual Memory Usage -
VirtualSize: 0, PeakVirtualSize: 0

- Pagefile Usage -
PagefileUsage: 0, PeakPagefileUsage: 0

- Working Set Size -
WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0

*** Dump of thread ID 1608 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Float Invalid Operation (0xc0000090) at address 0x000000000094194C

- Registers -
rax=0000000004d2c660 rbx=0000000000000000 rcx=0000000004c6d020 rdx=0000000000000010 rsi=0000000000000000 rdi=0000000000000000
r8=0000000004d2c660 r9=0000000002a0bc4c r10=0000000000000006 r11=000000000010a30f r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=00000000ecee70d1 rsp=00000000023aff90 rbp=0000000000000000
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202

- Callstack -
ChildEBP RetAddr Args to Child
023affd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 4736 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=0000000000000000 rbx=0000000000000000 rcx=00000000045f9c10 rdx=00000000045f9cd0 rsi=0000000000000064 rdi=0000000000000000
r8=0000000000000000 r9=00000000eb210000 r10=0000000000000000 r11=0000000000000200 r12=0000000000000000 r13=0000000000000000
r14=00000000045ffeb0 r15=0000000000000000 rip=00000000ecf26754 rsp=00000000045ffe88 rbp=0000000000000000
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
045ffe80 e9e4c4a7 045fff48 00000000 00000000 00000000 ntdll!ZwDelayExecution+0x0
045fff20 005a76a7 ea496630 00000000 00000000 00000000 KERNELBASE!SleepEx+0x0
045fff50 ea488364 00000000 00000000 00000000 00000000 einstein_O1MD1CV_1.00_windows_x!+0x0
045fff80 ecee70d1 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
045fffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Debug Message Dump ****

*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

Task history:

23/02/2017 14:51:51 | Einstein@Home | Starting task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1
23/02/2017 14:51:51 | Einstein@Home | [cpu_sched] Starting task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 using einstein_O1MD1CV version 100 (AVX) in slot 5
...
23/02/2017 15:23:09 | SETI@home | Computation for task 21jn08ab.15589.394233.10.37.66_0 finished
23/02/2017 15:23:09 | Einstein@Home | [cpu_sched] Preempting h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 (left in memory)
23/02/2017 15:23:09 | Einstein@Home | [cpu_sched] Restarting task LATeah0013L_884.0_0_0.0_3990900_1 using hsgamma_FGRPB1G version 120 (FGRPopencl1K-nvidia) in slot 1
...
23/02/2017 16:00:42 | Einstein@Home | Computation for task LATeah0013L_884.0_0_0.0_3990900_1 finished
23/02/2017 16:00:42 | Einstein@Home | [cpu_sched] Resuming h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1
23/02/2017 16:00:42 | Einstein@Home | [cpu_sched] Resuming task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 using einstein_O1MD1CV version 100 (AVX) in slot 5
23/02/2017 16:00:42 | SETI@home | Starting task 21jn08ab.15589.394233.10.37.57_1
23/02/2017 16:00:42 | SETI@home | [cpu_sched] Starting task 21jn08ab.15589.394233.10.37.57_1 using setiathome_v8 version 800 (cuda50) in slot 1
...
23/02/2017 16:31:24 | SETI@home | Computation for task 21jn08ab.15589.394233.10.37.57_1 finished
23/02/2017 16:31:24 | Einstein@Home | [cpu_sched] Preempting h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 (left in memory)
23/02/2017 16:31:24 | Einstein@Home | Starting task LATeah0013L_884.0_0_0.0_9919520_1
23/02/2017 16:31:24 | Einstein@Home | [cpu_sched] Starting task LATeah0013L_884.0_0_0.0_9919520_1 using hsgamma_FGRPB1G version 120 (FGRPopencl1K-nvidia) in slot 1
...
23/02/2017 17:32:17 | Einstein@Home | [cpu_sched] Preempting LATeah0013L_884.0_0_0.0_9919520_1 (removed from memory)
23/02/2017 17:32:17 | Einstein@Home | [cpu_sched] Resuming h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1
23/02/2017 17:32:17 | Einstein@Home | [cpu_sched] Resuming task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 using einstein_O1MD1CV version 100 (AVX) in slot 5
23/02/2017 17:32:18 | SETI@home | Starting task 21jn08ab.15589.394233.10.37.2_0
23/02/2017 17:32:18 | SETI@home | [cpu_sched] Starting task 21jn08ab.15589.394233.10.37.2_0 using setiathome_v8 version 800 (cuda50) in slot 2
...
23/02/2017 18:35:06 | SETI@home | Computation for task 21jn08ab.15589.394233.10.37.60_0 finished
23/02/2017 18:35:06 | Einstein@Home | [cpu_sched] Preempting h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 (left in memory)
23/02/2017 18:35:06 | Einstein@Home | [cpu_sched] Restarting task LATeah0013L_884.0_0_0.0_9919520_1 using hsgamma_FGRPB1G version 120 (FGRPopencl1K-nvidia) in slot 1
...
23/02/2017 19:12:42 | Einstein@Home | Computation for task LATeah0013L_884.0_0_0.0_9919520_1 finished
23/02/2017 19:12:42 | Einstein@Home | [cpu_sched] Resuming h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1
23/02/2017 19:12:42 | Einstein@Home | [cpu_sched] Resuming task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 using einstein_O1MD1CV version 100 (AVX) in slot 5
23/02/2017 19:12:42 | SETI@home | Starting task 24au08ag.1941.2526.16.43.196_1
23/02/2017 19:12:42 | SETI@home | [cpu_sched] Starting task 24au08ag.1941.2526.16.43.196_1 using setiathome_v8 version 822 (opencl_nvidia_SoG) in slot 1
...
23/02/2017 19:13:05 | Einstein@Home | [sched_op] Reason: Unrecoverable error for task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1
23/02/2017 19:13:05 | Einstein@Home | Computation for task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 finished
23/02/2017 19:13:05 | Einstein@Home | Output file h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1_0 for task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 absent
23/02/2017 19:13:05 | Einstein@Home | Output file h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1_1 for task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 absent
23/02/2017 19:13:05 | Einstein@Home | Output file h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1_2 for task h1_1386.20_O1C02Cl1In0C__O1MD1CV_VelaJr1_1386.65Hz_72_1 absent