Patch skipping crashing in functions marked with [UnmanagedCallersOnly] #102767

markoostveen · 2024-05-28T15:23:21Z

Description

This is a repost of my stackoverflow post, that I have found a temporary solution for here

C# function pointers causes access violation upon calling from C++

I have tried debugging, but when the issue occurs. In this case C++ calls the callback, but before it enters the function's scope the exception is thrown, but immediately returns with an error code.

For the sake of debugging, I've tried switching the calling convention from cdecl to stdcall to see if it has to do with stack corruption.

My assumptions are that:

static functions marked with [UnmanagedCallersOnly] have a static address, especially when an entrypoint is defined.
Thread.BeginThreadAffinity() followed by Thread.EndThreadAffinity() ensures that an unmanaged thread can successfully start executing managed code.

Good to know:

The function is being invoked by a thread not created in C#, there are many threads calling this function

Reproduction Steps

In the following bit of code, I'm using C# function pointers to omit using a delegate type, for faster performance. The code works fine when I use a delegate type marshalled as a function pointer, no exception no issues. However, when changing it from a delegate to a function pointer it sometimes throws an access violation.

Code is modified to remove namespaces and prefixes not relevant to the question

    public static class Engine
    {
        // Causing issues with sometimes throwing access violation
        [DllImport("engine", CallingConvention = CallingConvention.Cdecl)]
        public static unsafe extern UInt32 EventScheduler_ScheduleLocalEvent(
            int priority, UInt64 delay, delegate* unmanaged[Cdecl]<nint, void> callback, nint obj);

        // Works fine every time, but is much slower seems due to delegate instatiations
        [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
        public delegate void EventScheduler_ScheduleLocalEvent_callbackDelegate(nint obj);
        [DllImport("engine", CallingConvention = CallingConvention.Cdecl)]
        public static unsafe extern UInt32 EventScheduler_ScheduleLocalEvent(
            int priority,
            UInt64 delay,
            [MarshalAs(UnmanagedType.FunctionPtr)] EventScheduler_ScheduleLocalEvent_callbackDelegate callback,
            nint obj);

    }

    public static class EventScheduler
    {
        public delegate void EventCallback();

        // attribute is only present in the function pointer senario
        [UnmanagedCallersOnly(CallConvs = new[] { typeof(CallConvCdecl ) }, EntryPoint = "ScheduleLocalEvent_EventCallbackFunction")]
        private static void ScheduleLocalEvent_EventCallbackFunction(nint handlePtr)
        {
            Thread.BeginThreadAffinity();
            GCHandle handle = GCHandle.FromIntPtr(handlePtr);
            var cb          = (EventScheduler.EventCallback)handle.Target!;
            cb();
            handle.Free();
            Thread.EndThreadAffinity();
        }
        public static UInt32 ScheduleLocalEvent(int priority, SimulationTime delayTime, EventCallback eventCallback)
        {
            GCHandle handle  = GCHandle.Alloc(eventCallback, GCHandleType.Normal);
            IntPtr handlePtr = GCHandle.ToIntPtr(handle);
            unsafe
            {
                return Engine.EventScheduler_ScheduleLocalEvent(
                    priority, delayTime, &ScheduleLocalEvent_EventCallbackFunction, handlePtr);
            }
        }
   }

On the C++ side

    #define API extern "C" __declspec(dllexport)
    
    API uint32_t EventScheduler_ScheduleLocalEvent(int priority, uint64_t delay, void(__cdecl* callback)(void* obj), void* obj)
    {
        ERSAssert(Ers::Core::InsideSubModel());
        Core::SyncManagerBase& syncManager = Ers::Core::GetSyncManagerBase();
        return syncManager.ScheduleEvent(priority, delay, [callback, obj]() { callback(obj); });
    }

Expected behavior

The thread should enter and execute the static function passed using a function pointer

Actual behavior

The thread doesn't enter the user-defined portion of the function. an Access Violation is thrown in a generated call visible in the IL during the native-to-managed-transition

Regression?

No response

Known Workarounds

Change the garbage collection mode by adding ```xml
true
true


### Configuration

Tested on PC's around the office with the same behavior. All were x64, issue occurred on both Windows and Linux. The issue was reproduced on .net8.0 and .net9.0.100-preview

### Other information

The issue doesn't appear always, i think it might be a race condition inside the garbage collector.

The text was updated successfully, but these errors were encountered:

hez2010 · 2024-05-28T15:45:43Z

Can you try changing GCHandle.Alloc(eventCallback, GCHandleType.Normal) to GCHandle.Alloc(eventCallback, GCHandleType.Pinned)?

jkotas · 2024-05-28T16:10:21Z

Can you try changing GCHandle.Alloc(eventCallback, GCHandleType.Normal) to GCHandle.Alloc(eventCallback, GCHandleType.Pinned)?

This may hide issue similar to how changing the GC mode hides the issue, but it won't fix the underlying problem.

Thread.BeginThreadAffinity() followed by Thread.EndThreadAffinity() ensures that an unmanaged thread can successfully start executing managed code.

Thread.Begin/EndThreadAffinity do nothing in .NET Core. You can delete these calls. (These calls were relevant in .NET Framework for low-level hosting, e.g. under MS SQL Server.)

I have tried debugging, but when the issue occurs. In this case C++ calls the callback, but before it enters the function's scope the exception is thrown, but immediately returns with an error code.

Could you please share more details about the location of the crash? Where is the origin exception thrown exactly?

markoostveen · 2024-05-28T19:04:52Z

Can you try changing GCHandle.Alloc(eventCallback, GCHandleType.Normal) to GCHandle.Alloc(eventCallback, GCHandleType.Pinned)?

I've tried this before posting on StackOverflow, this didn't fix the issue. Also seemed to have no effect on the frequency it was occuring

Thread.Begin/EndThreadAffinity do nothing in .NET Core. You can delete these calls. (These calls were relevant in .NET Framework for low-level hosting, e.g. under MS SQL Server.)

Got it, that is nice to know!

Could you please share more details about the location of the crash? Where is the origin exception thrown exactly?

Yes, this is a screenshot where the exception get's thrown, though it only get's visibly thrown in .net9.0 preview in .net8.0 is calls abort and exists with an exit code indicating the Access Violation.
A portion is hidden as I wasn't allowed to show call paths from our project

This issue though always seems to happen in this call in the disassembly which I assume has something to do with setting up garbage collection

jkotas · 2024-05-28T19:18:54Z

This issue though always seems to happen in this call in the disassembly which I assume has something to do with setting up garbage collection

Yes, this call is unmanaged->managed transition. It would be useful to know where the crash happens inside this call. I do not think that the Visual Studio will show it to you. You would need to run under low-level debugger like windbg or collect crash dump using (https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps).

markoostveen · 2024-05-28T21:16:46Z

This issue though always seems to happen in this call in the disassembly which I assume has something to do with setting up garbage collection

Yes, this call is unmanaged->managed transition. It would be useful to know where the crash happens inside this call. I do not think that the Visual Studio will show it to you. You would need to run under low-level debugger like windbg or collect crash dump using (https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps).

Now this is where it gets interesting, while running it inside windbg the issue doesn't occur at all after running it many times, i'll try to get you a crash dump, any special settings you would like to be turned on for the minidump flags?

jkotas · 2024-05-28T21:47:16Z

any special settings you would like to be turned on for the minidump flags?

Default settings should work fine, to start with at least.

markoostveen · 2024-05-28T22:39:06Z

any special settings you would like to be turned on for the minidump flags?

Default settings should work fine, to start with at least.

Here you go
AWealthOfRows.exe.11692.dmp

jkotas · 2024-05-28T23:44:03Z

Thank you for sharing the minidump. Here are the details of the crash:

00007ff9`9f1abbea 833d07ed2e0000  cmp     dword ptr [00007ff9`9f49a8f8],0 <- ACCESS_VIOLATION here
00007ff9`9f1abbf1 7405            je      Ers!Ers.EventScheduler.ScheduleLocalEvent_EventCallbackFunction()+0x38 (00007ff9`9f1abbf8)
00007ff9`9f1abbf3 e8c80ac75f      call    coreclr!JIT_DbgIsJustMyCode (00007ff9`fee1c6c0)

The crash is in "Just My Code" check. It is an extra code that gets inserted by the JIT at the start of every method when running under managed debugger like Visual Studio. It allows the managed debugger to provide better stepping experience. It explains why you are not able to reproduce the crash outside Visual Studio.

jkotas · 2024-05-28T23:49:21Z

The exception details:

0:019> dt coreclr!_EXCEPTION_RECORD 0x0000003f`265ff4b0
   +0x000 ExceptionCode    : 0xc0000005
   +0x004 ExceptionFlags   : 0
   +0x008 ExceptionRecord  : (null) 
   +0x010 ExceptionAddress : 0x000001e0`7fe90860 Void
   +0x018 NumberParameters : 2
   +0x020 ExceptionInformation : [15] 0
0:019> dx -r1 (*((coreclr!unsigned __int64 (*)[15])0x3f265ff4d0))
(*((coreclr!unsigned __int64 (*)[15])0x3f265ff4d0))                 [Type: unsigned __int64 [15]]
    [0]              : 0x0 [Type: unsigned __int64]
    [1]              : 0x1e08017f56e [Type: unsigned __int64]

Notice that the ExceptionAddress is different than the actual address of the code (see my previous comment). It suggests that this is problem with patch skipping. The patch skipping is a managed debugger feature where it copies out a few instructions to different spot to execute them in isolation. The patch skipping has to update the relative addresses in the code when it copies the code out, but it was not done here for 00007ff99f49a8f8` address.

@dotnet/dotnet-diag-contrib PTLA

dotnet-policy-service · 2024-05-28T23:49:53Z

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

markoostveen · 2024-05-29T00:09:15Z

Awesome you've been able to pinpoint it so fast, that makes sense. Do you need anything else from me?
On an unrelated note, will a patch likely make it into .net9.0 still?

hoyosjs · 2024-05-31T03:48:51Z

I can't see the bypass buffers in the dump sadly, but I think there's a few issues:

NativeWalker::DecodeInstructionForPatchSkip seems to be messing up and thinking this is a write instruction. This might have regressed with AVX512 support in .NET 8. Have you seen this in .NET 6?

I don't see anything handling RIP-relative operations for some operations. (like cmp in this case) in

runtime/src/coreclr/debug/ee/controller.cpp

Lines 4381 to 4399 in 3393b4d

    
               { 
        
                   patchBypassRW[1] = 0x8b; // MOV reg, mem 
        
                   _ASSERTE((int)sizeof(void*) <= SharedPatchBypassBuffer::cbBufferBypass); 
        
                   *(void**)bufferBypassRW = (void*)(patch->address + m_instrAttrib.m_cbInstr + dwOldDisp); 
        
               } 
        
               else 
        
               { 
        
                   _ASSERTE(m_instrAttrib.m_cOperandSize <= SharedPatchBypassBuffer::cbBufferBypass); 
        
                   // Copy the data into our buffer. 
        
                   memcpy(bufferBypassRW, patch->address + m_instrAttrib.m_cbInstr + dwOldDisp, m_instrAttrib.m_cOperandSize); 
        
                   if (m_instrAttrib.m_fIsWrite) 
        
                   { 
        
                       // save the actual destination address and size so when we TriggerSingleStep() we can update the value 
        
                       pSharedPatchBypassBufferRW->RipTargetFixup = (UINT_PTR)(patch->address + m_instrAttrib.m_cbInstr + dwOldDisp); 
        
                       pSharedPatchBypassBufferRW->RipTargetFixupSize = m_instrAttrib.m_cOperandSize; 
        
                   } 
        
               } 
        
           }

hoyosjs · 2024-05-31T06:22:49Z

No, even in 6.0 the decoder would classify the instruction as non-write. This is actually doing silly copy to and from the module data structure. But this is actually an unlikely root cause. Rather, the window of time between where we overwrite the displacement // Overwrite the *signed* displacement. and when another thread might be executing the CopyInstructionBlock but before it does the offset fixup this means we will be executing the old RIP relative addressing.

hoyosjs · 2024-05-31T06:26:46Z

@markoostveen, does the native portion run on many threads and is this the first time they execute managed code?

tommcdon · 2024-05-31T15:51:04Z

The crash is in "Just My Code" check. It is an extra code that gets inserted by the JIT at the start of every method when running under managed debugger like Visual Studio. It allows the managed debugger to provide better stepping experience. It explains why you are not able to reproduce the crash outside Visual Studio.

@markoostveen would you mind trying to disable Just My Code in the debugger to determine if issue reproduces without it enabled? Also would it be possible to share a VS solution containing a small C# console app and C++ DLL that we can use to reproduce and debug the issue?

markoostveen · 2024-05-31T20:28:20Z

@markoostveen, does the native portion run on many threads and is this the first time they execute managed code?

Yes the native portion is running on multiple threads, some started from C# but others started in native code. Those threads are executing jobs from a jobsystem(Written in my spare time) which in the end calls the managed callback as part of a job.

@markoostveen would you mind trying to disable Just My Code in the debugger to determine if issue reproduces without it enabled?

I've ran the program with both Just My Code enabled and disabled giving me the same behavior in both instances resulting in the same crash. If you want I can provide a dump where it is enabled and disabled

Also would it be possible to share a VS solution containing a small C# console app and C++ DLL that we can use to reproduce and debug the issue?

I'm not sure if I'll be able to provide this as I don't have an easy way to compress it into a smaller sharable console application. Sharing the entire app isn't possible due to company policy but I'll try to get clearance if need be.

markoostveen · 2024-05-31T21:13:22Z

Well, I got clearance to share it much faster than anticipated, although I can only share release binaries so you'll be able to run it.
I did have to build in a cutoff date so it'll only function until June 30th

It is a standalone executable running a small simulation (No GUI yet in C#)
AWealthOfRows.zip

While running you will see the following

Highlighted in blue shows the Thread ID, this ID is the thread printing the message which in this case is always the main thread as it's printing before and after the simulation not during.

The implementation of the method that is causing the issue is under namespace Ers.EventScheduler if you look at the Ers.dll in ILSpy or another IL inspection tool

To cause the issue I was encountering open visual studio using the following command in cmd

devenv /debugexe AWealthOfRows.exe

Then right click on the awealthofrows project click properties and change the debugger type to "Mixed (.Net Core, .Net 5+)"

markoostveen · 2024-06-28T15:35:39Z

Hey, is there any progress on the issue? I just want to check-in. if there isn't any progress, I'll try to get clearance to extend the period cutoff date/killswitch.

tommcdon · 2024-07-01T17:29:02Z

Hey, is there any progress on the issue? I just want to check-in. if there isn't any progress, I'll try to get clearance to extend the period cutoff date/killswitch.

It's still under investigation. In the meantime, we have created a standalone repro so the original app shouldn't be necessary any longer

    internal class Program
    {
        static volatile int count = 0;

        [UnmanagedCallersOnly(CallConvs = new[] { typeof(CallConvCdecl) }, EntryPoint = "MyFoo")]
        static void MyFoo(nint obj)
        {
            Interlocked.Increment(ref count);
        }

        [DllImport("Dll1.dll", CallingConvention = CallingConvention.Cdecl)]
        static unsafe extern void TestFunction([MarshalAs(UnmanagedType.FunctionPtr)] delegate* unmanaged[Cdecl]<nint, void> callback);

        static unsafe void Main(string[] args)
        {
            Console.WriteLine("Hello, World!");
            const int numThreads = 1000;
            for (int i = 0; i < numThreads; i++)
            {
                TestFunction(&MyFoo);
            }
            while (count < numThreads)
            {
                Thread.Sleep(1);
            }
            Console.WriteLine("Hit enter to quit");
            Console.In.ReadLine();
        }
    }

#define API extern "C" __declspec(dllexport)

volatile LONG i = 0;

typedef void(__cdecl* MyCallback)(void* obj);

void DoStuff(MyCallback callback)
{
	callback((void*)i);
	InterlockedIncrement(&i);
}

DWORD WINAPI ThreadProc(
	_In_ LPVOID lpParameter
)
{
	DoStuff((MyCallback)lpParameter);
	return 0;
}


API void __cdecl TestFunction(void(__cdecl* callback)(void* obj))
{
	CreateThread(NULL, 0, ThreadProc, callback, 0, NULL);
}

markoostveen · 2025-01-02T14:48:13Z

Hi all,

Recently after changing some seemingly unrelated pieces of code our workaround has stopped working for some reason.

I have been delving into the issue and trying to debug the issue, it seems when I run it the following assertion is being hit

Assert failure(PID 23280 [0x00005af0], Thread: 49160 [0xc008]): m_instrAttrib.m_fIsCall

CORECLR! DebuggerPatchSkip::IsInPlaceSingleStep + 0x2F (0x00007ffc`3ceb2aff)
CORECLR! DebuggerPatchSkip::DebuggerPatchSkip + 0x2A8 (0x00007ffc`3cea1398)
CORECLR! DebuggerController::ActivatePatchSkip + 0x160 (0x00007ffc`3cea3450)
CORECLR! DebuggerController::DispatchPatchOrSingleStep + 0xF46 (0x00007ffc`3ceacd46)
CORECLR! DebuggerController::DispatchNativeException + 0xAB0 (0x00007ffc`3ceabba0)
CORECLR! Debugger::FirstChanceNativeException + 0x4F8 (0x00007ffc`3cedd4b8)
CORECLR! IsDebuggerFault + 0x8D (0x00007ffc`3c5273dd)
CORECLR! CLRVectoredExceptionHandlerPhase2 + 0x18C (0x00007ffc`3c51372c)
CORECLR! CLRVectoredExceptionHandler + 0x3D2 (0x00007ffc`3c513582)
CORECLR! CLRVectoredExceptionHandlerShim + 0x245 (0x00007ffc`3c514085)
    File: C:\Projects\temp\runtime\src\coreclr\debug\ee\controller.h:1537
    Image: C:\Projects\temp\runtime\artifacts\bin\coreclr\windows.x64.Debug\corerun.exe

Though when trying to delve deeping I'm unable to place a breakpoint on the actual location, I can't find a resource how I can actually debug the coreCLR where the issue is occurring, as I seem to be getting the following message.

I have also been trying to catch the issue with post-mortem debugging in windbg but this doesn't seem to be working, anyone has any idea how to attach the debugger, since I can't just attach 2 invasive debuggers at once?

markoostveen · 2025-01-02T14:58:33Z

I've also disabled the feature FEATURE_EMULATE_SINGLESTEP to check if it was being related to the issue, but when I do, native only debugging works okay as expected, but then the breakpoint being hit is different, in that case, the 'address' member on struct DebuggerControllerPatch is NULL causing IsBound to return false in another assertion

jkotas · 2025-01-02T17:17:51Z

m_instrAttrib.m_fIsCall

This is unrelated problem that was fixed by #109603

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 28, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 28, 2024

am11 added area-System.Runtime.InteropServices and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 28, 2024

github-project-automation bot added this to AppModel May 28, 2024

AaronRobinsonMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 28, 2024

dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 28, 2024

jkotas added area-Diagnostics-coreclr and removed area-System.Runtime.InteropServices labels May 28, 2024

jkotas changed the title ~~C# function pointers causes access violation upon calling from C++~~ Patch skipping crashing in functions marked with [UnmanagedCallersOnly] May 28, 2024

tommcdon added this to the 9.0.0 milestone Jun 4, 2024

tommcdon assigned hoyosjs Jun 4, 2024

tommcdon removed the untriaged New issue has not been triaged by the area owner label Jun 4, 2024

tommcdon mentioned this issue Jul 9, 2024

App crashes with an output "Trace/Breakpoint Trap" on Linux when a P/Invoke callback is called from a native library if the dotnet debugger is attached. #104459

Open

tommcdon unassigned hoyosjs Jul 23, 2024

tommcdon modified the milestones: 9.0.0, 10.0.0 Jul 23, 2024

tommcdon assigned mikelle-rogers Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch skipping crashing in functions marked with [UnmanagedCallersOnly] #102767

Patch skipping crashing in functions marked with [UnmanagedCallersOnly] #102767

markoostveen commented May 28, 2024

hez2010 commented May 28, 2024

jkotas commented May 28, 2024

markoostveen commented May 28, 2024 •

edited

Loading

jkotas commented May 28, 2024

markoostveen commented May 28, 2024

jkotas commented May 28, 2024

markoostveen commented May 28, 2024

jkotas commented May 28, 2024

jkotas commented May 28, 2024 •

edited

Loading

dotnet-policy-service bot commented May 28, 2024

markoostveen commented May 29, 2024 •

edited

Loading

hoyosjs commented May 31, 2024

hoyosjs commented May 31, 2024

hoyosjs commented May 31, 2024

tommcdon commented May 31, 2024

markoostveen commented May 31, 2024 •

edited

Loading

markoostveen commented May 31, 2024 •

edited

Loading

markoostveen commented Jun 28, 2024

tommcdon commented Jul 1, 2024

markoostveen commented Jan 2, 2025 •

edited by AaronRobinsonMSFT

Loading

markoostveen commented Jan 2, 2025 •

edited

Loading

jkotas commented Jan 2, 2025

Patch skipping crashing in functions marked with [UnmanagedCallersOnly] #102767

Patch skipping crashing in functions marked with [UnmanagedCallersOnly] #102767

Comments

markoostveen commented May 28, 2024

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

hez2010 commented May 28, 2024

jkotas commented May 28, 2024

markoostveen commented May 28, 2024 • edited Loading

jkotas commented May 28, 2024

markoostveen commented May 28, 2024

jkotas commented May 28, 2024

markoostveen commented May 28, 2024

jkotas commented May 28, 2024

jkotas commented May 28, 2024 • edited Loading

dotnet-policy-service bot commented May 28, 2024

markoostveen commented May 29, 2024 • edited Loading

hoyosjs commented May 31, 2024

hoyosjs commented May 31, 2024

hoyosjs commented May 31, 2024

tommcdon commented May 31, 2024

markoostveen commented May 31, 2024 • edited Loading

markoostveen commented May 31, 2024 • edited Loading

markoostveen commented Jun 28, 2024

tommcdon commented Jul 1, 2024

markoostveen commented Jan 2, 2025 • edited by AaronRobinsonMSFT Loading

markoostveen commented Jan 2, 2025 • edited Loading

jkotas commented Jan 2, 2025

markoostveen commented May 28, 2024 •

edited

Loading

jkotas commented May 28, 2024 •

edited

Loading

markoostveen commented May 29, 2024 •

edited

Loading

markoostveen commented May 31, 2024 •

edited

Loading

markoostveen commented May 31, 2024 •

edited

Loading

markoostveen commented Jan 2, 2025 •

edited by AaronRobinsonMSFT

Loading

markoostveen commented Jan 2, 2025 •

edited

Loading