interpreting hang 'sample' output


James Walker
 

Sometimes I get a report of a hang from a user of my product, with output from the ‘sample’ tool but not enough information to replicate a hang on my end. Let’s say it starts like this:

Call graph:
909 Thread_121713 DispatchQueue_1: com.apple.main-thread (serial)
+ 909 start (in FrameForge Storyboard Studio 4.0) + 52 [0x102ace034]
+ 909 NSApplicationMain (in AppKit) + 777 [0x7fff2fbd4b86]
+ 909 -[NSApplication run] (in AppKit) + 658 [0x7fff2fc02d7e]
+ 909 ??? (in FrameForge Storyboard Studio 4.0) load address 0x102acc000 + 0xf4017 [0x102bc0017]
+ 909 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (in AppKit) + 1352 [0x7fff2fc11070]
+ 909 _DPSNextEvent (in AppKit) + 883 [0x7fff2fc12829]
+ 909 _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 64 [0x7fff315ca579]
+ 909 ReceiveNextEventCommon (in HIToolbox) + 584 [0x7fff315ca7d5]
+ 909 RunCurrentEventLoopInMode (in HIToolbox) + 292 [0x7fff315caabd]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 885 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ ! 885 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ ! 885 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ ! 885 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
+ 20 __CFRunLoopRun (in CoreFoundation) + 1871 [0x7fff3299cc7a]
+ ! 17 __CFRunLoopDoTimers (in CoreFoundation) + 322 [0x7fff329b7ed7]

I would have thought that once the tree starts branching, I’d see numbers that more or less add up to the 885, but no, they’re all much smaller. What does that tell me? I’m not seeing anywhere that a lot of samples are taken in my own code.

Near the end of the output, there is this, should it tell me anything?

Sort by top of stack, same collapsed (when >= 5):
__workq_kernreturn (in libsystem_kernel.dylib) 2727
mach_msg_trap (in libsystem_kernel.dylib) 2703
__psynch_cvwait (in libsystem_kernel.dylib) 1818
__select (in libsystem_kernel.dylib) 909
ftruncate (in libsystem_kernel.dylib) 7


John Brownie
 

James Walker wrote on 17/6/20 20:11:
Sometimes I get a report of a hang from a user of my product, with output from the ‘sample’ tool but not enough information to replicate a hang on my end. Let’s say it starts like this:

Call graph:
909 Thread_121713 DispatchQueue_1: com.apple.main-thread (serial)
+ 909 start (in FrameForge Storyboard Studio 4.0) + 52 [0x102ace034]
+ 909 NSApplicationMain (in AppKit) + 777 [0x7fff2fbd4b86]
+ 909 -[NSApplication run] (in AppKit) + 658 [0x7fff2fc02d7e]
+ 909 ??? (in FrameForge Storyboard Studio 4.0) load address 0x102acc000 + 0xf4017 [0x102bc0017]
+ 909 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (in AppKit) + 1352 [0x7fff2fc11070]
+ 909 _DPSNextEvent (in AppKit) + 883 [0x7fff2fc12829]
+ 909 _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 64 [0x7fff315ca579]
+ 909 ReceiveNextEventCommon (in HIToolbox) + 584 [0x7fff315ca7d5]
+ 909 RunCurrentEventLoopInMode (in HIToolbox) + 292 [0x7fff315caabd]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 885 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ ! 885 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ ! 885 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ ! 885 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
+ 20 __CFRunLoopRun (in CoreFoundation) + 1871 [0x7fff3299cc7a]
+ ! 17 __CFRunLoopDoTimers (in CoreFoundation) + 322 [0x7fff329b7ed7]

I would have thought that once the tree starts branching, I’d see numbers that more or less add up to the 885, but no, they’re all much smaller. What does that tell me? I’m not seeing anywhere that a lot of samples are taken in my own code.
I think you're misinterpreting the tree. The 885 has two lower calls, each at 885. The next one with 20 is a child of the 909 CFRunLoopRunSpecific, bringing that 909 up to 905, with a further four branches in nodes further down. Similarly, you have 17 of the 20 in __CFRunLoopRun, with another three calls not in the snippet.

I can't help with the other question.

John
--
John Brownie
Mussau-Emira language, New Ireland Province, Papua New Guinea
Kouvola, Finland


James Walker
 

On Jun 17, 2020, at 10:39 AM, John Brownie <john_brownie@sil.org> wrote:

James Walker wrote on 17/6/20 20:11:
Sometimes I get a report of a hang from a user of my product, with output from the ‘sample’ tool but not enough information to replicate a hang on my end. Let’s say it starts like this:

Call graph:
909 Thread_121713 DispatchQueue_1: com.apple.main-thread (serial)
+ 909 start (in FrameForge Storyboard Studio 4.0) + 52 [0x102ace034]
+ 909 NSApplicationMain (in AppKit) + 777 [0x7fff2fbd4b86]
+ 909 -[NSApplication run] (in AppKit) + 658 [0x7fff2fc02d7e]
+ 909 ??? (in FrameForge Storyboard Studio 4.0) load address 0x102acc000 + 0xf4017 [0x102bc0017]
+ 909 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (in AppKit) + 1352 [0x7fff2fc11070]
+ 909 _DPSNextEvent (in AppKit) + 883 [0x7fff2fc12829]
+ 909 _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 64 [0x7fff315ca579]
+ 909 ReceiveNextEventCommon (in HIToolbox) + 584 [0x7fff315ca7d5]
+ 909 RunCurrentEventLoopInMode (in HIToolbox) + 292 [0x7fff315caabd]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 885 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ ! 885 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ ! 885 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ ! 885 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
+ 20 __CFRunLoopRun (in CoreFoundation) + 1871 [0x7fff3299cc7a]
+ ! 17 __CFRunLoopDoTimers (in CoreFoundation) + 322 [0x7fff329b7ed7]

I would have thought that once the tree starts branching, I’d see numbers that more or less add up to the 885, but no, they’re all much smaller. What does that tell me? I’m not seeing anywhere that a lot of samples are taken in my own code.
I think you're misinterpreting the tree. The 885 has two lower calls, each at 885. The next one with 20 is a child of the 909 CFRunLoopRunSpecific, bringing that 909 up to 905, with a further four branches in nodes further down. Similarly, you have 17 of the 20 in __CFRunLoopRun, with another three calls not in the snippet.
OK, but I guess what’s really confusing me is that most of the samples reach a dead end in mach_msg_trap. Should I just ignore that, and concentrate on the stuff under __CFRunLoopRun?


Gary L. Wade
 

We really can’t tell much from just this sample, what with it just being the main thread.  One easy thing to consider is the code in your timers to see if there’s an issue there (your shown leaf call on the main thread is __CFRunLoopDoTimers), and I would look at the call graphs of the other threads to see if you're doing some main-thread-only stuff on a background thread, or possibly trying to do a sync call on the main thread at the end of a background thread callback.

On Jun 17, 2020, at 11:43 AM, James Walker <list2@...> wrote:



On Jun 17, 2020, at 10:39 AM, John Brownie <john_brownie@...> wrote:

James Walker wrote on 17/6/20 20:11:
Sometimes I get a report of a hang from a user of my product, with output from the ‘sample’ tool but not enough information to replicate a hang on my end.  Let’s say it starts like this:

Call graph:
  909 Thread_121713   DispatchQueue_1: com.apple.main-thread  (serial)
  + 909 start  (in FrameForge Storyboard Studio 4.0) + 52  [0x102ace034]
  +   909 NSApplicationMain  (in AppKit) + 777  [0x7fff2fbd4b86]
  +     909 -[NSApplication run]  (in AppKit) + 658  [0x7fff2fc02d7e]
  +       909 ???  (in FrameForge Storyboard Studio 4.0)  load address 0x102acc000 + 0xf4017  [0x102bc0017]
  +         909 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit) + 1352  [0x7fff2fc11070]
  +           909 _DPSNextEvent  (in AppKit) + 883  [0x7fff2fc12829]
  +             909 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 64  [0x7fff315ca579]
  +               909 ReceiveNextEventCommon  (in HIToolbox) + 584  [0x7fff315ca7d5]
  +                 909 RunCurrentEventLoopInMode  (in HIToolbox) + 292  [0x7fff315caabd]
  +                   909 CFRunLoopRunSpecific  (in CoreFoundation) + 462  [0x7fff3299bece]
  +                     885 __CFRunLoopRun  (in CoreFoundation) + 1319  [0x7fff3299ca52]
  +                     ! 885 __CFRunLoopServiceMachPort  (in CoreFoundation) + 247  [0x7fff3299df85]
  +                     !   885 mach_msg  (in libsystem_kernel.dylib) + 60  [0x7fff6c941170]
  +                     !     885 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x7fff6c940dfa]
  +                     20 __CFRunLoopRun  (in CoreFoundation) + 1871  [0x7fff3299cc7a]
  +                     ! 17 __CFRunLoopDoTimers  (in CoreFoundation) + 322  [0x7fff329b7ed7]

I would have thought that once the tree starts branching, I’d see numbers that more or less add up to the 885, but no, they’re all much smaller.  What does that tell me?  I’m not seeing anywhere that a lot of samples are taken in my own code.
I think you're misinterpreting the tree. The 885 has two lower calls, each at 885. The next one with 20 is a child of the 909 CFRunLoopRunSpecific, bringing that 909 up to 905, with a further four branches in nodes further down. Similarly, you have 17 of the 20 in __CFRunLoopRun, with another three calls not in the snippet.

OK, but I guess what’s really confusing me is that most of the samples reach a dead end in mach_msg_trap.  Should I just ignore that, and concentrate on the stuff under __CFRunLoopRun?



James Walker
 

On Jun 17, 2020, at 2:20 PM, Gary L. Wade <garywade@desisoftsystems.com> wrote:

We really can’t tell much from just this sample, what with it just being the main thread. One easy thing to consider is the code in your timers to see if there’s an issue there (your shown leaf call on the main thread is __CFRunLoopDoTimers), and I would look at the call graphs of the other threads to see if you're doing some main-thread-only stuff on a background thread, or possibly trying to do a sync call on the main thread at the end of a background thread callback.
--
Gary L. Wade
http://www.garywade.com/
I figured that the main thread would be the main point, because an app is hung when the main thread is not responding to events. And it doesn’t look like much of interest is going on in the other threads, but I’ll show them.

909 Thread_121790: com.apple.NSURLConnectionLoader
+ 909 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb8b]
+ 909 _pthread_start (in libsystem_pthread.dylib) + 148 [0x7fff6ca04109]
+ 909 __NSThread__start__ (in Foundation) + 1064 [0x7fff3502c7a2]
+ 909 ??? (in CFNetwork) load address 0x7fff3118c000 + 0x4a02a [0x7fff311d602a]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 909 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ 909 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ 909 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ 909 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
909 Thread_121819: com.apple.CFSocket.private
+ 909 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb8b]
+ 909 _pthread_start (in libsystem_pthread.dylib) + 148 [0x7fff6ca04109]
+ 909 __CFSocketManager (in CoreFoundation) + 641 [0x7fff329c6ce3]
+ 909 __select (in libsystem_kernel.dylib) + 10 [0x7fff6c9490fe]
909 Thread_121861: com.apple.NSEventThread
+ 909 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb8b]
+ 909 _pthread_start (in libsystem_pthread.dylib) + 148 [0x7fff6ca04109]
+ 909 _NSEventThread (in AppKit) + 132 [0x7fff2fdb4144]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 909 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ 909 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ 909 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ 909 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
909 Thread_121875
+ 909 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb8b]
+ 909 _pthread_start (in libsystem_pthread.dylib) + 148 [0x7fff6ca04109]
+ 909 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) (in SBEngineLib4) + 36 [0x1035dbc1f]
+ 909 WorkerThreadLoop() (in SBEngineLib4) + 69 [0x1035db0b6]
+ 909 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (in libc++.1.dylib) + 18 [0x7fff69ad3592]
+ 909 _pthread_cond_wait (in libsystem_pthread.dylib) + 698 [0x7fff6ca04425]
+ 909 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff6c943882]
909 Thread_122001
+ 909 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb8b]
+ 909 _pthread_start (in libsystem_pthread.dylib) + 148 [0x7fff6ca04109]
+ 909 ??? (in AppleVPA) load address 0x7fff46774000 + 0x28a30 [0x7fff4679ca30]
+ 909 _pthread_cond_wait (in libsystem_pthread.dylib) + 698 [0x7fff6ca04425]
+ 909 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff6c943882]
909 Thread_128889
+ 909 start_wqthread (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb77]
+ 909 _pthread_wqthread (in libsystem_pthread.dylib) + 390 [0x7fff6ca00aa1]
+ 909 __workq_kernreturn (in libsystem_kernel.dylib) + 10 [0x7fff6c9424ce]
909 Thread_129571
+ 909 start_wqthread (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb77]
+ 909 _pthread_wqthread (in libsystem_pthread.dylib) + 390 [0x7fff6ca00aa1]
+ 909 __workq_kernreturn (in libsystem_kernel.dylib) + 10 [0x7fff6c9424ce]
909 Thread_130176
909 start_wqthread (in libsystem_pthread.dylib) + 15 [0x7fff6c9ffb77]
909 _pthread_wqthread (in libsystem_pthread.dylib) + 390 [0x7fff6ca00aa1]
909 __workq_kernreturn (in libsystem_kernel.dylib) + 10 [0x7fff6c9424ce]



On Jun 17, 2020, at 11:43 AM, James Walker <list2@jwwalker.com> wrote:



On Jun 17, 2020, at 10:39 AM, John Brownie <john_brownie@sil.org> wrote:

James Walker wrote on 17/6/20 20:11:
Sometimes I get a report of a hang from a user of my product, with output from the ‘sample’ tool but not enough information to replicate a hang on my end. Let’s say it starts like this:

Call graph:
909 Thread_121713 DispatchQueue_1: com.apple.main-thread (serial)
+ 909 start (in FrameForge Storyboard Studio 4.0) + 52 [0x102ace034]
+ 909 NSApplicationMain (in AppKit) + 777 [0x7fff2fbd4b86]
+ 909 -[NSApplication run] (in AppKit) + 658 [0x7fff2fc02d7e]
+ 909 ??? (in FrameForge Storyboard Studio 4.0) load address 0x102acc000 + 0xf4017 [0x102bc0017]
+ 909 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (in AppKit) + 1352 [0x7fff2fc11070]
+ 909 _DPSNextEvent (in AppKit) + 883 [0x7fff2fc12829]
+ 909 _BlockUntilNextEventMatchingListInModeWithFilter (in HIToolbox) + 64 [0x7fff315ca579]
+ 909 ReceiveNextEventCommon (in HIToolbox) + 584 [0x7fff315ca7d5]
+ 909 RunCurrentEventLoopInMode (in HIToolbox) + 292 [0x7fff315caabd]
+ 909 CFRunLoopRunSpecific (in CoreFoundation) + 462 [0x7fff3299bece]
+ 885 __CFRunLoopRun (in CoreFoundation) + 1319 [0x7fff3299ca52]
+ ! 885 __CFRunLoopServiceMachPort (in CoreFoundation) + 247 [0x7fff3299df85]
+ ! 885 mach_msg (in libsystem_kernel.dylib) + 60 [0x7fff6c941170]
+ ! 885 mach_msg_trap (in libsystem_kernel.dylib) + 10 [0x7fff6c940dfa]
+ 20 __CFRunLoopRun (in CoreFoundation) + 1871 [0x7fff3299cc7a]
+ ! 17 __CFRunLoopDoTimers (in CoreFoundation) + 322 [0x7fff329b7ed7]

I would have thought that once the tree starts branching, I’d see numbers that more or less add up to the 885, but no, they’re all much smaller. What does that tell me? I’m not seeing anywhere that a lot of samples are taken in my own code.
I think you're misinterpreting the tree. The 885 has two lower calls, each at 885. The next one with 20 is a child of the 909 CFRunLoopRunSpecific, bringing that 909 up to 905, with a further four branches in nodes further down. Similarly, you have 17 of the 20 in __CFRunLoopRun, with another three calls not in the snippet.
OK, but I guess what’s really confusing me is that most of the samples reach a dead end in mach_msg_trap. Should I just ignore that, and concentrate on the stuff under __CFRunLoopRun?