Ticket #33 (assigned defect)

Opened 11 months ago

Last modified 4 months ago

[PATCH] EventMachine will segfault occasionally on program end

Reported by: rogerdpack Assigned to: raggi (accepted)
Priority: minor Milestone:
Keywords: Cc: francis

Description

If something interrupts EM currently during an unbind callback, EM will 'jump' to its release_machine block. At that point the Descriptors array is in a 'half finished' state, since it had been going through and deleting out members to be unbound. It therefore seg faults. This patch seems to do the trick [and, conveniently, might be slightly faster than the existing code, anyway]. Test included. Finally EM feels stable after this patch exists.

Attachments

assure_ends_well.diff (3.5 kB) - added by rogerdpack on 06/26/08 18:46:33.
patch and tests
test_handles_uncaught_exception.rb (1.5 kB) - added by rogerdpack on 07/26/08 03:57:18.

Change History

06/26/08 18:46:33 changed by rogerdpack

  • attachment assure_ends_well.diff added.

patch and tests

06/28/08 10:29:10 changed by rogerdpack

I may need to submit an updated version of this patch that removes the Descriptor from the list before calling delete, in all cases. The one submitted currently works, however.

06/30/08 13:28:13 changed by rogerdpack

these changes need to be carefully examined to make sure that they remote the socket from the Descriptors list BEFORE being deleted. It may be the case it does it right now. I'm not sure.

06/30/08 20:20:05 changed by raggi

  • owner set to raggi.
  • priority changed from major to minor.
  • status changed from new to assigned.

Interesting, will check back on this as soon as runtime patches are cleared, lowering priority until then, as it's an on-exit condition.

07/25/08 11:45:49 changed by raggi

  • cc set to francis.

Summary of discussion:

  • unbind callbacks need a refactor on the C++ side. Francis said he wants to move them out of the C++ destructor.
  • some of the errors that reach event_callback could do with more debugging info, will create a separate ticket for that.

It may also be relevant to add an errback system to run, which catches errors that fall through from event_callback.

07/25/08 14:26:08 changed by rogerdpack

There was an errback system but it was deemed to costly performance-wise. I would support an errback 'begin rescue' around EM.run itself, though [before the ensure that calls .release_system or what not].

It appears that v 0.12.0 and SVN trunk do this:

/opt/local/lib/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:226: [BUG] Segmentation fault ruby 1.8.6 (2008-03-03) [i686-darwin9.2.0]

while with the patch they do

test_ends_well_multi_thread(DeathEnd?): EventMachine::ConnectionNotBound?: EventMachine::ConnectionNotBound?

So I think what this is showing is that the first problem has been overcome, which revealed another problem. I'll look into it and see if it's easily fixable, as well as release a 'higher quality' version of this patch. -R

07/26/08 03:54:49 changed by rogerdpack

Francis' patch 756 seems to fix this by disallowing the "unbind newly dead sockets loop" to be interrupted, thus the stack isn't abandoned prematurely causing our old seg faults.

Unfortunately, now if several throw uncaught exceptions during unbind, it effectively 'discards' all the the last exception, and has a small performance hit. It does fix the bug in question, though.

I may be able to get my patch to work as well as his, however :) I might look at it sometime. Note that with the patch here we get 'connection unbound errors' which seems to match what happens currently when OTHER methods raise in a similar manner, as well. See next attached. Thanks! -R

07/26/08 03:57:18 changed by rogerdpack

  • attachment test_handles_uncaught_exception.rb added.

01/27/09 18:25:14 changed by raggi

Merged into:

http://github.com/eventmachine/eventmachine/tree/ends_well

Highlights some new errors on trunk, where we're doing connection unbinding before setting up the binding.