Ticket #84 (closed defect: fixed)

Opened 9 months ago

Last modified 2 months ago

epoll causes thread priority issues in multi thread Ruby

Reported by: rogerdpack Assigned to:
Priority: major Milestone: EventMachine "Framework"
Keywords: Cc: raggi, rogerdpack

Description

At least with 1.8.6, if you run it with epoll, the epoll thread seems to hog almost all the cpu time.

Attachments

noepoll+thread.gif (144.6 kB) - added by tmm1 on 04/18/09 01:34:59.
select reactor with threads
epoll+thread.gif (135.8 kB) - added by tmm1 on 04/18/09 01:37:33.
epoll reactor with threads (EM.defer usage)

Change History

09/25/08 17:45:56 changed by tmm1

  • cc set to raggi.
  • milestone set to EventMachine "Framework".

Could you elaborate? Are you talking about the following behavior:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m5.382s
user	0m0.056s
sys	0m0.004s

$ time ruby -rubygems -e "require 'eventmachine'; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.250s
user	0m0.048s
sys	0m0.016s

If so, you can try the following patch:

diff --git a/ext/em.cpp b/ext/em.cpp
index 4094755..0ebc2ca 100644
--- a/ext/em.cpp
+++ b/ext/em.cpp
@@ -425,8 +425,19 @@ bool EventMachine_t::_RunEpollOnce()
 {
        #ifdef HAVE_EPOLL
        assert (epfd != -1);
+
+       // default epoll timeout of -1 (wait indefinitely for an event)
+       // if there are ruby threads or pending timers, use a timeout of 0 (return immediately)
+       int timeout = -1;
+       #ifdef BUILD_FOR_RUBY
+       if (!rb_thread_alone())
+               timeout = 0;
+       #endif
+       if (Timers.size() > 0)
+               timeout = 0;
+
        struct epoll_event ev [MaxEpollDescriptors];
-       int s = epoll_wait (epfd, ev, MaxEpollDescriptors, 50);
+       int s = epoll_wait (epfd, ev, MaxEpollDescriptors, timeout);
        if (s > 0) {
                for (int i=0; i < s; i++) {
                        EventableDescriptor *ed = (EventableDescriptor*) ev[i].data.ptr;
@@ -511,8 +522,12 @@ bool EventMachine_t::_RunEpollOnce()
                }
        }
 
-       timeval tv = {0,0};
-       EmSelect (0, NULL, NULL, NULL, &tv);
+       #ifdef BUILD_FOR_RUBY
+       if (!rb_thread_alone()) {
+               timeval tv = {0,0};
+               rb_thread_select (0, NULL, NULL, NULL, &tv);
+       }
+       #endif
 
        return true;
        #else

10/10/08 17:46:24 changed by tmm1

  • cc changed from raggi to raggi, rogerdpack.

Before Patch:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m5.333s
user	0m0.048s
sys	0m0.016s

$ time ruby -rubygems -e "require 'eventmachine'; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.249s
user	0m0.044s
sys	0m0.020s

After Patch:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.084s
user	0m0.080s
sys	0m0.004s

$ time ruby -rubygems -e "require 'eventmachine'; EM.run{ EM.add_timer(0){ Thread.new{ 100.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.242s
user	0m0.048s
sys	0m0.012s

10/10/08 18:26:13 changed by tmm1

And again with 100k iterations:

Before Patch:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

(took too long: epoll_wait is called with a 50 millisecond timeout, multiplied by 100k iterations = 83 minutes)

$ time ruby -rubygems -e "require 'eventmachine'; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.522s
user	0m0.236s
sys	0m0.108s

After Patch:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m21.694s
user	0m20.521s
sys	0m0.244s

$ time ruby -rubygems -e "require 'eventmachine'; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.521s
user	0m0.264s
sys	0m0.080s

10/10/08 19:10:52 changed by tmm1

Updated patch, waits 10 milliseconds when only timers exist to give them a chance to expire, and uses rb_thread_schedule:

diff --git a/ext/em.cpp b/ext/em.cpp
index f7339ee..380e4e9 100644
--- a/ext/em.cpp
+++ b/ext/em.cpp
@@ -425,8 +425,20 @@ bool EventMachine_t::_RunEpollOnce()
 {
        #ifdef HAVE_EPOLL
        assert (epfd != -1);
+
+       // default epoll timeout of -1 (wait indefinitely for an event)
+       // if there are pending timers, use a timeout of 10 (give timers 10 milliseconds to expire)
+       // if there are ruby threads, use a timeout of 0 (return immediately)
+       int timeout = -1;
+       if (Timers.size() > 0)
+               timeout = 10;
+       #ifdef BUILD_FOR_RUBY
+       if (!rb_thread_alone())
+               timeout = 0;
+       #endif
+
        struct epoll_event ev [MaxEpollDescriptors];
-       int s = epoll_wait (epfd, ev, MaxEpollDescriptors, 50);
+       int s = epoll_wait (epfd, ev, MaxEpollDescriptors, timeout);
        if (s > 0) {
                for (int i=0; i < s; i++) {
                        EventableDescriptor *ed = (EventableDescriptor*) ev[i].data.ptr;
@@ -511,8 +523,11 @@ bool EventMachine_t::_RunEpollOnce()
                }
        }
 
-       timeval tv = {0,0};
-       EmSelect (0, NULL, NULL, NULL, &tv);
+       #ifdef BUILD_FOR_RUBY
+       if (!rb_thread_alone()) {
+               rb_thread_schedule();
+       }
+       #endif
 
        return true;
        #else

rb_thread_schedule makes it a little bit faster:

$ time ruby -rubygems -e "require 'eventmachine'; EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m19.932s
user	0m19.733s
sys	0m0.172s

12/03/08 20:14:33 changed by tmm1

Updated patch again. Had to add a check for gTerminateSignalReceived to make sure we didn't block when EM.stop was called.

It is possible this patch will break comm_inactivity_timeout/heartbeats, since we could be blocking in an epoll_wait and not notice that there hasn't been any i/o for 2 seconds.

Other than that the patch seems to work and all the tests pass:

diff --git a/ext/em.cpp b/ext/em.cpp
index e222a59..1aa4c95 100644
--- a/ext/em.cpp
+++ b/ext/em.cpp
@@ -425,8 +425,23 @@ bool EventMachine_t::_RunEpollOnce()
 {
 	#ifdef HAVE_EPOLL
 	assert (epfd != -1);
+
+	// default epoll timeout of -1 (wait indefinitely for an event)
+	// if there are pending timers, use a timeout of 10 (give timers 10 milliseconds to expire)
+	// if there are ruby threads, use a timeout of 0 (return immediately)
+	// if a reactor halt is scheduled, use a timeout of 0 (return immediately)
+	int timeout = -1;
+	if (Timers.size() > 0)
+		timeout = 10;
+	#ifdef BUILD_FOR_RUBY
+	if (!rb_thread_alone())
+		timeout = 0;
+	#endif
+	if (gTerminateSignalReceived)
+		timeout = 0;
+
 	struct epoll_event ev [MaxEpollDescriptors];
-	int s = epoll_wait (epfd, ev, MaxEpollDescriptors, 50);
+	int s = epoll_wait (epfd, ev, MaxEpollDescriptors, timeout);
 	if (s > 0) {
 		for (int i=0; i < s; i++) {
 			EventableDescriptor *ed = (EventableDescriptor*) ev[i].data.ptr;
@@ -511,8 +526,11 @@ bool EventMachine_t::_RunEpollOnce()
 		}
 	}
 
-	timeval tv = {0,0};
-	EmSelect (0, NULL, NULL, NULL, &tv);
+	#ifdef BUILD_FOR_RUBY
+	if (!rb_thread_alone()) {
+		rb_thread_schedule();
+	}
+	#endif
 
 	return true;
 	#else

12/04/08 20:46:23 changed by tmm1

It appears rb_thread_schedule tends to schedule the main thread (in which the epoll reactor is running), causing a lot of latency when using epoll:

$ time ruby -I ext:lib -r eventmachine -e "EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m22.369s
user	0m22.197s
sys	0m0.108s

Setting a priority on the spawned thread fixes the issue:

$ time ruby -I ext:lib -r eventmachine -e "EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ Thread.current.priority=1; 100_000.times{|i| p(i)}; EM.stop } } }" > /dev/null

real	0m0.437s
user	0m0.360s
sys	0m0.064s

01/22/09 20:48:41 changed by tmm1

Modified the patch to calculate the time until the next timer and wait that long, instead of waiting 10 milliseconds and checking RunTimers?() over and over:

diff --git a/ext/em.cpp b/ext/em.cpp
index e222a59..de35667 100644
--- a/ext/em.cpp
+++ b/ext/em.cpp
@@ -28,6 +28,9 @@ See the file COPYING for complete licensing information.
 // This avoids the need for frequent expensive calls to time(NULL);
 time_t gCurrentLoopTime;
 
+// Number of milliseconds until the next timer, calculated on each tick in RunTimers()
+int gMillisecondsTilTimer;
+
 #ifdef OS_WIN32
 unsigned gTickCountTickover;
 unsigned gLastTickCount;
@@ -425,8 +428,23 @@ bool EventMachine_t::_RunEpollOnce()
 {
 	#ifdef HAVE_EPOLL
 	assert (epfd != -1);
+
+	// default epoll timeout of -1 (wait indefinitely for an event)
+	// if there are pending timers, wait until the next timer needs to fire
+	// if there are ruby threads, use a timeout of 0 (return immediately)
+	// if a reactor halt is scheduled, use a timeout of 0 (return immediately)
+	int timeout = -1;
+	if (Timers.size() > 0)
+		timeout = gMillisecondsTilTimer;
+	#ifdef BUILD_FOR_RUBY
+	if (!rb_thread_alone())
+		timeout = 0;
+	#endif
+	if (gTerminateSignalReceived)
+		timeout = 0;
+
 	struct epoll_event ev [MaxEpollDescriptors];
-	int s = epoll_wait (epfd, ev, MaxEpollDescriptors, 50);
+	int s = epoll_wait (epfd, ev, MaxEpollDescriptors, timeout);
 	if (s > 0) {
 		for (int i=0; i < s; i++) {
 			EventableDescriptor *ed = (EventableDescriptor*) ev[i].data.ptr;
@@ -511,8 +529,11 @@ bool EventMachine_t::_RunEpollOnce()
 		}
 	}
 
-	timeval tv = {0,0};
-	EmSelect (0, NULL, NULL, NULL, &tv);
+	#ifdef BUILD_FOR_RUBY
+	if (!rb_thread_alone()) {
+		rb_thread_schedule();
+	}
+	#endif
 
 	return true;
 	#else
@@ -861,8 +882,10 @@ bool EventMachine_t::_RunTimers()
 		multimap<Int64,Timer_t>::iterator i = Timers.begin();
 		if (i == Timers.end())
 			break;
-		if (i->first > now)
+		if (i->first > now) {
+			gMillisecondsTilTimer = (i->first - now)/1000;
 			break;
+		}
 		if (EventCallback)
 			(*EventCallback) ("", EM_TIMER_FIRED, i->second.GetBinding().c_str(), i->second.GetBinding().length());
 		Timers.erase (i);

01/22/09 21:09:12 changed by tmm1

If I don't print out anything in the thread (which was causing ruby to call select on the stdout fd and wait on it), then the epoll version is actually faster:

$ time ruby -I ext:lib -reventmachine -e "EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| }; EM.stop } } }" > /dev/null

real	0m0.235s
user	0m0.044s
sys	0m0.008s

vs

$ time ruby -I ext:lib -reventmachine -e "EM.epoll; EM.run{ EM.add_timer(0){ Thread.new{ 100_000.times{|i| }; EM.stop } } }" > /dev/null

real	0m0.056s
user	0m0.036s
sys	0m0.024s

02/11/09 02:43:50 changed by tmm1

From http://rubyforge.org/pipermail/eventmachine-talk/2008-February/001683.html:

Fast (in all modes):
  def process
    send_data "whatever"
    close_connection_after_writing
  end

Super slow (only with epoll activated):
  def process
    before = lambda {
      send_data "whatever"
      close_connection_after_writing
    }

    EventMachine.defer(before, lambda { })
  end


ULTRA slow (only with epoll activated):
  def process
    before = lambda {
      LOG.info "whatever"
      send_data "whatever"
      close_connection_after_writing
    }

    EventMachine.defer(before, lambda { })
  end

The ULTRA slow happens because logging causes threads to call select() on stdout as previously discussed. I'm going to investigate the added latency in the epoll+threadpool case further.

03/07/09 02:33:29 changed by tmm1

Note that the patch breaks set_comm_inactivity_timeout.

04/18/09 01:34:59 changed by tmm1

  • attachment noepoll+thread.gif added.

select reactor with threads

04/18/09 01:37:33 changed by tmm1

  • attachment epoll+thread.gif added.

epoll reactor with threads (EM.defer usage)

04/28/09 21:00:42 changed by tmm1

  • status changed from new to closed.
  • resolution set to fixed.

google-perftools cpu profiling graphs showed a lot of time being spent in rb_thread_save/restore_context.. turns out this was because the RunEpollOnce? stack frame was almost 800kb, and that data was memcpy'd in and out of the C stack on every thread context switch.

By heap allocating the array of epoll events (as a private member var on EventMachine_t), we make GC faster (as it has less of a stack to scan for ruby VALUEs), and also make significant impact the speed of thread context switching when using epoll.

Fixed in 1f6a4c912256b8110af94e270f7dde486f3c9d75