Ticket #40 (new defect)

Opened 11 months ago

Last modified 2 months ago

making several calls to send_data with small chunks of data is significantly slower than making a few calls with bigger chunks

Reported by: tmm1 Assigned to: tmm1
Priority: minor Milestone: EventMachine "Framework"
Keywords: Cc: francis wyhaines

Description

wyhaines: I have noticed that making several calls to send_data with small chunks of data is significantly slower than making a few calls with bigger chunks.
aman: ooh interesting
aman: any idea why
wyhaines: Yeah.  So, when I send static files, I will aggregate strings together into one bigger string and accept the cost for that instead of the cost for several small sends.
wyhaines: I had started looking at the code that handles the send queue.  It's _supposed_ to aggregate small chunks itself, but I think there's just something not working right in there.
wyhaines: I haven't looked at it enough to figure it out, though.
wyhaines: I do think there is a problem there, though.
wyhaines: The C++ code should be able to aggregate chunks more efficiently than doing it on the Ruby side.

Attachments

1741.diff (2.4 kB) - added by tmm1 on 06/30/08 18:24:26.
writev instead of send

Change History

06/30/08 18:24:05 changed by tmm1

One possible approach is to let the kernel handle the coalescing of multiple OutboundData? pages instead of doing it in the reactor: http://p.ramaze.net/1741. However, in my tests, this did not make a significant difference.

06/30/08 18:24:26 changed by tmm1

  • attachment 1741.diff added.

writev instead of send

06/30/08 19:56:19 changed by raggi

Awesome, I will poke Francis on this when I next speak to him.

This may actually solve another ticket.

07/09/08 20:32:43 changed by tmm1

fix memory leak in my patch:

--- a/ext/ed.cpp
+++ b/ext/ed.cpp
@@ -627,11 +627,12 @@ void ConnectionDescriptor::_WriteOutboundData()
 
                int sent = bytes_written;
                for(int i = 0; i < iovcnt; i++) {
+                       OutboundPage *op = &(OutboundPages[0]);
                        if (iov[i].iov_len <= sent) {
+                               op->Free();
                                OutboundPages.pop_front();
                                sent -= iov[i].iov_len;
                        } else {
-                               OutboundPage *op = &(OutboundPages[i]);
                                op->Offset += sent;
                        }
                }

07/22/08 12:49:25 changed by raggi

  • priority changed from critical to minor.

demoted to minor until it comes out of research dev

07/22/08 15:19:35 changed by raggi

  • owner changed from raggi to tmm1.

Re-assign to tmm1 for tracking, patch can be applied by whoever when ready.

03/20/09 03:49:22 changed by tmm1

This test case displays the issue.

# starting big at Thu Mar 19 23:35:13 -0700 2009
# big done, took 0.003532 seconds

# starting big at Thu Mar 19 23:35:14 -0700 2009
# big done, took 0.002954 seconds

# starting small at Thu Mar 19 23:35:14 -0700 2009
# small done, took 0.350519 seconds

# starting small at Thu Mar 19 23:35:15 -0700 2009
# small done, took 0.344344 seconds

require 'rubygems'
require 'eventmachine'

module MeasureSendDataTime
  SmallData = 'abc'.freeze
  BigData = ('abc'*100_000).freeze
  def post_init
    @start = Time.now
    @type = rand(2)>0 ? :small : :big
    puts "starting #{@type} at #{@start}"

    if @type == :small
      100_000.times{ send_data SmallData }
    else
      send_data BigData
    end

    close_connection_after_writing
  end
  def unbind
    puts "#{@type} done, took #{Time.now-@start} seconds"
  end
end

EM.run{
  EM.start_server '127.0.0.1', 8080, MeasureSendDataTime
}

03/20/09 03:50:09 changed by tmm1

Updated patch to use writev() if possible:

diff --git a/ext/ed.cpp b/ext/ed.cpp
index ae88e7f..f23602f 100644
--- a/ext/ed.cpp
+++ b/ext/ed.cpp
@@ -624,9 +624,22 @@ void ConnectionDescriptor::_WriteOutboundData()
 	}
 
 	LastIo = gCurrentLoopTime;
-	char output_buffer [16 * 1024];
 	size_t nbytes = 0;
 
+	#ifdef HAVE_WRITEV
+	int iovcnt = OutboundPages.size();
+	struct iovec iov[ iovcnt ];
+
+	for(int i = 0; i < iovcnt; i++){
+		OutboundPage *op = &(OutboundPages[i]);
+		iov[i].iov_base = (void *)(op->Buffer + op->Offset);
+		iov[i].iov_len	= op->Length - op->Offset;
+
+		nbytes += iov[i].iov_len;
+	}
+	#else
+	char output_buffer [16 * 1024];
+
 	while ((OutboundPages.size() > 0) && (nbytes < sizeof(output_buffer))) {
 		OutboundPage *op = &(OutboundPages[0]);
 		if ((nbytes + op->Length - op->Offset) < sizeof (output_buffer)) {
@@ -642,6 +655,7 @@ void ConnectionDescriptor::_WriteOutboundData()
 			nbytes += len;
 		}
 	}
+	#endif
 
 	// We should never have gotten here if there were no data to write,
 	// so assert that as a sanity check.
@@ -650,7 +664,11 @@ void ConnectionDescriptor::_WriteOutboundData()
 	assert (nbytes > 0);
 
 	assert (GetSocket() != INVALID_SOCKET);
+	#ifdef HAVE_WRITEV
+	int bytes_written = writev(GetSocket(), iov, iovcnt);
+	#else
 	int bytes_written = send (GetSocket(), output_buffer, nbytes, 0);
+	#endif
 
 	bool err = false;
 	if (bytes_written < 0) {
@@ -660,6 +678,21 @@ void ConnectionDescriptor::_WriteOutboundData()
 
 	assert (bytes_written >= 0);
 	OutboundDataSize -= bytes_written;
+
+	#ifdef HAVE_WRITEV
+	int sent = bytes_written;
+	for(int i = 0; i < iovcnt; i++) {
+		OutboundPage *op = &(OutboundPages[i]);
+
+		if (iov[i].iov_len <= sent) {
+			op->Free();
+			OutboundPages.pop_front();
+			sent -= iov[i].iov_len;
+		} else {
+			op->Offset += sent;
+		}
+	}
+	#else
 	if ((size_t)bytes_written < nbytes) {
 		int len = nbytes - bytes_written;
 		char *buffer = (char*) malloc (len + 1);
@@ -669,6 +702,7 @@ void ConnectionDescriptor::_WriteOutboundData()
 		buffer [len] = 0;
 		OutboundPages.push_front (OutboundPage (buffer, len));
 	}
+	#endif
 
 	#ifdef HAVE_EPOLL
 	EpollEvent.events = (EPOLLIN | (SelectForWrite() ? EPOLLOUT : 0));
diff --git a/ext/extconf.rb b/ext/extconf.rb
index 3f94f6f..9c9db0a 100644
--- a/ext/extconf.rb
+++ b/ext/extconf.rb
@@ -17,6 +17,7 @@ add_define 'HAVE_RBTRAP' if have_var('rb_trap_immediate', ['ruby.h', 'rubysig.h'
 add_define "HAVE_TBR" if have_func('rb_thread_blocking_region')# and have_macro('RUBY_UBF_IO', 'ruby.h')
 add_define "HAVE_INOTIFY" if inotify = have_func('inotify_init', 'sys/inotify.h')
 add_define "HAVE_OLD_INOTIFY" if !inotify && have_macro('__NR_inotify_init', 'sys/syscall.h')
+add_define 'HAVE_WRITEV' if have_header('sys/uio.h')
 
 # Minor platform details between *nix and Windows:
 
diff --git a/ext/project.h b/ext/project.h
index eff5105..494ea73 100644
--- a/ext/project.h
+++ b/ext/project.h
@@ -119,6 +119,10 @@ static inline int inotify_rm_watch (int fd, __u32 wd) { return syscall (__NR_ino
 #define INOTIFY_EVENT_SIZE  (sizeof(struct inotify_event))
 #endif
 
+#ifdef HAVE_WRITEV
+#include <sys/uio.h>
+#endif
+
 #include "binder.h"
 #include "em.h"
 #include "epoll.h"

03/20/09 04:18:26 changed by tmm1

Oops, last patch had a nasty bug in cleaning up the OutboundPages? (failed the test suite with double and non-aligned free() errors):

diff --git a/ext/ed.cpp b/ext/ed.cpp
index ae88e7f..3205500 100644
--- a/ext/ed.cpp
+++ b/ext/ed.cpp
@@ -624,9 +624,22 @@ void ConnectionDescriptor::_WriteOutboundData()
 	}
 
 	LastIo = gCurrentLoopTime;
-	char output_buffer [16 * 1024];
 	size_t nbytes = 0;
 
+	#ifdef HAVE_WRITEV
+	int iovcnt = OutboundPages.size();
+	struct iovec iov[ iovcnt ];
+
+	for(int i = 0; i < iovcnt; i++){
+		OutboundPage *op = &(OutboundPages[i]);
+		iov[i].iov_base = (void *)(op->Buffer + op->Offset);
+		iov[i].iov_len	= op->Length - op->Offset;
+
+		nbytes += iov[i].iov_len;
+	}
+	#else
+	char output_buffer [16 * 1024];
+
 	while ((OutboundPages.size() > 0) && (nbytes < sizeof(output_buffer))) {
 		OutboundPage *op = &(OutboundPages[0]);
 		if ((nbytes + op->Length - op->Offset) < sizeof (output_buffer)) {
@@ -642,6 +655,7 @@ void ConnectionDescriptor::_WriteOutboundData()
 			nbytes += len;
 		}
 	}
+	#endif
 
 	// We should never have gotten here if there were no data to write,
 	// so assert that as a sanity check.
@@ -650,7 +664,11 @@ void ConnectionDescriptor::_WriteOutboundData()
 	assert (nbytes > 0);
 
 	assert (GetSocket() != INVALID_SOCKET);
+	#ifdef HAVE_WRITEV
+	int bytes_written = writev(GetSocket(), iov, iovcnt);
+	#else
 	int bytes_written = send (GetSocket(), output_buffer, nbytes, 0);
+	#endif
 
 	bool err = false;
 	if (bytes_written < 0) {
@@ -660,6 +678,23 @@ void ConnectionDescriptor::_WriteOutboundData()
 
 	assert (bytes_written >= 0);
 	OutboundDataSize -= bytes_written;
+
+	#ifdef HAVE_WRITEV
+	int sent = bytes_written;
+	deque<OutboundPage>::iterator op = OutboundPages.begin();
+	for (int i = 0; i < iovcnt; i++) {
+		if (iov[i].iov_len <= sent) {
+			OutboundPages.pop_front();
+			op->Free();
+			sent -= iov[i].iov_len;
+		} else {
+			op->Offset += sent;
+		}
+
+		if (op == OutboundPages.end()) break;
+		*op++;
+	}
+	#else
 	if ((size_t)bytes_written < nbytes) {
 		int len = nbytes - bytes_written;
 		char *buffer = (char*) malloc (len + 1);
@@ -669,6 +704,7 @@ void ConnectionDescriptor::_WriteOutboundData()
 		buffer [len] = 0;
 		OutboundPages.push_front (OutboundPage (buffer, len));
 	}
+	#endif
 
 	#ifdef HAVE_EPOLL
 	EpollEvent.events = (EPOLLIN | (SelectForWrite() ? EPOLLOUT : 0));
diff --git a/ext/extconf.rb b/ext/extconf.rb
index 3f94f6f..9c9db0a 100644
--- a/ext/extconf.rb
+++ b/ext/extconf.rb
@@ -17,6 +17,7 @@ add_define 'HAVE_RBTRAP' if have_var('rb_trap_immediate', ['ruby.h', 'rubysig.h'
 add_define "HAVE_TBR" if have_func('rb_thread_blocking_region')# and have_macro('RUBY_UBF_IO', 'ruby.h')
 add_define "HAVE_INOTIFY" if inotify = have_func('inotify_init', 'sys/inotify.h')
 add_define "HAVE_OLD_INOTIFY" if !inotify && have_macro('__NR_inotify_init', 'sys/syscall.h')
+add_define 'HAVE_WRITEV' if have_header('sys/uio.h')
 
 # Minor platform details between *nix and Windows:
 
diff --git a/ext/project.h b/ext/project.h
index eff5105..494ea73 100644
--- a/ext/project.h
+++ b/ext/project.h
@@ -119,6 +119,10 @@ static inline int inotify_rm_watch (int fd, __u32 wd) { return syscall (__NR_ino
 #define INOTIFY_EVENT_SIZE  (sizeof(struct inotify_event))
 #endif
 
+#ifdef HAVE_WRITEV
+#include <sys/uio.h>
+#endif
+
 #include "binder.h"
 #include "em.h"
 #include "epoll.h"

03/21/09 03:28:44 changed by yakischloba

In the above test, main slowdown seems to be simply in method dispatch and referencing of the string being passed. Using Ruby 1.9 with tcmalloc gave a 35% speed increase using above test. malloc is called in the reactor once for each call to send_data, but this was only 1/3 or less of the total operation cost. Seems to be more a ruby issue than with the reactor.