Hi,
lshg/lsh (as of lsh 2.1 on GNU/Linux, x86_64) systematically fails for me when passed large data streams on stdin:
--8<---------------cut here---------------start------------->8--- $ lsh -G -B fencepost.gnu.org Passphrase for key `xxx@yyy':
$ lshg fencepost.gnu.org uname -o GNU/Linux
$ cat /dev/zero | lshg fencepost.gnu.org md5sum lsh: Protocol error: Write buffer full, peer not responding. lsh: write_buffer: Attempt to write data to closed buffer. lsh: Protocol error: Write buffer full, peer not responding. lsh: write_buffer: Attempt to write data to closed buffer.
[...]
lsh: Protocol error: Write buffer full, peer not responding. lshg: write_buffer: Attempt to write data to closed buffer.
$ killall lsh lsh: no process found --8<---------------cut here---------------end--------------->8---
Conversely, this works well:
--8<---------------cut here---------------start------------->8--- $ cat /dev/zero | lsh fencepost.gnu.org md5sum --8<---------------cut here---------------end--------------->8---
Any idea what’s wrong or how to debug it?
Thanks, Ludo’.
ludo@gnu.org (Ludovic Courtès) writes:
lshg/lsh (as of lsh 2.1 on GNU/Linux, x86_64) systematically fails for me when passed large data streams on stdin:
--8<---------------cut here---------------start------------->8--- $ lsh -G -B fencepost.gnu.org Passphrase for key `xxx@yyy':
$ lshg fencepost.gnu.org uname -o GNU/Linux
$ cat /dev/zero | lshg fencepost.gnu.org md5sum lsh: Protocol error: Write buffer full, peer not responding. lsh: write_buffer: Attempt to write data to closed buffer.
...
Conversely, this works well:
--8<---------------cut here---------------start------------->8--- $ cat /dev/zero | lsh fencepost.gnu.org md5sum
Any idea what’s wrong or how to debug it?
Definitely looks like something wrong in the flow control involving lsh and lshg.
You could first try increasing the WRITE_BUFFER_MARGIN in connection.c, but I doubt that will help (if that was the problem, you'd most likely see it also when lshg isn't involved).
You may also get some info using some or all of -v --trace and --debug, to the lsh and lshg processes.
It would also be interesting to see if the problem still exists in the latest version in the repo (which works quite differently; lshg is no longer a separate program, but you still setup the gateway with lsh -G, and then further invocations of lsh will try to use it).
It's some time since I worked on this code... The way it works, there's a soft_limit limiting the amount of data we're willing to keep buffered for writing to the socket. When this happens, the hard_limit is set. We will still generate new packets to be buffered if needed to respond to a key exhange, but otherwise, we're not supposed to generate new packets, and this basically works because read_data.c:do_read_query_data checks if connection->hard_limit is set.
Now, I think the problem is that the code reading from a gateway client socket doesn't check if hard_limit > 0. Not entirely trivial to fix. What needs to be done is to
1. Make gateway_commands.c:do_read_gateway check that flag,
if (self->connection->chain->hard_limit) ... In this case, return zero, but we also need to stop reading from the socket. Maybe one can have the caller, io.c:do_buffered_read, check if the return value is zero, and call lsh_oop_cancel_read_fd?
2. Somwhow use the wakeup mechanism (invoked from connection.c:do_connection_flow_controlled) to restart reading from the gateway socket.
Do you think you could write a test case? At the receiving end, it might help to have a slow receiver of data, say a data sink like
while true; do dd bs=1000 count=1 of=/dev/null; sleep 1; done
Regards, /Niels
Thanks, Ludo’. _______________________________________________ lsh-bugs mailing list lsh-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/lsh-bugs
nisse@lysator.liu.se (Niels Möller) skribis:
ludo@gnu.org (Ludovic Courtès) writes:
lshg/lsh (as of lsh 2.1 on GNU/Linux, x86_64) systematically fails for me when passed large data streams on stdin:
--8<---------------cut here---------------start------------->8--- $ lsh -G -B fencepost.gnu.org Passphrase for key `xxx@yyy':
$ lshg fencepost.gnu.org uname -o GNU/Linux
$ cat /dev/zero | lshg fencepost.gnu.org md5sum lsh: Protocol error: Write buffer full, peer not responding. lsh: write_buffer: Attempt to write data to closed buffer.
[...]
Do you think you could write a test case?
Does the one above exhibit the problem for you? It’s 100% reproducible here.
Thanks, Ludo’.
nisse@lysator.liu.se (Niels Möller) skribis:
ludo@gnu.org (Ludovic Courtès) writes:
lshg/lsh (as of lsh 2.1 on GNU/Linux, x86_64) systematically fails for me when passed large data streams on stdin:
--8<---------------cut here---------------start------------->8--- $ lsh -G -B fencepost.gnu.org Passphrase for key `xxx@yyy':
$ lshg fencepost.gnu.org uname -o GNU/Linux
$ cat /dev/zero | lshg fencepost.gnu.org md5sum lsh: Protocol error: Write buffer full, peer not responding. lsh: write_buffer: Attempt to write data to closed buffer.
...
Conversely, this works well:
--8<---------------cut here---------------start------------->8--- $ cat /dev/zero | lsh fencepost.gnu.org md5sum
Any idea what’s wrong or how to debug it?
[...]
Now, I think the problem is that the code reading from a gateway client socket doesn't check if hard_limit > 0. Not entirely trivial to fix. What needs to be done is to
Make gateway_commands.c:do_read_gateway check that flag,
if (self->connection->chain->hard_limit) ... In this case, return zero, but we also need to stop reading from the socket. Maybe one can have the caller, io.c:do_buffered_read, check if the return value is zero, and call lsh_oop_cancel_read_fd?
Somwhow use the wakeup mechanism (invoked from connection.c:do_connection_flow_controlled) to restart reading from the gateway socket.
Any update on that? :-)
Thanks, Ludo’.
ludo@gnu.org (Ludovic Courtès) writes:
nisse@lysator.liu.se (Niels Möller) skribis:
Now, I think the problem is that the code reading from a gateway client socket doesn't check if hard_limit > 0. Not entirely trivial to fix. What needs to be done is to
Make gateway_commands.c:do_read_gateway check that flag,
if (self->connection->chain->hard_limit) ... In this case, return zero, but we also need to stop reading from the socket. Maybe one can have the caller, io.c:do_buffered_read, check if the return value is zero, and call lsh_oop_cancel_read_fd?
Somwhow use the wakeup mechanism (invoked from connection.c:do_connection_flow_controlled) to restart reading from the gateway socket.
Any update on that? :-)
Sorry, I doubt I'll be able to fix this problem any time soon.
I had another look now, and I think the code reading from the gateway socket needs to be converted from using the io_buffered_read abstraction to using the io_consuming_read. With io_buffered_read, the handler is expected to either comsume all available data, or replace itself with a new handler able to immediately process remaining data. This is used for the main ssh connections, which starts with one or more text lines before the first binary ssh packet, but it makes it hard to solve the problem at hand.
While with io_consuming_read, the read callback io.c:do_consuming_read will first do a READ_QUERY asking the handler how much data it wants. This method could return 0, and then the select loop stops reading from the socket. That should solve half of the problem (and the gateway code doesn't need the handler-replacement thing).
Then the remaining problem is waking the gateway connections up. This should be done via the connection->wakeup callback on the main ssh connection. Which currently invokes channels.c:do_channels_wakeup. That could almost be used as is, as long as each gateway client has at least one channel open. But to really get it right, the main ssh connection would need to have a list of all gateway sockets, and the wakeup callback ought to call lsh_register_read_fd on each of those sockets.
(I'm not as fond of all of these objects and abstractions now as when I wrote this code... The development version is organized a bit differently).
Regards, /Niels