on Red Hat 8, I can reliably crash lshd, shutting down all active sessions and leaving a pid file around so lshd refuses to restart, by running
dd if=/dev/random bs=1024 count=1|nc localhost >/dev/null
a few times. Sadly, capturing the random sequence that blows down an lshd and replaying it against another one doesn't seem to kill it; different running instances require different replays of the same random to kill them.
Oh well, back to bloody openssh with all its bugs and patches. Drat!
-Bennett
FWIW, I can reproduce it:
lshd: write_buffer: do_write length = 256 lshd: write_buffer: do_write closure->length = 293 lshd: garbage collecting... lshd: gc_mark: Memory corrupted! Aborted
I think it has nothing to do with the actual bits sent, but rather that some earlier random data caused the code to take a rarely tested execution path, which has garbage collect bugs in it, which is discovered a while later.
In general, as an alternative, perhaps a fork() solution will be less vulnerable to these problems.
Btw, is there a way to make the GC more aggressive? E.g., run it after every NEW or KILL? I could use this to track down a GC bug in my patched lsh version.
Bennett Todd bet@rahul.net writes:
on Red Hat 8, I can reliably crash lshd, shutting down all active sessions and leaving a pid file around so lshd refuses to restart, by running
dd if=/dev/random bs=1024 count=1|nc localhost >/dev/null
a few times. Sadly, capturing the random sequence that blows down an lshd and replaying it against another one doesn't seem to kill it; different running instances require different replays of the same random to kill them.
Oh well, back to bloody openssh with all its bugs and patches. Drat!
-Bennett
Summary:
PLEASE DISABLE LSHD SERVICE. Apply below patch.
Simon Josefsson jas@extundo.com writes:
FWIW, I can reproduce it:
lshd: write_buffer: do_write length = 256 lshd: write_buffer: do_write closure->length = 293 lshd: garbage collecting... lshd: gc_mark: Memory corrupted! Aborted
I think it has nothing to do with the actual bits sent, but rather that some earlier random data caused the code to take a rarely tested execution path, which has garbage collect bugs in it, which is discovered a while later.
I'm afraid it's worse than that. It seems to be a genuine buffer overrun, on the heap. It's the buffer in read_line.c,
/* GABA: (class (name read_line) (super read_handler) (vars (handler object line_handler) (e object exception_handler)
; Line buffer (pos . uint32_t) (buffer array uint8_t MAX_LINE))) */
The below patch should fix the bug. It's a case of checking for an error, reporting it, and then forgetting to return from the function. Instead the code just went on overwriting the buffer. Pretty embarrassing.
diff -u -a -r1.31 read_line.c --- src/read_line.c 16 Feb 2003 21:30:11 -0000 1.31 +++ src/read_line.c 18 Sep 2003 20:02:48 -0000 @@ -100,6 +100,7 @@ /* Too long line */ EXCEPTION_RAISE(self->e, make_protocol_exception(0, "Line too long.")); + return available; }
/* Ok, now we have a line. Copy it into the buffer. */
The buggy code was checked in a little more than four years ago, 1999-08-22, at about this time of day.
I'm *not* going to bet that it isn't exploitable. I'll try to get new releases out within a few days, until then, I recommend that you apply the above patch to lshd and recompile, or disable lshd service.
Thanks to Bennett Todd for reporting the problem.
Sorry about the trouble. Regards, /Niels
Thanks for the very timely response!
Would it be worth composing an announcement of this for venues like bugtraq and full-disclosure? Or do you think most lsh users are on lsh-bugs?
On a separate topic, I've been chatting with someone else, and he pointed out that a somewhat different abstraction could be more robust; rather than having an open, visible buffer data structure into which code can directly write, if code is obliged to go through accessor routines you can have just one place where all the bounds-checking is implemented, rather than having to get it right every place that writes into a buffer. Seems to me like it'd be a performance hit worth taking for the benefit.
-Bennett
Bennett Todd bet@rahul.net writes:
Would it be worth composing an announcement of this for venues like bugtraq and full-disclosure? Or do you think most lsh users are on lsh-bugs?
I *hope* all people that use lshd in production environments are subscribed to this mailinglist. But it would certainly make sense to post an announcement on security vulnerability fora.
But I think my top priority for lsh work the next few days will be to get new 1.4.3 and 1.5.3 releases out; writing an announcement is secondary.
On a separate topic, I've been chatting with someone else, and he pointed out that a somewhat different abstraction could be more robust; rather than having an open, visible buffer data structure into which code can directly write,
The abstractions in this piece of code with it's fixed size buffer is not typical for lsh. Most "buffer management" is in the single function ssh_format, which takes a format string and arguments, calculates and allocates the needed space, formats the data into the newly allocated space, and then at the end asserts() that actual length matches exactly the length which was computed earlier. Callers usually need not care about bounds checking.
If there are any principles behind this design it's
(1) Try to keep the allocation and bounds checking in one place. Or at least in *few* places, there are a few more but ssh_format.
(2) Keep things simple. I don't believe in complex frameworks to protect modules from each other (at least I don't believe in that if the modules are still running in their own address space). Frameworks are useful only when they make the code simpler.
Regards, /Niels
2003-09-18T16:35:38 Bennett Todd:
On a separate topic, I've been chatting with someone else, and he pointed out that a somewhat different abstraction could be more robust; rather than having an open, visible buffer data structure into which code can directly write, if code is obliged to go through accessor routines you can have just one place where all the bounds-checking is implemented, rather than having to get it right every place that writes into a buffer.
The someone, by the way, was Timo Sirainen tss@iki.fi.
-Bennett
Simon Josefsson jas@extundo.com writes:
I think it has nothing to do with the actual bits sent, but rather that some earlier random data caused the code to take a rarely tested execution path, which has garbage collect bugs in it, which is discovered a while later.
Just for the record, I hope nobody interpreted that as anything but speculation as to the cause, not as any kind of assertion that the problem wasn't more serious. (My speculation was based on experience I had when getting GC crashes in my patched version of lsh.)
Sorry if I caused anyone else to be confused, Simon
Simon Josefsson jas@extundo.com writes:
(My speculation was based on experience I had when getting GC crashes in my patched version of lsh.)
I think the gc itself is pretty stable. It's just that it examines most or all objects in the program, and it is quite picky. So if there are resource leakage or memory corruption bugs anywhere else in the code, it is quite likely to trigger some assert() in the gc next time it is run.
/Niels