memxor - nettle-bugs

13 Sep 2011

      On Tue, Sep 13, 2011 at 1:17 PM, Niels Möller nisse@lysator.liu.se wrote:
...
Then you shouldn't need to bother about the lsh directory again. You
have a symlink to the shared aclocal.m4 (and to some other shared
files).
Ok I figured out. I attach a working SSE2 detection.
...
Thanks. I'll have to think some more on how to organize this. Some
properties I'd like to have:

Don't require users to call any init function.

One could define memxor to jump via a function pointer, and have an
  initial value for that pointer which jumps to the routine to set the
  pointer to the right function, and then use it. Overwriting the
  pointer should be atomic, so no locking needed even for multithreaded
  programs.
I don't think locking is an issue if you only call a function on
initialization. You can expect (and require) that a library isn't
going to be initialized by multiple threads. I don't know however of a
portable way to do initialization transparently without an explicit
function call.
...

Avoid using gcc-specific things, including inline asm, in the C

source files.
The cpuid test would have then to be moved to an assembly file.
...
Other obvious uses for cpu detection in nettle:
 * The AES code could check for the special aes instructions.
Indeed. Once a framework for overwriting functionality is set, those would
be not very hard to add. However setting such framework in nettle seems to
require substantial work as all exported functions need to be replaced by
function pointers thus breaking ABI. If this is done gradually (it has
to, as you
never know what you would be able to optimize in a new processor) it would
be worse, since every optimization added would break ABI.
Maybe it makes sense to have a libgcrypt-like high level interface and
optimizations
would be used only there. The existing C api remains an API to access
the C implementation.
This could also address the problem with optimized hash algorithms[0],
since most cpu-assisted
sha1 or sha256 implementations work on an output=hash(data,length)
basis and do not map
to the existing API.
[0]. http://www.mail-archive.com/openssl-dev@openssl.org/msg21787.html
...
* The serpent code can use %xmm and %ymm registers, when present. On
   x86_64, as far as I'm aware all current implementations have sse2,
   but one could check for, and make use of, the 256-bit %ymm
   registers.
I wouldn't care of serpent optimizations much :)
regards,
Nikos