On Tue, Sep 13, 2011 at 1:17 PM, Niels Möller nisse@lysator.liu.se wrote:
Then you shouldn't need to bother about the lsh directory again. You have a symlink to the shared aclocal.m4 (and to some other shared files).
Ok I figured out. I attach a working SSE2 detection.
Thanks. I'll have to think some more on how to organize this. Some properties I'd like to have:
- Don't require users to call any init function.
One could define memxor to jump via a function pointer, and have an initial value for that pointer which jumps to the routine to set the pointer to the right function, and then use it. Overwriting the pointer should be atomic, so no locking needed even for multithreaded programs.
I don't think locking is an issue if you only call a function on initialization. You can expect (and require) that a library isn't going to be initialized by multiple threads. I don't know however of a portable way to do initialization transparently without an explicit function call.
- Avoid using gcc-specific things, including inline asm, in the C
source files.
The cpuid test would have then to be moved to an assembly file.
Other obvious uses for cpu detection in nettle: * The AES code could check for the special aes instructions.
Indeed. Once a framework for overwriting functionality is set, those would be not very hard to add. However setting such framework in nettle seems to require substantial work as all exported functions need to be replaced by function pointers thus breaking ABI. If this is done gradually (it has to, as you never know what you would be able to optimize in a new processor) it would be worse, since every optimization added would break ABI.
Maybe it makes sense to have a libgcrypt-like high level interface and optimizations would be used only there. The existing C api remains an API to access the C implementation. This could also address the problem with optimized hash algorithms[0], since most cpu-assisted sha1 or sha256 implementations work on an output=hash(data,length) basis and do not map to the existing API.
[0]. http://www.mail-archive.com/openssl-dev@openssl.org/msg21787.html
* The serpent code can use %xmm and %ymm registers, when present. On x86_64, as far as I'm aware all current implementations have sse2, but one could check for, and make use of, the 256-bit %ymm registers.
I wouldn't care of serpent optimizations much :)
regards, Nikos