When compiled for armv6+ and getauxval() is present (glibc 2.16+),
avoid slow and unreliable /proc/cpuinfo parsing.
E.g. /proc/cpuinfo contains junk with qemu-user and can be unavailable
in some chroot environment.
Hello,
The attached patch implements the XTS block cipher mode, as specified
in IEEE P1619. The interface is split into a generic pair of functions
for encryption and decryption and additional AES-128/AES-256 variants.
The function signatures follows the same pattern used by other block-
cipher modes like ctr, cfb, ccm, etc...
Basic tests using a small selection of NIST CAVS vectors are provided.
XTS is use in several disk encryption algorithms and programs because
it allows to use a block …
[View More]cipher even when the input length is not a
perfect multiple of the block cipher length by using ciphertext
stealing.
Thanks to Daiki Ueno for initial review.
Feedback is appreciated.
Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
[View Less]
On raspberry pi 3b+ (cortex-a53 @ 1.4GHz):
Before:
aes128 | nanosecs/byte mebibytes/sec cycles/byte
ECB enc | 39.58 ns/B 24.10 MiB/s - c/B
ECB dec | 39.57 ns/B 24.10 MiB/s - c/B
After:
ECB enc | 15.24 ns/B 62.57 MiB/s - c/B
ECB dec | 15.68 ns/B 60.80 MiB/s - c/B
Passes nettle regression test (only little-endian though)
Does not use pre-rotated tables (as in AES_SMALL), so reduces d-…
[View More]cache
footprint from 4.25K to 1K (enc)/1.25K (dec);
completely unrolled, so increases i-cache footprint
from 948b to 4416b (enc)/4032b (dec)
As it completely replaces current implementation, I just attached new
files (will post final version as a patch).
P.S. Yes, I tried convert macros to m4: complete failure (no named
parameters, problems with more than 9 arguments, weird expansion rules);
so I fallen back to good ol' gas. Sorry.
P.P.S. with this change, gcm/neon and (to-be-publushed) chacha_blocks/neon,
gnutls-cli --benchmark-ciphers:
Before:
Checking cipher-MAC combinations, payload size: 16384
AES-128-GCM 13.56 MB/sec
CHACHA20-POLY1305 68.26 MB/sec
AES-128-CBC-SHA1 16.72 MB/sec
AES-128-CBC-SHA256 15.07 MB/sec
After:
AES-128-GCM 35.32 MB/sec
CHACHA20-POLY1305 94.94 MB/sec
AES-128-CBC-SHA1 27.53 MB/sec
AES-128-CBC-SHA256 23.30 MB/sec
[View Less]
Taken from https://github.com/floodyberry/chacha-opt (released by author
as public-domain-or-MIT, so I guess ok to borrow).
On x86/sse2 and x86_64: 80 to 100% faster.
Passes regression test on linux/debian/stretch x86 and x86_64,
benchmarks ran with patched nettle-3.4.1 (due to abi break in 3.5).
*Not* tested on win{32,64} (important: win64 ABI difference).
chacha-opt also contains x86{,_64}-{ssse3,avx{,2},xop} optimized
code, but I don't have hardware to test (and there are difference
in …
[View More]structure/argument layout that need to be corrected and tested).
WIP, will add armv6 and arm/neon a bit later.
P.S.
Then I will probably take a look at poly1305 and likely try to borrow
license-compatible arm asm somewhere (current nettle code is painfully
slow); gcrypt is somewhat faster than nettle and LGPLv2.1+; cryptograms
has definitely fastest crypto, but it is BSD-3-clause-or-GPLv2+;
while it is, AFAIK, compatible with LGPL, but not sure if that's
acceptable for nettle inclusion.
P.S. previously posted arm neon gcm patch breaks x86_64 compilation,
will post trivial fix later.
[View Less]
Currently ghash/gcm performance on arm in both gcrypt and nettle is a bit abysmal:
=== bench-slopes-nettle ===
GCM auth | 28.43 ns/B 33.54 MiB/s 39.81 c/B 1400.2
=== bench-slopes-gcrypt ===
GCM auth | 21.86 ns/B 43.62 MiB/s 30.52 c/B 1396.0
=== bench-slopes-openssl [1.1.1a] ===
GCM auth | 5.99 ns/B 159.3 MiB/s 8.38 c/B 1399.6
=== cut ===
Current openssl/cryptograms code is based on ideas from
https://hal.inria.fr/hal-01506572 (…
[View More]licensed CC BY 4.0)
and there are linked implementation
https://conradoplg.cryptoland.net/software/ecc-and-ae-for-arm-neon/
(licensed LGPL 2.1+), which I guess should be acceptable to borrow.
Very preliminary patch for nettle will be posted as reply (passes nettle
regression test, but needs more extensive testing);
=== bench-slopes-nettle [w/ patched nettle 3.3] ===
aes128 | nanosecs/byte mebibytes/sec cycles/byte
GCM auth | 7.07 ns/B 134.9 MiB/s 9.90 c/B
=== cut ===
(And not only it is notably faster, it should be completely free of all
cache/timing leaks).
[View Less]