Hi,
after building Nettle natively for Apple devices running Apple Silicon, I noticed a drastic performance different between the native build, and an x86_64 build emulated via Apple's Rosetta. The latter, despite emulation, was over 10 times faster in some algorithms, e.g. AES-128-GCM.
I found out that get_arm64_features did not at all detect the CPU capabilities on Apple devices.
The attached patch fixes the issue for me, with it in place the CPU features are correctly detected.
AES-128-GCM benchmark results: Native (pre-patch): 200MB/s Emulated: 3.2GB/s Native (patched): 5.2GB/s
Regards, Tim Kosse