Eric Richter erichte@linux.ibm.com writes:
This patch introduces an optimized powerpc64 assembly implementation for sha512-compress, derived from the implementation for sha256-compress-n.
Thanks, I'm about to merge this. One question: When you store non-volatile registers on the stack,
- li T0, -8
- li T1, -24
- stvx v20, T0, SP
- stvx v21, T1, SP
Why the offsets -8 and -24? My understanding is that on entry, SP is 16-byte aligned ("quad word"), and that the 8 bytes starting at the SP value typically holds the caller's back chain pointer, so we shouldn't be clobbering those bytes? (I'm looking at the stack frame figure on page 34 in the v2.1.5 abi spec downloaded from https://openpowerfoundation.org/specifications/64bitelfabi/).
I would have expected offsets -16 and -32. The file powerpc64/p8/sha256-compress-n.asm, which I merged some month ago, also uses -8 and -24. While, e.g., powerpc64/p8/ghash-update.asm, uses offsets -16 and -32.
I also think the gpr register usage could be trimmed a bit. T0 and T1 are used only in function prologue and epilogue, and could overlap with something else using volatile registers. And the two registers TC32, TC48 could be replaced by a single register STATE32 = STATE + 32. But that can be tweaked after merge.
Regards, /Niels