Re: [PATCH 2/2] powerpc64: Add optimized assembly for sha512-compress

25 Aug 2024


      Eric Richter erichte@linux.ibm.com writes:
...
This patch introduces an optimized powerpc64 assembly implementation for
sha512-compress, derived from the implementation for sha256-compress-n.
Thanks, I'm about to merge this. One question: When you store
non-volatile registers on the stack,
...

li	T0, -8
li	T1, -24
stvx	v20, T0, SP
stvx	v21, T1, SP

Why the offsets -8 and -24? My understanding is that on entry, SP is
16-byte aligned ("quad word"), and that the 8 bytes starting at the SP
value typically holds the caller's back chain pointer, so we shouldn't
be clobbering those bytes? (I'm looking at the stack frame figure on
page 34 in the v2.1.5 abi spec downloaded from
https://openpowerfoundation.org/specifications/64bitelfabi/).
I would have expected offsets -16 and -32. The file
powerpc64/p8/sha256-compress-n.asm, which I merged some month ago, also
uses -8 and -24. While, e.g., powerpc64/p8/ghash-update.asm, uses
offsets -16 and -32.
I also think the gpr register usage could be trimmed a bit. T0 and T1
are used only in function prologue and epilogue, and could overlap with
something else using volatile registers. And the two registers TC32,
TC48 could be replaced by a single register STATE32 = STATE + 32. But
that can be tweaked after merge.
Regards,
/Niels
-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH 2/2] powerpc64: Add optimized assembly for sha512-compress