I'm not aware of a simple way to accomplish either approaches on POWER8, I recommend to use allocated stack buffer to assist handling leftovers rather than making it complicated or we can use POWER9 specific instruction 'lxvll' which can used to load vector with length passed to general register as parameter, it also work on both endian modes without any post-loading operations, another benefit from switching to POWER ISA 3.0 is that we can use 'lxvb16x/stxvb16x' to load/store input and output data instead of 'lxvd2x/stxvd2x' instructions, this eliminate the need for post-loading/pre-storing permuting operations on little-endian mode.
regards, Mamone
On Sun, Nov 22, 2020 at 11:26 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
It generates a mask compatible with the length of leftovers, for example
if
the length is 1 then the mask generated is 0xFF000000000000000000000000000000 then the mask is ANDed with the vector register of leftovers to clear the extra unneeded bytes. It's not exactly like the first approach but it avoids using stack and handles the
leftovers
inside the assembly implementation, sorry for mixing up.
I see. I'm a bit worried that it may read to far. E.g, assume that leftover size to read is 5 bytes, and those 5 bytes start at address 1ffffff8. Then the final
lxvd2x VSR(C0),0,DATA
will read 16 bytes from memory, including a few bytes starting at address 20000000, which may result in a segfault. Getting this right would need approach 2, "Round the address down to make it aligned, read an aligned word and, only if needed, the next word. And shift and mask to get the needed bytes."
I would expect that the simplest is to go with approach two: Have a loop to read a byte at the time, and shift into a register.
I made a merge request in git.lysator.liu.se, it ended up easier for me
to
push patches to the repository in this way, I hope you don't mind dealing with the future patches the same way.
Thanks, that's fine. But you may need to ping me, since I don't look at the gitlab web interface that often.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.