[tahoe-dev] Using SSE2 operations in zfec [Was (no subject)]
lloyd at randombit.net
Mon Feb 2 17:24:23 UTC 2009
On Mon, Feb 02, 2009 at 09:10:01AM -0700, zooko wrote:
> I had another random thought -- could Python or something about the
> Python<->C interface or something about your use of SSE2 be mis-
> aligning the stack?
The x86-64 ABI specifies that the stack should always be 16-byte
aligned upon function entry. It does seem possible Python would not
respect that in all cases, or maybe there is some case where using
alloca throws things off - since on x86-64 the worst that would
usually happen is things run a bit slower due to misaligned memory
accesses, it is conceivable that such a bug would be missed. I added
assert(((uintptr_t)__builtin_frame_address(0)) % 16 == 0);
at the beginning of _addmul1 (and disabled NDEBUG to ensure it was
active), and the tests all ran without the assertion triggering, as
did my encoding benchmark.
I looked at the assembly GCC 4.3 generates for Opteron and Core2
processors for addmul (-O2 and -O2 -fPIC). In each case it pushes 4
64-bit registers onto the stack, and does not touch the stack again
until returning when it pops the callee-saved registers. So even if
the stack was misaligned, it is hard for me to see how it would affect
the performance that much.
More information about the tahoe-dev