[tahoe-dev] Using SSE2 operations in zfec [Was (no subject)]

Jack Lloyd lloyd at randombit.net
Mon Feb 2 17:24:23 UTC 2009


On Mon, Feb 02, 2009 at 09:10:01AM -0700, zooko wrote:
> I had another random thought -- could Python or something about the  
> Python<->C interface or something about your use of SSE2 be mis- 
> aligning the stack?

The x86-64 ABI specifies that the stack should always be 16-byte
aligned upon function entry. It does seem possible Python would not
respect that in all cases, or maybe there is some case where using
alloca throws things off - since on x86-64 the worst that would
usually happen is things run a bit slower due to misaligned memory
accesses, it is conceivable that such a bug would be missed. I added

assert(((uintptr_t)__builtin_frame_address(0)) % 16 == 0);

at the beginning of _addmul1 (and disabled NDEBUG to ensure it was
active), and the tests all ran without the assertion triggering, as
did my encoding benchmark.

I looked at the assembly GCC 4.3 generates for Opteron and Core2
processors for addmul (-O2 and -O2 -fPIC). In each case it pushes 4
64-bit registers onto the stack, and does not touch the stack again
until returning when it pops the callee-saved registers. So even if
the stack was misaligned, it is hard for me to see how it would affect
the performance that much.

-Jack



More information about the tahoe-dev mailing list