Nice find, thank you very much! Not sure how that code ended up there since everywhere else I've been pretty careful about use of SSE2 instructions! That would explain everything since that code was introduced with the new renderer (0.99).
I've simply changed it to call the non-SSE2 version since I'm not sure the SSE2 version actually ever ran faster anyway.