You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even with the one liner fix, it is still slower, this BTW begs the question whether Intel stole your implementation, hehehehe, good for us common falks though... Lion cove P cores can do 3 multiplication per cycle, maybe that can be used.
The purpose of divllu is to implement a narrowing division in software when hardware support is absent. Of course if your CPU implements this natively, then you should use that instead.
x86-64 is unusual in that it directly supports this 128 / 64 => 64 division. I'm not aware of any other ISA which supports this.
Even with the one liner fix, it is still slower, this BTW begs the question whether Intel stole your implementation, hehehehe, good for us common falks though... Lion cove P cores can do 3 multiplication per cycle, maybe that can be used.
The text was updated successfully, but these errors were encountered: