Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port differential coded version to ARM NEON #17

Open
lemire opened this issue Dec 16, 2017 · 0 comments
Open

Port differential coded version to ARM NEON #17

lemire opened this issue Dec 16, 2017 · 0 comments

Comments

@lemire
Copy link
Member

lemire commented Dec 16, 2017

The generic codec supports both x64 and ARM NEON, however the differential-encoded version is x64 only.

It seems like it would be easy to port them over. The Delta function in ARM is almost identical:

uint32x4_t Delta(uint32x4_t curr, uint32x4_t prev) {
   return vsubq_u32(curr, vextq_u32 (prev,curr,3));
}

And so is the prefix sum which is currently mixed with the store in _write_avx_d1 (for historical reasons I suppose)...

uint32x4_t PrefixSum(uint32x4_t curr, uint32x4_t prev) {
   uint32x4_t zero = {0, 0, 0, 0};
   uint32x4_t add = vextq_u32 (zero, curr, 3);
   uint8x16_t BroadcastLast = {12,13,14,15,12,13,14,15,12,13,14,15,12,13,14,15};
   prev = vreinterpretq_u32_u8(vqtbl1q_u8(vreinterpretq_u8_u32(prev),BroadcastLast));
   curr = vaddq_u32(curr,add);
   add = vextq_u32 (zero, curr, 2);
   curr = vaddq_u32(curr,prev);
   curr = vaddq_u32(curr,add);
   return curr;
}

It could be that my implementations are suboptimal, but I think that they are correct and given these functions it should be easy to create a differentially coded codec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant