-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pulley: implement vector shuffle #9910
Conversation
0a788d4
to
314ea0c
Compare
Subscribe to Label Actioncc @fitzgen
This issue or pull request has been labeled: "cranelift", "pulley"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
(rule (lower (has_type $I8X16 (shuffle a b (u128_from_immediate mask)))) | ||
(let ((vmask VReg (pulley_vconst128 mask))) | ||
(pulley_vshuffle a b vmask))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting the mask
into a VReg
could it be a u128
immediate to the vshuffle
instruction itself? That should cut down on opcodes slightly and more closely matches the wasm semantics of the instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be a
u128
immediate to thevshuffle
instruction itself?
I remember not, but let me try again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vshuffle(VReg, VReg, VReg, u128)
would raise this error
thread '<unnamed>' panicked at cranelift/codegen/src/isa/pulley_shared/inst/emit.rs:109:9:
encoded inst InstAndKind { inst: Raw { raw: VShuffle { dst: Writable { reg: VReg(p0v) }, src1: VReg(p0v), src2: VReg(p1v), mask: 41362427191743139026751447860679676176 } }, kind: PhantomData<cranelift_codegen::isa::pulley64::Pulley64> } longer than worst-case size: length: 22, Inst::worst_case_size() = 20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's ok, I guess I could raise the inst. size limit.
Do you happen to have something along the Pully pipeline that could use an extra pair of hands? I should probably leave these exercises for others. Btw, this is a wonderful setup for helping people learn about how Pulley works :) |
Sure yeah, whatever suits you best! Perhaps the next-lowest-hanging-fruit is performance optimizations which will involve running benchmarks and understanding the performance of Pulley and where hot-spots are. The main goal would be identifying macro-opcodes that can be added for lowering. The first one off the top of my head would be a macro opcode related to bounds checks to make memory accesses faster, but I haven't done performance profiling one way or another on that yet. (we also don't have many docs yet on how to do this so it's a choose-your-own-adventure at the moment, so not necessarily great as a first project) |
part of #9783