Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some non square matrix definitions are incorrect #74

Closed
mlangerak opened this issue Aug 25, 2023 · 6 comments
Closed

Some non square matrix definitions are incorrect #74

mlangerak opened this issue Aug 25, 2023 · 6 comments
Assignees
Labels

Comments

@mlangerak
Copy link

There seems to be an issue with the matrix definitions for some non-square matrices. For example, float4x3 has a constructor taking 3 n128 and internally consists of 3 n128 (row or column)vectors. It should have a constructor taking 4 n128, and have 4 n128 internal (row or column)vectors. There are other cases too, e.g. float3x2.

@redorav
Copy link
Owner

redorav commented Aug 25, 2023

Hi @mlangerak, this is expected and by design. The reason for that is that is that even if the physical layout changes, the logical layout still works. Are you finding inconsistencies when using these?

@redorav redorav self-assigned this Aug 25, 2023
@mlangerak
Copy link
Author

mlangerak commented Aug 25, 2023

Conceptually a float4x3 consists of 4 rows (or columns for column major) of float3's so you'd expect that the matrix internally has 4 SIMD vector elements, since that would be the most natural and efficient mapping to matrix operations like constructing from 4 rows (or columns for column major), accessing a row (or column for column major), etc.

@redorav
Copy link
Owner

redorav commented Aug 25, 2023

That is true, the reason I went with the other model is so that it was more space-efficient. Would you make it work the same way if it was a 4x2 matrix also be 4 simd vectors? Changing it now to work how you need would be a huge amount of work, even if I understand the rationale for it. All the matrix operations are done taking this in mind.

@redorav
Copy link
Owner

redorav commented Aug 25, 2023

I've been looking at the code. There seems to be no "right" solution to trying to represent non 4x4 matrices by fixed-width vectors. The advantages and disadvantages to a hypothetical change I can see are:

Advantages
· Matrices are "castable" to and from the origin matrices, for example a float4x4 can become a float4x3 without issues (it can already be cast to a float3x4)
· You can poke at matrix rows via [ ] operators and assign float3 or float4

Disadvantages
· Matrices take a lot more space. In the extreme case, a float4x1 takes up four SIMD vectors if we want to make it all consistent
· It biases a physical row layout over columns
· float2x2 will become bloated as well (it's a single simd vector at the moment)

This of course ignores the large disadvantage that is to change the inner workings of all these matrices, verifying results, etc. I don't have a unit test coverage as large as I would like, it was originally more manual.

I think I can see both sides of this, it took me a lot of time to get the matrix stuff working and verified results, etc (even if there are bugs), changing it now would cause a lot more issues than it solves. If you have other ideas or suggestions I'm open to seeing your perspective or how you'd solve it

@mlangerak
Copy link
Author

mlangerak commented Aug 25, 2023

There are pros and cons to either layout, but I would argue for using the layout that causes the least surprise even if it is less optimal in some respects. By least surprise I mean which layout is more common in practice. For instance I think (I did not double check this) that when sharing constant data with the GPU, the DXC compiler will use 4 float3's for a float4x3 in row-major mode. In that case, it would be "least surprise" if you could memcopy a hlslpp::float4x3 directly when pushing constant data to the GPU.

The float2x2 case is fortunate, since then DXC uses the same packing into a single float4 as hlslpp does already. So in that case it is already consistent with DXC.

[edit]
Supporting both row and column major does seem to make this more complicated. If you really needed that flexibility you could have separate declarations for the matrix type from the matrix layout. E.g., something like:

struct Matrix4x3{...} // storage for 4 float3's, implicitly row major
struct Matrix3x4{...} // storage for 3 float4's, implicitly row major
using float4x3 = Matrix4x3; // when compiled with row major matrices
using float4x3 = Matrix3x4; // when compiled with column major matrices

It would get confusing really quickly though, so unless there is a good reason to support both row and column major, I would default to row major always since it is more natural IMO, indeed I am using hlslpp and DXC in row major mode. Incidentally I need to also support Metal shading language which is sadly column major and there is no shader compiler switch to change it either it seems. To compensate, I implemented a mul intrinsic for MSL which hides this column/row major distinction:

template<typename T, int RowN, int ColN, int N>
inline decltype(auto) mul(matrix<T, ColN, RowN> m, vec<T, N> v)
{
    return v * m;  // Note reversed order of matrix-vector product; Metal has column major matrix layout.
}

This works well to hide the row/column major confusion, and so far I've been able to write all my MSL pretending it is row-major.

@redorav
Copy link
Owner

redorav commented Aug 25, 2023

There is a common misunderstanding that this library is meant to interface with hlsl when uploading to the GPU, I'll refer you to #58 for some more discussion.

While that would be convenient there's all sorts of cases where this doesn't happen. Just as a few examples:

cbuffer ExampleCB
{
	float3 a;     // Offset: 0
	float b;      // Offset: 12
        float2x2 m;   // Offset: 16
        float4 v;     // Offset: 48
}
  1. HLSL considers float3 to be 12 bytes, and if you append a float3 and a float it will put them in the same 16 byte line. HLSL++ won't be able to do that since all SIMD vectors are 16 bytes. i.e.
HLSL:   [ a.x ][ a.y ][ a.z ][  b  ]

HLSL++: [ a.x ][ a.y ][ a.z ][ a.w ]
        [  b  ][      padding      ]

Same applies to float1 and float2, with their respective paddings.

  1. HLSL considers float2x2 to be two float4s, contrary to what you say above. The offset for the next variable v is 32 bytes and not 16 as one would expect from a single float4.
HLSL:   [ m_00 ][ m_01 ][    padding   ]
        [ m_01 ][ m_11 ][    padding   ]

HLSL++: [ m_00 ][ m_01 ][ m_01 ][ m_11 ]
  1. float3x4 and float4x3 can vary in HLSL depending on whether you specify row_major or column_major. The alignment changes depending on how you specify this parameter, between 48 bytes and 64, as you'd expect, but there is no fixed meaning to float3x4 or float4x3, it can mean 3 rows 4 columns or the other way around. This isn't something that HLSL++ can express in any way really. A row_major float3x4 maps to float3x4, and column_major float4x3 maps to float4x3, but the other combinations will fail.

The main takeaway is that there is no least surprise behavior, these things are surprising no matter what you do, even if you had no SIMD vectors C++'s packing rules can come in and do unexpected things. It is not HLSL++'s aim to solve these problems.

That said, I use a little interface to cater for this use case in my codebase. As it is still incomplete it hasn't made its way here yet, but it is possible to do with not that much work using the store() family of functions. The other thing I have basically seen at lots of codebases is to just declare aligned types in general, i.e. don't use anything other than float4, maybe row_major float3x4 and float4x4.

Hopefully that helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants