-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How well does this cover HLSL202x? #66
Comments
Hi @devshgraphicsprogramming hlsl++ doesn't provide templated types like what you're looking for. Once upon a time types were declared in terms of templates such as floatNxM and floatN and it was very slow to compile and the code required to call functions and pass them around complicated to write and debug. Template resolution rules are very finicky and I had to test that functions compiled by calling them, otherwise non-instantiated templates wouldn't necessarily be validated. Lots of SFINAE and template magic. I deleted all that and never looked back. That's probably not useful for you. However you can probably implement your own matrix<N, M> mapping to the hlsl++ types if you require it for your project and pass them into the functions as you would normally. struct float4x4 {};
template<int N, int M>
struct matrix {};
template<>
struct matrix<4, 4> : float4x4 {}; Any function taking a float4x4 should accept that matrix<4, 4> as if it was a float4x4. If you require any support with that I can try to help out. |
ok this is quite strange as @llvm-beanz says that what's going on under the hood is that in DXC (which is a fork of llvm 3.7) the And you're telling me that for every
Any chance of going back to that with |
FWIW, the https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector |
It's possible that that's what happens in DXC under the hood, this library provides compatibility for practical HLSL. To be perfectly honest, there's no real need for them either unless it makes their lives easier, as there are no types larger than 4, and on GPUs it makes no sense anyway since they aren't designed around SIMD types anymore.
I have a separate struct for every floatNxM, but I had to anyway to specialize them for the different SIMD types. Like I said in the other bug, this is a library that is aimed at making SIMD on the CPU look like hlsl, with a few extensions here and there like the quaternions and the float8 which can be useful for batch processing of things. I don't model the implementation according to DXC, but try to follow the interface, and there are things I cannot do like ternary operators. There is little chance anyone will need anything much more complicated when doing graphics, which is what this library is aimed at primarily. You've also noticed that the sizes of the types aren't what they say (float3 is a 4-float type).
There is little chance I'll be going back to templates, but you can always fork the code if it makes your life easier. I also haven't thought of an 8x8 matrix, is this something you would need? I understand that it might be a part of the language, but I've been programming HLSL for over 10 years and never had to use it in a templated manner. The swizzles are part of the language as well and they have to be emulated in awkward ways in C++. Many of the matrix swizzles aren't possible or provided. |
The need to use the template <typename ElementTy>
void myFn(vector<ElementTy, 3> MyVec) {...} There are other contexts where we've seen users historically us the vector and matrix templates under preprocessor macros, but I do prefer to pretend the C preprocessor doesn't exist. |
I understand that for templated code you might want to restrict functions in that way, it's just that I've never had the need to write code like that. Consider that the only elements that realistically go in there are uint, int and float (maybe half/double these days, although I have never seen double shipped). For any sort of non-trivial code the reuse is minimal. If you can provide a real world use case for this (not just that it can be done and how) it would help the discussion. I am trying to see your point though, and perhaps for that kind of use case it might make sense. That said, if you were to have a header that derives from the hlsl++ types you could have this kind of template behavior without even modifying the hlsl++ internals. Templates are a recent addition to DXC and I've yet to see a shader codebase use them (which doesn't mean it isn't more common, I just draw from my experience). I get that you want to avoid the preprocessor, and all the problems surrounding it. But the thing I dislike even more is long compile times and I very much avoid templates as much as I can these days. |
Let me show you my 30 minute Vulkanized 2023 talk : https://www.youtube.com/watch?v=JGiKTy_Csv8&t=1050s Consider our statically polymorphic BxDFs (see the NDF traits and Cokk-Torrance struct) : https://github.com/Devsh-Graphics-Programming/Nabla/pull/475/files/0efa0574fcb32cc4566e61903dd112236528f23e..e2daa633d0fbbedd54ec498c80826cd6def8eadf Or our lower_bound, upper_bound and Workgroup and Single Dispatch Scans: https://github.com/Devsh-Graphics-Programming/Nabla/pull/438/files
explicit instantiation and I don't see how a PCH (or extern template) with explicit instantiation would be slower to compile than 1 handwritten struct in the header per specialization. In-fact it should be faster, because codegen was already performed once as opposed to per-translation unit which includes the header. |
I'll take a look at the talk when I get a bit of time.
I've taken a look at the provided code but I haven't found much that relies on this restriction of the dimensions of a vector. I understanding that templates can be useful and I see the templating of code like your BRDF but I'm not sure what that has to do with matrices and vectors just yet. I've also always accomplished that via defines to the shader compiler, which you probably need to pass in anyway to select them between two shaders that do different BRDFs, so I'm not quite sure what the benefit is. I've been through the compile time pain already, and I'm not rewriting it again to be templated. If we can find a solution that satisfies you I would leave it at that. I'm still not happy with the compile times of hlsl++ as swizzle instantiation is quite expensive, I've even considered providing a define that disables them for those who don't need them, although tbh it is one of the main attractive points of the library and I don't even disable them myself. I know about PCHs and all the other lousy hacks for C++ and I'm not happy about any of them. One of the things I need to do at some point is provide an hlsl++ module. When I programmed the bulk of the library modules weren't out and even today the support is still not widespread. I've played around with them in a toy stl I'm programming. If there was interest I could consider it. It would take a bit of effort because of all the system headers that could get included which is a bit unfortunate. |
Make sure to watch the first talk. Then watch this one to address the choice of "using different BRDFs" without defines: https://www.youtube.com/watch?v=Ru3YutCVXsM We use Nabla instead of Unreal for many reasons, and one of them is compiling 40k shader permutations at startup.
Can you share any of your old performance numbers about how much templating HLSL++ hurt your compile times? Also does everything have to be inline? Are the linkers really that dumb in your experience? |
The compile times varied between compilers but one ARM compiler for some reason took about 40 seconds to compile the unit test solution. There is a commit where I changed it all and some numbers. The compilation time was cut by almost half across the board. I wish I'd had something like Compile Score back then to give you more detailed numbers.
Not everything has to be inline, it's just that I made it header only to begin with for simplicity. Most small methods are force inline because I did measure performance differences for small functions. Some of the larger ones I tagged as inline only and it's up to the compiler to decide. I could do better in this regard for sure if need be and if you have some suggestions in this regard I'm happy to hear them. |
At the end of the day whether you explicitly specialize your templated struct, or write out separate ones and template their aliases, it doesn't matter. You'd probably want to use
as you'd want |
I'm currently searching for and evaluating libraries that will let me share as much shader code with the CPU (regular functions, structs and HLSL "built-in"s) as possible.
HLSL2021 now has templates and stuff, so I'm wondering how close your type declarations, etc. are to HLSL2021 internals
For example, now matrices are defined in terms of a template in HLSL, so apparently you can do this in HLSL2021 to implement a
chainRule
utility function:microsoft/hlsl-specs#24 (comment)
The text was updated successfully, but these errors were encountered: