Save hidden states in bfloat16 #208

norabelrose · 2023-04-21T19:52:10Z

Right now it appears that we're getting numerical overflows when casting to float16 in float32_to_int16 for some models (RWKV-LM, T0pp, Unified QA). These models were trained in bfloat16, not float16, which has a higher dynamic range and can represent larger numbers than float16 can.

bfloat16 has the nice property that its dynamic range is effectively the same as float32, it just has lower precision within that range. So we should essentially never convert large numbers into inf if we save everything in bfloat16.

So I'm proposing we save everything in bfloat16 across the board. Since this isn't supported natively by datasets or its Apache Arrow backend, we'd still use the reinterpret-as-int16 hack. The other option would be to choose which precision to use dynamically based on the precision of the model's weights, but I'd prefer not to deal with that complexity right now.

The text was updated successfully, but these errors were encountered:

norabelrose added the bug Something isn't working label Apr 21, 2023

norabelrose assigned Kyle1668 and norabelrose Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save hidden states in bfloat16 #208

Save hidden states in bfloat16 #208

norabelrose commented Apr 21, 2023

Save hidden states in bfloat16 #208

Save hidden states in bfloat16 #208

Comments

norabelrose commented Apr 21, 2023