Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement deflate_if conditional compression #26

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

kkent030315
Copy link
Contributor

@kkent030315 kkent030315 commented Dec 26, 2023

This PR introduces conditional compression deflate_if as @SOF3 pointed out in previous PR.

flate!(pub static DATA1: [u8] from "assets/random.dat" with zstd if always);
flate!(pub static DATA2: [u8] from "assets/random.dat" with deflate if less_than_original);
flate!(pub static DATA3: [u8] from "assets/random.dat" with deflate if compression_ratio_more_than 10%);

flate!(pub static DATA4: [u8] from "assets/random.dat" if always);
flate!(pub static DATA5: [u8] from "assets/random.dat" if less_than_original);
flate!(pub static DATA6: [u8] from "assets/random.dat" if compression_ratio_more_than 10%);

flate!(pub static DATA7: str from "assets/chinese.txt" with zstd if always);
flate!(pub static DATA8: str from "assets/chinese.txt" with deflate if less_than_original);
flate!(pub static DATA9: str from "assets/chinese.txt" with deflate if compression_ratio_more_than 10%);

flate!(pub static DATA10: str from "assets/chinese.txt" if always);
flate!(pub static DATA11: str from "assets/chinese.txt" if less_than_original);
flate!(pub static DATA12: str from "assets/chinese.txt" if compression_ratio_more_than 10%);

Features

  • Implement deflate_if! proc-macro in include-flate-codegen.

The deflate_if! macro is completely isolated from the deflate_file! (or deflate_utf8_file!). This is by design. Because we do not actually want to (and should not) calculate/evaluate whether or not the file is actually compressed, at runtime. deflate_if! is a proc-macro evaluates and returns boolean at compile-time in the Lazy::new. This constant boolean will trigger the compiler optimization and the compiler will remove the unreachable code, so it allows us to implement this feature without adding any efforts on the runtime code itself.

Indeed, even if the deflate_if! evaluated to false and entire decompression code is removed by compiler, once_cell::Lazy and its related runtime codes will remain but it's not considerable to performance critical. We may want to make it use pure (not with Lazy::new) include_bytes! or include_str! if the compression should not be proceeded. However, to make this feature true, it requires huge refactor of core design of this crate itself. I decided this is not worth doing compared to the what we will achieve in this PR.

Bug Fixes

  • Changes in Add Zstd support as optional and compile-time low ratio report #25 didn't actually present in flate! macro other than str types.
    • I forgot to pass the macro_rules! parameters into proc-macro implementation, but it wasn't raise any errors since the parameter is entiely optional. I've added test for this to ensure this never happen.

Tests

  • Added tests for deflate_if! in tests/deflate-if.rs.
  • Added tests for selective compression methods (as pointed out in Bug Fixes) in tests/with-compress.rs.
  • Added tests for syntax check in tests/syntax.rs.

Misc

  • Added example project in examples/flate.rs. Especially this was useful for testing the actual binary with decompilers, for me, but should also be useful for people seek to use this crate.

@kkent030315
Copy link
Contributor Author

kkent030315 commented Dec 26, 2023

I've tested if always conditional compression and with xxx selective custom compression methods against compiled binary with decompilers in windows MSVC environment.
Everything works fine as expected.

  • with deflate if always should compress with deflate and with only minimal deflate dependency in binary.
  • with zstd if always should compress with zstd and with only minimal zstd dependency in binary.
  • if less_than 10 with very small original file should never be compressed and should never add any runtime codes other than once_cell::Lazy.

@kkent030315
Copy link
Contributor Author

Also, I would suggest squash & merge when merging PRs to avoid adding unmeaningful and verbose commits.

@SOF3
Copy link
Owner

SOF3 commented Dec 27, 2023

Would it be a bit ambiguous to write if less_than 10? It is not immediately ambiguous what we are comparing - raw buffer size, compressed buffer size, compression ratio/percentage, 1 - compression ratio/percentage, or what?

@kkent030315
Copy link
Contributor Author

Would it be a bit ambiguous to write if less_than 10? It is not immediately ambiguous what we are comparing - raw buffer size, compressed buffer size, compression ratio/percentage, 1 - compression ratio/percentage, or what?

Yes, that is what I thought as well. if compression_ratio_more_than 10% sounds amazing as readability. However, assigning %: e.g., if xxx 10% makes syn::LitInt completely broken. We may add custom parse logic there, but the problem is that we may not be able to take advantage of the various support benefits of syn::LitInt types as proc-macro. It is not worth.
Or, something else:

  • if compression_ratio_more_than 10
  • if compression_ratio_more_than 10 %
  • if compression_ratio_more_than 10 percent

@kkent030315
Copy link
Contributor Author

Well, a combination of LitInt and Token![%] allowed if compression_ratio_more_than 10% style.
@SOF3 How does look like to you?

@SOF3
Copy link
Owner

SOF3 commented Dec 28, 2023

originally I wanted to suggest changing more_than to >, but then I realized this is a slippery slope that would prompt for other features like && and || and grouping, at which point we would be implementing a DSL parser. So I suppose it's good enough rn.

};
$crate::decode_string($crate::codegen::deflate_utf8_file!($path $($algo)?), Some($crate::CompressionMethodTy(algo)))
// Evaluate the condition at compile time to avoid unnecessary runtime checks
if $crate::codegen::deflate_if!($path $($algo)? $($($threshold)+)?) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually use conditional compilation here? While the compiler should optimize away the constant false branch, it seems weird that we are including the file into the program twice just so that we could remove it later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. can we make deflate_if! emit all() for true and any() for false instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Unfortunately, the conditional compilation does not support such statements.

This (relying on compiler optimization with constant boolean statement) is the only way to achieve this without huge code refactor as described in the PR description.

Although it is still possible by either one of:

  • making flate! itself a proc-macro.
  • make a proc-macro in codegen crate that generates
    • if the deflate_if! returns true:
      • $(pub $(($($vis)+))?)? static $name: $crate::Lazy<::std::vec::Vec<u8>> = $crate::Lazy::new(...); or
    • if the deflate_if! returns false:
      • $(pub $(($($vis)+))?)? static $name: &[u8] = include_str!(...).

Either ways are much more huge refactor/codebase change than just let the compiler optimize out.

Copy link
Contributor Author

@kkent030315 kkent030315 Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems weird that we are including the file into the program twice just so that we could remove it later.

Please note that deflate_if! does not actually deflates the file. It is used for deciding whether or not the compression ratio meets the given threshold criteria. So the only a single file will be embedded in the binary as a result of compiler optimization, not the twice.

@SOF3 SOF3 self-requested a review December 29, 2023 02:23
/// This macro evaluates to `true` if the file should be compressed, `false` otherwise, at compile time.
/// Useful for conditional compilation without any efforts to the runtime.
///
/// Please note that unlike the macro names suggest, this macro does **not** actually compress the file.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only expected for internal use, should we #[doc(hidden)] the re-export? And shall we rename this to should_deflate, since this does not actually do the imperative deflate as the name suggests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants