Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests are failing #4

Open
Janmajayamall opened this issue May 11, 2024 · 9 comments
Open

Tests are failing #4

Janmajayamall opened this issue May 11, 2024 · 9 comments

Comments

@Janmajayamall
Copy link

Hello, thanks for the implementation.

However, few tests are failing after #3 merged.

I am running the tests on m1 mac and can confirm tests pass till #2.

Test summary:

//hintless_simplepir:client_test                                         PASSED in 1.0s
//hintless_simplepir:database_benchmarks                                 PASSED in 0.8s
//hintless_simplepir:database_hwy_benchmarks                             PASSED in 1.3s
//hintless_simplepir:database_hwy_test                                   PASSED in 2.2s
//hintless_simplepir:database_test                                       PASSED in 2.5s
//hintless_simplepir:hintless_simplepir_benchmarks                       PASSED in 0.9s
//hintless_simplepir:server_test                                         PASSED in 10.6s
//hintless_simplepir:simplepir_test                                      PASSED in 0.7s
//hintless_simplepir:utils_test                                          PASSED in 1.9s
//linpir:client_test                                                     PASSED in 2.3s
//linpir:linpir_benchmarks                                               PASSED in 1.8s
//lwe:lwe_symmetric_encryption_test                                      PASSED in 0.8s
//lwe:sample_error_test                                                  PASSED in 0.6s
//hintless_simplepir:hintless_simplepir_test                             FAILED in 288.1s
//linpir:database_test                                                   FAILED in 123.6s
//linpir:linpir_test                                                     FAILED in 62.4s
//linpir:server_test                                                     FAILED in 137.2s
@Janmajayamall
Copy link
Author

Let me know if I should open a different issue to discuss the following

When records do not fit in plaintext modulus of SimplePIR, why are multiple shards created instead of stacking the records vertically within a single row, latter being more suitable for SimeplePIR? The corresponding paper says that for longer records, records were split and vertically stacked within single column across multiple rows. Which makes sense, given the download cost increases linearly with no. of shards.

I am asking just in case you found that it is more suitable for some parameter to shard the database?

@b5li
Copy link
Collaborator

b5li commented May 13, 2024

Thanks for reporting the issue! Can you please provide the build environment in which the tests failed? That could help to debug the failure.

Regarding your second comment: Yes, stacking records vertically would save download cost. By allowing to shard the database one could have the option to save on the upload cost. You can adjust the record size and the number of rows/columns to configure the database to achieve the best tradeoffs that make sense for the application.

@mayank0403
Copy link

mayank0403 commented Nov 11, 2024

Hi,

Thanks for the great work!

The tests are still failing. Dockerfile with my build env:

# Use --platform to force x86_64
FROM --platform=linux/amd64 ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

# Install essential packages including all development libraries
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    git \
    clang \
    llvm \
    libssl-dev \
    pkg-config \
    curl \
    gpg \
    gnupg \
    apt-transport-https \
    ca-certificates \
    software-properties-common \
    wget \
    python3 \
    g++ \
    unzip \
    openssl \
    libgsl-dev \
    libgslcblas0 \
    && rm -rf /var/lib/apt/lists/*


# Install Microsoft's GSL
RUN git clone https://github.com/microsoft/GSL.git && \
    cd GSL && \
    cmake -B build . && \
    cmake --build build --target install
# Add Bazel repository and key
RUN curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > /usr/share/keyrings/bazel-archive-keyring.gpg && \
    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/bazel-archive-keyring.gpg] https://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list

# Install Bazel
RUN apt-get update && \
    apt-get install -y bazel

WORKDIR /workspace

COPY . .

CMD ["/bin/bash"]

Looks like the recovered record doesn't match the DB.

Any ideas on how to fix this?

@mayank0403
Copy link

mayank0403 commented Nov 11, 2024

After looking more into it, it seems that one of the places where the issue is are these functions:
FusedAbsorbAddInPlaceWithoutPadLazily() and MergeLazyOperations(). The easiest test case to look at is //linpir:database_test. Moreover, if you look at the files changed in #3, only one file was changed under //linpir and it was linpir/database.cc where lazy multiply and add was introduced, and it broke the code. The moment I switch back to non-lazy variant, the test passes. For your reference, I have attached a screenshot below which shows the old code which passes vs the new code that fails the test //linpir:database_test.

@b5li I hope this helps in finding the root cause of the issue. I don't know the reason why all the other tests are failing, but it seems that most of these issues come from functions in SHELL.

Image
The code on the left passes; the code on the right fails.

@b5li
Copy link
Collaborator

b5li commented Nov 12, 2024

Thanks for the debugging tips! Can you try the following to see if the test passes:

bazel run --cxxopt='-std=c++17' --cxxopt='-DHWY_COMPILE_ONLY_SCALAR=1' //linpir:database_test

This will run skip the highway SIMD code and take the slow path instead.

@mayank0403
Copy link

Thanks a lot for the tip! The tests pass with the slower scalar option.

For the faster SIMD code, are there any current plans on fixing the issue? We are working on a follow-up work and would greatly appreciate if we can run the SIMD code for our evaluation.

@b5li
Copy link
Collaborator

b5li commented Nov 12, 2024

Good to know it's the SIMD code that broke the test. What CPU model were you using? I tried on Intel Tiger Lake and Ice Lake, and also an AMD Milan processor, but haven't been able to reproduce the failure.

@mayank0403
Copy link

I am using Macbook (Apple M1 Pro) with Docker (platform=linux/amd64 ubuntu:22.04).

@b5li
Copy link
Collaborator

b5li commented Nov 19, 2024

Sorry about the late response. @mayank0403 I see that you are running an x86-64 linux docker on an Apple M1 CPU, which seems like a complicated build environment and may confuse the compiler / runtime about which SIMD instruction target to use.

It looks like compiler support for SVE instructions on Apple M CPUs might be buggy (see e.g. google/highway#2356), so there might be issues even compiling and running natively on M1 machines. It could also be the case that some uint64_t specific highway code in SHELL might not be supported on those CPUs. Unfortunately I don't have access to Apple M CPU yet, so I wasn't able to tell what exactly the problem is. If you have more information about the failure (e.g. trace or exit code), it would be helpful to identify the root cause.

At the meantime I would also recommend to test on x86-64 CPUs. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants