-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syscall filtering #14
Comments
Hi @shlomopongartz , thanks for your review, it's very appreciated ! first implementation using a table
My first implementation was an array of integers: KVM-VMI/kvm@9c3e588#diff-e4915d9c60ed6f8fec205d6349ae7ea3R21 And at each syscall, i had to find the syscall in this array: KVM-VMI/kvm@9c3e588#diff-4ab90127bb787cca2b557234276fbe47R162 So the more syscalls you had, the slower it gets to process a new syscall through the filter. An 8K tableWhat you suggest is a big 8K table for every possible syscall I guess that you define an offset for Windows and Linux maybe ? Windows has no offset, starts at 0, and Linux starts at 4K (right after Windows table). -> This forces us to add more Windows / Linux / OS logic in the kernel, and i wanted the implementation to be as simple and OS agnostic as possible. Why i choose an hastable ?At the beginning, what i wanted was a simple set structure. But i couldn't find any. (maybe could you, if you know the kernel better ?) So i fallback on the hashtable to get a constant O(1) access time, even if there is hash function overhead. Another reason is that i assume that the user only wants to monitor a small subset of all the syscalls available. (and it is the main use case for everyone using the filter) -> The best solution here would be to have a set that automatically resizes itself. mutexes
It's correct, i should protect the -> i will lock into the differences between a Can you explain a bit about the Thanks. |
Hi, I don't think you need to have different code for Linux and Windows guests as both pass the system call number in the RAX register and if you use a array of booleans then you are done. If you want you can pass a mask from user space which makes the kernel code generic, that is the kernel code will always mask the RAX value with the mask to obtain the syscall number. RCU is a little bit overkill as the updates are not done in the data path so RW locks are just fine. Best regards, S.P> |
Regarding the use of a simple array vs a hash table you have quite a valid point. Yet I realized a corner case to be considered which is affecting both the implementations. What would happen if the guest would be running a HW assisted virtualized guest itself? To pose the question in a more generic way: how do we tackle cases in which the guest OS might map more than a SysCall to the same ID? This is something the kernel can solve by simply looking at the context. In our case it might be a bit more complex though. |
I fixed Also, I will keep the With an 8K table, we would have to maintain a way to differentiate between Linux or Windows guest, and i don't see right now how this can be done easily. @noxdafox Regarding the Windows 10 Linux subsystem: Briefly: the Linux binary makes a syscall -> In Nitro, we would intercept it, but treat it as a Windows syscall unfortunately, since the logic that differentiate between the two is done after the syscall, in the kernel. |
Hi,
I have read the syscall filtering code and I have two comments.
One, In windows we have up to two SSDT (from 4 possible one bits 12-13) each of wich may have up to 4K entries (bits 0-11) and in Linux we have much less. So we may have up to 8K possible syscalls on the worst case. Now you have declared the hash table to be of size 4K entries (12 bits) so why not just use an 8K table and remove the hash table overhead.
Two, I see that you protect the insertion and deletion with mutex but you don't protect the "find" shouldn't you use RW lock, or better RCU. I also suggest using spinlock instead of mutex as mutex can cruse context switch.
Best regards,
S.P.
The text was updated successfully, but these errors were encountered: