-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plateau: searching for a part of camelCased word does not yield results #23
Comments
Good point. It looks like tokenizer should be improved to split camelCased words into distinct tokens (both in index and query). cc @ronaldtse this is a search problem that could probably be delegated. |
I am not sure if that example shows a problem? @ReesePlews Regarding the park, that is a problem. I will rename the issue to reflect that problem. The problem is that “park” is considered part a word (e.g., “DistributionBusinessPark”), and just “park” does not appear in the document as a standalone word, and so either that entire word needs to be searched or as you have noticed a wildcard must be used for a partial match. We can address that by splitting camel-cased words into tokens, and/or add help text about wildcard support. This is not very typical for an English document (where “park” would be used numerous times, and therefore be found), but we should support this case. |
@ReesePlews The reason “c” is found in clause 4.25.4.6.9 is because you will see “C” does appear as a standalone word in the table in that clause. In other circumstances it appears as part of words with numbers, and I assume the tokenizer splits a word into multiple tokens when it encounters a number in it. (E.g., in 6.3.1 it appears in “C01”, etc.) |
search is working much better in the dev branch than earlier, but there are still some questions about search rules; the anticipated results are not always what one expects.
input of "c" returns
but input of "park" or "_park" returns
"*park" returns
how to pick up "park" to get better results?
The text was updated successfully, but these errors were encountered: