Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search is currently broken for finding specific names or people (search is too smart in a bad way?) #739

Open
ffeldner opened this issue Jan 24, 2024 · 2 comments

Comments

@ffeldner
Copy link

ffeldner commented Jan 24, 2024

Hi,

The media.ccc.de page has its talk authors hyperlinked. For example, on this talk: https://media.ccc.de/v/37c3-11782-smtp_smuggling_spoofing_e-mails_worldwide - the author is called Timo Longin, thus the listed speaker has a hyperlink to https://media.ccc.de/search?p=Timo+Longin

However, the way the search works, it looks for both tokens seperately - so I get a waggonload of Timos and other talks.

Even when figuring out "hey, the smart solution is to search manually for the rarest token of the name, which is clearly the surname, Longin" this gets torpedoed by a "smart" search feature that I guess is there to filter out typos?

I get a waggonload of results for the query https://media.ccc.de/search/?q=longin because they contain the word login, I get a talk by a person called Longtin, so each search word seems to get completely taken apart and filled with single-character placeholders or sth.

This does not happen when changing the q to a p https://media.ccc.de/search/?p=longin and using only Longin, because apparently query search tries to be smart, while person search tries to be accurate. but it still would be beneficial to allow person search to search for a name containing spaces without splitting it up.

Also, using various parameters like p instead of q by manually rewriting the URL after searching for something is not documented or offered, so unless one knows this functionality of a person search exists, they will not think to do so when using the search field on media.ccc.de

Preferred fix would be to implement a way to search for an entire name, and then use that way for the hyperlinks on the speaker names on videos. bandaid fix would be to use the search/?p= personsearch with only surnames of speakers to narrow it down.

@evilscientress
Copy link

We just analyzed the issue a bit more and it seams like the issue lies with the elastic search query.

The query uses a multi_match of type best_fields which doesn't do phrase matching. So it splits the query term at white spaces. The query should rather use the type phrase which would type to match the full name, or a combination of both with phrase matching boosted.

It's debatable though if the search should try to split up the search term at white spaces at all, because it then will return other speakers that for example share the same first name, which is not what the user expects when clicking on a speaker name.

@rofl0r
Copy link

rofl0r commented Feb 7, 2024

The query should rather use the type phrase which would type to match the full name, or a combination of both with phrase matching boosted.

sounds good. maybe this could be changed in this way for some testing ?

It's debatable though if the search should try to split up the search term at white spaces at all

if it is split, it should only return results where all terms match, not any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants