Great search, but still a few things missing #1220

admtech · 2024-05-31T10:48:04Z

admtech
May 31, 2024

I found ParadeDB with "pg_search" and "pg_lakehouse" by random and I am thrilled. You really fill a big gap in the PostgreSQL universe. It makes a database for everything more realistic and feasible. Thanks a lot for that.

I have been working with "pg_search" for the last 24 hours and have integrated it into our website database. The performance is really very good, much faster than the standard Postgresql full-text search. I failed the first time in some places because of the documentation, in my opinion a few more examples are missing.

RANGE

For example, the documentation on range is very poor (and also has an error "range => '[1,3)'::int4range"). But after some trial and error, I got it to work.

Here are some suggestions for additions to the documentation (our index: search_content):

tsrange:

SELECT id, title, stamp_create, paradedb.rank_bm25(id) FROM search_content.search(
  query => paradedb.range(
    field => 'stamp_create',
    range => '["2007-04-12 09:23:27","2007-04-12 10:23:27"]'::tsrange
  ),
  stable_sort => true
);

daterange:

SELECT id, title, stamp_create, paradedb.rank_bm25(id) FROM search_content.search(
  query => paradedb.range(
    field => 'stamp_create',
    range => '["2007-04-12","2007-04-13"]'::daterange
  ),
  stable_sort => true
);

Combinations:

SELECT id, title, stamp_create, paradedb.rank_bm25(id) FROM search_content.search(
    query => paradedb.boolean(
	    SHOULD => ARRAY[
		    paradedb.parse('title:"Proxmox routing"'),
		    paradedb.parse('title:Proxmox OR title:routing'),
		    paradedb.fuzzy_term(field => 'title', value => 'Proxmox routing')
	    ],
	    MUST => ARRAY[
                    paradedb.range(field => 'stamp_create', range => '["2022-01-01","2024-01-01"]'::daterange)
        ]
    ),
    limit_rows => 10
);

SORT

The default sorting for "pg_search" is always BM25 scoring. In general, relevance is of course the first and most important sort. However, our users have expressed a desire for further sorting, e.g. by date. This means showing all content with the search terms, but then by date and not by relevance.

This does not work optimally at the moment because there is no additional sorting field. Even if the TIMESTAMP field e.g. "stamp" is available in the index, I cannot use it for sorting. This means that I have to search the entire index first in order to re-sort with ORDER BY. The performance is not optimal for many sessions.

Maybe I missed something, I would be grateful for a hint. Alternatively I can open a new feature request.

INDEX: numeric_fields arrays

INT[] and BIGINT[] arrays do not yet work for numeric_fields. It is not possible to filter INT or BIGINT arrays. New feature request?

TOKENIZERS

It is not possible to create your own mixed tokenizers or to apply multiple tokenizers to a field. For example: A tokenizer that supports "ngrams" and "en_stem". This may lead to better search results. There is already a issue for this: #575. Any alternative hint?

STEMMER

Support for multiple languages. There is already a issue for this: #1062

FOUND COUNTER

During a search, there is always a counter with the number of records found. Even if you use limit or offset. I have not found a counter here yet. Do you have any tips on how to get it without a second query?

Finally, the only remaining question is: How big is my index? There is already an issue for that: #1061

Thanks and keep up the good work. We will definitely test the search and integrate it in the future.

Regards
Frank from Germany

philippemnoel · 2024-06-01T02:18:08Z

philippemnoel
Jun 1, 2024
Maintainer

Hi Frank! Thank you for reporting this. You are right that there are several areas where the documentation is sparse and confusing. I apologize.

Regarding the sort, it would be wonderful if you opened a feature request.

On the docs suggestion, if you'd like to open a PR and contribute it yourself, we would be honored to have your contribution. You can find it in paradedb/paradedb under the docs/ folder. If you prefer not to, I'll take care to add your feedback to the documentation.

Please let us know how else we can help, and thank you for your kind words :)

1 reply

philippemnoel Dec 19, 2024
Maintainer

All of these should be handled by now!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParadeDB

Great search, but still a few things missing #1220

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ParadeDB

Great search, but still a few things missing #1220

admtech May 31, 2024

Replies: 1 comment · 1 reply

philippemnoel Jun 1, 2024 Maintainer

philippemnoel Dec 19, 2024 Maintainer

admtech
May 31, 2024

Replies: 1 comment 1 reply

philippemnoel
Jun 1, 2024
Maintainer

philippemnoel Dec 19, 2024
Maintainer