Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urljoin - broken links, duplication of the last base url segment #4376

Open
alexbelyaev opened this issue Dec 27, 2024 · 6 comments
Open

urljoin - broken links, duplication of the last base url segment #4376

alexbelyaev opened this issue Dec 27, 2024 · 6 comments
Labels
Bug-Report Confirmed bug report

Comments

@alexbelyaev
Copy link

It looks like there is an issue in urljoin function.
When a relative URL is appended to a base URL, it duplicate the last segment of the base path in this case.

Base url + relative href = broken link with duplication
https://www.jobs-oberlausitz.de/stelle/ + stelle/detail/36993 =
https://www.jobs-oberlausitz.de/stelle/stelle/detail/36993
Correct:
https://www.jobs-oberlausitz.de/stelle/detail/36993

To Reproduce

CSS Selector bridge

URL:
https://www.jobs-oberlausitz.de/stelle/suche?filter%5Bregion_id%5D%5B0%5D=1&filter%5Btyp%5D%5B0%5D=Vollzeit&filter%5Bmerkmal_id%5D%5B0%5D=&Freitextsuche=&orderby=Datum&seitennummer=1

Article link selector (<a href="stelle/detail/37574" ... ):
#siteobjects-middle > form > div > div.siteobject-stellenliste > table > tbody > tr

Resulting links:

https://www.jobs-oberlausitz.de/stelle/stelle/detail/37574
https://www.jobs-oberlausitz.de/stelle/fileman/imgsc/fitheight/50/firmen/logo_64_St_dtisches_Klinikum_G_rlitz_gGmbH.png

Or:

urljoin( 'https://www.jobs-oberlausitz.de/stelle/', 'stelle/detail/36993')

Expected behavior

Result links:

https://www.jobs-oberlausitz.de/stelle/detail/37574
https://www.jobs-oberlausitz.de/fileman/imgsc/fitheight/50/firmen/logo_64_St_dtisches_Klinikum_G_rlitz_gGmbH.png

Desktop (please complete the following information):
Win 10
Chrome Version 131.0.6778.205 (Official Build) (64-bit)

@alexbelyaev alexbelyaev added the Bug-Report Confirmed bug report label Dec 27, 2024
@dvikan
Copy link
Contributor

dvikan commented Dec 29, 2024

the first example looks correct to me. The relative part is appended to base.

@alexbelyaev
Copy link
Author

alexbelyaev commented Jan 1, 2025

But in the browser it works differently in fact, so the result link is broken in such scenario even though it works on the original page (in Chrome). Here is part of GPT explanation: "When a relative URL is appended to a base URL, it does not duplicate the last segment of the base path. Instead, it appends the relative URL directly to the base URL's directory." I've added GPT's version of the library for studying purpose.
urljoin.zip

@dvikan
Copy link
Contributor

dvikan commented Jan 1, 2025

base: http://example.com/foo/bar/
relative: bar

Do you think it's correct to resolve this to http://example.com/foo/bar/ ?

See https://github.com/fluffy-critter/php-urljoin/blob/main/tests/cases.json for test suite

@alexbelyaev
Copy link
Author

alexbelyaev commented Jan 1, 2025

It is above my understanding why on the provided page it works the way it works (and how it should). But the result link in rss-bridge is different from what I have in the browser on that page. I assumed that the library was not taking something into account.

url

url2

@dvikan
Copy link
Contributor

dvikan commented Jan 1, 2025

The base url is https://www.jobs-oberlausitz.de/ as specified in the html tag <base>.

Still might be a bug here but it's not in urljoin. I think bug is that rssbridges does not respect the <base> tag in html pages.

@dvikan
Copy link
Contributor

dvikan commented Jan 3, 2025

@ORelio @LarsStegman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug-Report Confirmed bug report
Projects
None yet
Development

No branches or pull requests

2 participants