Skip to content

Releases: ELTE-DH/WebArticleCurator

v1.7.4

30 Nov 19:04
Compare
Choose a tag to compare

Fix Make Newspaper3k an optional dependency

v1.7.2

30 Nov 18:19
Compare
Choose a tag to compare

Make Newspaper3k an optional dependency

v1.7.1

30 Nov 15:44
Compare
Choose a tag to compare

Fix bad (actually standard) encoding detection from requests

v1.7.0

29 Nov 17:59
Compare
Choose a tag to compare

Improve checkurls mode (add extract_article_urls_from_page_plus_fun)
Better config validation

v1.6.0

27 Nov 09:42
Compare
Choose a tag to compare

Add checkurls mode (to debug portals)
Add negative sampling (to create new warc by ommiting some URLs)

v1.5.4

25 Oct 08:09
Compare
Choose a tag to compare

Fix typos and add better exception message for page numbering config

v1.5.3

13 Oct 17:12
Compare
Choose a tag to compare

Fix site schema for new yamale version

v1.5.2

14 Sep 16:02
Compare
Choose a tag to compare

Fix sampling: Handle non-existent URLs properly

v1.5.1

26 Aug 17:13
Compare
Choose a tag to compare

Fix iterating news archives by month or year

v1.5.0

24 Aug 14:59
Compare
Choose a tag to compare

Enable iterating news archives by month or year