Releases: ELTE-DH/WebArticleCurator
Releases · ELTE-DH/WebArticleCurator
v1.7.4
Fix Make Newspaper3k an optional dependency
v1.7.2
Make Newspaper3k an optional dependency
v1.7.1
Fix bad (actually standard) encoding detection from requests
v1.7.0
Improve checkurls mode (add extract_article_urls_from_page_plus_fun)
Better config validation
v1.6.0
Add checkurls mode (to debug portals)
Add negative sampling (to create new warc by ommiting some URLs)
v1.5.4
Fix typos and add better exception message for page numbering config
v1.5.3
Fix site schema for new yamale version
v1.5.2
Fix sampling: Handle non-existent URLs properly
v1.5.1
Fix iterating news archives by month or year
v1.5.0
Enable iterating news archives by month or year