Skip to content

Commit

Permalink
Add extract_article_urls_from_page_plus_fun to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dlazesz committed Nov 29, 2021
1 parent d33a3b7 commit c28cb85
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Python functions:
- `portal_specific_exctractor_functions_file`: The filename pointing to the python file which contains the required extractor functions
- `extract_next_page_url_fun` (it can be NULL): The name of the function to be imported from the `portal_specific_exctractor_functions_file` to extract the "next page URL"
- `extract_article_urls_from_page_fun`: The name of the function to be imported from the `portal_specific_exctractor_functions_file` to extract the article URLs from the archive page
- `extract_article_urls_from_page_plus_fun`: The name of the function to be imported from the `portal_specific_exctractor_functions_file` to extract the article URLs from the archive page with metadata form the portal's archive (for `checkurls` mode)
- `next_page_of_article_fun` (it can be NULL): The name of the function to be imported from the `portal_specific_exctractor_functions_file` if there are multipage articles. This function extracts the "next page URL" for the rest of the pages in a multipage article. (It must be used with `MultiPageArticleConverter` or similar as `corpus_converter` to work!)
- `corpus_converter_file`: The filename pointing to the python file which contains the required corpus extractor class
- `corpus_converter`: The name of the class to be imported from the `corpus_converter_file`. The default is to do nothing (`dummy-converter`).
Expand Down

0 comments on commit c28cb85

Please sign in to comment.