-
Notifications
You must be signed in to change notification settings - Fork 1
About the output
Samuel Depardieu edited this page Jun 6, 2018
·
1 revision
The outputed file is meant to contains a number a different fields, which can vary depending on the scraped provider.
It will always have the following attribute, thought:
title: a string containing the document title
uri: the url of the document
pdf: the name of the file
sections: a json object of section names, containing the text extracted from matching sections
keywords: a json object of keywords, containing the text extracted from matching text
hash: a md5 digest of the file
provider: the provider from where the file has been downloaded