You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Need to add the other names for sections we want to scrape references from based on my manual tagging of 123 policy documents.
I suggest to add 'Endnotes".
Also to investigate why in some cases "bibliography" and "references" sections aren't scraped, is it something to do with the capitalisation or formatting of the section name, or are there multiple references sections?
You may find my manually tagged data set and notes interesting (inc urls so you can see):
This file gives the 13 examples of times where there were references not in a section called "References", but instead the section had a different name. You might find the notes of interest. doc_sample_20190218-1253 for issue.xlsx
This file gives the 11 examples of when a reference was scraped but I didn't think there was a references section, or a reference wasn't scraped but I did think there was a references section: doc_sample_20190218-1253 for issue all mismatch.xlsx
The text was updated successfully, but these errors were encountered:
Cool, I can do this next sprint!
The system is changing a bit, because we are spliting the scraping process and the pdf parsing process into two different task, and next week I'm likely to work on the pdf parsing task.
Adding these is as simple as juste writting them in a file 👍
Need to add the other names for sections we want to scrape references from based on my manual tagging of 123 policy documents.
I suggest to add 'Endnotes".
Also to investigate why in some cases "bibliography" and "references" sections aren't scraped, is it something to do with the capitalisation or formatting of the section name, or are there multiple references sections?
You may find my manually tagged data set and notes interesting (inc urls so you can see):
This file gives the 13 examples of times where there were references not in a section called "References", but instead the section had a different name. You might find the notes of interest.
doc_sample_20190218-1253 for issue.xlsx
This file gives the 11 examples of when a reference was scraped but I didn't think there was a references section, or a reference wasn't scraped but I did think there was a references section:
doc_sample_20190218-1253 for issue all mismatch.xlsx
The text was updated successfully, but these errors were encountered: