-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New extractor requirement #700
Comments
Hi @mubashar1199 , typically, an extractor is a scala class that is run on every wikipedia page and tries to extract specific information from the page. For example the existing LabelExtractor extracts the page name, the HomepageExtractor tries to detect the homepage of the person/organization that is a wikipedia page is about. Each extractor writes the extracted triples in specific datasets. It is usually a 1-1 mapping e.g. LabelExtractor -> label dataset but some extractors that get a lot of information may split the data in multiple datasets Are you trying to write a new extractor or post-process the existing datasets to form a new dataset |
this seems like a post-processing step. check this out http://dev.dbpedia.org/Post-Processing |
Yes i want to post-process the dataset to generate new triples and append these triples to existing dataset or create the new dataset for newly created triples. How that can be done using extraction framework? |
Ok i will take a look |
the approach definitively is to create a new "dataset" here. However this postprocessing, does not necessarily have to be fully integrated into the extraction framework it can be also derived from the marvin extraction on the Databus https://databus.dbpedia.org/marvin/mappings/mappingbased-objects-uncleaned/ Tell us please what triples you would like to generate and what tools you are going to use (any other external data dependencies) then @Vehnem can help you how and where to integrate. |
I want to use wikipedia info box properties and based on some predefined rules, infer new information from those properties and append the already existed dataset. I want the results to appear in sparql public endpoint. Please tell me how and where to integrate it. |
@JJ-Author post-processing is pretty much the worst place to add anything. We discussed this a lot and the plan is to implement post-processing via the databus and thus remove it completely. @mubashar1199 these are the insertion points for new data into DBpedia: More info from WikipediaIf you think there is non-covered info in Wikipedia, that is not yet covered by the extraction:
Adding extensions based on the extracted dataVery similar to post-processing, i.e. you work on one of the extracted datasets such as mappingbased-extraction |
Hello,
I want to create a new extractor but i am unable to understand the following:
1: I want to create new output dataset file, just creating a new dataset in Dataset.scala is not working for me.
2: I want to iterate all the rdf triples in mappingsbased-objects-uncleaned,ttl.bz2 file, perform some processing and then generate new rdf triples in a newly created dataset file. It is also required to run this at last when all other extraction has been done.
In the gender extractor following comment is written:
// Even better: in the first extraction pass, extract all types. Use them in the second pass.
How this multipass functionality can be implemented?
Please tell me how can i perform above operations
Thanks
The text was updated successfully, but these errors were encountered: