Skip to content

GSOC2015_Progress_Emilio

Emilio Dorigatti edited this page Aug 12, 2015 · 20 revisions

Warmup Period (until 25th of May) Warm up tickets on GitHub, made an experimental date normalization module using the ANTLR4 bindings for Python and the provided grammar. We decided to abandon this approach and create our own regular expressions directly via Python.

First Week (25/5 - 31/5) Some ideas about the date normalizer, meeting at FBK with mentor Marco Fossati.

Second Week (1/6 - 7/6) First prototype of the date normalizer, reviewed crowd annotated gold standard.

Third Week (8/6 - 14/6) Exams!

Fourth and Fifht Weeks (15/6 - 28/6) Almost finished and tested the date normalizer as well as the code using it.

Fifth Week (29/6 - 5/7) Final refinements for the mid-term: successfully outputting reified triples and script for transforming the wikipedia dump in sentences about soccer.

Sixth Week (6/7 - 12/7) Refactoring and cleaning of the code base, experiments with the unsupervised classifier. As it turned out it is heavily dependent on the quality of the entities linked by the linker (for example stagione 2010-2011 was linked to Serie B) and on the mapping between frame elements and ontology types in dbpedia.

Seventh Week (13/7 - 19/7) Script to compute Fleiss's Kappa on the crowdflower results, slowly refactoring the code base

Eight Week (20/7 - 26/7) Holidays.

Ninth Week (27/7 - 2/8) Created rules to run the supervised classifier, thoughts about scoring triples' confidence and implementation of score for unsupervised classification using the entity linking score.

Tenth Week (3/8 - 9/8) Scoring supervised classification facts and serializing triples' score in a separated dataset, heavy refactoring of the classifier.