Time: March 28, 2017, h. 11:00 am
Location: Room Ofek, Polo scientifico e tecnologico “Fabio Ferrari”, Building Povo 1 - Povo (Trento)
Dr. Hanyu Zhang
Abstract of Dissertation
Multilingual semantic linguistic resource is critical for many applications in Natural Language Processing (NLP). While, building large-scale lexico-semantic resources manually from scratch is extremely expensive, which promoted the applications of automatic extraction or merger algo- rithms. These algorithms did benefit us in creation of large-scale resources, but introduced many kinds of errors as the side effect. For example, Chinese WordNet follows the WordNet structure and is generated via several algorithms. This automatic generation of resources introduces many kinds of errors such as wrong translation, typos and false mapping between multilingual terms. The quality of a linguistic resource influences the performance of the further applications direct- ly, which means the quality of a linguistic resource should be the higher the better. Thus, finding errors is inevitable.
However, till now, there is not any efficient method to find errors from a large-scale and multi- lingual resource. Validating manually by experts could be a solution, but it is very expensive, where the obstacles come from not only the large-scale dataset, but also multilingual. Even though crowdsourcing is a method for solving large-scale and tedious task, it is still costly. By thinking in this scenario, we plan to find an effective method that can help us finding errors in low cost.
We use games as our solution and adopt Universal Knowledge Core (UKC) with respect to Chi- nese language as our case study. UKC is a multi-layered multilingual lexico-semantic resource where a common lexical element from a different language is mapped to a formal concept. In this dissertation, we present a non-immersive game named Concept Challenge Game to find the errors that exist in English-Chinese lexico-semantic resource. In this game, people will face chal- lenges in English synsets and have to choose the most appropriate option from the listed Chinese synsets. The players are unaware when finding errors in the lexico-semantic resource. Our evalu- ation shows that people are spending a significant amount of time playing and able to find differ- ent erroneous mappings. Moreover, we further extended our game to Italian version, the result is promising as well, indicating that our game has the ability to figure out errors in multilingual lin- guistic resources.