Karol Wieloch
2007
Unsupervised Methods of Topical Text Segmentation for Polish
Dominik Flejter
|
Karol Wieloch
|
Witold Abramowicz
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing
2006
Linguistic Suite for Polish Cadastral System
Witold Abramowicz
|
Agata Filipowska
|
Jakub Piskorski
|
Krzysztof Węcel
|
Karol Wieloch
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports on an endeavour of creating basic linguistic resources for geo-referencing of Polish free-text documents. We have defined a fine-grained named entity hierarchy, produced an exhaustive gazetteer, and developed named-entity grammars for Polish. Additionally, an annotated corpus for the cadastral domain was prepared for evaluation purposes. Our baseline approach to geo-referencing is based on application of aforementioned resources and a lightweight co-referencing technique which utilizes string-similarity metric of Jaro-Winkler. We carried out a detailed evaluation of detecting locations, organizations and persons, which revealed that best results are obtained via application of a combined grammar for all types. The application of lightweight co-referencing for organizations and persons improves recall but deteriorates precision, and no gain is observed for locations. The paper is accompanied by a demo, a geo-referencing application capable of: (a) finding documents and text fragments based on named entities and (b) populating the spatial ontology from texts.
Search