The development of vocabularies of historical period names from web acquired corpora7
Part of : Mediterranean archaeology & archaeometry : international journal ; Vol.14, No.4, 2014, pages 165-174
Issue:
Pages:
165-174
Abstract:
Periodization is a universal and very popular system of organizing History (Petras, et al., 2006) by arbitrary dividing time into periods such as “Δικτατορία” (dictatorship) in a way that is specific to places and communities. Structured collections of time period names and timelines are considered very useful in cultural content documentation and temporal information extraction. However, to the best of our knowledge, this is the first report on the systematic collection of period names of Greek History.New period names are constantly created or left out of use. Aiming to capture this combination of dispersed specificity and constant evolution, we used the Focused Monolingual Crawler (FMC) (Mastropavlos, et al., 2011) and an initial list of 25 “seed-terms” to develop corpora dense in period names with Web retrieved documents. Period names were manually retrieved from the accumulated corpora and were annotated for a set of features, including allomorphs that occurred in the collected corpora and whether the term denoted a fact or a time period or something else as well as for persons, places and other period names related with the term.The linguistic environments where the terms occurred were identified and some of them were fed to the (FMC) as new “seed-terms”. This cycle was repeated for three times and yielded 78 period names with an average of 16 paradigms per term and a corpus consisting of 3020 valid XML documents. Some first observations on the strategies employed by Greek communities to coin time period names are reported.
Subject (LC):
Keywords:
periodization, time period name, Focused Monolingual Crawler, unstructured Web data
Notes:
Corresponding author: Maria S. Mouroutsou (msmourou@gmail.com)
References (1):
- Berman, M. (2011). Extending Gazetteers with Time and Entity Relationships. Historical Gazetteer Elements: Temporal Frameworks. Track on Historical Gazetteers Part of the Symposium on Space-Time Integration in Geography and GIScience co-sponsored by Harvard Univeristy’s Center for Geographic Analysis and the AAG, Wednesday-Friday, April 13-15, AAG 2011, Seattle, WABuckland, M. and Lancaster, L. (2004) Combining Time, Place, and Topic: The Electronic Cultural Atlas Initiative, D–Lib Magazine, volume 10, number 5 (May), at http://www.dlib.org/dlib/may04/buckland/05buckland.html, accessed 2 June 2006.Crofts, N., Doerr, M., Gill, T., Stead, S. and Stiff, M. (2004) Definition of the CIDOC Conceptual Reference Model (version 4.0).DDBC Time Authority Database (http://authority.ddbc.edu.tw/docs/open_content/)Doerr, M., Kritsotaki, A. and Stead, St. (2003) Thesauri of Historical Periods – A Proposal for Standardization, (http://www.cidoc-crm.org/).Feinberg, M., Mostern, R., Stone, S. and Buckland, M. (2003) Application of Geographical Gazetteer Standards to Named Time Periods. Technical Report, Electronic Cultural Atlas Initiative, Berkeley.Gavrilidou, M., (2002) The Hellenic National Corpus on-line, Revue Belge de Philologie et Historie 80, pp. 1003-1015Goutsos, D., (2010) The Corpus of Greek Texts: a reference corpus for Modern Greek, Corpora. Vol 5, pp. 29-44.Harvard University. Chinese Historical GIS Project. Available at <http://www.fas.harvard.edu/~chgis/>.ISO/CD 21127 (2002) Information and documentation – A reference ontology for the interchange of cultural heritage informationMastropavlos, N. and Papavassiliou, V. (2011) Automatic Acquisition of Bilingual Language Resources. In Proceedings of the 10th International Conference of Greek Linguistics, Komotini, Greece.Petras, V., Meiske, M., Larson, R., Zernecke, J., Carl, K. and Buckland, M. (2005) Leveraging Library of Congress Subject Headings to improve Search for Events – A Time Period Directory.Petras, V., Larson, R. and Buckland, M. (2006) Time Period Directories: a Metadata Infrastructure for Placing Events in Temporal and Geographic Context. Joint Conference on Digital Libraries, Chapel Hill, NC, USA.Prokopidis, P., Desipri, E., Papageorgiou, H. and Markopoulos, G. (2009) TimeEL: Recognition of Temporal Expressions in Greek texts. In Proceedings of the 9th International Conference of Greek Linguistics, Chicago, Illinois, USA.Skadiņa, I., Aker, A., Μαστροπαύλος, Ν., Su, F., Tufis, D., Mateja, V. et al. (2012). Collecting and Using Comparable Corpora for Statistical Machine Translation. In On-Line Proceedings of the LREC2012 Conference on Language Resources and Evaluation, pages 438-445. Istanbul, Turkey.Support for the Learner: Time Periods, at http://ecai.org/imls2004/timeperiods.html, accessed 2 June 2006.Wikipedia. List of Themed Timelines. 2004. Available at <http://en.wikipedia.org/wiki/List_of_themed_timelines>.