Past Event! Note: this event has already taken place.

Enhancing language technology resources in minoritized language contexts: roles and applications of corpora

December 12, 2022 at 12:00 PM to 1:30 PM

Cost:Free



Dr. Dawn Knight, Cardiff University

Please register here for this event

About the Speaker

Dawns research interests and expertise are in the areas of corpus linguistics, discourse analysis, digital interaction and non-verbal communication. Dawn has led and/or contributed to a range of UKRI (UK Research and Innovation) and Welsh Government-funded multidisciplinary and cross-institutional projects, including the National Corpus, or Contemporary Welsh (CorCenCC), FreeTxt, and International Variation Online (IVO) projects. From 2018-2021 Dawn was the Chair of the British Association for Applied Linguistics.

Abstract

The creation of language technology resources in a minoritized language context poses interesting challenges, but also presents opportunities that are not always available to developers of such resources for larger languages. In this presentation I will demonstrate how scrutiny of the unique context of a specific minoritized language, and meaningful collaboration with potential user groups, can determine the design and construction of language resources. I take a case study approach in this talk, focusing on the rationale for, and realisation of, three key resources in the Welsh language: CorCenCC (the National Corpus of Contemporary Welsh (Corpws Cenedlaethol Cymraeg Cyfoes), the Geifan word list and FreeTxt, a novel open-source bilingual free-text analysis toolkit.

The creation of these resources involved the development of important new tools and processes, including, in the case of CorCenCC, a unique user-driven corpus design in which language data was collected and validated through crowdsourcing, and an in-built pedagogic toolkit (Y Tiwtiadur) developed in consultation with representatives of all anticipated academic and community user groups. The approaches used to construct the resources mentioned in this talk provide an invaluable template for those researching other minoritized or minority languages. The specifics of how this template might inform corpus construction in these/such languages will be discussed in more depth during my presentation.

This event is sponsored by the School of Linguistics and Language Studies