Peer-Reviewed Science Publications

I am a peer-reviewed author and speaker at the world’s largest defense training conference (I/ITSEC), as well as human-computer interaction (HCI) international conferences published by Springer Nature, one of the most prestigious, highest-impact, and oldest continuously operating academic publishers.

BLEU Skies for Endangered Language Revitalization: Lemko Rusyn and Ukrainian Neural AI Translation Accuracy Soars (2023)

Abstract

Accelerating global language loss, associated with elevated incidence of illicit substance use, type 2 diabetes, binge drinking, and assault, as well as sixfold higher youth suicide rates, poses a mounting challenge for minority, Indigenous, refugee, colonized, and immigrant communities. In environments where intergenerational transmission is often disrupted, artificial intelligence neural machine translation systems have the potential to revitalize heritage languages and empower new speakers by allowing them to understand and be understood via instantaneous translation. Yet, artificial intelligence solutions pose problems, such as prohibitive cost and output quality issues. A solution is to couple neural engines to classical, rule-based ones, which empower engineers to purge loanwords and neutralize interference from dominant languages. This work describes an overhaul of the engine deployed at LemkoTran.com to enable translation into and out of Lemko, a severely endangered, minority lect of Ukrainian genetic classificability indigenous to borderlands between Poland and Slovakia (where it is also referred to as Rusyn). Dictionary-based translation modules were fitted with morphologically and syntactically informed noun, verb, and adjective generators fueled by 877 lemmata together with 708 glossary entries, and the entire system was riveted by 9,518 automatic, codification-referencing, must-pass quality-control tests. The fruits of this labor are a 23% improvement since last publication in translation quality into English and 35% increase in quality translating from English into Lemko, providing translations that outperform every Google Translate service by every metric, and score 396% higher than Google’s Ukrainian service when translating into Lemko.

Preprint

Please Cite

Orynycz, P. (2023). BLEU Skies for Endangered Language Revitalization: Lemko Rusyn and Ukrainian Neural AI Translation Accuracy Soars. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2023. Lecture Notes in Computer Science(), vol 14051. Springer, Cham. https://doi.org/10.1007/978-3-031-35894-4_10
@inproceedings{orynycz2023bleu,
title={BLEU Skies for Endangered Language Revitalization: Lemko Rusyn and Ukrainian Neural AI Translation Accuracy Soars},
author={Orynycz, Petro},
booktitle={International Conference on Human-Computer Interaction},
pages={135--149},
year={2023},
organization={Springer}
}

Winning Hearts & Tongues: A Polish to Lemko Case Study (2023)

Abstract

When minority and local languages are lost, national security suffers: not only are significant increases in suicidality, depression, diabetes, assault, and substance abuse often documented, a void is created that has historically been exploited by adversaries. For example, millions from minority language communities ahistorically assume the Russian language and/or identity as their own in Ukraine, Belarus, NATO allies, and even the United States. If native language communication gaps remain in the hands of adversaries only, using their long experience with these languages, NATO remains at a major disadvantage attempting to engage these communities. In Europe, psychic wounds inflicted in part by language loss have not been closed by assimilation. Instead, cities experience bursts of isolating tensions in the West and eastern populations are convinced by adversarial powers that those powers are their true allies, who understand and respect them. Nor is education in the official language a panacea: in the case of Ukraine (and even Spain), non-trivial differences between local lects and the official language create openings for adversaries to fan the flames of separatism.

Using machine translation engines to empower NATO and its partners in training recruits or acting on the ground in the language closest to their hearts and minds can win immediate ‘us’-ness and showcase NATO’s embraced polycultural vision. Artificial intelligence and rule-based engines were assembled to translate between the official language of Poland and that of its indigenous Lemko minority, which has long been targeted by foreign powers. Engines were scored translating from Lemko to Polish using metrics developed with support from DARPA, producing a bilingual evaluation understudy (BLEU) score of 31.13 and translation edit rate (TER) of 54.10. Meanwhile, in the other direction, the engines scored TER 53.73 and BLEU 29.49, a score 6.5 times better than that of Google Translate’s Polish-Ukrainian service.

Preprint

Please Cite

Orynycz, P., & Dobry, T. (2023). Winning Hearts & Tongues: A Polish to Lemko Case Study. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC)

Say It Right: AI Neural Machine Translation Empowers New Speakers To Revitalize Lemko (2022)

Abstract

Artificial-intelligence powered neural machine translation might soon resuscitate endangered languages by empowering new speakers to communicate in real time using sentences quantifiably closer to the literary norm than those of native speakers, and starting from day one of their language reclamation journey. While Silicon Valley has been investing enormous resources into neural translation technology capable of superhuman speed and accuracy for the world’s most widely used languages, 98% have been left behind, for want of corpora: neural machine translation models train on millions of words of bilingual text, which simply do not exist for most languages, and cost upwards of a hundred thousand United States dollars per tongue to assemble.

For low-resource languages, there is a more resourceful approach, if not a more effective one: transfer learning, which enables lower-resource languages to benefit from achievements among higher-resource ones. In this experiment, Google’s English-Polish neural translation service was coupled with my classical, rule-based engine to translate from English into the endangered, low-resource, East Slavic language of Lemko. The system achieved a bilingual evaluation understudy (BLEU) quality score of 6.28, several times better than Google Translate’s English to Standard Ukrainian (BLEU 2.17), Russian (BLEU 1.10), and Polish (BLEU 1.70) services. Finally, the fruit of this experiment, the world’s first English to Lemko translation service, was made available at the web address www.LemkoTran.com to empower new speakers to revitalize their language.

New speakers are key to language revitalization, and the power to “say it right” in Lemko is now at their fingertips.

Preprint

Please cite

Orynycz, P. (2022). Say It Right: AI Neural Machine Translation Empowers New Speakers to Revitalize Lemko. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_37
@InProceedings{10.1007/978-3-031-05643-7_37,
author="Orynycz, Petro",
editor="Degen, Helmut
and Ntoa, Stavroula",
title="Say It Right: AI Neural Machine Translation Empowers New Speakers to Revitalize Lemko",
booktitle="Artificial Intelligence in HCI",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="567--580",
abstract="Artificial-intelligence-powered neural machine translation might soon resuscitate endangered languages by empowering new speakers to communicate in real time using sentences quantifiably closer to the literary norm than those of native speakers, and starting from day one of their language reclamation journey. While Silicon Valley has been investing enormous resources into neural translation technology capable of superhuman speed and accuracy for the world's most widely used languages, 98{\%} have been left behind, for want of corpora: neural machine translation models train on millions of words of bilingual text, which simply do not exist for most languages, and cost upwards of a hundred thousand United States dollars per tongue to assemble.",
isbn="978-3-031-05643-7"
}