|
AI-Enhanced Transdisciplinary Data Encoding for LLMs Training
Rusudan Makhachashvili, Natalia Bober
Proceedings of the 29th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2025, pp. 327-333 (2025); https://doi.org/10.54808/WMSCI2025.01.327
|
The 29th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2025
Virtual Conference September 9 - 12, 2025 Proceedings of WMSCI 2025 ISSN: 2771-0947 (Print) ISBN (Volume): 978-1-950492-85-5 (Print) |
|
Abstract
The rapid advancement of artificial intelligence (AI) has reshaped linguistic data encoding, particularly for Large Language Models (LLMs). AI-driven annotation techniques enable efficient lexical processing, semantic disambiguation, and automated neology tagging, refining computational language modeling across transdisciplinary domains.
This study explores AI-enhanced methodologies for encoding linguistic data for LLM training. AI-assisted lexicographic workflows enable LLMs to dynamically adjust to linguistic evolution while ensuring scalable annotation across diverse transdisciplinary corpora. LLMs trained on transdisciplinary lexicons can generate cross-modal language interpretations, refining machine-generated discourse across domains. The inquiry objective is the investigation of the innovative philosophic aspects cyberspace through the lenses of the language development processes as it informs AI models elaboration, LLMs training, and digital communication. The study design is the disclosure of cyberspace as an ontology model and as a logosphere model. Two data encoding projects, developed by the authors, serve as foundational elements for this investigation. A methodology and AI-augmented, AI-performed protocols of computer vocabulary innovative elements phenomenological features identification is introduced supplying the template for a new study field – phenomenological, AI-enhanced digital neology, neography and neosemiotics. Transdisciplinary educational applications of these approaches to data encoding, include: training AI-enhanced NLP models for transdisciplinary communication; developing standardized linguistic annotation protocols, ensuring interoperability across AI-driven lexicographic systems; integrating transdisciplinary discourse structures into machine-learning lexicons, refining AI adaptive language comprehension. |
||