1st Edition

Machine Learning in Translation Corpora Processing

By Krzysztof Wolk Copyright 2019
    280 Pages
    by CRC Press

    280 Pages 4 Color & 37 B/W Illustrations
    by CRC Press

    280 Pages 4 Color & 37 B/W Illustrations
    by CRC Press

    This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.



     

    Table of contents



     



    Preface





    Introduction



    Background and context



    Machine translation (MT)





    Statistical machine translation and comparable corpora



    Overview of SMT



    Textual Components and Corpora



    Moses Tool Environment For SMT



    Aspects of SMT processing



    Evaluation of SMT Quality





    State of the Art



    Current methods and results in spoken language translation



    Recent methods in comparable corpora exploration





    Author’s solutions to PL-EN corpora processing problems



    Parallel data mining improvements



    Multi-threaded, Tuned and GPU-accelerated Yalign



    Tuning of Yalign method



    Minor improvements in mining for Wikipedia exploration



    Parallel data mining using other methods



    SMT Metric Enhancements



    Alignment and filtering of corpora



    Baseline system training



    Description of experiments





    Results and conclusions



    Machine translation results



    Evaluation of obtained comparable corpora



    Quasi comparable corpora exploration



    Other fields of MT techniques application





    Final conclusions





    References







     





     

    Biography

    Krzysztof Wołk holds a PhD Eng. degree in Computer Science, and is a graduate of the Polish-Japanese Academy of Information Technology. He is currently an associate professor at the Cathedral of Multimedia at the same university. His research is mostly related to natural language processing and machine learning based on statistical methods, neural networks and deep learning; and is interested in IT and its challenges, and engages in interdisciplinary projects, particularly those related to HCI, UX, medicine and psychology.



    In addition, he has worked as a lecturer at the Warsaw School of Photography & Graphic Design, and as an IT trainer. His specialties as a teacher are primarily deep learning, machine learning, natural language processing, computational linguistics, multimedia, HCI, UX, mobile applications, HTML 5, Adobe applications and server products from Apple and Microsoft.



    As far as his didactic work is concerned, he leads classrooms at the faculty of computer science and at the new media art department at the Polish-Japanese Academy of Information Technology and he also used to lead classes and lectures at the Warsaw School of Photography & Graphic Design.