My research interests revolve around Natural Language Processing (NLP), Deep Learning (DL) and Computational Linguistics (CL). Most of my work is about Machine Translation (MT) and Quality Estimation (QE) to enrich end users' experience. I have previously worked on Post-Editing (PE), incremental training and adaptation through time.
Past Projects & Grants
Browser-based Multilingual Translation (Bergamot) -- (2019-2021)Funded by the European Commission, the Bergamot project will add and improve client-side machine translation in a web browser. Unlike current cloud-based options, running directly on users’ machines empowers citizens to preserve their privacy and increases the uptake of language technologies in Europe in various sectors that require confidentiality. Free software integrated with an open-source web browser, such as Mozilla Firefox, will enable bottom-up adoption by non-experts, resulting in cost savings for private and public sector users who would otherwise procure translation or operate monolingually. Our combined research on user experience, domain adaptation, quality estimation, outbound translation, and efficiency support a broad browser-based innovation plan.
Predicting Relevance and Quality of Machine Translation for Product Reviews -- (2018)Funded by the Amazon Academic Research Awards (AARA) program, this project was to devisea Quality Estimation (QE) approach for the machine translation (MT) of product reviews. Ononline market platforms such as Amazon, product reviews are abundant but written in a singlelanguage (often English). Automatically translating such reviews could better enable products toreach foreign markets. However, this type of content introduces important challenges to stateof the art machine translation, which often results in far from perfect quality translations, andthus automatic quality estimation becomes paramount.
Quality Translation 21 (QT21) -- (April 2015 - January 2018)Quality Translation 21 is a machine translation project which has received funding from the European Union’s Horizon 2020 Research and Innovation program. Many of the languages not supported by our current technologies show common traits: they are morphologically complex, with free and diverse word order. Often there are not enough training resources and/or processing tools. Together this results in drastic drops in translation quality. The combined challenges of linguistic phenomena and resource scenarios have created a large, and under-explored, grey area in the language technology map of European languages. Combining support from key stakeholders, QT21 addressed this grey area by substantially improved statistical and machine-learning based translation model, improved evaluation and continuous learning from mistakes, all with a strong focus on scalability.
MateCAT -- (October 2011 - October 2014)European project led by the Bruno Kessler Foundation (FBK), and conducted with the Computer Science laboratory of Le Mans University (LIUM), The University of Edinburgh and Translated Srl. For professional translators, it aimed at reducing the post-editing cost through the use of an optimized web-based CAT tool. To improve the user’s productivity, the project partners have worked on in-domain adaptation, project adaptation, automatic quality estimation and both online and incremental adaptation from user feedback. MateCAT nowadays is used by thousands of professional translators to deliver translations in more than 100 languages to 10,000 active users all over the world.
COSMAT -- (October 2009 - October 2012)Led by the LIUM, working with SYSTRAN and the INRIA, the project aimed at providing a collaborative translation service of scientific documents to the scientific community. The result of this project was planned to be hosted on the HAL, an open archive where authors can deposit scholarly documents from all academic fields. Independently of the characteristics bound to scientific documents (domain adaptation, entities recognition, etc.), the collaborative aspect of this project relied on both translated and reviewed versions of the scientific documents (PhD thesis, article, etc.) which are used to improve the quality of the machine translation system through an analysis based on post-editing.