Research Interests

I am interested in Natural Language Processing in general and more specifically:

  • Machine translation, particularly optimisation, continuous adaptation and post-editing
  • Quality evaluation and estimation of machine translation
 

Research Associate at USFD

Logo of The University of Sheffield

I currently hold a Research Associate position in Lucia Specia's team, and work as a member of the NLP group at The University of Sheffield (USFD). I am hired on the european project so-called "Quality Translation 21" (QT21) to work on discriminative training and quality estimation of machine translation.

QT21 is a machine translation project which has received funding from the European Union’s Horizon 2020 Research and Innovation programm. It tackles solutions for morphologically riched and under-resourced european languages which suffer from poor translation quality.

In addition to my current funding, I hold two research grants:

  • Amazon AARA 2016, co-written with Lucia Specia (starting in April 2017)
  • EAMT 2016, co-written with Lucia Specia (starting in April 2017)
 

Past projects

Before coming to Sheffield, I did a PhD in Computer Science (2009-2013), followed by a 15-month postdoc at Le Mans University, in France. During my two consecutive positions in this university, I was supervised by Holger Schwenk now working at Facebook AI Research (FAIR) in Paris. At the same time of my PhD, I was a research engineer at SYSTRAN, under the supervision of Jean Senellart.

 
MateCAT -- (October 2011 - October 2014)

European project led by the Bruno Kessler Foundation (FBK), and conducted with the Computer Science laboratory of Le Mans Univeristy (LIUM), The University of Edinburgh and Translated Srl. For professional translators, it aimed at reducing the post-editing cost through the use of an optimized web-based CAT tool. To improve the user’s productivity, the project partners have worked on in-domain adaptation, project adaptation, automatic quality estimation and both online and incremental adaptation from user feedback. MateCAT nowadays is used by thousands of professional translators to deliver translations in more than 100 languages to 10,000 active users all over the world.

COSMAT -- (October 2009 - October 2012)

Led by the LIUM, working with SYSTRAN and the INRIA, the project aimed at providing a collaborative translation service of scientific documents to the scientific community. The result of this project was planned to be hosted on the HAL, an open archive where authors can deposit scholarly documents from all academic fields. Independently of the characteristics bound to scientific documents (domain adaptation, entities recognition, etc.), the collaborative aspect of this project relied on both translated and reviewed versions of the scientific documents (PhD thesis, article, etc.) which are used to improve the quality of the machine translation system through an analysis based on post-editing.

 

PhD Thesis

Abstract -- Although machine translation research achieved big progress for several years, the output of an automated system (i.e. a raw translation) cannot be published without prior revision by human annotators. Based on this fact, we wanted to exploit the user feedbacks from the review process in order to incrementally adapt our statistical system over time. As part of this thesis, I was therefore interested in the post-editing, one of the most active fields of research, and what is more widely used in the translation and localization industry. However, the integration of user feedbacks is not an obvious task. On the one hand, we must be able to identify the information that will be useful for the system, among all changes made by the user. To address this problem, I introduced a new concept of “Post-Editing Actions”, and proposed an analysis methodology for automatic identification of this information from post-edited data. On the other hand, for the continuous integration of user feedbacks, I have developed an algorithm for incremental adaptation of a statistical machine translation system, which gets higher performances than the standard procedure. This is even more interesting as both development and optimization of this type of translation system has a very computational cost, sometimes requiring several days of computing. Conducted jointly with Systran and the LIUM, my research work in this thesis have been taking part to COSMAT, a project of the French Government Research Agency. This project aimed to provide a collaborative machine translation service for scientific content to the scientific community. The collaborative aspect of this service with the possibility for users to review the raw translations have given an application framework for my research.

 

For more details about my research work, read my publications and CV.

Contact

Address:
Department of Computer Science
University of Sheffield
Regent Court, 211 Portobello
Sheffield, S1 4DP, UK
Phone:
+44 (0)114 222 1892

Email:
f.blain [at] sheffield [dot] ac.uk