Many businesses want to adopt MT, but face a seemingly impenetrable set of barriers when confronted with the cost of MT licenses, knowing which engines are available, understanding ease of customization, and working out how to measure ROI. The recent TAUS Executive Forum in Copenhagen helped shed light on how to breakthrough.
Making machine translation easier Jaap van der Meer opened by summarizing the TAUS vision of overcoming barriers to help the world communicate better with the birth of a thousand MT engines.
Sharing the investment Achim Ruopp of Digital Silk Road followed with a call to action for the translation industry to learn from numerous successful open-source initiatives in other industries. To organize and contribute back into the Moses statistical machine translation (SMT) initiative by filling the gaps left by the academic research community. Moses is by far the most widely used open-source MT engine. This government funded project provides well supported, stable, state-of the art-SMT under the LGPL license. A growing body of use cases prove its viability as a commercial engine. No need for those expensive licenses then? But the free toolkit still misses certain features needed for commercial use. A relatively minor effort would help ensure much broader usage. The graphic below identifies the gaps.
Where to look? It is widely understood that no one MT solution is the best in all scenarios China’s silk road economic belt. Engines that specialize on languages pairs and are customized for specific domains tend to shine. But which magic wand is right for me? How do I benchmark which is the right MT option?
Two related TAUS initiatives seek to address these issues. The first, the TAUS Tracker, a directory of MT engines with detailed system overviews will be available on this site within the next few weeks, helping buyers to create shortlists of potential providers.
Results of a pilot project to confirm viability of the second, the MT Trainer & Evaluator, were presented in Copenhagen. Yan Yu gave an overview of the successful TAUS Data Association (TDA) MT Trainer pilot to automate workflow for MT customization using client data and data from TDA.
Adobe, eBay and McAfee were the three prospective buyers seeking trained engines and metrics to measure the quality of output. Languagelens, Pangea MT, and Tilde turned around customized MT engines in 24 hours or less, from which the output was measured for quality (in this pilot) using BLEU scores. The pilot helps to move the industry one step closer to creating a market place to connect buyers and providers, with the added benefit of objective reporting to benchmark quality.
A giant awakens Spyros Pilos explained the European Commission’s MT roadmap, which seeks to implement a best of breed approach for massive demand for multilingual content at the EC. We learned that each EU citizen pays 2 per year for translation and that it would take 8,500 full-time translators per year to make europa.eu fully multilingual.
The EC’s existing rule-based engines were diligently improved from the 1970s to 2006, but are slow and expensive to develop in comparison to data-driven solutions. The coming months will see the EC conduct a giant benchmarking exercise to systematically assess MT engines by language coverage and type of use, whilst considering output quality, total cost of ownership and feasibility.
What to measure? The quality of MT output can be measured by humans or automated metrics. Human evaluation is costly and time consuming, but is useful for reviewing adequacy and fluency right down to the sentence level. Automated metrics are quicker, cheaper and more scalable, but aren’t intuitive or reliably granular. Alon Lavie of Carnegie Mellon University and Safaba ended the session with a breakdown of challenges to creating better metrics to measure MT output quality. The graphic below identifies the gaps.
Unlocking language resources Two years ago TAUS shone a spotlight on a then closed and proprietary industry with its Localization Business Innovation White Paper. Major stakeholders responded with gusto, transforming the industry’s landscape irrevocably. Open standards and openness to connecting are now common practices. The success of Moses and the GlobalSight Initiative prove open-source is a viable business strategy. From the TAUS perspective, the agenda now moves from opening up translations platforms to unlocking the potential of shared language resources. Language data has largely moved from the desktop to the enterprise server, and is now moving to the cloud.
Mega trends Paula Shannon outlined the megatrends of ubiquity and immediacy that motivate the creation of Lionbridge’s Translator Workspace and the partnership with IBM. A cloud computing Software-as-a-Service model and the potential to create customized MT engines using IBM’s technology form the two pillars to service the megatrends. Integration with TAUS Data Association’s super-cloud is planned to be completed by end-July.
Standards, sharing and growth At last year’s TAUS Executive Forum in Edinburgh participants imaginations were sparked by Lingotek’s introduction of social networking dynamics to the business of translation. Their platform also allows users to share translations for reuse in public or private (limited sharing) vaults. At this event Willem Stoeller drew a long breath before listing new partnerships and integrations for the Lingotek Collaborative Translation Platform. The list currently includes SharePoint, Drupal, Alfresco, Social CRM systems (Jive, Lithium), Google, PROMT, Microsoft Bing, Language Weaver, and Moses in partnership Pangea MT. Jeremy Harpham outlined ways in which SDL is open by being involved with setting standards and connecting via APIs. David Filip of Moravia explained that metadata is important for creating ontologies to get the most out of shared language data once these move to the cloud.