Dictionaries Conversion Framework Reduced Dictionary Conversion from 3 Months to One Week
Oxford’s Global Languages initiative will empower millions of people across the globe with digital lexical data in 100 of the world’s languages. Lexical information, in a single linked repository, will be made available for free for speakers and learners, as well as used for licensing into the technology products and applications.
Initially, OUP made all dictionaries conversions in-house. The process was not unified and took minimum 3 weeks, in some cases reaching 3 months depending on the initial data format. For each dictionary a new convertor was created. And 15-20% of data was lost and needed to be fixed.
Digiteum created a Dictionaries Conversion Framework (DCF) to keep up with the quality and consistency of output data. It enabled to reduce the conversion speed to 2-5 working days with 100% data accuracy.
Figuratively speaking, the creation of the Framework could be compared with the invention of the printing press for dictionaries conversion. It is “one converter fits all dictionaries” framework that unifies the output lexical data no matter its initial format and structure.
With 30 dictionaries converted so far, Phase 2 of the project was started to continue OUP’s tradition of digital innovation.
- Dictionary data conversion from arbitrary input formats (e.g. XML, RTF, HTML, Plain Text, etc.) into specified output formats (e.g. LeXml, DTD6).
- It takes from 2 to 5 business days for one dictionary conversion (compared with up to 3 months before the project start).
- 100% data accuracy vs. 15-20% data loss.
- Conversion speed based on DCF increased 10 times compared to conversions done previously (based on XSL).
- Over 30 dictionaries in 18 languages of Europe, Asia and Africa were converted to date.
CLIENT: Oxford University Press, Global Languages team
KEY WORDS: dictionaries conversion framework, Lexical Engine, computer linguistics, LeXml, DTD6.