Traditionally, TM tools use the sentence as the basis for full and fuzzy matches. As long as your translation practice concentrates on the revisions and updates of documents and products, the leveraging score of TM tools can be very high.
However, much of the market has changed. Increasing need for rapid-turnaround translation of smaller bits of content has brought developers back to the design table. Their focus is not only on the productivity of the translator, but also the agility of the enterprise.
They must rely on advanced statistical approaches, as already applied in statistical and hybrid MT systems, and they must bring sophisticated linguistic intelligence into the mix as well. They are not looking to leverage TM from a single document or project, but to use as much domain-specific text and data as possible. Their advanced, subsegment leveraging capability can further increase translation productivity. Harnessing big data sets is a core requirement.
Today, glossaries are built by terminologists: the best-in-class language specialists. It is laborious work and frustrating. Because language keeps changing, the terminologist is always behind and the glossary is often ignored. But, in fact, terminology can be harvested in real time by accessing high volumes of translation data. Synonyms and related terms can be identified automatically. Parts-of-speech can be tagged, context listed, sources quoted and meanings described. The technology to do this work in a largely automated fashion, with linguists and users involved as validators already exists. . .