Pages in topic: [1 2] > | When a 50% match isn't a 50% match? Thread poster: Christopher Schröder
|
I did a rare CAT job today and noticed this: Segment in TM: Hvis der udtages biologisk materiale til en forskningsbiobank: Segment to be translated: Samtykke til at udtage og opbevare biologisk materiale i forbindelse med forsøget (en forskningsbiobank) This came up as a 50% match. Since when do four words out of 14 make a 50% match? On the other hand, those four words do make up 50% of the... See more I did a rare CAT job today and noticed this: Segment in TM: Hvis der udtages biologisk materiale til en forskningsbiobank: Segment to be translated: Samtykke til at udtage og opbevare biologisk materiale i forbindelse med forsøget (en forskningsbiobank) This came up as a 50% match. Since when do four words out of 14 make a 50% match? On the other hand, those four words do make up 50% of the segment in the TM. Is that what is happening? Is that normal?! ▲ Collapse | | | Report it to support | Nov 16, 2018 |
I would report it to the CAT tool provider's support. I don't give reductions below 75%, though, as fuzzy matches below that threshold are too unreliable (at least in MemoQ) and don't generally justify any reduction. | | | Lincoln Hui Hong Kong Local time: 18:26 Member Chinese to English + ...
Most CAT tools don't even display matches below 60% by default. I sometimes adjust them so that they actually have a better chance of showing certain things. | | | Roman Karabaev Russian Federation Local time: 14:26 English to Russian + ... Well, it's normal nowadays | Nov 16, 2018 |
A screenshot from MemoQ. Zero words out of two make... a 68% match. | |
|
|
megane_wang Spain Local time: 12:26 Member (2007) English to Spanish + ... I don't think I never trusted even a 80% match.... | Nov 16, 2018 |
.... can you imagine a "50%"? I don't care about them at all. CAT tools are a help, not THE ultimate tool (fortunately for us translators) Ruth | | | Which CAT tool? | Nov 16, 2018 |
Out of curiosity, which CAT tool did you use? Mine gives a 33% match, also counting the word "til". | | | A program that clearly can't count | Nov 16, 2018 |
It is large agency's own system. I got paid 100% for this, so that's not the issue. What bothers me is how a computer can possibly add up wrong... And how much else it might get wrong... | | | almost funny | Nov 16, 2018 |
Roman Karabaev wrote: A screenshot from MemoQ. Zero words out of two make... a 68% match. (I)nstruction -> (co)nstruction....it's weird how your MemoQ "thinks"! | |
|
|
Configuration | Nov 16, 2018 |
Jean Dimitriadis wrote: Out of curiosity, which CAT tool did you use? Mine gives a 33% match, also counting the word "til". Most CAT tools allow you to set your preferred match rate, I personally set it to a minimum of 75% percent -- less than that it feels like (at least in most cases) just easier to translate from scratch. | | | Samuel Murray Netherlands Local time: 12:26 Member (2006) English to Afrikaans + ... It's a bit of science and a bit of magic | Nov 16, 2018 |
Chris S wrote: I did a rare CAT job today and noticed this: ... This came up as a 50% match. By character, 65% of the segment in the TM matches 40% of the segment in the source text. I believe CAT tools that can't do proper morphological stemming/tokenization may try to strike a balance between word matching and character matching. My own CAT tool, WFC, favours character matching when the segment is short, and word matching when the segment is long. This leads to things similar to Roman's construction/instruction. Since when do four words out of 14 make a 50% match? Is the proposed translation 50% useful to you? If yes, then it is a true 50% match. If not, then they didn't get the magic quite right, but magic isn't precise anyway. Jean Dimitriadis wrote: Out of curiosity, which CAT tool did you use? Mine gives a 33% match, also counting the word "til". Without any morphological analysis (i.e. default tokenizer, "language unknown"), OmegaT says it's below the match threshold (i.e. below 30%). With an English tokenizer, OmegaT says it's a 38% match. But with the Danish tokenizer, OmegaT says it's a 50% match.
[Edited at 2018-11-16 18:34 GMT] | | | Endre Both Germany Local time: 12:26 English to German Matches only start getting useful at 70-80% | Nov 16, 2018 |
As Thomas and Nadia have mentioned, it’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker. It would be interesting to compare the development of match ratings in CAT tools over time. Unfortunately the calculation of match rates is a race to the bottom.... See more As Thomas and Nadia have mentioned, it’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker. It would be interesting to compare the development of match ratings in CAT tools over time. Unfortunately the calculation of match rates is a race to the bottom. For obvious reasons, agencies are interested in pushing match rates upwards until they are just this side of indefensible - higher matches are a windfall to them. Even translators who have the occasion and the inclination to compare CAT tools tend to assume that higher match rates equate to better TM leveraging, when in fact there is scant correlation between the two, the differences mostly boiling down to the audacity of the CAT tool’s marketing team. As displayed by the examples in this thread, they are getting pretty audacious. ▲ Collapse | | | DZiW (X) Ukraine English to Russian + ... culture-dependents: half-full is half-empty | Nov 16, 2018 |
Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation. Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrif... See more Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation. Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrificing functional parts. A couple years ago I was pleasantly surprised to watch a demonstration where some app analyzed simple, complex, and compound sentences and could tell about similarity of the context--noting the antecedents (the meaning). However, even in a new/small TM I never used a 50% fuzzy match, because I also doubt that many 'false positives' are any useful for speeding the process up ▲ Collapse | |
|
|
Samuel Murray Netherlands Local time: 12:26 Member (2006) English to Afrikaans + ...
Endre Both wrote: It’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker. I've had the opposite experience. Especially with regard to lengthier segments, a low match would save me time if it concerns a repeated phrase. I can recall several instances when my CAT tool yielded no match but the first result in a concordance search was something that I would very much would have liked to see suggested as a fuzzy match. This is particularly true for matches consisting of consecutive words. Here's a hypothetical example of such a no match that would have saved time and sanity: Segment 1: Thinking of your experience with Company X over the past 7 days, please rate the following on a scale of 1 to 10: Segment 2: Thinking of your experience with Company X over the past 7 days, and considering how the company's Y compares with that of other companies mentioned in question Z, please tell in your own words how satisfied you were with the following: | | |
The helpdesk tells me the reason for this showing as a 50% match is because otherwise this 30% match wouldn't show up as a match at all. In other words, they're trying to help me, not rip me off. This seems reasonable. The construction/instruction thing made me laugh/cry though | | | Lincoln Hui Hong Kong Local time: 18:26 Member Chinese to English + ...
Samuel Murray wrote: Endre Both wrote: It’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker. I've had the opposite experience. Especially with regard to lengthier segments, a low match would save me time if it concerns a repeated phrase. I can recall several instances when my CAT tool yielded no match but the first result in a concordance search was something that I would very much would have liked to see suggested as a fuzzy match. This is particularly true for matches consisting of consecutive words. Here's a hypothetical example of such a no match that would have saved time and sanity: Segment 1: Thinking of your experience with Company X over the past 7 days, please rate the following on a scale of 1 to 10: Segment 2: Thinking of your experience with Company X over the past 7 days, and considering how the company's Y compares with that of other companies mentioned in question Z, please tell in your own words how satisfied you were with the following: I think we call them fragments rather than matches. Most CAT tools definitely use them, and I often wish that they are more robust in detecting them. | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » When a 50% match isn't a 50% match? Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |