Meta's Omnilingual MT for 1,600 Languages

(ai.meta.com)

77 points | by j0e1 3 days ago

11 comments

ks2048 1 minute ago
Meta released No Language Left Behind (NLLB) [1], I think in 2022. I wonder why this in not "NLLB 2.0"? These companies love introducing new names to confuse things
[1] https://ai.meta.com/research/no-language-left-behind/
ks2048 5 minutes ago
I'll be looking at this in detail. I've started a company to do similar things, https://6k.ai
I'm currently concentrating on better data gathering for low-resource languages.
When you look in detail at data like Common Crawl, finepdfs, and fineweb, (1) they are really lacking quality data sources if you know where to look, and (2) the sources they have are not processed "finely" enough (e.g. finepdfs classify each page of PDF as having a specific language, where-as many language learning sources have language pairs, etc.
stingraycharles 3 hours ago
I find that meta’s translations are very poor compared to others, at least for relatively obscure languages, which I figured was relevant considering the article.
Google Translate is a good default, but LLMs are really good at translations, as they’re better capable at understanding context and providing culturally appropriate translations.
(I live in Cambodia where they speak Khmer)
[-]
- ks2048 11 minutes ago
  So, LLMs are noticeably better in Khmer than Google Translate? I wonder why Google Translate doesn't use Gemini under-the-hood. Perhaps it's more prone to hallucinations.
  I'm interested in find some thorough testing of translations on different LLMs vs Translation APIs.
  [-]
  - pattilupone 5 minutes ago
    There's a dropdown on Google Translate that lets you choose "Advanced" mode or "Classic" mode. Advanced mode uses Gemini but it's only available for select languages.
- djsamseng 2 hours ago
  Hello from Siem Reap, Cambodia! Awesome to see a fellow tech enthusiast from Cambodia.
  I actually found Facebook’s translations pretty good (better than Google Translate for things longer than a sentence). From my understanding of Khmer, Khmer is a bit more verbose and context dependent, hence LLMs in Khmer would be a big help understand those nuances.
  In the inverse case (LLMs generating khmer from English) I heard from locals that it sounds formal and “robotic” which I found quite interesting.
- pseudocomposer 1 hour ago
  Kagi Translate is fantastic. Multilingual support is honestly one of the best things about LLMs, imo.
- yellow_lead 1 hour ago
  It's not even good for Chinese
- smallerize 3 hours ago
  *they're
  (Sorry I had to)
  [-]
  - stingraycharles 3 hours ago
    I could have sworn I edited it! I did notice myself as well, but thanks for the correction.
  - tomrod 2 hours ago
    *ពួកគេគឺជា
djoldman 1 hour ago
Just spent a long time trying to find where you can download any of these weights.
Is it open weight? If so, why isn't there just a straight link to the models?
garyclarke27 1 hour ago
They can translate 1600 languages, but they cannot do basic text formatting, where are the paragraphs?
psychoslave 3 hours ago
That's a high count, but still a bit away from "Omni". Usual count is between 4k and 8k depending the source. But the first 1k might be the hardest, certainly.
[-]
- simultsop 1 hour ago
  when you market, you use frontier and edge terms, so it sounds pro max
croes 3 hours ago
Off topic, since the AI craze MS‘ documentation translation has ridiculous errors like translating try catch keywords to "versuchen" and "fangen" for German pages
[-]
- Tarq0n 2 hours ago
  Yes their translations offer negative value, which is annoying because at work you can't usually choose your locale settings.
  And the errors are really basic, like translating shortly to short, not the same thing at all!
rowanseerwald 1 hour ago
[dead]
ath3nd 19 minutes ago
[dead]
true21733 3 days ago
[dead]
bikeshaving 1 hour ago
I’m very wary of celebrating Meta’s language work when the company was credibly found to have contributed to the genocide against the Rohingya in Myanmar, and separately, to human rights abuses against Tigrayans during the conflict in northern Ethiopia. Be careful whose sins you’re laundering.
https://www.amnesty.org/en/latest/news/2025/02/meta-new-poli... https://www.amnesty.org/en/latest/news/2023/10/meta-failure-...
[-]
- 0x3f 1 hour ago
  Do you also boycott Toyota for the Hilux?
  [-]
  - bikeshaving 1 hour ago
    I don’t own a car :)