Key Points: IBM releases Granite Embedding Multilingual R2 with two new open-source embedding models (97M and 311M parameters). The 97M model leads all sub-100M multilingual embedders, both support 200+ languages, a 32K-token context window, and code retrieval for 9 programming languages.
IBM introduces two new multilingual embedding models based on ModernBERT. The compact 97-million-parameter model outperforms all open sub-100M multilingual embedders on the MTEB benchmark, while the 311-million model ranks among the best open-source models under 500M parameters.
The IBM Granite team presents the release of Granite Embedding Multilingual R2, a significant advancement in their embedding technology. Both models are available under the Apache 2.0 license and are based on the ModernBERT architecture.
The compact 97-million-parameter model sets new standards for sub-100M models, achieving an MTEB multilingual retrieval score of 60.3. The larger 311-million model achieved a score of 65.2, positioning it as the second-best open model among 500-million-parameter models.
Both models support over 200 languages and were specifically tuned for 52 languages. A distinctive feature is the significantly extended context window of 31,330 tokens – a 64-fold increase compared to the previous R1 version. The models offer additional code retrieval capabilities in nine programming languages.
With the integration of Matryoshka embeddings, both models provide flexible deployment options for various use cases. The models were developed with enterprise requirements in mind and provide comprehensively documented integration options for framework developers.