Skip to content

Data-Centric Method Boosts Multilingual E-commerce Search

A new approach is transforming multilingual e-commerce search. By focusing on data, it's making searches more relevant than ever.

there was a room in which people are sitting in the chairs,in front of a table looking into the...
there was a room in which people are sitting in the chairs,in front of a table looking into the laptop and doing something,beside them there are many flee xi in which different advertisements are present which different text.

Researchers have made significant strides in enhancing multilingual e-commerce search performance. A data-centric approach, introduced by Yabo Yin et al., has shown promising results in improving search relevance. This method focuses on refining the data used to train language models, rather than relying solely on complex model modifications.

The approach combines several techniques: translation-based data augmentation to enrich multilingual datasets, semantic negative sampling to improve search relevance, and self-validation filtering to remove inaccurate labels. This method has been evaluated on the CIKM AnalytiCup 2025 dataset, demonstrating consistent improvements in F1 scores for both query-category and query-item tasks.

Frameworks like CSRM-LLM and LREF are being developed to leverage multilingual LLMs, and techniques like PagedAttention address efficient memory management. These advancements, along with the data-centric approach, have outperformed existing language model-based search systems on a large e-commerce dataset.

The data-centric approach, focusing on enhancing multilingual e-commerce search performance, has proven effective in improving search relevance. By addressing challenges such as accurate machine translation, handling code-switching, and improving relevance matching, this method offers a practical solution for multilingual e-commerce search systems.

Read also:

Latest