Search engines drive product discovery in e-commerce, but traditional keyword-based search often fails to understand context, synonyms, and user intent. While working on a product search system for an e-commerce services company, we faced challenges where users couldn’t find relevant products due to limitations in exact-match keyword searches. To solve this, we built a Hybrid Search solution that combines Lexical Search (Elasticsearch) and Semantic Search (FAISS) with AI-powered ranking (LLMs), delivering faster, more accurate, and context-aware search results.

This article explains how we implemented the first cut of Hybrid Search to enhance e-commerce product discovery.


🔹 Why Hybrid Search?

Traditional keyword-based search (BM25 in Elasticsearch) is great for exact text matches but struggles with:

Synonyms & Variations → “Cheap sneakers” vs. “Affordable running shoes”

Context & Intent → Searching for “wireless headphones” might miss “Bluetooth earbuds”

Ranking Relevance → The best results don’t always appear first

🔹 Hybrid Search fixes this by merging:

Lexical Search (Elasticsearch BM25) → Matches exact words

Semantic Search (FAISS Vectors) → Matches meaning & context

AI Re-Ranking (Cross-Encoders/LLMs) → Ensures best results appear first

Let’s explore how we build this system. 🚀


🔹 1️⃣ Lexical Search with Elasticsearch (Keyword-Based Matching)

Elasticsearch uses BM25 scoring to rank search results based on keyword relevance.

✔️ Fast retrieval

✔️ Works well for structured data

Fails when query wording changes


🔹 2️⃣ Semantic Search with FAISS (Vector-Based Matching)

FAISS (Facebook AI Similarity Search) enables fast Approximate Nearest Neighbor (ANN) search by storing product embeddings as high-dimensional vectors.

How We Store Products as Vectors

Convert product titles & descriptions into vector embeddings using multi-qa-mpnet-base-dot-v1 (optimized for Q&A & search).

Store embeddings in FAISS for quick similarity matching.

✔️ Captures meaning & intent

✔️ Finds relevant products even when words differ

Slower than BM25 for large datasets


🔹 3️⃣ Merging Lexical & Semantic Search (Hybrid Ranking)

After retrieving results from both Elasticsearch (BM25) and FAISS (Semantic Search), we merge them using weighted scoring.

✔️ Balances exact match & contextual relevance

✔️ Boosts results appearing in both BM25 & FAISS searches

Doesn’t guarantee the best ranking order yet (solved in next step)


🔹 4️⃣ AI-Powered Re-Ranking with Cross-Encoders (LLMs)

Even after merging Elasticsearch & FAISS, the results may still be misranked. To fix this, we re-rank results using a pre-trained LLM (cross-encoder/ms-marco-MiniLM-L-12-v2).

How AI Re-Ranking Works

Each result is compared to the query using the LLM.

The model predicts a relevance score for each query-product pair.

We normalize scores using Min-Max Scaling to get a consistent ranking.

✔️ Ensures most relevant products rank highest

✔️ Filters out low-relevance results

Adds extra compute cost


🚀 Test Drive: Running the Hybrid Search

✅ Search Results

🔍 What These Results Mean

1️⃣ “Wireless Headphones” ranks at the top with a perfect score (~10) because it matches both keywords (“headphones”) and semantic meaning (“Bluetooth”).

2️⃣ “Bluetooth Speaker” appears second because it is related to the query (“Bluetooth”), but it is not a direct match for “headphones”, hence the lower score (~6.48).

Key Takeaways from the Test Drive

✔️ Keyword-based and semantic matches are combined for smarter search

✔️ AI Re-Ranking ensures the best product appears first

✔️ Search is highly relevant, filtering out irrelevant products


🚀 Final Benefits of AI-Powered Hybrid Search

Combines BM25 (Elasticsearch) + FAISS (Vectors) for better retrieval

Dynamic weighted scoring balances exact and contextual matches

AI-powered re-ranking ensures best results appear first

Scales efficiently with FAISS Inverted Indexing

This AI-powered search system provides a smarter, more efficient way to handle product discovery in e-commerce. 🚀🔥

The complete working Hybrid Search implementation can be found on GitHub:🔗 GitHub Repository: fistix/ai

Would you like to integrate this into your SaaS product or optimize further? Let’s discuss! 

Search

Recent Comments

No comments to show.

Categories