Metarank ESCI search demo

It's a demo of a hybrid search made over the ESCI/ESCI-S datasets with final reranking made with Metarank. To run locally, see github.com/metarank/demo2

Supported retrieval methods:

  • BM25 title: a typical ES search request over a title field.
  • BM25 title, bullets, desc: ES search over `title`, `bullets` and `desc` fields without using boosting.
  • all-MiniLM-L6-v2: approx. kNN vector search with ES, over embeddings made with sentence-transformers/all-MiniLM-L6-v2 model
  • esci-MiniLM-L6-v2: approx. kNN vector search with ES, embeddings generated with a custom model fine-tuned over the ESCI dataset metarank/esci-MiniLM-L6-v2

Supported re-ranking methods:

  • BM25 with optimal boosts: LambdaMART model over separate per-fields BM25 scores. Something like ES-LTR is doing.
  • Cross-encoder: ce-msmarco-MiniLM-L6: A cross-encoder sentence-transformers/ms-marco-MiniLM-L-6-v2 trained over the MS MARCO dataset.
  • Cross-encoder: ce-esci-MiniLM-L12: A custom cross-encoder metarank/ce-esci-MiniLM-L12-v2 fine-tuned over the ESCI dataset.
  • LambdaMART: BM25, metadata: LambdaMART over all BM25 scores, and all document metadata fields found in the ESCI-S dataset.
  • LambdaMART: esci-MiniLM-L6-v2, BM25, metadata: LambdaMART over all ranking features WITHOUT cross-encoders, for the sake of performance.
  • LambdaMART: ce-esci-MiniLM-L12, esci-MiniLM-L6-v2, BM25, metadata: LambdaMART over all ranking features, can be quite slow.