Eval Runs
Each row is one evaluation run โ a search backend (lexical, vector, or a hybrid fusion config) scored against the same 323 benchmark queries. The columns are quality scores (higher = better) and response time (lower = faster). Pick two runs below to compare them query-by-query.
What do these scores mean?
Each backend is scored against NFCorpus โ a benchmark of 323 medical search queries, where humans have judged which documents are actually relevant to each query. Every quality score runs 0โ1 and higher is better; for latency, lower is better.
- nDCG@10
- Overall quality of the top-10 ranking โ it rewards putting more-relevant documents nearer the top. This is the headline metric.
- precision@10 (P@k)
- Of the 10 results shown, the fraction that are relevant.
- recall@10
- Of all the relevant documents that exist for a query, the fraction that made it into the top 10.
- MRR
- How high the first relevant result lands, on average (1.0 = always at rank 1).
- p50 / p95 ms
- Response time: the median (p50) and the slow-tail 95th percentile (p95), in milliseconds.
| id | backend | k | model | fusion | nDCG | P@k | recall | MRR | p50 ms | p95 ms | n |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | hybrid | 10 | BAAI/bge-small-en-v1.5 | rrf k=10 | 0.3384 | 0.2529 | 0.1711 | 0.5281 | 1219.7 | 1688.6 | 323 |
| 19 | hybrid | 10 | BAAI/bge-small-en-v1.5 | weighted ฮฑ=0.3 | 0.3557 | 0.2613 | 0.1719 | 0.5501 | 1210.8 | 1704.9 | 323 |
| 18 | hybrid | 10 | BAAI/bge-small-en-v1.5 | weighted ฮฑ=0.5 | 0.3412 | 0.2573 | 0.1749 | 0.5196 | 1189.5 | 1743.3 | 323 |
| 17 | hybrid | 10 | BAAI/bge-small-en-v1.5 | rrf k=60 | 0.3389 | 0.2542 | 0.1706 | 0.5293 | 1170.7 | 1650.4 | 323 |
| 16 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3428 | 0.2554 | 0.1618 | 0.5272 | 1067.2 | 1599.3 | 323 |
| 15 | lexical | 10 | โ | โ | 0.2235 | 0.1517 | 0.0967 | 0.4112 | 124.3 | 232.0 | 323 |
| 14 | hybrid | 10 | BAAI/bge-small-en-v1.5 | weighted ฮฑ=0.5 | 0.3412 | 0.2573 | 0.1749 | 0.5196 | 1140.4 | 1600.1 | 323 |
| 13 | hybrid | 10 | BAAI/bge-small-en-v1.5 | rrf k=60 | 0.3389 | 0.2542 | 0.1706 | 0.5293 | 1147.4 | 1748.1 | 323 |
| 12 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3428 | 0.2554 | 0.1618 | 0.5272 | 1195.6 | 1633.9 | 323 |
| 11 | lexical | 10 | โ | โ | 0.2235 | 0.1517 | 0.0967 | 0.4112 | 115.6 | 301.0 | 323 |
| 10 | hybrid | 10 | BAAI/bge-small-en-v1.5 | weighted ฮฑ=0.5 | 0.3412 | 0.2573 | 0.1749 | 0.5196 | 1257.9 | 1683.0 | 323 |
| 9 | hybrid | 10 | BAAI/bge-small-en-v1.5 | rrf k=60 | 0.3389 | 0.2542 | 0.1706 | 0.5293 | 1309.9 | 1733.1 | 323 |
| 8 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3428 | 0.2554 | 0.1618 | 0.5272 | 1069.2 | 1608.0 | 323 |
| 7 | lexical | 10 | โ | โ | 0.2235 | 0.1517 | 0.0967 | 0.4112 | 136.2 | 243.2 | 323 |
| 6 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3433 | 0.2557 | 0.1619 | 0.5288 | 1107.3 | 1657.0 | 323 |
| 5 | lexical | 10 | โ | โ | 0.2235 | 0.1517 | 0.0967 | 0.4112 | 136.9 | 254.2 | 323 |
| 4 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3433 | 0.2557 | 0.1619 | 0.5288 | 1142.7 | 1662.4 | 323 |
| 3 | lexical | 10 | โ | โ | 0.2149 | 0.1396 | 0.0933 | 0.4100 | 91.8 | 156.9 | 323 |
| 2 | vector | 10 | BAAI/bge-small-en-v1.5 | โ | 0.3433 | 0.2557 | 0.1619 | 0.5288 | 875.7 | 1280.5 | 323 |
| 1 | lexical | 10 | โ | โ | 0.2150 | 0.1390 | 0.0933 | 0.4121 | 112.3 | 225.3 | 323 |