Eval Runs

Each row is one evaluation run — a search backend (lexical, vector, or a hybrid fusion config) scored against the same 323 benchmark queries. The columns are quality scores (higher = better) and response time (lower = faster). Pick two runs below to compare them query-by-query.

What do these scores mean?

Each backend is scored against NFCorpus — a benchmark of 323 medical search queries, where humans have judged which documents are actually relevant to each query. Every quality score runs 0–1 and higher is better; for latency, lower is better.

nDCG@10: Overall quality of the top-10 ranking — it rewards putting more-relevant documents nearer the top. This is the headline metric.
precision@10 (P@k): Of the 10 results shown, the fraction that are relevant.
recall@10: Of all the relevant documents that exist for a query, the fraction that made it into the top 10.
MRR: How high the first relevant result lands, on average (1.0 = always at rank 1).
p50 / p95 ms: Response time: the median (p50) and the slow-tail 95th percentile (p95), in milliseconds.

id	backend	k	model	fusion	nDCG	P@k	recall	MRR	p50 ms	p95 ms	n
20	hybrid	10	BAAI/bge-small-en-v1.5	rrf k=10	0.3384	0.2529	0.1711	0.5281	1219.7	1688.6	323
19	hybrid	10	BAAI/bge-small-en-v1.5	weighted α=0.3	0.3557	0.2613	0.1719	0.5501	1210.8	1704.9	323
18	hybrid	10	BAAI/bge-small-en-v1.5	weighted α=0.5	0.3412	0.2573	0.1749	0.5196	1189.5	1743.3	323
17	hybrid	10	BAAI/bge-small-en-v1.5	rrf k=60	0.3389	0.2542	0.1706	0.5293	1170.7	1650.4	323
16	vector	10	BAAI/bge-small-en-v1.5	—	0.3428	0.2554	0.1618	0.5272	1067.2	1599.3	323
15	lexical	10	—	—	0.2235	0.1517	0.0967	0.4112	124.3	232.0	323
14	hybrid	10	BAAI/bge-small-en-v1.5	weighted α=0.5	0.3412	0.2573	0.1749	0.5196	1140.4	1600.1	323
13	hybrid	10	BAAI/bge-small-en-v1.5	rrf k=60	0.3389	0.2542	0.1706	0.5293	1147.4	1748.1	323
12	vector	10	BAAI/bge-small-en-v1.5	—	0.3428	0.2554	0.1618	0.5272	1195.6	1633.9	323
11	lexical	10	—	—	0.2235	0.1517	0.0967	0.4112	115.6	301.0	323
10	hybrid	10	BAAI/bge-small-en-v1.5	weighted α=0.5	0.3412	0.2573	0.1749	0.5196	1257.9	1683.0	323
9	hybrid	10	BAAI/bge-small-en-v1.5	rrf k=60	0.3389	0.2542	0.1706	0.5293	1309.9	1733.1	323
8	vector	10	BAAI/bge-small-en-v1.5	—	0.3428	0.2554	0.1618	0.5272	1069.2	1608.0	323
7	lexical	10	—	—	0.2235	0.1517	0.0967	0.4112	136.2	243.2	323
6	vector	10	BAAI/bge-small-en-v1.5	—	0.3433	0.2557	0.1619	0.5288	1107.3	1657.0	323
5	lexical	10	—	—	0.2235	0.1517	0.0967	0.4112	136.9	254.2	323
4	vector	10	BAAI/bge-small-en-v1.5	—	0.3433	0.2557	0.1619	0.5288	1142.7	1662.4	323
3	lexical	10	—	—	0.2149	0.1396	0.0933	0.4100	91.8	156.9	323
2	vector	10	BAAI/bge-small-en-v1.5	—	0.3433	0.2557	0.1619	0.5288	875.7	1280.5	323
1	lexical	10	—	—	0.2150	0.1390	0.0933	0.4121	112.3	225.3	323