Use a real (probing-generated) scorer in benchmarks