Quotient AI reposted this
Do your LLMs pass the SMELL test? 👃 Introducing SMELL (Subject-Matter Expert Language Liaison): a new framework for aligning LLM evaluations with human feedback. SMELL integrates subject-matter expertise into AI evaluations, ensuring more accurate and domain-specific assessments. Traditional LLM evaluations often miss the subtle judgments that humans bring to specific tasks. SMELL captures a small set of expert feedback, synthesizing it into rubrics for LLM judges—allowing AI developers to evaluate domain-specific use-cases at scale. Inspired by Databricks Mosaic Research's Grading Notes framework, SMELL refines LLM evaluation by incorporating concise, task-specific grading guidelines. By adapting judge rubrics from human feedback, SMELL creates a scalable, nuanced system that aligns closely with human assessments. SMELL achieved outstanding results on various preference datasets, with a Cohen's Kappa of 0.92, 96% accuracy, and a 0.96 F1 score on the preferences from UltraFeedback—all using just 5 in-sample data points. SMELL's ability to align with human judgments, even with limited feedback, highlights its efficiency and strong predictive power. Curious about integrating SMELL into your AI workflow? 📓 Explore our detailed Jupyter notebook for insights into SMELL's implementation and results. https://lnkd.in/eN3UfJMF 🔗 Try SMELL in your own projects with our preview API: https://lnkd.in/eJX8YV-7 🤝 Want to build a custom judge for your specific use case or contribute to our research? Share your datasets with us at [email protected], and we’ll publish the results with full attribution.
I love smelling :)
Very helpful
CEO & Co-Founder | Quotient AI
4wCheck out the full research blog post here: https://www.quotientai.co/post/subject-matter-expert-language-liaison-smell-a-framework-for-aligning-llm-evaluators-to-human-feedback