Learning to critically analyze the output of large language models (LLM) is an important component of developing A.I. literacy and responsible A.I. use.
Check out the tools below to begin evaluating output from LLMs like OpenAI ChatGPT, Google Gemini, and Anthropic Claude:
LLM Arena - https://llmarena.ai/
LMSYS Chatbot Arena -https://lmarena.ai/
ChatArena -https://www.chatarena.ai/
For folks who are familiar with Git, check out LLM Comparator, an "interactive visualization tool with a python library, for analyzing side-by-side LLM evaluation results." You can learn more in the video below or the paper, "LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models."
References
ACM SIGCHI. (2024, May 9). LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models [Video]. YouTube. https://www.youtube.com/watch?v=mnCvEHVc3ac
Kahng, M., Tenney, I., Pushkarna, M., Liu, M. X., Wexler, J., Reif, E., ... & Dixon, L. (2024, May). Llm comparator: Visual analytics for side-by-side evaluation of large language models. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (pp. 1-7).
Comentarios