LLM self-play on 20 Questions. gpt-3.5-turbo has a score of 68
DeepSeek LLM Scaling Open-Source Language Models with Longtermism
BDCC, Free Full-Text
LLM self-play on 20 Questions. gpt-3.5-turbo has a score of 68
AI #49: Bioweapon Testing Begins — LessWrong
LLM self-play on 20 Questions. gpt-3.5-turbo has a score of 68
Harness the Power of LLMs: Zero-shot and Few-shot Prompting
PDF) Evaluating the use of GPT-3.5-turbo to provide clinical
PDF) GPT-3.5/4 - Is the programming performance declining over time?
AgentTuning: Enabling Generalized Agent Abilities for LLMs
10 sec for a call of gpt-3.5-turbo - API - OpenAI Developer Forum
LLM self-play on 20 Questions. gpt-3.5-turbo has a score of 68
BDCC, Free Full-Text
LLM self-play on 20 Questions. gpt-3.5-turbo has a score of 68
I scored the top Open LLM Leaderboard models with my own benchmark
arxiv-sanity