LLMs

May
03

PGN2FEN: A Benchmark for Evaluating LLM Chess Reasoning

Introducing PGN2FEN — a benchmark for evaluating language models' ability to understand and transcribe chess game move sequences.
6 min read
Aug
12

The Convergence of Proprietary and Open Source LLMs

Open and private models are becoming more similar than they are different
6 min read
Apr
26

How to Beat Proprietary LLMs With Smaller Open Source Models

Building your AI applications around open source models can make them better, cheaper, and faster
14 min read