Readable summaries of new papers, leaderboards, datasets, and eval methods. We track real capability shifts, costs, and trade-offs—separating breakthrough from hype—and provide downloadable charts/tables for teams that need receipts.