Three frontier models. One practical filter to pick yours.

A calm look at the week three labs all shipped a frontier model, the 30-minute prompt to test them on your real work, and the quiet skill that matters more than any benchmark.

All issues

Section 01 of 08Front Page

Three frontier models, one calm answer

Three labs shipped a frontier-class model in a single week. Claude Opus 4.8 on Thursday. Gemini Omni and Gemini 3.5 from Google on the heels of I/O.

The question is no longer which model is best. It is which model fits the work you actually do.

Writers and researchers picking between Opus and Gemini 3.5 will feel a real choice for the first time. Creators get a new option in Omni: one model that natively writes, speaks, and watches video.

Section 02 of 08What Happened

Three updates worth a slow read

Anthropic shipped Claude Opus 4.8 with sharper coding and agentic performance. The upgrade lands inside the existing Claude apps and API.

Google released Gemini Omni and Gemini 3.5 in the same week. Omni is one model that natively handles text, audio, and video. Gemini 3.5 is the frontier reasoning model with stronger action support.

Hacker News updated its guidelines to ban AI-generated and AI-edited comments. First major developer community to set explicit human-only conversation rules.

Section 03 of 08Why It Matters

Choice fatigue is the new problem

Until this week, the AI conversation was about adoption. Pick a tool. Get started. Catch up.

Now there are three frontier-class models in active use at the same time. Choosing wrong costs a workflow rebuild.

The winning skill this month is not prompting. It is evaluation. Knowing how to test which model fits your task in 30 minutes, then committing.

Section 04 of 08Who Should Care

Anyone whose week touches writing, code, or content

Writers and content creators weighing Opus against Gemini Omni for next quarter's workflow.

Developers deciding whether Opus 4.8 changes the agentic stack they shipped last month.

Small business owners who hear the hospital-bill story (one HN user negotiated $195K down to $33K with AI) and realise their highest-leverage use is not a website chatbot. It is the next contract or invoice they have to handle.

Section 05 of 08Tool Watch

Try Claude on your hardest current task this week

Claude Opus 4.8 is the cleanest tool to test against your real work this week. The upgrade is already live in the apps and API, no new account, no migration.

Pick a task you finished last week. Re-run it in Opus 4.8 the way you would normally. Compare time, quality, and how much you had to rewrite.

Full Claude entry lives in the Tool Cupboard. Free to start, paid for serious use.

Section 06 of 08Prompt Corner

A prompt to test any new model in 30 minutes

"Below is a task I actually completed last week (paste it). Solve it in your best style. After the solution, list three concrete reasons your output would or would not save me time compared to the way I currently work."

Run it in two different models. Compare the outputs. Compare the self-critique. The model that helps you most is rarely the one with the best benchmark. It is the one whose self-critique you actually trust.

Section 07 of 08Skill Column

Evaluation beats adoption this year

Last year the skill was prompting. This year it is evaluation.

Three models, three pricing tiers, three context windows, three sets of strengths. The reader who runs a fair 30-minute test on a real task wins the year.

Start a simple notebook: task, model, output quality (1 to 5), time saved (minutes), would-use-again (yes or no). Five entries beat every benchmark blog you will read this month.

Section 08 of 08The Chai Takeaway

Don’t pick the best model. Pick yours.

Three models. One question. Which helps you do your actual work today?

End of Issue No. 002

Next edition lands in your inbox next week.