😺 Play the puzzle that broke every AI model

发布时间：2026-03-25来源：The Neuron

Your browser does not support the audio element.

Welcome, humans.

🎙️ LIVE LATER TODAY @ 10 a.m. PT / 1 p.m. ET

: Dan Shipper, the

CEO of Every

, joins us live to break down agent-native engineering: the framework his 15-person team uses to ship AI products with virtually zero hand-written code. Dan calls it "Claude Code in a trench coat": give an agent three basic tools, name the goal, and let it figure out how.

Dan just vibe coded a new agentic document editor between meetings. He launched it. It went so viral it started crashing. Then it took over his entire week.

Good problem!

Click the image to go to YouTube, then on YouTube, Click “Notify Me” to get notified when we go live.

We'll get into Every's agent-native architectures, the complete Every product suite (Spiral, Sparkle, Cora, Monologue, and Proof), and how to structure product teams when one person can do what used to take ten.

Click the image to watch live!

Here’s what happened in AI today:

🙀 ARC-AGI-3 launched and every frontier AI model scored under 1% (while humans scored 100%).

📰 OpenAI killed Sora and blindsided Disney, who found out 30 minutes after a joint meeting.

📰 Sanders and AOC proposed banning all new data center construction until AI regulation passes.

🍪 Granola raised $125M at a $1.5B valuation as it expands from meeting notes to enterprise AI.

🎓 Anthropic says power users are pulling ahead of everyone else. Here's the one technique that separates them.

… and a

whole lot more that you can read about here

.

P.S:

Want to reach 675,000 AI-hungry readers?

Click here to advertise with us.

🙀 ARC-AGI-3 = The New AI Intelligence Test Where Every Model Basically Failed

You know those moments where you download a new app, poke at it for two minutes, and just figure it out? No tutorial, no instructions?

Turns out AI can't do that. At all.

The

ARC Prize Foundation

just launched ARC-AGI-3, a new benchmark (standardized test for AI) designed to measure whether AI can learn new things on the fly, the way humans do. Instead of answering trivia or writing code, AI agents explore unfamiliar interactive environments and solve puzzles they've never seen before.

“AGI” (which = artificial general intelligence) is the finish line every major AI company is racing toward: AI that handles any new task without special training.

AGI is a major goal. OpenAI just renamed its product division "AGI Deployment." Jensen Huang said AGI is "already in the room." OpenAI's next model, codenamed "Spud," might be the first they claim as AGI. ARC-AGI-3 is meant to actually test these claims.

Here's what happened after it launched:

Every frontier model

scored under 1%

. Gemini 3.1 Pro: 0.37%. GPT-5.4: 0.26%. Claude Opus 4.6: 0.25%. Grok 4.2: 0%.

100% of human testers solved every environment on their first try. No instructions, no training.

The benchmark tests 135 novel environments (~1,000 levels), measuring how efficiently AI solves them compared to humans.

A

$2M prize competition

is live on Kaggle, and you can

play the public games yourself

.

Why this matters:

Not everyone agrees the test is fair. The scoring uses a squared efficiency penalty (if a human takes 10 steps and AI takes 100, the AI scores 1%). AI can't score higher than humans even when more efficient, but humans

do

get bonus points when they're more efficient than AI. Extended-thinking models were excluded. Critic

@scaling01

argued the methodology was designed to produce low scores.

ARC founder

François Chollet fired back

with a deeper point: today's models only perform well when humans build elaborate scaffolding around them (specific prompts, custom harnesses, thinking tricks). The scaffolding is the human intelligence; the model is just executing it. If it's truly AGI, there should be no human in the loop.

Notice something interesting? We're no longer arguing about

whether

AI is smart. We're arguing about

how to measure

smart.

Our take:

The real question isn't whether today's models pass (eventually they will). It's whether training on massive datasets w/ the current architecture can ever produce genuine adaptability, or whether something fundamentally different is needed. NYU professor Saining Xie

made a provocative case

that LLMs are "anti-Bitter Lesson" because they're built entirely on human-generated knowledge rather than learning from raw experience

(more on this in our new podcast ep!).

The models that eventually crack ARC-AGI-3 won't just be smarter; they'll be a

different kind

of smart. The kind that matters if you actually care about AGI. If you don't, today's models are perfectly fine tools… they'll just always require potentially billions in human hand-holding via training.

Perhaps that's for the best.

FROM OUR PARTNERS

Ready to build AI-powered apps and agents? Join the

OutSystems

developer community and start using AI to develop, deploy, and scale your next mission-critical agentic application for free. Go from prompt to production faster, with full control, on a unified, agile, and enterprise-proven platform.

🎓 AI Skill of the Day:

The skill that separates AI power users from everyone else

Anthropic just published research

showing a growing skills gap: experienced AI users are pulling dramatically ahead. The biggest difference? Power users give AI better

context

before asking it to work.

The technique: before you ask AI to do real work, front-load a short brief explaining who you are, what you're working on, what good looks like, and what to avoid. Think of it like onboarding a new hire instead of shouting tasks at a stranger.

You are helping me with [specific project/task]. Here's what you need to know: About me: [Your role, expertise level, industry] The project: [What you're working on and why] What good looks like: [Examples, tone, format, length] Constraints: [What to avoid, word limits, style rules] Success criteria: [How I'll judge if this is useful] With this context, please [your actual request].

Save this as Project Context (so custom instructions you use in a specific project) or as a re-usable Skill that explains your workflow to an AI agent.

Have a specific skill you want to learn?

Request it here!

You know the thumbs up / thumbs down button on ChatGPT? There are people who get paid to do that, except for millions of hours, across every major AI lab. And one of the companies behind it just hit $1.2B in revenue without ever raising VC money.

Nick Heiner, VP of Product at

Surge AI

, joins us on the pod to pull back the curtain on these “reinforcement learning environments”: a.k.a the secret training grounds where AI models learn to do real work.

Nick explains why even the best frontier models fail 40% of workplace tasks, what happened when 200+ Wall Street experts graded GPT-5, Claude, and Gemini on actual finance work, and why "reward hacking" means your AI is gaming the system like a kid who stops hitting their sister by kicking instead.

Watch/and or Listen

👉

YouTube

|

Spotify

|

Apple Podcasts

Today's episode is brought to you by

Dell AI Factory with NVIDIA

.

🍪 Treats to Try

Anthropic has not stopped shipping a new feature every day for the past… actually we need one of those “days since an accident” calendars, but for days since no new tool drops, to confir-

OH WAIT

!

Claude Code added "auto mode"

(research preview), which lets the agent decide which actions are safe to run on its own and which need your approval, eliminating the "babysit every step or let it run wild" tradeoff for developers —free for existing Claude Code users.

Granola

expanded from meeting notetaker to full enterprise AI app with agent support after a 6x valuation jump to $1.5B —free tier available.

Marco

unifies Gmail, iCloud, Outlook, and any IMAP provider into one privacy-first, offline-first inbox on iOS, Mac, and web —free to try.

Google Lyria 3 Pro

generates longer, more customizable music tracks and is rolling out across Gemini and enterprise products —free inside Gemini.

Ensu

runs a private LLM on your device (from Ente, the encrypted photo storage company) that grows with you over time and keeps all data local —free to try.

Cog

adds persistent memory, self-reflection, and scenario simulation to Claude Code, giving your coding agent continuous awareness across sessions —free (open source).

📰 Around the Horn

US President Trump named

Mark Zuckerberg, Larry Ellison, and Jensen Huang to a new tech advisory panel that will help shape AI regulation.

Apple

got full access to Google's Gemini and can now shrink it into smaller models that run Siri's AI features on your iPhone without needing the internet.

Sakana AI's "AI Scientist"

was published in Nature: a pipeline that invents research ideas, runs experiments, writes papers, and passed peer review at a top ML conference.

Harvey

raised $200M at an $11B valuation as Sequoia tripled down on the legal AI startup alongside a16z and Kleiner Perkins.

Anthropic

published research finding no material job displacement from AI yet, but a growing skills gap where power users are pulling dramatically ahead of everyone else.

Want absolutely EVERYTHING that happened in AI today? Click here!

FROM OUR PARTNERS

Attio is the AI CRM for modern teams.

Connect your email and calendar and Attio instantly builds your CRM. Every contact, every company, every conversation — organized in one place. Then ask it anything. No more digging, no more data entry. Just answers.

Start your free trial →

📖 Thursday Trivia

One is AI, and one is real. Which is which? Vote below!

Which is AI, and which is real?

Which is AI, and which is real? The answer is below, but place your vote to see how your guess everyone else (no cheating now!)

A is AI, and B is real.

B is AI, and A is real.

A Cat’s Commentary

Trivia answer:

B

is AI

, and A

is real

. Also, if y’all just want some wholesome dog content (

because the world is real ruff out there right meow),

watch this

.

That’s all for now.

What'd you think of today's email?

🐾🐾🐾🐾🐾 Like a hit of catnip

🐾🐾🐾 Good, not great

🐾 It sucked

P.P.S:

Love the newsletter, but only want to get it once per week? Don’t unsubscribe—

update your preferences here

.

转载说明：本文系转载内容，版权归原作者及原出处所有。转载目的在于传递更多行业信息，文章观点仅代表原作者本人，与本平台立场无关。若涉及作品版权问题，请原作者或相关权利人及时与本平台联系，我们将在第一时间核实后移除相关内容。