<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Jacob's Research Journal</title>
    <link>https://jacobbrooke95.github.io/jacobs-research-journal/</link>
    <description>Deep research papers and analysis on AI, technology, and public life — by Jacob Brooke in Jefferson City, MO.</description>
    <language>en-us</language>
    <copyright>© 2026 Jacob Brooke</copyright>
    <lastBuildDate>Mon, 06 Apr 2026 09:00:00 -0500</lastBuildDate>
    <atom:link href="https://jacobbrooke95.github.io/jacobs-research-journal/feed.xml" rel="self" type="application/rss+xml"/>

    <item>
      <title>The Mind at the Frontier: 50 Issues of Import AI</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-06-import-ai-50-issues.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-06-import-ai-50-issues.html</guid>
      <pubDate>Mon, 06 Apr 2026 09:00:00 -0500</pubDate>
      <category>AI</category>
      <description><![CDATA[Jack Clark has been writing Import AI since 2016. A close read of the last 50 issues — #403 through #452 — reveals one obsession above all others: how powerful will AI get, how fast, and does anyone have any idea what to do about it?]]></description>
      <content:encoded><![CDATA[
<p>Issue #450 of Import AI contains a sentence that stops you cold. Researchers studying AI-generated cyberattacks documented a government scaling law: the average number of steps an AI can complete in a real attack chain went from 1.7 in August 2024 to 9.8 by February 2026. The best single run reached 22 of 32 steps &mdash; most of a full compromise, end to end, automated.</p>

<p>Jack Clark reported that without editorial alarm. Just: here is the measurement, here is what it means. He has been doing this every week, in some form, since 2016. Import AI is Clark&rsquo;s newsletter &mdash; a dense, technically precise, darkly funny weekly digest of AI research from someone who co-founded Anthropic, before that led policy at OpenAI, and before that was one of the people who first made the world pay serious attention to what language models might become.</p>

<p>I spent the last week reading the last 50 issues of Import AI &mdash; numbers 403 through 452, covering March 2025 through early April 2026. Cybersecurity appears as a dominant theme in 24 of 50 issues. Chinese AI decoupling appears in 15. AI automating AI research in 14. The pattern Clark documents, quietly but consistently, is that AI safety researchers predicted a specific set of behaviors &mdash; shutdown resistance, reward hacking, situational awareness, emergent misalignment &mdash; and are now watching those predictions arrive one by one in production systems.</p>

<p>The narrative that emerges: AI systems are becoming more capable faster than anyone predicted, including the people making them. The safety properties researchers warned about are now showing up in real deployments. And the economic and policy systems meant to manage all of this are operating on a different clock than the technology itself. Clark doesn&rsquo;t say this directly. He doesn&rsquo;t have to. Fifty issues says it for him.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-06-import-ai-50-issues.html">Read the full post &rarr;</a></p>
      ]]></content:encoded>
    </item>

    <item>
      <title>The Accidental Longevity Drug</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-glp1-longevity.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-glp1-longevity.html</guid>
      <pubDate>Sun, 05 Apr 2026 18:00:00 -0500</pubDate>
      <category>Health</category>
      <description><![CDATA[GLP-1 drugs like Ozempic weren't designed to slow aging. They've produced better longevity evidence than anything that was. A deep look at the data, the paradox, and what it means for the future of medicine.]]></description>
      <content:encoded><![CDATA[
<p>In November 2023, the SELECT trial reported that semaglutide &mdash; the molecule behind Ozempic and Wegovy &mdash; reduced major adverse cardiovascular events by 20% in adults with obesity but without diabetes. Then cardiologists found a 40% relative risk reduction for heart failure. Then kidney specialists found a 16% reduction in kidney failure risk. Then hepatologists reported that 63% of patients with fatty liver disease achieved resolution.</p>

<p>One drug. Not four. And none of these outcomes were what it was designed for.</p>

<p>GLP-1 receptor agonists were built to manage blood sugar in type 2 diabetes. They were not designed to extend life, reverse organ damage, or intervene in the biology of aging. But Nature Biotechnology ran a headline that would have been unthinkable five years ago: "Are GLP-1s the first longevity drugs?" The answer, based on the evidence so far, is: quite possibly &mdash; and more credibly than any drug deliberately designed for that purpose.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-glp1-longevity.html">Read the full post &rarr;</a></p>
      ]]></content:encoded>
    </item>

    <item>
      <title>The Loop Is Closing: Recursive Self-Improvement Has Left the Lab</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-recursive-self-improvement.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-recursive-self-improvement.html</guid>
      <pubDate>Sun, 05 Apr 2026 14:00:00 -0500</pubDate>
      <category>AI</category>
      <description><![CDATA[Ninety percent of Claude's code is written by Claude. Every major AI lab has a concrete timeline for automating its own research. The governance gap is widening. A source-by-source investigation into the most consequential development in AI.]]></description>
      <content:encoded><![CDATA[
<p>Ninety percent of Claude's code is written by Claude. Not by the engineers at Anthropic who designed the model, but by a previous version of the model itself &mdash; iterating on its own codebase, proposing changes, testing them, shipping them. An Anthropic spokesperson told Fortune that company-wide, the figure for AI-generated code is between 70% and 90%. At some leading engineers' desks at both Anthropic and OpenAI, it's reportedly 100%.</p>

<p>That number alone should stop you. It means the tools that are reshaping entire industries are increasingly built not by human hands but by earlier versions of themselves. The concept has a name that has bounced around AI safety circles for decades &mdash; recursive self-improvement, or RSI &mdash; and as of spring 2026, it has migrated from philosophical thought experiment to operational reality at every major AI laboratory on earth.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-recursive-self-improvement.html">Read the full post &rarr;</a></p>
      ]]></content:encoded>
    </item>

    <item>
      <title>WWDC 2026 Preview: What's Actually Coming June 8th</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-wwdc-2026-preview.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-wwdc-2026-preview.html</guid>
      <pubDate>Sun, 05 Apr 2026 09:00:00 -0500</pubDate>
      <category>Apple</category>
      <description><![CDATA[Sixty-four days out from WWDC, the picture is coming into focus. Siri's billion-dollar Gemini makeover, the Snow Leopard strategy for iOS 27, Core AI for developers, and the hardware wildcards — sourced and rated by confidence.]]></description>
      <content:encoded><![CDATA[
<p>Apple confirmed it two weeks ago: WWDC 2026 runs June 8 through 12, with the keynote kicking off Monday morning at Apple Park. On paper, it's the same format we've seen since 2020 — mostly online, a few thousand lottery winners in person, software betas by the afternoon. But the stakes this year are genuinely different.</p>

<p>Last year Apple spent WWDC introducing Liquid Glass and playing catch-up on AI promises that had been piling up since the original Apple Intelligence announcement at WWDC 2024. This year, the company has to prove that the billions it's spending on AI infrastructure are producing something people actually want to use. And the centerpiece of that argument has a name you already know: Siri.</p>

<p>I've spent the last week pulling together every credible source I can find — Bloomberg's Mark Gurman, Apple's own press materials, supply chain reporting, developer leaks, and community speculation — to build the most complete picture I can of what's coming.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-05-wwdc-2026-preview.html">Read the full post →</a></p>
      ]]></content:encoded>
    </item>

    <item>
      <title>When Will Claude Mythos Ship? An Evidence-Based Prediction</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-04-mythos-release-prediction.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-04-mythos-release-prediction.html</guid>
      <pubDate>Sat, 04 Apr 2026 16:45:00 -0500</pubDate>
      <category>AI</category>
      <description><![CDATA[Anthropic's Claude Mythos leaked on March 26 via a CMS misconfiguration. Analyzing release cadence, Code with Claude 2026 dates, infrastructure signals, and competitive pressure to build a falsifiable prediction for when it ships.]]></description>
      <content:encoded><![CDATA[
<p>On March 26, a CMS misconfiguration exposed ~3,000 unpublished Anthropic blog posts in a public data cache. Inside those drafts: detailed documentation for Claude Mythos, an internal codename Capybara — a fourth tier of Claude sitting above Opus with "dramatically higher scores" on coding, reasoning, and especially cybersecurity benchmarks.</p>

<p>Anthropic confirmed it was real. Fortune got them on record calling Mythos "the most capable we've built to date" and "a step change in capabilities." The question everyone's asking now: when does it actually ship?</p>

<p>I spent the last week pulling every available signal — release history, marketing strategy, infrastructure timelines, competitive pressure, and some interesting Reddit signals — to build a falsifiable prediction. The short answer: <strong>May 6, 2026 at Code with Claude San Francisco, with phased rollout through June.</strong> Confidence level: 65%.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/2026-04-04-mythos-release-prediction.html">Read the full post →</a></p>
      ]]></content:encoded>
    </item>

    <item>
      <title>Google's Gemma 4: What Actually Matters</title>
      <link>https://jacobbrooke95.github.io/jacobs-research-journal/posts/gemma-4-deep-dive.html</link>
      <guid isPermaLink="true">https://jacobbrooke95.github.io/jacobs-research-journal/posts/gemma-4-deep-dive.html</guid>
      <pubDate>Sat, 04 Apr 2026 09:00:00 +0000</pubDate>
      <category>AI</category>
      <description><![CDATA[Google dropped Gemma 4 on April 2nd with Apache 2.0 licensing — and that legal change may matter more than any benchmark. A deep look at the model family, architecture innovations, Mac setup, and an honest read of where it actually stands vs. Qwen, Llama 4, and the Chinese open-model pack.]]></description>
      <content:encoded><![CDATA[
<p>Google dropped Gemma 4 on April 2nd, and for once the open-weights release cycle is moving faster than the hype cycle can catch up. I've spent the last couple of days working through the technical docs, running the models locally, and reading the community reaction. Here's what I actually think.</p>

<p>The short version: Gemma 4 is a genuinely capable model family. But the headline isn't a benchmark number. It's a license. Apache 2.0 — no asterisks, no carve-outs, no monthly active user cap tucked into a Terms of Use PDF. That's the thing that will matter in six months, not whether the 31B scores 0.3 points higher than Qwen on GPQA Diamond.</p>

<figure>
  <img src="https://jacobbrooke95.github.io/jacobs-research-journal/images/gemma4-hero.png" alt="Google Gemma 4 official announcement graphic from Google Blog" style="width:100%;border-radius:8px;">
  <figcaption>Gemma 4 — announced April 2nd, 2026. Source: Google Blog.</figcaption>
</figure>

<h2>The Lineup</h2>

<p>Four models, two design philosophies. Google calls them the edge tier and the workstation tier, and the distinction is real — these aren't just different sizes of the same thing.</p>

<table>
  <thead>
    <tr><th>Model</th><th>Effective Params</th><th>Total Params</th><th>Context</th><th>Target</th></tr>
  </thead>
  <tbody>
    <tr><td>Gemma 4 E2B</td><td>2.3B</td><td>5.1B</td><td>128K</td><td>On-device</td></tr>
    <tr><td>Gemma 4 E4B</td><td>4.5B</td><td>8.0B</td><td>128K</td><td>On-device</td></tr>
    <tr><td>Gemma 4 26B A4B</td><td>4B active</td><td>26B (MoE)</td><td>256K</td><td>Workstation/server</td></tr>
    <tr><td>Gemma 4 31B Dense</td><td>31B</td><td>31B</td><td>256K</td><td>Workstation/server</td></tr>
  </tbody>
</table>

<p>The "E" in E2B and E4B stands for "effective" — Google's shorthand for Per-Layer Embeddings (PLE). Rather than one embedding table at the input, PLE adds a residual signal into every decoder layer, giving small models representational depth well beyond their actual weight count.</p>

<p>Two notable absences: there's no replacement for Gemma 3's popular 12B, leaving an awkward gap between ~4.5B effective and 26B. And the rumored 120B flagship didn't materialize at launch.</p>

<figure>
  <img src="https://jacobbrooke95.github.io/jacobs-research-journal/images/gemma4-benchmark-perf-vs-size.png" alt="Gemma 4 model performance vs size comparison chart from Hugging Face" style="width:100%;border-radius:8px;">
  <figcaption>Performance vs. model size across all four Gemma 4 variants. Source: Hugging Face Blog.</figcaption>
</figure>

<h2>The Architecture Worth Understanding</h2>

<h3>Alternating Attention</h3>
<p>Rather than running full attention through every layer, Gemma 4 alternates between local sliding-window attention (512-token windows on smaller models, 1024-token on larger ones) and global full-context attention. This is substantially more compute-efficient than dense full attention — it's how you get to 256K context windows without proportional cost blowup.</p>

<h3>Dual RoPE</h3>
<p>Standard rotary positional embeddings for local attention layers; Proportional RoPE (p-RoPE) for global layers. The combination enables reliably useful performance at 256K tokens rather than just nominally supporting it.</p>

<h3>Built-in Multimodal</h3>
<p>Vision support is native across all four models — not a separate variant, baked into the base architecture. The vision encoder uses learned 2D positions with multi-dimensional RoPE and a configurable image token budget (70 to 1120 tokens). Audio support (via a USM-style conformer encoder) is available in the edge models.</p>

<h2>The Benchmarks: An Honest Read</h2>

<table>
  <thead>
    <tr><th>Benchmark</th><th>31B Dense</th><th>26B A4B</th><th>E4B</th><th>E2B</th></tr>
  </thead>
  <tbody>
    <tr><td>MMLU Pro</td><td>85.2%</td><td>82.6%</td><td>69.4%</td><td>60.0%</td></tr>
    <tr><td>GPQA Diamond</td><td>84.3%</td><td>82.3%</td><td>58.6%</td><td>43.4%</td></tr>
    <tr><td>AIME 2026</td><td>89.2%</td><td>88.3%</td><td>42.5%</td><td>37.5%</td></tr>
    <tr><td>LiveCodeBench v6</td><td>80.0%</td><td>77.1%</td><td>52.0%</td><td>44.0%</td></tr>
    <tr><td>MMMU Pro (vision)</td><td>76.9%</td><td>73.8%</td><td>52.6%</td><td>44.2%</td></tr>
  </tbody>
</table>

<p>On LMArena, the 31B Dense sits at roughly #3 among open models with an ELO around 1452. The 26B MoE holds an ELO of ~1441 with only 4B parameters active. These are legitimately good numbers.</p>

<figure>
  <img src="https://jacobbrooke95.github.io/jacobs-research-journal/images/gemma4-arena-elo-comparison.png" alt="Gemma 4 Arena ELO leaderboard ranking comparison chart" style="width:100%;border-radius:8px;">
  <figcaption>Arena ELO leaderboard positioning Gemma 4 31B at #3 among open models. Source: Hugging Face Blog.</figcaption>
</figure>

<p><strong>The speed problem is real.</strong> Community benchmarks show the 26B MoE at roughly 11 tokens/sec on hardware where Qwen 3.5 35B runs at 60+. That's a 5x difference users feel on every request.</p>

<p><strong>Chinese models remain competitive.</strong> Qwen 3.5, GLM-5, and Kimi K2.5 are at or slightly ahead on aggregate automated benchmarks. Where Gemma 4 genuinely wins: non-English multilingual tasks and human preference evaluations.</p>

<p><strong>The 256K context window has caveats.</strong> Practically reaching the full window requires substantial VRAM headroom — benchmark on your specific hardware before building on it.</p>

<h2>Running It on Your Mac</h2>

<p>The E4B quantized to GGUF is ~9.6GB — comfortably fits a Mac mini M4 or any recent MacBook Pro.</p>

<pre><code># Edge models — audio and vision included
ollama run gemma4:e2b      # ~5.5GB
ollama run gemma4:e4b      # ~9.6GB

# Workstation models
ollama run gemma4:26b      # ~18GB, MoE
ollama run gemma4:31b      # ~20GB, dense</code></pre>

<p>On Apple Silicon, Ollama automatically routes through Apple's MLX framework. Hardware guidance: MacBook Pro M3 (18–36GB) handles E4B well; Mac Studio M3 Ultra handles all four variants comfortably.</p>

<h2>What Actually Matters Here</h2>

<figure>
  <img src="https://jacobbrooke95.github.io/jacobs-research-journal/images/gemma4-elo-benchmark.svg" alt="Gemma 4 Arena ELO score vs model size benchmark chart from Google DeepMind" style="width:100%;border-radius:8px;background:#fff;padding:8px;">
  <figcaption>ELO score vs. model size — Gemma 4 plotted against competitors. Source: Google DeepMind.</figcaption>
</figure>

<p>The Apache 2.0 license is the most important thing that happened on April 2nd. Previous Gemma versions shipped with a custom Terms of Use including monthly active user caps — a procurement headache that quietly pushed commercial teams toward Mistral, Qwen, and Llama. Apache 2.0 removes all of that. No MAU caps, no restrictions, no royalties. Commercial teams can now build on Gemma the same way they build on Llama — fully, without reservation.</p>

<p>There's also a geopolitical angle: enterprise procurement and security teams are increasingly preferring US-origin AI models over Chinese providers for compliance and data governance reasons that have nothing to do with benchmark rankings. Gemma 4 being strong, Apache-licensed, and US-origin is useful positioning no benchmark table captures.</p>

<h2>Bottom Line</h2>

<p><strong>If you avoided Gemma because of the license:</strong> reconsider. Apache 2.0 removes the blocker entirely.</p>

<p><strong>If you're evaluating open models for a new project:</strong> test the 26B MoE — impressive efficiency profile — but benchmark inference speed on your hardware first. The 11 token/sec community reports are concerning enough to verify.</p>

<p><strong>If you're looking for raw benchmark supremacy:</strong> the picture is mixed. The open-weights frontier in April 2026 is a close and genuinely competitive pack. That's the actual story here.</p>

<p><a href="https://jacobbrooke95.github.io/jacobs-research-journal/posts/gemma-4-deep-dive.html">Read the full post →</a></p>
      ]]></content:encoded>
    </item>

  </channel>
</rss>
