Data Science Briefing #321


(view in browser)

Jun 2nd

Next webinar:
Jun 3, 2026 - CrewAI for Production-Ready Multi‑Agent Systems
Count down to 2026-06-03T17:00:00.000Z

Dear Reader,

Announcements

 

Ready to level up your understanding of AI agents? 🤖

We all see the impressive capabilities of tools like Claude Code, Open Clawd, and Hermes, but what actually powers them behind the scenes?

In our latest Substack post, we break down the "secret sauce" of modern AI assistants by walking through how to build a basic agentic harness. If you're building with LLMs, exploring agentic workflows, or just want to understand the infrastructure that makes these tools tick, this is a must-read!

👉 Building a Basic Agentic Harness

Check it out and Subscribe so you don't miss another post.

This week brings two strong learning resources for builders. A full Stanford course walks through language modeling from scratch, covering tokenization, model architecture, training, and evaluation. Students write real code and train small models step by step, not just read about them. For a hands-on security track, one developer turned personal study notes into a structured course based on Linux Basics for Hackers. Both fit readers who learn by doing.

Two pieces look at the harder side of adoption. Big companies now ration AI access as monthly bills climb faster than planned. Some teams cap tokens, cut seats, or pause projects to keep spend in check. A separate report finds that safety controls on certain Meta and Google models can be stripped in minutes. That result raises sharp questions about how well today’s guardrails actually hold.

On the research side, a new experimental model from Google DeepMind builds text with diffusion instead of standard next-token prediction. Early numbers point to very fast generation and a fresh way to think about how text models work. That speed theme returns in a short post on prototyping in the age of AI, which argues that one person can now build and test an idea in hours rather than weeks. Read together, the three links sketch a field moving fast on both methods and tools.

On the academic front, we have two papers that test how far AI agents can push real research. One team built a system where many agents work like a decentralized lab. They form teams around promising ideas, critique each other before spending compute, and share both wins and dead ends, so the group avoids repeating work. On a 24-task biomedical benchmark it reached the 74th percentile, beating the strongest prior agent by more than 8 points. A second project aimed language models at unsolved math, asking them to write proofs in Lean, a language that checks every step. The strongest agent resolved 9 of 353 open Erdős problems at a few hundred dollars each and proved 44 of 492 sequence conjectures, with the work now feeding research in graph theory, optimization, and algebraic geometry.

Other work looked at how people connect. Researchers mapped follower ties across more than 1.6 million U.S. voters on Twitter, drawn from daily samples between 2014 and 2017. Physical distance turned out to be the strongest predictor of who follows whom, ahead of age and race, with party affiliation playing a surprisingly small role. Living near someone, the data show, pulls people toward following others of the same background. So online attention still tracks the map more than the ballot box.

A field study carried that question to one of the harshest places on Earth. For ten months, twelve crew members at Antarctica’s Concordia Station wore proximity sensors. They worked through the polar winter, when the base sits cut off and temperatures drop past minus 80 Celsius. The team expected close contact to build support. The opposite happened. Frequent physical proximity lined up with more conflict, more mistrust, and lower perceived performance, and the multicultural crew slowly split into national subgroups. The result matters for long missions to the Moon and Mars, where small teams will live packed together for years.

Our current book recommendation is "LLMs in Production: From Language Models to Successful Products" by C. Brousseau and M. Sharp. In this week's video, we have a keynote by M. Ceglowski on Superintelligence: The Idea That Eats Smart People.

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


"LLMs in Production: From Language Models to Successful Products" by C. Brousseau and M. Sharp is for data scientists and machine learning engineers who have moved past the “cool demo” phase and now need to ship something people can use. The book focuses on the real work behind LLM products: choosing models, preparing data, building RAG systems, evaluating outputs, controlling cost, managing latency, and deploying reliably.

Its biggest strength is that it treats LLMs as production software, not magic. The authors connect familiar ML concerns—measurement, data quality, feedback loops, monitoring, and trade-offs—to newer LLM-specific patterns such as prompt design, fine-tuning, LoRA, RLHF, hosted APIs, Kubernetes deployment, and edge inference. The hands-on projects help ground the material, especially for readers who want more than another conceptual overview.

The book is not perfect. Some sections move quickly, and experienced MLOps engineers may wish for more depth on architecture, observability, or failure analysis. Its tooling choices may also date quickly, as LLM infrastructure continues to shift. Still, the core value holds: this is a practical guide to thinking like an engineer when working with language models. For anyone trying to turn LLM experiments into durable products, it is an easy book to justify buying.


  1. Language Modeling from Scratch [cs336.stanford.edu]
  2. Corporate America Is Starting to Ration AI as Cost Skyrockets [wsj.com]
  3. Gemini Diffusion: Google DeepMind’s experimental research model [blog.google]
  4. Why I Made a Journal for AI-Generated Papers [cesarhidalgo.com]
  5. AI guardrails stripped from Meta and Google models in minutes [ft.com]
  6. The Speed of Prototyping in the Age of AI [darylcecile.net]
  7. A structured course built from personal study notes of the book Linux Basics for Hackers [github.com/ahegazy0]


Superintelligence: The Idea That Eats Smart People

video preview

All the videos of the week are available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

On-Demand Videos:

Long-form tutorials

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) Jun 9th Next webinar: Jul 8, 2026 - Automate the Boring Developer Stuff with LLMs [Register] Dear Reader, The very first issue of the Data Science Briefing went out on Jun 2nd, 2019 so this month we're celebrating the 7th anniversary of the Data Science Briefing. Over the next few weeks we'll have more than a few updates and improvements so we can continue learning together. Announcements As we celebrate our 7th anniversary, I wanted to share something I've been building for...

(view in browser) May 27th Next webinar: Jun 3, 2026 - CrewAI for Production-Ready Multi‑Agent Systems [Register] Dear Reader, Announcements Ready to level up your understanding of AI agents? 🤖 We all see the impressive capabilities of tools like Claude Code, Open Clawd, and Hermes, but what actually powers them behind the scenes? In our latest Substack post, we break down the "secret sauce" of modern AI assistants by walking through how to build a basic agentic harness. If you're building...

(view in browser) May 20th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...