Data Science Briefing #301


(view in browser)

Jan 14th

Next webinar:
Jan 21, 2026 - LangChain for Generative AI Pipelines
Count down to 2026-01-21T18:00:00.000Z

Dear Reader,

Welcome to the Jan 14th issue of our newsletter!

We're proud to announce that our roundup of the best books of 2025 is now up over on the Data For Science Substack. Check it out!

The next edition of the LangChain for Generative AI Pipelines webinar is coming in just a few days, but there are still a few spots open, so don't forget to register.

This week’s stand out themes is that the AI moment is moving from “novel demo” to “structural shift.” One essay from a veteran builder makes the uncomfortable point that refusing the tools won’t stop the change: with today’s coding agents, a well-specified goal plus careful review can compress weeks of work into hours, so the scarce skill becomes clarifying intent and judging output, not just typing code.

Another piece takes that idea to an extreme, with a “library” that ships as a spec and language-agnostic tests: pick a language, paste a short prompt, and generate a fresh implementation on demand! Practical agent-building guidance echoes the same direction: make control flow explicit with graphs (nodes, edges, states), so multi-step behavior is observable and debuggable rather than hidden in prompt prose. I

In parallel, an “agents-in-your-pipeline” project frames LLM work like ordinary data tooling. Zooming out, a sharp business take argues AI is a stress test that commoditizes what you can fully specify, pushing durable value toward operations—deployment, reliability, security, and the messy work you can’t prompt into existence. And in research, the same multi-agent framing shows up as an “AI co-scientist” designed to help navigate exploding literatures and generate novel, evidence-grounded hypotheses—less autocomplete, more structured collaborator.

On the academic front, the through-line is that structure—in data, prompts, objectives, and networks—quietly determines what systems learn and how they behave. One paper warns that training language models on seemingly narrow “safe” tasks can backfire, producing surprisingly broad misalignment: optimize too hard on a constrained proxy, and you may teach the model the wrong general rule. Another shows a more benign version of the same phenomenon: simple prompt repetition can boost performance in non-reasoning models, suggesting that a lot of “capability” is really about stabilizing the model’s internal trajectory rather than adding new information.

Moving from single models to societies, the network angle becomes explicit: consensus isn’t just about what individuals believe, but about how local decision rules interact with connections, echoing work on the hidden structure of innovation networks, where breakthroughs (and bottlenecks) emerge from the topology of collaboration and recombination.

Even platform measurement gets pulled into this story: careful ID-sampling to reconstruct a complete slice of TikTok is a reminder that the networks we study are often distorted by what we can observe, and those distortions can drive the conclusions. Meanwhile, in the physical world, a specialized loss function for precipitation forecasting underscores how much progress hinges on aligning the training signal with rare, high-impact events rather than average-case accuracy.

Put it all together with a proposal for a global web of autonomous scientific agents, and you get a crisp thesis for 2026: the next wave won’t be about ever-larger monoliths, but about designing objectives and connected systems that stay robust when they scale, because once you wire agents, people, and information into networks, small local choices become global outcomes.

Our current book recommendation is "Building AI Agents with LLMs, RAG, and Knowledge Graphs" by S. Raieli and G. Iuculano. You can find all the previous book reviews on our website. In this week's video, we have a discussion with Terry Tao on how "LLMs Are Simpler Than You Think – The Real Mystery Is Why They Work!"

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


"Building AI Agents with LLMs, RAG, and Knowledge Graphs" by S. Raieli and G. Iuculano is a clear-headed guide for anyone trying to turn “cool LLM demo” into an agent that can retrieve facts, use tools, and stay anchored to real information. Raieli and Iuculano keep the focus on what matters in practice. How RAG and knowledge graphs change the reliability profile of an agent, and when you need more structure than “just prompt it better.”

For data scientists and ML engineers, the best part is the build-oriented progression. It connects core concepts to concrete patterns—single-agent tool use, retrieval pipelines, and multi-agent coordination—without drowning you in theory. The examples feel like things you’d actually adapt into a prototype at work, and the overall framing consistently nudges you toward grounded, auditable behavior instead of vibes-based generation.

The tradeoff is breadth: if you already know transformers cold, some early sections may read like a warm-up, and the “production” angle is more of a practical starting line than a full MLOps reliability handbook. Still, as a one-stop map of modern agent building—especially where RAG and knowledge graphs stop being buzzwords and start being design choices—it’s an intense, usable read that tends to leave you with a short list of things you want to try next.


  1. Don't fall into the anti-AI hype [antirez.com]
  2. A Software Library with No Code [dbreunig.com]
  3. Building AI Agents with LangGraph (2026 Edition): A Step-by-Step Guide [ai.gopubby.com]
  4. Flash Learn - Agents made simple [github.com/Pravko-Solutions]
  5. AI is a business model stress test [dri.es]
  6. Accelerating scientific breakthroughs with an AI co-scientist [research.google]
  7. Multi-Agent Systems: The Architecture Shift from Monolithic LLMs to Collaborative Intelligence [comet.com]


Terry Tao: "LLMs Are Simpler Than You Think – The Real Mystery Is Why They Work!"

video preview

All the videos of the week are now available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

On-Demand Videos:

Long-form tutorials

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) May 13th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...

(view in browser) May 6th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...

(view in browser) Apr 30th Next webinar: May 6, 2026 - Automate the Boring Developer Stuff with LLMs [Register] Dear Reader, Announcements ✈️ Mapping the skies: How do we visualize airline traffic between states? We often think of air travel in terms of airports, but viewing it as a network of state-to-state connections reveals fascinating patterns in how our country moves. Our latest substack uses data visualization to turn raw statistics into a clear story about infrastructure and mobility....