Data Science Briefing #306


(view in browser)

Feb 18th

Dear Reader,

Welcome to the 306th issue of our newsletter!

Our schedule this week is completely packed with opportunities to skill up with us. We just finished the first edition of Code Development with AI Assistants and are already getting ready to scale complex automations with CrewAI for Production-Ready Multi-Agent Systems tomorrow, February 19. Don't leave your seat to chance and lock in your registration!

Across this week’s links, a common thread is the growing gap between what AI can generate and what humans can responsibly own. One essay argues that as “vibe coding” and AI-assisted drafting get frictionless, we risk accumulating “cognitive debt”, shipping words or software faster than we can genuinely understand them, and losing the slow, public, meaning-making that comes from rewriting, revising, and thinking in the open. That tension shows up in the economics too. Token caching can make costs scale quadratically, so long-running “helpful” sessions quietly become dominated by reading their own history.

Meanwhile, two hands-on case studies reveal both the promise and the limits of automation: a small “swarm” of models can sketch a surprisingly feature-rich, SQLite-inspired engine. And in reverse engineering, early LLM wins give way to a long tail in which progress depends on better retrieval, similarity search, domain-specific tooling, and guardrails. Layer on top a sobering real-world anecdote about an autonomous agent escalating into reputational attack behavior in open source, and the takeaway becomes clear: capabilities are accelerating, but so is the need for verification, governance, and “human-in-the-loop” norms.

On the academic front, we continue the arc from first principles to real-world impact: if “emergence” is the name we give to macro-behavior that isn’t explicitly programmed at the micro-level, then cities and information systems are basically emergence engines where simple local rules can still yield scale-free structure, from the distribution of points-of-interest that looks “heavy-tailed” even when you start from something as bland as a homogeneous random process, to the surprisingly universal ways central nodes and hubs show up across many scale-free networks.

The methodological subtext is a reminder that you don’t get to claim patterns without clean causal plumbing: rigorous randomization is the difference between “we saw a thing” and “we learned a thing.” That matters when you move from theory into governance, such as measuring who actually gets shade on sidewalks, and how that basic infrastructure quietly encodes inequality at a global scale.

On the tooling side, several papers point to a convergence: better network measures (including spectral, SVD-based notions of incidence centrality that unify node/edge importance even in directed networks and hypergraphs) help us describe complex systems more faithfully, while small, well-scoped language models can be experimentally validated for high-stakes summarization workflows. And looming over all of it is the “next emergence” question: if we can scaffold models to do autonomous mathematics research, then the frontier is whether we can build experimental, interpretable, and equitable pipelines around these systems so that their breakthroughs don’t outpace our ability to trust, test, and distribute their benefits.

Our current book recommendation is "Visualizing Generative AI: How AI Paints, Writes, and Assists" by P. Vergadia and V. Lakshmanan. You can find all the previous book reviews on our website. In this week's video, we have a long lecture on Productively Programming Accelerated Computing Systems.

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


"Visualizing Generative AI: How AI Paints, Writes, and Assists" by P. Vergadia and V. Lakshmanan is a concept-first, diagram-rich guide that makes modern GenAI feel legible. Priyanka Vergadia’s visual explanations are the star: clean mental models for tokens, embeddings, transformers, and “why the model says what it says,” without burying you in math. It’s the kind of book that helps you keep the whole system in your head fast.

For data scientists and ML engineers, the best value is the shared vocabulary it builds for real-world conversations: architecture tradeoffs, where GenAI fits in products, and what it’s actually good at today (assistive workflows, automation, and augmentation more than magic). It also doesn’t dodge the sharp edges, such as hallucinations, security concerns, and practical limitations, so you’re not left with a glossy, hype-only view.

The main drawback is depth: if you want rigorous internals, training dynamics, evaluation deep dives, or extensive code and end-to-end implementation details, this isn’t the book for you. But as a quick, sticky mental map, something you can read in a weekend and keep referencing when you’re designing, reviewing, or educating stakeholders, it’s a very strong pick, and likely to earn a spot on your “worth recommending” shelf.


  1. What is happening to writing? [resobscura.substack.com]
  2. There is unequivocal evidence that Earth is warming at an unprecedented rate. [science.nasa.gov]
  3. BarraCUDA: An open-source CUDA compiler that targets AMD GPUs [github.com/Zaneham]
  4. An AI Agent Published a Hit Piece on Me – Forensics and More Fallout [theshamblog.com]
  5. Expensively Quadratic: the LLM Agent Cost Curve [blog.exe.dev]
  6. Building sqlite with a small swarm [kiankyars.github.io]
  7. The Long Tail of LLM-Assisted Decompilation [blog.chrislewis.au]


Productively Programming Accelerated Computing Systems

video preview

All the videos of the week are now available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

On-Demand Videos:

Long-form tutorials

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) Mar 5th Next webinar: Mar 18, 2026 - Gemini API with VertexAI for Developers [Register] Dear Reader, Welcome to the 308th issue of our newsletter! Announcements The first edition of the CrewAI for Production-Ready Multi‑Agent Systems was a great success and we're already planning on the next edition. Meanwhile, if you missed out on the live session, I put a package together on Gumroad so you can work through it at your own pace. It's five Jupyter notebooks walking through...

(view in browser) Feb 25th Next webinar: Mar 18, 2026 - Gemini API with VertexAI for Developers [Register] Dear Reader, Welcome to the Feb 25th issue of our newsletter! Announcements The first edition of the CrewAI for Production-Ready Multi‑Agent Systems was a great success and we're already planning on the next edition. Meanwhile, if you missed out on the live session, I put a package together on Gumroad so you can work through it at your own pace. It's five Jupyter notebooks walking...

(view in browser) Feb 11th Next webinar: Feb 18, 2026 - Code Development with AI Assistants [Register] Dear Reader, Welcome to the Feb 11th issue of our newsletter! The clock is ticking, and our schedule for next week is completely packed with opportunities to skill up with us. Whether you're looking to turbocharge your workflow during Code Development with AI Assistants on February 18 or ready to scale complex automations with CrewAI for Production-Ready Multi-Agent Systems on February 19,...