Data Science Briefing #296


(view in browser)

Nov 7th

Next webinar:
Dec 10, 2025 - Claude API for Python Developers
Count down to 2025-12-10T18:00:00.000Z

Dear Reader,

Welcome to the November 7th edition of the Data Science Briefing!

We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!

The latest blog post on the Epidemiology series is also out: Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!

This week’s links trace the full AI stack, from gritty data plumbing to big-picture ethics. Netflix’s write-ahead log post is a masterclass in designing resilient pipelines (think exactly-once semantics, fast recovery, and taming out-of-order events). At the same time, Karpathy’s “Yes, you should understand backprop” is the clearest case yet for owning the math that actually shapes your gradients.

On the product side, Fly.io’s “You Should Write an Agent” argues for shipping small, focused agents now, wiring tools, memory, and guardrails instead of chasing grand abstractions. Meanwhile, the data economy behind those agents is under scrutiny: The Atlantic’s look at Common Crawl spotlights the invisible infrastructure (and messy provenance) that feeds modern models, and Lee Fang tracks how copyright enforcement has faded just as AI scraping has surged, shifting old piracy debates into the training-data era.

Zoom out to the grid, and the New Yorker’s tour of AI data centers underscores the real-world costs of scale. And if you need a north star for why all this machinery matters, Quanta reports on language models meeting expert-level analyses on specific tasks, an achievement that’s as much about disciplined engineering and data stewardship as it is about model size.

From micro-level ties to macro-level contagion, this week’s papers sketch a unified playbook for modeling how information and misinformation move. Work on intersectional inequalities in social networks shows that who connects to whom still gates opportunity and exposure, setting the initial conditions for any diffusion. The physics-inspired takes on news, rumors, and opinions, then models how those signals propagate.

At the same time, human mobility studies remind us that geography and movement patterns still serve as the hidden coupling layer across communities. Methodologically, neural symbolic regression enables the direct discovery of governing equations from network data, and new null models for information decomposition provide principled baselines for separating synergy from redundancy when multiple signals interact.

On the language-model front, evidence that accumulating context shifts model “beliefs” raises sharp questions about prompt hygiene and evaluation drift, just as continuous autoregressive models push beyond token clocks toward smoother dynamics. Together, they argue for pipeline designs that respect structure (inequality and mobility), dynamics (learned equations, robust baselines), and cognition (context-sensitive LMs) if we want forecasting, moderation, and AI-assistant behavior to hold up outside the lab.

Our current book recommendation is Sinan Ozdemir’s "Quick Start Guide to Large Language Models". You can find all the previous book reviews on our website. In this week's video, we have an overview of "what is a Laplace Transform?”.

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


Sinan Ozdemir’s "Quick Start Guide to Large Language Models" lives up to its name. It moves quickly from core concepts, tokens, context windows, and prompt structure to working patterns like chat apps, RAG, summarization, and lightweight agents. The sequencing is pragmatic: read a chapter, ship a prototype.

The standout value for DS/ML folks is its treatment of embeddings and retrieval. Ozdemir shows when embeddings beat fine-tuning, how to chunk and index, and how to trade off accuracy, latency, and cost with clear, reusable checklists. His sections on prompt patterns, tool use/function-calling, and interface design treat prompting like API design, constrain inputs, structure outputs, plan for failure modes, making it easy to slot into existing services.

In short: an excellent on-ramp and onboarding text. Pair it with heavier resources for evaluation, alignment, and production-grade deployments.


  1. Building a Resilient Data Platform with Write-Ahead Log at Netflix [netflixtechblog.com]
  2. You Should Write An Agent [fly.io]
  3. Common Crawl Is Doing the AI Industry’s Dirty Work [theatlantic.com]
  4. Yes you should understand backprop [karpathy.medium.com]
  5. What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise [leefang.com]
  6. Inside the Data Centers That Train A.I. and Drain the Electrical Grid [newyorker.com]
  7. In a First, AI Models Analyze Language As Well As a Human Expert [quantamagazine.org]


But what is a Laplace Transform?

video preview

All the videos of the week are now available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

On-Demand Videos:

Long-form tutorials

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) Nov 21st Next webinar: Dec 10, 2025 - Claude API for Python Developers [Register] Dear Reader, Welcome to the 297th edition of the Data Science Briefing! We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn,...

(view in browser) Oct 29th Next webinar: Dec 10, 2025 - Claude API for Python Developers [Register] Dear Reader, Welcome to the 295th edition of the Data Science Briefing! We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn,...

(view in browser) Oct 8th Next webinar: Dec 10, 2025 - Claude API for Python Developers [Register] Dear Reader, Welcome to the Oct 15th edition of the Data Science Briefing! We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn,...