Sept 4th

Next webinar:
Sep 17, 2025 - Machine Learning with PyTorch for Developers [Register]

Dear Reader,

Welcome to the Sept 4th edition of the Data Science Briefing!

We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!

The latest blog post on the Epidemiology series is also out: Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!

From nuts-and-bolts scaling to policy turbulence, this week’s picks favor results over rhetoric: “How To Scale Your Model” turns TPU/GPU lore into actionable guidance on parallelism and communication bottlenecks for training and inference at scale, a must-read if you’re pushing beyond a single node.

Graph Transformers get a clean, industry-grounded primer arguing that attention over graphs is fast becoming the default for structured data in finance, bio, and recsys. On the ground, a staff engineer’s six-week sprint with Claude Code provides a pragmatic playbook for embracing the “95% garbage” first pass, then iterating with tighter prompts, specifications, and tests.

In applied science, OpenAI and Retro Biosciences report a 50-fold increase in stem-cell reprogramming markers via a specialized model—evidence that domain-tuned AI can deliver real laboratory gains. Meanwhile, Mistral’s Le Chat adds Memories and 20+ MCP connectors, making enterprise assistants stickier inside everyday tools. And in the policy file, 85+ climate scientists released a detailed review of DOE’s Climate Working Group report as EPA reconsiders its Endangerment Finding.

On the research front, we explore the through-line separating true novelty from comfortable echoes. One team quantifies how LLMs fall into “plot templates,” proposing metrics and mitigations for story diversity, which is nicely complemented by work that measures what models actually memorize and how deduplication and extraction risk shape deployments.

On the learning side, “active reading” frames factual acquisition as targeted retrieval and synthesis rather than brute-force scale. At the same time, a sweeping survey of physical neural networks argues that pushing compute into photonics and other analog substrates can buy massive efficiency.

Outside the lab, a clever mobility study teases apart geography from human choice, showing how much of our movement is constrained by the map versus our willingness to explore; a 65-year analysis of U.S. charts similarly suggests the road to cultural “hits” is steeper in a long-tail attention economy. And in the background, a multi-country longitudinal study on education and brain aging nudges us toward nuance: schooling seems to raise the baseline of cognitive performance more than it slows the slope of decline—an uncomfortable but clarifying distinction for anyone betting on interventions to move the needle.

This week's book is "Behavioral Network Science: Language, Mind, and Society" by T. T. Hills. You can find all the previous book recommendations on our website. In this week's video, we have an interview with Dr. Peter Hotez on Understanding Vaccine Misinformation.

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team

Michael Lanham's book, "AI Agents in Action", is a practical guide for developers who want to build autonomous AI agents using large language models (LLMs) and open-source frameworks. The book focuses on real-world engineering rather than abstract theory, offering a step-by-step approach to building agent architectures, managing multi-agent systems, and using LLMs to solve business problems. It's written for developers and technical professionals who have the necessary foundational skills in Python and want to move from theoretical knowledge to hands-on development.

The book's strength lies in its gradual layering of complexity, starting with basic concepts and moving to advanced topics like multi-agent orchestration and prompt engineering. Lanham uses open-source tools like CrewAI, AutoGen, and Nexus, and includes annotated code examples to help readers follow along. This approach effectively bridges the gap between academic theory and practical development, making it a valuable toolkit for machine learning engineers who want to create production-ready solutions for tasks like workflow automation and customer service bots. The book also provides insightful commentary on integrating key components like memory and feedback loops into agent-based systems.

However, the book has some notable limitations. A major critique is its optimistic portrayal of the tools and techniques, often overlooking critical discussions about their limitations, trade-offs, and performance at scale. It focuses on illustrative projects rather than addressing issues of robustness and reliability, which are crucial for high-stakes, enterprise-grade deployments. Another drawback is the lack of extended use cases or full-scale system integration examples, which would provide a more complete understanding of an agent system's lifecycle, maintenance, and long-term performance in a real-world business environment.

Echoes in AI: Quantifying lack of plot diversity in LLM outputs (W. Xu, N. Jojic, S. Rao, C. Brockett, B. Dolan)
Decoupling geographical constraints from human mobility (L. Boucherie, B. F. Maier, S. Lehmann)
Reevaluating the role of education on cognitive decline and brain aging in longitudinal cohorts across 33 Western countries (A. M. Fjell, O. Rogeberg, Ø. Sørensen, I. K. Amlien, D. Bartrés-Faz, A. M. Brandmaier, G. Cattaneo, S. Düzel, H. Grydeland, R. N. Henson, S. Kühn, U. Lindenberger, T. H. Lyngstad, A. M. Mowinckel, L. Nyberg, A. Pascual-Leone, C. Solé-Padullés, M. H. Sneve, J. Solana, M. Strømstad, L. O. Watne, K. B. Walhovd, D. Vidal-Piñeiro)
Training of physical neural networks (A. Momeni, B. Rahmani, B. Scellier, L. G. Wright, P. L. McMahon, C. C. Wanjura, Y. Li, A. Skalli, N. G. Berloff, T. Onodera, I. Oguz, F. Morichetti, P. del Hougne, M LeGallo, A. Sebastian, A. Mirhoseini, C. Zhang, D. Marković, D. Brunner, C. Moser, S. Gigan, F. Marquardt, A. Ozcan, J. Grollier, A. J. Liu, D. Psaltis, A. Alù, R. Fleury)
Is it getting harder to make a hit? Evidence from 65 years of US music chart history (M. E. Lech, S. Lehmann, J. L. Juul)
Learning Facts at Scale with Active Reading (J. Lin, V.-P. Berges, X. Chen, W.-T. Yih, G. Ghosh, B. Oğuz)
How much do language models memorize? (J. X. Morris, C. Sitawarin, C. Guo, N. Kokhlikyan, G. E. Suh, A. M. Rush, K. Chaudhuri, S. Mahloujifar)

Understanding Vaccine Misinformation: A Conversation with Dr. Peter Hotez

All the videos of the week are now available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

Sep 17, 2025 - Machine Learning with PyTorch for Developers [Register]
Oct 1, 2025 - LLMs for Data Science [Register]
Oct 15, 2025 - LangChain for Generative AI Pipelines [Register]

On-Demand Videos:

Long-form tutorials

Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.

Learn More

Unsubscribe

Data For Science, Inc

Data Science Briefing #291

Sept 4th

Upcoming Events:

On-Demand Videos:

Data Science Briefing #293

Data Science Briefing #292

Data Science Briefing #290