Dear Reader,
Welcome to the 292nd edition of the Data Science Briefing!
We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!
The latest blog post on the Epidemiology series is also out:
Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!
This week, we start with a fascinating look at why cities behave like living organisms that suggests urban planning can borrow from biology. If you’re building with AI, A. B. Vijay Kumar’s “Product Requirements Prompts” demonstrates how structured prompts transform fuzzy ideas into machine-executable specifications and more robust, agentic workflows.
For the hands-on crowd, we have an overview of running local LLMs on macOS for private, offline tinkering. On the modeling front, Singlelunch offers a refreshingly skeptical take on GNNs: useful, yes, but hardly a panacea. Quanta’s clear primer on the Fourier transform reminds us why frequency space still underpins everything from compression to physics intuition. Similarly, Scientific American’s tour of card-shuffling math doubles as a cautionary tale for randomness in online systems.
On the research front, a new mechanistic study finds that “theory-of-mind” skill in LLMs hinges on ultra-sparse parameter substructures, hinting at compact, targetable circuits for social reasoning. At the same time, reliability remains front-and-center: a fresh analysis by OpenAI argues hallucinations are structurally incentivized by today’s training and evaluation, which reward confident guesses over calibrated “I don’t know,” while a survey maps trustworthiness in LLM reasoning across truthfulness, safety, robustness, fairness, and privacy.
Adding friction, a perspective contends there’s a scaling “wall” that limits how much bigger models can actually reduce uncertainty to scientific standards. Enter self-improving systems: R-Zero pits a Challenger against a Solver to generate tasks from scratch and bootstrap reasoning without external data, and “Bootstrapping Task Spaces” (ExIt) builds autocurricula that boost math, multi-turn tool use, and ML engineering tasks. As a framing contrast, work in animal behavior formalizes causal drivers of social networks, reminding us that explicit causal structure, and not just scale, can make complex systems legible and steerable.
Our current book recommendation is Mark Carrigan’s "Generative AI for Academics". You can find all the previous book reviews on our website. In this week's video, we have a documentary on the history of the Python programming language.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
Mark Carrigan’s "Generative AI for Academics" is a brisk, sensible map for using LLMs in scholarly life. It avoids both hype and doom, treating generative AI as a set of tools that demand judgment, not blind adoption. The tone is practical and reflective—ideal for faculty, PIs, and grad students who need shared language and guardrails.
The book shines in how it organizes academic work (Thinking, Collaborating, Communicating, Engaging), then pairs each with concrete practices (rubber-ducking, draft refinement, critical oversight). It isn’t a prompt cookbook or a windy manifesto; it’s a clear framework for responsible use, culture-setting, and policy discussions in departments and labs.
Data scientists and ML engineers will find valuable takeaways for literature synthesis, design reviews, code docs, and stakeholder comms. But if you want model internals, rigorous eval protocols, threat modeling, or MLOps patterns, the book skims the surface. Bottom line: keep it close for norms, ethics, and mentoring; pair it with technical playbooks when you need depth.
- Cities Obey the Laws of Living Things [nautil.us]
- Product Requirements Prompts [abvijaykumar.medium.com]
- Experimenting with local LLMs on macOS [blog.6nok.org]
-
Why I'm lukewarm on graph neural networks [singlelunch.com]
-
How the Math of Shuffling Cards Almost Brought Down an Online Poker Empire [scientificamerican.com]
- Some thoughts on personal git hosting [shkspr.mobi]
-
What Is the Fourier Transform? [quantamagazine.org]
- How large language models encode theory-of-mind: a study on sparse parameter patterns (Y. Wu, W. Guo, Z. Liu, H. Ji, Z. Xu, D. Zhang)
-
A causal framework for the drivers of animal social network structure (B. Kawam, J. Ostner, R. McElreath, O. Schülke, D. Redhead)
-
Why Language Models Hallucinate (A. T. Kalai, O. Nachum, S. S. Vempala, E. Zhang)
-
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models (Y. Wang, Y. Yu, J. Liang, R. He)
-
The wall confronting large language models (P. V. Coveney, S. Succi)
-
R-Zero: Self-Evolving Reasoning LLM from Zero Data (C. Huang, W. Yu, X. Wang, H. Zhang, Z. Li, R. Li, J. Huang, H. Mi, D. Yu)
-
Bootstrapping Task Spaces for Self-Improvement (M. Jiang, A. Lupu, Y. Bachrach)
Python: The Documentary
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|