Dear Reader,
Welcome to the Oct 15th edition of the Data Science Briefing!
We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!
The latest blog post on the Epidemiology series is also out:
Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!
This week, we're particularly proud to announce a brand new series of webinars: Claude API for Python Developers, where we'll take a deep dive into Anthropic's powerful API and how to best leverage it. The first edition is coming up on Dec 1st, but Registrations are already open
In our regular content, we sketch a pragmatic arc for agents and AI tooling: Anthropic argues that “context” is a scarce resource and offers concrete tactics for curating retrieval, compressing histories, and budgeting tokens, treating context engineering as first-class infra for reliable agents. Under the hood, IBM reminds us why scaling compute still hurts: the classic von Neumann bottleneck of shuttling colossal weight tensors between memory and compute wastes energy and time, so architectural shifts (and near-memory/analog ideas) matter as much as model tweaks.
Outside the lab, Simon Willison’s field notes on running multiple coding agents capture a rising workflow, parallelized generation, and a bottleneck on human review. At the same time, Chris Loy warns that over-outsourcing can atrophy the hard part of programming: the thinking. As a counterpoint to “only bigger models,” Apple’s SimpleFold shows that elegant, general-purpose transformers plus flow-matching can rival specialized protein-folding stacks, evidence that clever design can beat baroque complexity.
Nature’s “rise of LLMs” frames foundation models as a new scientific instrument, setting the stage for work that asks not whether they replace linguistics, but how their generative machinery aligns with generative grammar’s abstractions. On the ground, GDPVAL shifts the benchmarking game from puzzles to profit, measuring whether model-plus-human teams can deliver economically valuable work faster and cheaper.
Meanwhile, a “physics of learning” lens derives classic algorithms from a Lagrangian, suggesting that better training recipes may emerge from first principles rather than ad-hoc tricks, and “AI agent economics” sketches incentives and market dynamics for fleets of autonomous workers. Zooming out, new macro models of an AGI world warn that growth may decouple from traditional labor, raising hard questions about bottlenecks, wages, and distribution. At the same time, bandit theory offers a sober playbook for exploration vs. exploitation as we operationalize these systems in policy and product.
Our current book recommendation is Mark Carrigan’s "Generative AI for Academics". You can find all the previous book reviews on our website. In this week's video, we have a hands-on primer on Fine-Tuning Open-Weight Models.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
Mark Carrigan’s "Generative AI for Academics" is a brisk, sensible map for using LLMs in scholarly life. It avoids both hype and doom, treating generative AI as a set of tools that demand judgment, not blind adoption. The tone is practical and reflective—ideal for faculty, PIs, and grad students who need shared language and guardrails.
The book shines in how it organizes academic work (Thinking, Collaborating, Communicating, Engaging), then pairs each with concrete practices (rubber-ducking, draft refinement, critical oversight). It isn’t a prompt cookbook or a windy manifesto; it’s a clear framework for responsible use, culture-setting, and policy discussions in departments and labs.
Data scientists and ML engineers will find valuable takeaways for literature synthesis, design reviews, code docs, and stakeholder comms. But if you want model internals, rigorous eval protocols, threat modeling, or MLOps patterns, the book skims the surface. Bottom line: keep it close for norms, ethics, and mentoring; pair it with technical playbooks when you need depth.
- Effective context engineering for AI agents [anthropic.com]
- AI Agents from First Principles [goyalpramod.github.io]
- Failing to Understand the Exponential, Again [julian.ac]
- How the von Neumann bottleneck is impeding AI computing [research.ibm.com]
- Apple SimpleFold: Folding Proteins is Simpler than You Think [github.com/apple]
-
Embracing the parallel coding agent lifestyle [simonwillison.net]
- The AI coding trap [chrisloy.dev]
-
The rise of large language models (Nature)
-
On the compatibility of generative AI and generative linguistics (E. Portelance, M. Jasbi)
-
GDPVAL: Evaluating AI Model Performance On Real-World Economically Valuable Tasks (T. Patwardhan, R. Dias, E. Proehl, G. Kim, M. Wang, O. Watkins, S. P. Fishman, M. Aljubeh, P. Thacker)
-
Physics of Learning: A Lagrangian perspective to different learning paradigms (S. Guo, B. Schölkopf)
-
Ten Principles of AI Agent Economics (K. Yang, C.X. Zhai)
- We Won't Be Missed: Work and Growth in the Era of AGI (P. Restrepo)
-
Introduction to Multi-Armed Bandits (A. Slivkins)
-
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors (A. Didolkar, N. Ballas, S. Arora, A. Goyal)
Fine-Tuning Open-Weight Models: A Hands-On Deep Learning Primer
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|