Dear Reader,
Welcome to the Thanksgiving edition of the Data Science Briefing!
We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!
The latest blog post on the Epidemiology series is also out:
Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!
The very first edition of the Claude API for Python Developers webinar is coming up in just over a week and a half. You won't want to miss our deep dive into all things Claude, so don't forget to Sign Up! We're also proud to announce that the first webinar of 2026 will be LangChain for Generative AI Pipelines on January 21st, and that you can already register.
This week’s issue zooms out to question whether the whole AI gold rush is resting on the category error of treating fluency in language as proof of genuine intelligence, and what that means for an industry pouring trillions into systems that might be spectacular mimics rather than thinkers. At the same time, we drop down to the hardware-level realities: a deep dive into continuous batching shows how clever scheduling around attention and KV caches is quietly improving real-world throughput more than most headline benchmarks ever will.
On the open-source front, a new model family built as a fully documented “model flow” rather than a single opaque checkpoint offers long-context reasoning and full training-phase traceability, hinting at a future where transparency is as important as raw capability. And because every new capability creates a new attack surface, we also look at the first widely reported AI-orchestrated cyber-espionage campaign, underscoring how quickly agents are moving from demo toys to tools in real offensive and defensive operations.
On the academic front, this week’s research highlights how much of modern modeling comes down to getting the correct notion of contact, cognition, and scale. One study on respiratory syncytial virus shows that the shape of urban contact networks (who meets whom, where, and how often) can completely reshape epidemic curves and vaccination strategies, a theme echoed by new work on spatiotemporal, activity-driven networks that treat time and movement as first-class citizens rather than afterthoughts.
In parallel, an open-source epidemic modeling toolkit with built-in Bayesian calibration makes it far easier for practitioners to turn those messy realities into calibrated, policy-relevant simulations instead of toy models. But as we get better at mapping real-world interactions, social science is discovering a new problem: large language models are becoming an existential threat to traditional online surveys at precisely the moment we’re learning how badly humans misperceive their own centrality in social networks, raising the question of what “ground truth” even means when both respondents and respondents’ AIs can systematically distort it.
Zooming out, early experiments with next-generation language models hint at genuine acceleration of scientific workflows, while theoretical work on phase transitions from linear to nonlinear information processing suggests that both brains and artificial networks may operate near critical points where small architecture or data changes trigger qualitatively new reasoning regimes.
Our current book recommendation is Sinan Ozdemir’s "Quick Start Guide to Large Language Models". You can find all the previous book reviews on our website. In this week's video, we have a video on howSomething Strange Happens When You Trace How Connected We Are.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
Sinan Ozdemir’s "Quick Start Guide to Large Language Models" lives up to its name. It moves quickly from core concepts, tokens, context windows, and prompt structure to working patterns like chat apps, RAG, summarization, and lightweight agents. The sequencing is pragmatic: read a chapter, ship a prototype.
The standout value for DS/ML folks is its treatment of embeddings and retrieval. Ozdemir shows when embeddings beat fine-tuning, how to chunk and index, and how to trade off accuracy, latency, and cost with clear, reusable checklists. His sections on prompt patterns, tool use/function-calling, and interface design treat prompting like API design, constrain inputs, structure outputs, plan for failure modes, making it easy to slot into existing services.
In short: an excellent on-ramp and onboarding text. Pair it with heavier resources for evaluation, alignment, and production-grade deployments.
- The AI boom is based on a fundamental mistake [theverge.com]
- Gemini CLI Tips and Tricks [github.com]
- A New Bridge Links the Strange Math of Infinity to Computer Science [quantamagazine.org]
- AI is coming for the world of competitive Excel [thehustle.co]
- Continuous Batching From First Principles [huggingface.co]
-
Olmo 3: Charting a path through the model flow to lead open-source AI [allenai.org]
-
Disrupting the first reported AI-orchestrated cyber espionage campaign [anthropic.com]
-
Urban contact patterns shape respiratory syncytial virus epidemics with implications for vaccination (P. Kimball, J.-S. Casalegno, P. P. Martinez, A. S. Mahmud, B. Sandstede, R. E. Baker)
-
Epydemix: An open-source Python package for epidemic modeling with integrated approximate Bayesian calibration (N. Gozzi, M. Chinazzi, J. T. Davis, C. Gioannini, L. Rossi, M. Ajelli, N. Perra, A. Vespignani)
-
The potential existential threat of large language models to online survey research (S. J. Westwood)
-
Perception of own centrality in social networks (J. Kovářík, J. Ozaita, A. Sánchez, P. Brañas-Garza)
-
Early science acceleration experiments with GPT-5 (S. Bubeck, C. Coester, R. Eldan, T. Gowers, Yin T. Lee, A. Lupsasca, M. Sawhney, R. Scherrer, M. Sellke, B. K. Spears, D. Unutmaz, K. Weil, S. Yin, N. Zhivotovskiy)
-
Spatiotemporal Activity-Driven Networks (Z. Simon, J. Saramäki)
-
Phase transitions from linear to nonlinear information processing in neural networks (M. Matsumura, T. Haga)
Something Strange Happens When You Trace How Connected We Are
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|