Dear Reader,
Welcome to the 295th edition of the Data Science Briefing!
We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!
The latest blog post on the Epidemiology series is also out:
Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!
This week’s picks trace a clear arc from model “smarts” to practical, ship-it-now tooling: the Transformer Circuits team’s piece on emergent introspective awareness nudges the debate from “can LLMs reason?” to “can they monitor and correct themselves?”. At the same time, Anthropic’s Claude on Vertex AI makes that capability deployable in enterprise pipelines without wrangling infra.
On the product side, Claude Skills hint at a cleaner pattern for task-specific orchestration than heavyweight protocol layers, and PyTorch Monarch showcases how the ecosystem is standardizing scalable building blocks for modern sequence workloads. DeepSeek-OCR’s 2D optical mapping reframes OCR as layout-aware context compression useful for long, messy documents. Also, JupyterLite brings GNU Octave into the browser for frictionless, anywhere computation.
Rounding it out, “Why I code as a CTO” argues for hands-on technical leadership, and Anil Dash’s “Majority AI View” reminds us that adoption is as much cultural as it is technical: shipping wins when capabilities, ergonomics, and values line up.
From social physics to model mechanics, this week's roundup of the latest academic papers maps where human behavior meets algorithmic design: new evidence that physical proximity trumps online ties in predicting U.S. voting nudges political analytics back to geography-first features, while multiobjective approaches to bias mitigation show how to tune recommender objectives beyond click-through without tanking utility.
On the modeling front, “Language Models are Injective and Hence Invertible” reframes pretraining as a two-way street with implications for editing, attribution, and safety, just as “Reasoning with Sampling” demonstrates that careful decoding can unlock latent problem-solving in base models you already have.
A survey of “vibe coding” captures the emerging craft of prompt-as-interface, and work on prediction-market arbitrage exposes the probabilistic seams where incentives and information leak. And when experiments sprawl, “When goodbye comes too soon” offers a pragmatic playbook for sunsetting projects fast—freeing up cycles to double down where the signal’s strongest.
Our current book recommendation is Sinan Ozdemir’s "Quick Start Guide to Large Language Models". You can find all the previous book reviews on our website. In this week's video, we have an interview with Andrej Karpathy: “We’re summoning ghosts, not building animals”.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
Sinan Ozdemir’s "Quick Start Guide to Large Language Models" lives up to its name. It moves quickly from core concepts, tokens, context windows, and prompt structure to working patterns like chat apps, RAG, summarization, and lightweight agents. The sequencing is pragmatic: read a chapter, ship a prototype.
The standout value for DS/ML folks is its treatment of embeddings and retrieval. Ozdemir shows when embeddings beat fine-tuning, how to chunk and index, and how to trade off accuracy, latency, and cost with clear, reusable checklists. His sections on prompt patterns, tool use/function-calling, and interface design treat prompting like API design, constrain inputs, structure outputs, plan for failure modes, making it easy to slot into existing services.
In short: an excellent on-ramp and onboarding text. Pair it with heavier resources for evaluation, alignment, and production-grade deployments.
- Emergent Introspective Awareness in Large Language Models [transformer-circuits.pub]
- Claude on Vertex AI [docs.claude.com]
- Why I code as a CTO [assembled.com]
- Introducing PyTorch Monarch [pytorch.org]
- GNU Octave Meets JupyterLite: Compute Anywhere, Anytime! [blog.jupyter.org]
- DeepSeek-OCR: Revolutionary Context Compression Through Optical 2D Mapping [deepseek.ai]
- Claude Skills are awesome, maybe a bigger deal than MCP [simonwillison.net]
- The Majority AI View [www.anildash.com]
- Physical partisan proximity outweighs online ties in predicting US voting outcomes (M. Tonin, B. Lepri, M. Tizzoni)
- Detecting bias in algorithms used to disseminate information in social networks and mitigating it using multiobjective optimization (V. Sekara, I. Dotu, M. Cebrian, E. Moro, M. Garcia−Herranz)
-
Language Models are Injective and Hence Invertible (G. Nikolaou, T. Mencattini, D. Crisostomi, A. Santilli, Y. Panagakis, E. Rodolà)
-
Unraveling the Probabilistic Forest: Arbitrage in Prediction Markets (O. Saguillo, V. Ghafouri, L. Kiffer, G. Suarez-Tangil)
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think (A. Karan, Y. Du)
-
A Survey of Vibe Coding with Large Language Models (Y. Ge, L. Mei, Z. Duan, T. Li, Y. Zheng, Y. Wang, L. Wang, J. Yao, T. Liu, Y. Cai, B. Bi, F. Guo, J. Guo, S. Liu, X. Cheng)
- When goodbye comes too soon: How to wrap up science projects quickly (M. H. Hagenauer, S. J. Winham, A. L. J. Freeman, P. W. Sternberg, B. J. Kolber)
Andrej Karpathy: “We’re summoning ghosts, not building animals.”
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|