Dear Reader,
Welcome to the Feb 11th issue of our newsletter!
The clock is ticking, and our schedule for next week is completely packed with opportunities to skill up with us. Whether you're looking to turbocharge your workflow during Code Development with AI Assistants on February 18 or ready to scale complex automations with CrewAI for Production-Ready Multi-Agent Systems on February 19, these sessions are designed to take you from "just curious" to "production-ready." Don't leave your seat to chance and lock in your registration!
This week, our attention was focused on an increasingly practical question: what does it take to move from “cool model” to “reliable system”? One deep dive walks through RLHF end-to-end, connecting the math to the gritty implementation details of reward modeling, preference data, policy optimization, and the failure modes that arise when theory meets real-world training runs. Another reframes “human ergonomics” for the agent era, arguing that the bottleneck isn’t just better models, but better interfaces, workflows, and guardrails that make agents legible and steerable under pressure.
On the engineering front, there’s sharp advice on using AI to write higher-quality code without surrendering correctness: treat the model like a fast collaborator, keep tight feedback loops with tests, and make intent explicit so you’re not debugging “helpful” hallucinations. And if you’re building agents in production, a standout piece makes the case for operational memory so your systems can actually learn from events, not just talk about them.
Finally, for teams craving causal answers instead of predictive correlations, a Bayesian starter guide offers a concrete on-ramp to causal discovery, emphasizing uncertainty, assumptions, and the discipline needed to avoid confident-but-wrong narratives.
On the academic front, we explore how “alignment” and “reasoning” aren’t abstract model virtues but rather measurable system behaviors with real-world consequences. Several works stress-test the popular story that bigger language models simply “reason better”: some failures look like brittle planning (the model can talk its way through a task but can’t execute the necessary state transitions), while others look like systematic blind spots where the chain of thought feels plausible yet collapses under counterfactual tweaks.
That’s where the reinforcement-learning cluster lands: rather than treating feedback as a thin preference layer, these papers push toward more principled objectives (including a maximum-likelihood framing of RL), better-behaved training signals, and richer reward representations, including reward models that can generate and generalize so we can close the gap between fluent explanations and reliable action.
Our current book recommendation is "Building AI Agents with LLMs, RAG, and Knowledge Graphs" by S. Raieli and G. Iuculano. You can find all the previous book reviews on our website. In this week's video, we have a long lecture on Robust and Interactable World Models in Computer Vision.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
"Building AI Agents with LLMs, RAG, and Knowledge Graphs" by S. Raieli and G. Iuculano is a clear-headed guide for anyone trying to turn “cool LLM demo” into an agent that can retrieve facts, use tools, and stay anchored to real information. Raieli and Iuculano keep the focus on what matters in practice. How RAG and knowledge graphs change the reliability profile of an agent, and when you need more structure than “just prompt it better.”
For data scientists and ML engineers, the best part is the build-oriented progression. It connects core concepts to concrete patterns—single-agent tool use, retrieval pipelines, and multi-agent coordination—without drowning you in theory. The examples feel like things you’d actually adapt into a prototype at work, and the overall framing consistently nudges you toward grounded, auditable behavior instead of vibes-based generation.
The tradeoff is breadth: if you already know transformers cold, some early sections may read like a warm-up, and the “production” angle is more of a practical starting line than a full MLOps reliability handbook. Still, as a one-stop map of modern agent building—especially where RAG and knowledge graphs stop being buzzwords and start being design choices—it’s an intense, usable read that tends to leave you with a short list of things you want to try next.
- RLHF From Scratch: A theoretical and practical deep dive into Reinforcement Learning with Human Feedback [github.com]
- From Human Ergonomics to Agent Ergonomics [wesmckinney.com]
- How to effectively write quality code with AI [heidenstedt.org]
- Is Graph Machine Learning the New Cryptocurrency Police? [medium.com/@nm8144]
- A Guide to Effective Prompt Engineering [blog.bytebytego.com]
-
Why Your AI Agents Need Operational Memory, Not Just Conversational Memory [gradientflow.substack.com]
-
The Complete Starter Guide For Causal Discovery Using Bayesian Modeling [medium.com/data-science-collective]
- Effects of antivaccine tweets on COVID-19 vaccinations, cases, and deaths (J. Bollenbacher, F. Menczer, J. Bryden)
- LLMs can’t jump (T. Zahavy)
-
Large Language Model Reasoning Failures (P. Song, P. Han, N. Goodman)
-
Maximum Likelihood Reinforcement Learning (F. Tajwar, G. Zeng, Y. Zhou, Y. Song, D. Arora, Y. Jiang, J. Schneider, R. Salakhutdinov, H. Feng, A. Zanette)
-
Correcting temporal bias in mobility data using time-use surveys (S. A. Sanchez, H. Gibbs, T. Yabe, D. T. O'Brien, E. Moro)
-
Reinforcement Learning from Human Feedback (N. Lambert)
-
Generative Reward Models (D. Mahan, D. V. Phung, R. Rafailov, C. Blagden, N. Lile, L. Castricato, J.-P. Fränken, C. Finn, A. Albalak)
Robust and Interactable World Models in Computer Vision
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|