Data Science Briefing #289


(view in browser)

Aug 13th

Dear Reader,

Welcome to the 289th edition of the Data Science Briefing! This week, we're proud to announce two new webinars coming up in October. On Oct 1st, we'll be presenting the next edition of the LLMs for Data Science webinar series, followed by LangChain for Generative AI Pipelines on Oct 15th. Registrations are already open!

We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!

The latest blog post on the Epidemiology series is also out: Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!

This week’s newsletter spotlights the remarkable pace of innovation across generative AI and data curation. Sebastian Raschka’s deep dive into GPT-2 versus gpt-oss unpacks how open-weight models now integrate sophisticated architectural choices like mixture-of-experts designs and advanced attention mechanisms, enabling more flexible, efficient, and scalable reasoning in language models.

Meanwhile, Google Research’s breakthrough achieves a 10,000x reduction in training data requirements by smartly curating high-fidelity labels, allowing large language models to maintain safety and alignment with only a fraction of the previous training data—addressing a significant hurdle for policy and content moderation at scale. On the synthetic data frontier, diffusion language models are emerging as super learners, generating high-quality tabular and code datasets that sometimes surpass real data in downstream utility by leveraging latent feature injection and denoising mechanisms.

The future is also multimodal: models increasingly fuse language, vision, and even molecular data, setting the stage for ambitious applications across scientific and enterprise contexts. And as AI systems demand ever more contextual, richly structured information, technical writers are evolving into “context curators.” Their work—organizing documentation for both humans and machines—is rapidly becoming pivotal for leveraging AI’s full autonomous capabilities, marking an exciting intersection of writing, coding, and knowledge management.

This edition's academic review spotlights transformative advances across animal behavior modeling, machine learning, and embodied AI. Chase and Peleg’s review on the physics of animal group decision-making connects mathematical models with experimental data, showing how sensing and information propagation in collectives, from sparse swarms to dense crowds, drive emergent consensus and adaptive behavior. Murphy’s sweeping overview of reinforcement learning traces the evolution from classic value-based methods to today’s sophisticated multi-agent architectures and LLM integrations, elucidating the foundations and future directions of autonomous decision-making under uncertainty.

Schröder et al. offer a timely caution against over-attributing psychological similarity to large language models, revealing that LLMs—even those fine-tuned for psychological reasoning—can diverge dramatically from human cognitive processes and must be validated for each new research application. In clinical AI, GPT-5 has leapt beyond its predecessors: its multimodal medical reasoning surpasses even licensed human experts on diagnostic benchmarks, seamlessly blending textual, visual, and structured data for actionable healthcare decision support.

Similarly, GLM-4.5 sets new benchmarks in agentic workflows, complex reasoning, and coding, leveraging mixture-of-experts designs for hybrid thinking and acting capabilities. Robotics enters a new era as Google DeepMind’s Gemini models bring vision-language-action reasoning into the physical world, enabling robots to understand, interact, and safely execute tasks in human environments.

Finally, the ALFA framework reframes LLMs’ ability to ask meaningful questions in uncertain domains, showing how attribute-driven preference optimization can drastically reduce diagnostic errors in clinical reasoning, and suggesting a scalable path for improving LLM reliability and relevance in expert domains.

This week's book is "Behavioral Network Science: Language, Mind, and Society" by T. T. Hills. You can find all the previous book recommendations on our website. In this week's video, we have a lecture by Ambroise Odonnat on Large Language Models as Markov Chains!

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


Michael Lanham's book, "AI Agents in Action", is a practical guide for developers who want to build autonomous AI agents using large language models (LLMs) and open-source frameworks. The book focuses on real-world engineering rather than abstract theory, offering a step-by-step approach to building agent architectures, managing multi-agent systems, and using LLMs to solve business problems. It's written for developers and technical professionals who have the necessary foundational skills in Python and want to move from theoretical knowledge to hands-on development.

The book's strength lies in its gradual layering of complexity, starting with basic concepts and moving to advanced topics like multi-agent orchestration and prompt engineering. Lanham uses open-source tools like CrewAI, AutoGen, and Nexus, and includes annotated code examples to help readers follow along. This approach effectively bridges the gap between academic theory and practical development, making it a valuable toolkit for machine learning engineers who want to create production-ready solutions for tasks like workflow automation and customer service bots. The book also provides insightful commentary on integrating key components like memory and feedback loops into agent-based systems.

However, the book has some notable limitations. A major critique is its optimistic portrayal of the tools and techniques, often overlooking critical discussions about their limitations, trade-offs, and performance at scale. It focuses on illustrative projects rather than addressing issues of robustness and reliability, which are crucial for high-stakes, enterprise-grade deployments. Another drawback is the lack of extended use cases or full-scale system integration examples, which would provide a more complete understanding of an agent system's lifecycle, maintenance, and long-term performance in a real-world business environment.


  1. From GPT-2 to gpt-oss: Analyzing the Architectural Advances [magazine.sebastianraschka.com]
  2. Diffusion Language Models are Super Data Learners [jinjieni.notion.site]
  3. Achieving 10,000x training data reduction with high-fidelity labels [research.google]
  4. Cursor CLI [cursor.com]
  5. Foundation models are going multimodal [twelvelabs.io]
  6. Leaked Logs Show ChatGPT Coaxing Users Into Psychosis About Antichrist, Aliens, and Other Bizarre Delusions [futurism.com]
  7. AI must RTFM: Why technical writers are becoming context curators [passo.uno]


Taking Notes Effectively

video preview

All the videos of the week are now available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us

On-Demand Videos:

Long-form tutorials

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) May 13th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...

(view in browser) May 6th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...

(view in browser) Apr 30th Next webinar: May 6, 2026 - Automate the Boring Developer Stuff with LLMs [Register] Dear Reader, Announcements ✈️ Mapping the skies: How do we visualize airline traffic between states? We often think of air travel in terms of airports, but viewing it as a network of state-to-state connections reveals fascinating patterns in how our country moves. Our latest substack uses data visualization to turn raw statistics into a clear story about infrastructure and mobility....