πŸ₯‚πŸ₯‚πŸ₯‚ Data Science Briefing #322 πŸ₯‚πŸ₯‚πŸ₯‚


​(view in browser)​

Jun 9th

Next webinar:
​
Jul 8, 2026 - Automate the Boring Developer Stuff with LLMs​
Count down to 2026-07-08T17:00:00.000Z​

Dear Reader,

The very first issue of the Data Science Briefing went out on Jun 2nd, 2019 so this month we're celebrating the 7th anniversary of the Data Science Briefing. Over the next few weeks we'll have more than a few updates and improvements so we can continue learning together.

Announcements

As we celebrate our 7th anniversary, I wanted to share something I've been building for a while.

On July 11th, I'll be collaborating with Packt Publishing to run a live, hands-on workshop called "Production Graph RAG: Build Explainable LLM Apps with Knowledge Graphs"

In 3.5 hours, you'll go from raw Wikipedia text to a working chatbot powered by a knowledge graph β€” using spaCy, REBEL, fastcoref, NetworkX, and an LLM. No hand-wavy slides. You actually build the thing.

Here's what you'll walk away with:

  • A working Graph RAG pipeline you can adapt to your own data
  • Understanding of why vector RAG fails on complex questions β€” and how to fix it
  • A certificate of completion
  • The full codebase and notebooks

Why does this matter? Traditional RAG finds similar text. It doesn't understand that A connects to B which caused C. Knowledge graphs give LLMs the relational context they've always been missing. This is the infrastructure that makes LLM apps actually trustworthy in production.

I spent years using this kind of reasoning while working with my clients. Now I'm teaching it in one afternoon.

Use code BRUNO40 at checkout for 40% off.

πŸ‘‰ Production Graph RAG: Build Explainable LLM Apps with Knowledge Graphs

Two reports this week reach the same point through different doors. One lab traced its own pipeline and found that its engineers now ship eight times as much code per quarter as they did from 2021 to 2025, with models writing more than 80 percent of the code merged into its systems. The length of task a model can finish on its own doubles roughly every four months, up from every seven. The lab calls this recursive self-improvement and asks for the option to slow frontier work if systems start designing their own successors. A second team pushed the idea further and ran a five-month test where humans wrote zero lines of code. Its agents produced about a million lines across roughly 1,500 pull requests, and output rose as the group grew from three engineers to seven. The humans set goals and fixed the surrounding environment when the agents stalled. Their rule was plain. People steer, agents execute.

That speed has set off a counter-current. Sixteen mathematicians from fifteen universities published a declaration to defend the values of their field, with backing from the International Mathematical Union. They ask researchers to disclose AI use, to take responsibility for whether a proof is correct, and to credit prior work. They warn that questions may get chosen for how easily a machine can crack them, not for what they mean. On the security side, one lab widened a program that turns its most capable model on critical code, adding about 150 organizations across more than 15 countries and reaching near 200 in total. The first 50 partners already surfaced more than 10,000 high or critical flaws, and the new members run power, water, healthcare, and communications. The cultural mood is hardening too. One essay tracks how the "Butlerian Jihad," a revolt against thinking machines from Dune, has slipped from fiction into real political speech, carrying paranoia and, in one case, violence. Put the pieces together and the week shows two races at once: machines learning to build machines, and people deciding what they will allow.

Down at the workbench, the practical question is how to watch agents that keep changing. One open-source platform traces every step an agent takes and flags when it rewrites its own prompts or drifts past set safety limits. It logs what changed, when, and what feedback drove the change, so a team can debug behavior that older tools miss.

This weeks batch of academic and industry papers goes inside the model and asks what it can shed. One study questions whether attention really needs separate query, key, and value projections. Tie the key and value into one and the memory cache drops by half, at a cost of about 3.1 percent in perplexity. Stack that with multi-query attention and the cache shrinks by as much as 96.9 percent. Tested on 300M and 1.2B models over 10 billion tokens, the trimmed designs matched the standard one and sometimes beat it. A companion read steps back from any single trick. It gathers more than 150 studies and system reports on the data used to teach models to reason, sorting the field by four plain questions: what kinds of reasoning data exist, what makes them useful, how people build them, and how they scale. The result is a map for a part of training that had grown scattered and hard to compare.

Two more papers ask how agents hold up over long, messy work. One introduces a benchmark of 36 expert-built tasks that each start from a working but slow baseline and hand the agent a fixed wall-clock budget to make it faster or better. Across 17 models, the strongest signal of success was not a clever first attempt. It was persistence, the habit of trying again and again. Many models quit early and left time on the table. The other paper takes the load off the model's memory. Its 20B search agent offloads bookkeeping to the surrounding harness, which holds the candidate documents, the curated evidence, and the records of what has been checked. The model keeps only the judgment calls: what to search, what to keep, what to verify, and when to stop. That split earned 0.730 average recall across eight benchmarks spanning web, finance, patents, and multi-hop questions, beating the next open agent by 11.4 points and staying close to far larger frontier searchers. Its edge grew on held-out tasks, by 17 points, a sign the learned habits carry past the training set.

The rest of the issue turns to people. A now-classic brain study scanned 15 mathematicians rating formulae as beautiful, plain, or ugly. The equations they called beautiful lit up the same emotional region, the medial orbitofrontal cortex, that responds to beauty in art and music, and Euler's identity drew the highest ratings. That felt sense of elegance is part of what a fresh warning says is at stake. A group of mathematicians argues that fast-moving AI threatens the values and culture of their field, and their declaration, backed by the International Mathematical Union, is open for signatures ahead of the next world congress in Philadelphia. The worry about machine-made knowledge has numbers behind it. A study of 17,790 matched article pairs found that AI-written Grokipedia entries run longer and cite fewer sources per word than Wikipedia. About two-thirds diverge in style and sourcing, read at a harder grade level, and lean rightward in the sources they cite on politics, history, and religion, favoring long narration over checkable references. One last paper keeps the focus on human groups instead of models. A network model pairs disease spread with three-way social ties and finds that group-level structure alone pushes vaccinated people into clusters that act as barriers. A light touch of peer reinforcement raises coverage and curbs outbreaks, but too much of it backfires and uptake falls.

Our current book recommendation is "Building Applications with AI Agents" by M. Albada. In this week's video, we have a Nobel Lecture by D. Hassabis on "Accelerating scientific discovery with AI".

Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!

Semper discentes,

The D4S Team


Michael Albada spent nine years building machine learning systems at Uber, ServiceNow, and Microsoft, and it shows. His O'Reilly book, Building Applications with AI Agents, treats agents as a design pattern, not magic. Thirteen chapters take you from a single working agent through skills, orchestration, memory, learning, and on to multi-agent systems. Later chapters cover measurement, production monitoring, and security.

The design-first stance is the real draw. Every idea sits inside a case study: customer support, legal work, advertising, and code review agents. Albada compares real frameworks by name, including LangGraph, AutoGen, CrewAI, and OpenAI's SDK, and weighs their trade-offs instead of crowning a winner. A data scientist gets clear patterns for picking tools, structuring memory, and validating output before it ships.

It has two weak spots. Some chapters lean on checklists, and sometimes make you walk away feeling like the core idea could fit in a third of the pages. It also skips runnable, end-to-end code, pointing you to outside docs instead. Still, for the data scientist or ML engineer moving into agent work, this book maps the decisions that matter and saves weeks of trial and error. Worth a spot on the shelf.


  1. ​The Butlerian Jihad Has Begun [syndekit.substack.com]
  2. ​Harness engineering: leveraging Codex in an agent-first world | OpenAI [openai.com]
  3. ​When AI builds itself [anthropic.com]
  4. ​Leiden Declaration on Artificial Intelligence and Mathematics [leidendeclaration.ai]
  5. ​Expanding Project Glasswing [www.anthropic.com]
  6. ​Data Quality, AI in Baseball, ODSC AI East Slides, Ollama, and Local Agents​ [odsc.substack.com]
  7. ​Agent Tracing and Observability: Log & Debug Complex AI Systems​ [comet.com]


Accelerating scientific discovery with AI

video preview​

All the videos of the week are available in our YouTube playlist.

Upcoming Events:

Opportunities to learn from us
​

On-Demand Videos:

Long-form tutorials
​

​Unsubscribe​

Data For Science, Inc

I'm a maker and blogger who loves to talk about technology. Subscribe and join over 3,000+ newsletter readers every week!

Read more from Data For Science, Inc

(view in browser) Jun 2nd Next webinar: Jun 3, 2026 - CrewAI for Production-Ready Multi‑Agent Systems [Register] Dear Reader, Announcements Ready to level up your understanding of AI agents? πŸ€– We all see the impressive capabilities of tools like Claude Code, Open Clawd, and Hermes, but what actually powers them behind the scenes? In our latest Substack post, we break down the "secret sauce" of modern AI assistants by walking through how to build a basic agentic harness. If you're building...

(view in browser) May 27th Next webinar: Jun 3, 2026 - CrewAI for Production-Ready Multi‑Agent Systems [Register] Dear Reader, Announcements Ready to level up your understanding of AI agents? πŸ€– We all see the impressive capabilities of tools like Claude Code, Open Clawd, and Hermes, but what actually powers them behind the scenes? In our latest Substack post, we break down the "secret sauce" of modern AI assistants by walking through how to build a basic agentic harness. If you're building...

(view in browser) May 20th Next webinar: May 27, 2026 - Code Development with AI Assistants [Register] Dear Reader, Announcements Ever wonder how we can turn thousands of unstructured news articles into structured, actionable insights? In the latest post from Data4Sci, we dive into the fascinating process of transforming raw text from news articles into interconnected networks of information. If you're interested in Natural Language Processing (NLP), entity extraction, and how to connect the...