|
Dear Reader,
Welcome to the 292nd edition of the Data Science Briefing!
We're proud to announce that a brand new Data Visualization with Python on-demand video is now available on the O'Reilly website: Python Data Visualization: Create impactful visuals, animations and dashboards. This in depth tutorial is almost 7h in length and covers fundamental and advanced usage of matplotlib, seaborn, plotly and bokeh as well as tips on how to use Jupyter widgets. Check it out!
The latest blog post on the Epidemiology series is also out:
Demographic Processes. In this post we explore how to include birth and death rates in your epidemik models. Check it out!
This week’s picks show AI maturing from flashy demos to hard choices about governance, infrastructure, and pedagogy: funders are now piloting algorithms to triage grant proposals, promising speed, but raising fresh worries about bias and opacity in how “promising science” gets defined. In the classroom, Google’s “Learn Your Way” turns static textbooks into adaptive, interactive study paths and reports improved learning outcomes in early tests by Google Research.
On the policy front, Meredith Whittaker warns that agentic AI could erode privacy and competition by normalizing cross-app data access and automation without robust safeguards. Meanwhile, the era of unchecked scraping looks to be ending as publishers and infrastructure providers push licensing regimes and emerging standards, tightening access to training data.
Inside academia, Nature flags a new tool spotting undisclosed LLM-generated text in manuscripts and peer reviews—evidence that disclosure norms still lag practice. And under the hood, engineers are tackling inference nondeterminism itself, tracing variance to concurrency and floating-point quirks and proposing ways to make outputs reproducible at scale.
This week’s research traces a provocative arc from “making models think” to managing the messy worlds they enter: reinforcement learning is being retooled to reward multi-step reasoning rather than short-horizon shortcuts, while a comprehensive survey maps the emerging toolbox for Large Reasoning Models to push reliability beyond prompt tricks.
On the foundations side, new bounds on optimal time estimation in stochastic processes suggest hard limits for any system attempting to maintain stable clocks, particularly for inference scheduling and reproducible evaluation. Biology presents both promise and peril, as genome language models can now co-design viable bacteriophages, foreshadowing programmable therapeutics and raising thorny biosecurity questions.
Socially, the line between utility and intimacy blurs as large-scale analyses of “AI companionship” communities reveal attachment patterns that product teams and policymakers can’t ignore. Meanwhile, adversaries exploit the same cognitive hooks: pig-butchering scams follow a predictable lifecycle ripe for automated detection, and “LLM hacking” exposes how seemingly benign annotation workflows can be subverted by prompt injection and data poisoning.
The overall takeaway: better reasoning isn’t enough; we need robust timing, safety, and governance to match the new capabilities.
Our current book recommendation is Mark Carrigan’s "Generative AI for Academics". You can find all the previous book reviews on our website. In this week's video, we have a video comparing Machine Learning vs Human Learning and how They’re Not Alike.
Data shows that the best way for a newsletter to grow is by word of mouth, so if you think one of your friends or colleagues would enjoy this newsletter, go ahead and forward this email to them. This will help us spread the word!
Semper discentes,
The D4S Team
Mark Carrigan’s "Generative AI for Academics" is a brisk, sensible map for using LLMs in scholarly life. It avoids both hype and doom, treating generative AI as a set of tools that demand judgment, not blind adoption. The tone is practical and reflective—ideal for faculty, PIs, and grad students who need shared language and guardrails.
The book shines in how it organizes academic work (Thinking, Collaborating, Communicating, Engaging), then pairs each with concrete practices (rubber-ducking, draft refinement, critical oversight). It isn’t a prompt cookbook or a windy manifesto; it’s a clear framework for responsible use, culture-setting, and policy discussions in departments and labs.
Data scientists and ML engineers will find valuable takeaways for literature synthesis, design reviews, code docs, and stakeholder comms. But if you want model internals, rigorous eval protocols, threat modeling, or MLOps patterns, the book skims the surface. Bottom line: keep it close for norms, ethics, and mentoring; pair it with technical playbooks when you need depth.
-
AI enters the grant game, picking winners [science.org]
- Learn Your Way: Reimagining textbooks with generative AI [research.google]
- What Meta learned from Galactica, the doomed model launched two weeks before ChatGPT [venturebeat.com]
- AI agents are coming for your privacy, warns Meredith Whittaker [economist.com]
- AI-Scraping Free-for-All by OpenAI, Google, and Meta Is Over [nymag.com]
- AI tool detects LLM-generated text in research papers and peer reviews [nature.com]
-
Defeating Nondeterminism in LLM Inference [thinkingmachines.ai]
- DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning (D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, X. Zhang, X. Yu, Y. Wu et al)
-
Optimal Time Estimation and the Clock Uncertainty Relation for Stochastic Processes (K. Prech, G. T. Landi, F. Meier, N. Nurgalieva, P. P. Potts, R. Silva, M. T. Mitchison)
- Generative design of novel bacteriophages with genome language models (S. H. King, C. L. Driscoll, D. B. Li, D. Guo, A. T. Merchant, G. Brixi, M. E. Wilkinson, B. L. Hie)
- A Survey of Reinforcement Learning for Large Reasoning Models (K. Zhang, Y. Zuo, B. He, Y. Sun, R. Liu, C. Jiang, Y. Fan, K. Tian, G. Jia, P. Li, Y. Fu, X. Lv, Y. Zhang, S. Zeng, S. Qu, H. Li, S. Wang, Y. Wang, X. Long, F. Liu, X. Xu, J. Ma, X. Zhu, E. Hua, Y. Liu, Z. Li, H. Chen)
-
"My Boyfriend is AI": A Computational Analysis of Human-AI Companionship in Reddit's AI Community (P. Pataranutaporn, S. Karny, C. Archiwaranguprok, C. Albrecht, A. R. Liu, P. Maes)
-
"Hello, is this Anna?": Unpacking the Lifecycle of Pig-Butchering Scams (R. Oak, Z. Shafiq)
- Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation (J. Baumann, P. Röttger, A. Urman, A. Wendsjö, F. M. Plaza-del-Arco, J. B. Gruber, D. Hovy)
Machine Learning vs Human Learning: They’re Not Alike
All the videos of the week are now available in our YouTube playlist.
Upcoming Events:
Opportunities to learn from us
On-Demand Videos:
Long-form tutorials
- Natural Language Processing 7h, covering basic and advanced techniques using NTLK and PyTorch.
- Python Data Visualization 7h, covering basic and advanced visualization with matplotlib, ipywidgets, seaborn, plotly, and bokeh.
- Times Series Analysis for Everyone 6h, covering data pre-processing, visualization, ARIMA, ARCH, and Deep Learning models.
|
|
|
|