Signal, not noise.
Releases, benchmarks, and analysis from a curated list of labs, newsletters, and community feeds. Updated every four hours.
New OpenAI Academy courses for the next era of work
OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.
Thousand Token Wood: shipping a multi-agent economy on a 3B model
CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models
OpenAI WebRTC Audio Session, now with document context
OpenAI WebRTC Audio Session, now with document context I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio…
Ire identifies another LOTUSLITE specimen
Project Ire examined a timely malware sample and determined its intent through reverse engineering—identifying LOTUSLITE characteristics even as most major EDR tools did not detect it. The post Ire…
olmo-eval: An evaluation workbench for the model development loop
BBVA puts AI at the core of banking with OpenAI
Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.
OpenAI to acquire Ona
OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.
Access OpenAI models and Codex through your Oracle cloud commitment
Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance.
Amazing Digital Dentures (a failed project)
Five labs, five minds: building a multi-model finance drama on small models
MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X
"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support"
Adaptive Ultrasound Imaging with Physics-Informed NV-Raw2Insights-US AI
How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas
Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU
arXiv:2606.12765v1 Announce Type: new Abstract: Apple's Metal 4.1 exposes a tensor compute path: the Metal Performance Primitives (MPP) matmul2d operation over cooperative_tensor fragments, whose…
Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review
arXiv:2606.12716v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant…
Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study
arXiv:2606.12881v1 Announce Type: new Abstract: We present an approach to fine-tuning large language models using Direct Preference Optimization (DPO), a reinforcement learning technique. Our…
Shopping Reasoning Bench: An Expert-Authored Benchmark for Multi-Turn Conversational Shopping Assistants
arXiv:2606.12608v1 Announce Type: new Abstract: Conversational shopping assistants now serve hundreds of millions of customers, yet no existing benchmark jointly evaluates the open-ended multi-turn…
Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures
arXiv:2606.12576v1 Announce Type: new Abstract: Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with…
MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents
arXiv:2606.10304v1 Announce Type: new Abstract: When LLM agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade…
The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring
arXiv:2606.10327v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat…
Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles
arXiv:2606.06715v1 Announce Type: new Abstract: We ask whether topic sentiment has a causal effect on perceived political ideology, and whether the answer depends on who assigns the ideology label.…
The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment
arXiv:2606.06667v1 Announce Type: new Abstract: The mechanisms behind LLMs' broad over-generalization beyond training examples remain unclear. Emergent misalignment (EM) offers a striking case study:…
Korean Culture into LLM Alignment: Toward Cultural Coherence
arXiv:2606.06797v1 Announce Type: new Abstract: Cultural-aspect work on large language models is dominated by a negative target: which outputs to suppress. We argue that a constructive counterpart is…
What Do People Actually Want From AI? Mapping Preference Plurality
arXiv:2606.06674v1 Announce Type: new Abstract: Large Language Models (LLMs) are often fine-tuned through Reinforcement Learning from Human Feedback (RLHF) to align with people's preferences and…
Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
arXiv:2606.05168v1 Announce Type: new Abstract: Training on synthetic data causes model collapse, but existing analyses treat this as single-chain degradation. In reality, the AI ecosystem involves…
LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations
arXiv:2606.05182v1 Announce Type: new Abstract: Large language models discard critical details when conversation history is compacted to fit within finite context windows. We present LANTERN (Layered…
Topics as Proxies for Sociodemographics: How Conversational Context Affects LLM Answers
arXiv:2606.02776v1 Announce Type: new Abstract: When large language models (LLMs) are used in high-stakes scenarios, such as legal, medical and financial advice, even a single conversation history is…
Cognitive-Linguistic Indicators of Depression in Online Communities: Analysed by DistilBERT and Holographic Reduced Representation
arXiv:2606.00026v1 Announce Type: new Abstract: This paper investigates whether combining cognitively grounded linguistic features with transformer-based embeddings improves automated detection of…
Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study
arXiv:2605.30465v1 Announce Type: new Abstract: Multi-label topic classification without labeled training data is a challenging task, specially when documents contain complex relational information.…
Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages
arXiv:2605.30529v1 Announce Type: new Abstract: Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora. When applied to clinical retrieval in…
A comparative study of transformer-based embeddings for topic coherence
arXiv:2605.28832v1 Announce Type: new Abstract: Topic modeling is a branch of Natural Language Processing (NLP) that aims to organize large collections of texts into coherent groups according to word…
Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning
arXiv:2605.28834v1 Announce Type: new Abstract: Syllabification describes the task of dividing words into syllables. Due to many rules and exceptions, training an algorithm to perform syllabification…
SERC: LDPC-Inspired Semantic Error Correction for Retrieval-Augmented Generation
arXiv:2605.28837v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated remarkable capabilities, their reliability is significantly compromised by hallucinations.…
Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions
arXiv:2605.28833v1 Announce Type: new Abstract: Automatic speech recognition (ASR) has the potential to substantially reduce manual annotation effort in child speech research by generating automatic…
How Consistent Are LLM Agents? Measuring Behavioral Reproducibility in Multi-Step Tool-Calling Pipelines
arXiv:2605.28840v1 Announce Type: new Abstract: Large language model (LLM) agents with tool-calling capabilities are increasingly deployed in production systems, yet a fundamental reliability…
Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline
arXiv:2605.26132v1 Announce Type: new Abstract: Can post-trained large language models (LLMs) further improve themselves using only unlabeled prompts, without external teachers or feedback from…
Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study
arXiv:2605.26394v1 Announce Type: new Abstract: Multi-turn Text-to-SQL is central to enterprise analytics yet remains predominantly evaluated in single-turn settings. We introduce…
Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation
arXiv:2605.26428v1 Announce Type: new Abstract: Generating high-quality, pedagogically useful questions from lecture slide decks is difficult because important instructional content is distributed…