Blog | Bilel S.

360° Attention

May 28, 2026 · 2 min read

PhD Student in Machine Learning

Everyone needs attention, even an LLM

The attention mechanism allows an LLM to identify what is relevant within a sequence, which largely explains its impressive performance. However, while it revolutionized the field a few years ago, it is also the primary driver of significant computational and memory costs. Google Research recently unveiled TurboQuant, a method designed to compress the attention cache.

The Attention Cache?

During the generation phase (producing tokens one by one), recalculating attention over the entire context at each step would be computationally prohibitive. To optimize this, the model stores the projections of previous tokens in what is known as the KV Cache (Key-Value Cache). While this mechanism reduces computational complexity, it shifts the burden to memory: the size of these key and value matrices grows linearly with both the sequence length and the number of simultaneous requests (batch size). This represents the primary bottleneck that saturates GPU VRAM.

Cache is Money

The idea is simple: instead of reading matrices using Cartesian coordinates, we use polar coordinates. To put it simply, instead of saying “Go 3 blocks east and 4 blocks north,” we say “Go 5 blocks at a 37° angle” (PolarQuant).

This reformulation allows the essence of the information—the vector's direction—to be captured using very few bits. A correction mechanism (QJL) then compensates for the most significant quantization errors. The result is a significant reduction in RAM usage and processing time, achieved without any meaningful loss in performance.

📚 Sources

TurboQuant: Redefining AI efficiency with extreme compression. March 24, 2026 Amir Zandieh, Research Scientist, and Vahab Mirrokni, VP and Google Fellow, Google Research

Is Stack Overflow Dead? LLMs too?

November 12, 2025 · 5 min read

Bilel Saghrouchni

PhD Student in Machine Learning

It's no secret today that LLMs need an enormous amount of text data to be trained effectively. Due to its immense volume, the web quickly became the primary data source, and the majority of training datasets are now based on it. In addition to the stratospheric amount of data, the diversity of sources is also a key factor in ensuring good coverage of different writing styles, topics, and contexts. Among these sources, developer forums like Stack Overflow play a crucial role. But in recent years, activity on Stack Overflow has seen a significant decline, raising questions about its future and its potential impact on future LLM training.

A Security Agent for Your Network? Reinforcement Learning for Detecting Anomalies in Network Traffic

July 21, 2025 · 5 min read

Bilel Saghrouchni

PhD Student in Machine Learning

Our daily lives are becoming increasingly digital, and computer networks continue to grow in both size and complexity. The traffic they carry is becoming denser and more diverse: banking transactions, health data, private communications, etc.
With the rise of so-called intelligent technologies — smart homes, smart cities, personalized medicine — the number of connected devices is expected to exceed 50 billion by the end of 2025. These devices, though ubiquitous, are often poorly secured: neglected updates, weak passwords, outdated protocols.
In this context, computer networks are prime targets for cybercriminals, who benefit from an ever-expanding attack surface. Designing effective and robust cybersecurity solutions has thus become a critical challenge — a fast-moving research field, but also an increasingly complex one.

Agents — A Real Innovation?

May 15, 2025 · 3 min read

Bilel Saghrouchni

PhD Student in Machine Learning

We’ve been hearing a lot about agents lately — articles and posts are overflowing with the term “Agentic AI” (and now, this one too 👀). But is it really something new? Back in the 1990s, people were already talking about agents and describing them as entities that operate continuously and autonomously within a dynamic and evolving environment. Others took a more philosophical approach, defining agents as entities whose internal state is represented by mental concepts such as beliefs, abilities, choices, and commitments. Their actions are constrained and governed by fixed rules.

Model Collapse

February 15, 2025 · 2 min read

Bilel Saghrouchni

PhD Student in Machine Learning

The Silent Invasion of AI Content

A large portion of content on the Internet is becoming AI-generated. The gradual integration of LLMs (Large Language Models) into our daily lives is democratizing and simplifying their use. Social media platforms have become true gold mines for this type of content: more and more users are relying on ChatGPT or Mistral’s LeChat to write or rewrite their posts before publishing them (maybe this post went through that too? 👀). A striking example is Quora, where AI-generated content rose from 2% in 2022 to ~38% in 2024!

Everyone needs attention, even an LLM​

The Attention Cache?​

Cache is Money​

📚 Sources​

The Silent Invasion of AI Content​

Everyone needs attention, even an LLM

The Attention Cache?

Cache is Money

📚 Sources

The Silent Invasion of AI Content