Llm |

Distillation Attack on LLMs (Model Extraction via Knowledge Distillation)

What is a Distillation Attack? In normal machine learning, knowledge distillation is a training method. A large “teacher” model teaches a smaller “student” model. The student learns to copy the teacher’s outputs. This makes the student faster and cheaper to run (Hinton et al., 2015). In a distillation attack, the attacker uses distillation to steal a model. The attacker has access to a black-box API. The API is the teacher. The attacker sends queries to the API. The API returns outputs. The attacker uses these query-output pairs to train a student model. The student model copies the teacher’s behavior. This is also called model extraction or model stealing (Tramèr et al., arXiv:1609.02943). ...

LLM Security: A Scientific Taxonomy of Attack Vectors

Introduction Security in Large Language Models (LLMs) is no longer a small topic inside NLP (Natural Language Processing). It has become its own field within computer security. Between 2021 and 2025, research moved from studying adversarial examples in classifiers to looking at bigger risks: alignment, memorization, context contamination, and models that keep behaving in harmful ways. The problem today is not only a bug in the code. It comes from how the system is built: the architecture, the training data, the alignment methods, and how the model is connected to other systems. ...

How a Large Language Model (LLM) Works

1. Fundamental Architecture: The Transformer The foundation of modern LLMs (Large Language Models) is the Transformer architecture [1]. Unlike RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks), which process text one step after another, the Transformer processes the entire sequence in parallel. This allows better modeling of long-range dependencies and faster training [1]. Figure: Flow of the Transformer architecture (attention, encoder/decoder, feed-forward). Source: Vaswani et al. [1]. ...