This is the intro to the «Essential Mathematics» series — an overview of which math areas are needed for AI and machine learning, and what they are used for.
Linear Algebra
What you need: vectors, matrices, matrix multiplication, linear transformations, dot product, vector norm, eigenvalues and eigenvectors (advanced).
Why:
- Vectors — the basic way to represent data: a word, an image, or a model’s «thought» is encoded as a vector of numbers
- Matrices — all neural network operations (convolutions, attention, linear layers) boil down to matrix multiplication
- Linear transformations — understanding how a layer «mixes» and combines features
- Without linear algebra you cannot read formulas in papers or understand what the code does (e.g.
W @ xis weight matrix times input vector)1
Derivatives
What you need: derivatives, partial derivatives, gradient, chain rule, function optimization.
Why:
- Neural networks are trained via gradient descent: we move weights in the direction that reduces the error
- Gradient — a vector of partial derivatives showing which way to «tweak» each parameter
- Chain rule is the backbone of backpropagation — the algorithm that computes gradients for all layers
- Without calculus you cannot understand where weight updates come from or why training works at all
Probability and Statistics
What you need: distributions (normal, Bernoulli, etc.), expectation, variance, conditional probability, Bayes’ theorem, maximum likelihood estimation (MLE).
Why:
- Models are often formulated probabilistically: «probability of the next word», «how confident is the model»
- Loss functions often derive from likelihood (cross-entropy, MSE, etc.)
- Statistics for working with data: normalization, quality assessment, understanding metrics (precision, recall, AUC)
- Bayes — foundation of many classical algorithms and the way to «update beliefs» on new data
Computer Vision (CV)
On top of the general minimum: discrete convolution (2D), affine and projective transformations, basic image geometry (camera calibration, epipolar geometry).
Why:
- Convolution — the core of CNNs: understanding how a filter «slides» over an image, and what kernel, padding, stride mean
- Affine transformations (rotation, scale, shift) — for augmentations and understanding invariance
- Projective geometry and homographies — for stereo vision, 3D reconstruction, AR
- For classical methods (SIFT, optical flow) — image derivatives and per-pixel gradients are useful
Optional but helpful
- Information theory (entropy, cross-entropy) — deeper understanding of loss functions and data compression
- Convex optimization — when the problem is «nice» and when SGD converges
- Differential geometry — for advanced topics (diffusion models, representations)
Practical minimum
To comfortably read papers and code:
- Be able to multiply matrices and understand what a linear layer does
- Understand what a gradient is and why it matters for training
- Know basic distributions and quality metrics
You can pick up the rest as needed — when it appears in a specific problem or paper.
W @ x— in Python (NumPy, PyTorch) the matrix multiplication operator: the weight matrix W is multiplied by the input vector x. The result is the output vector. A linear layer essentially computesW·x + b. ↩︎