Setup:
When I was introduced to the idea of a support vector machine (SVM), we started with a binary linear classification problem written as: $$\hat{w} := \text{argmax}_{w \in \mathbb{R}^d}\{\frac{\text{margin}(w)}{||w||}\}$$
where $\text{margin}(w) := \min_{i = 1}^n\{y_i \cdot w^T x_i\}$ for the classes $y_i \in \{-1, 1\}$.
This makes sense, as we intuitively want to maximize the minimum distance between any point in our training set $D := \{(x_i, y_i)\}_{i = 1}^n$.
Naturally, scaling the vector $w$ by some scalar $\alpha$ does not change the fraction, i.e.: $$\forall \alpha \in \mathbb{R}.\frac{\text{margin}(\alpha w)}{||\alpha w||} = \frac{\text{margin}(w)}{||w||}$$
Armed with this knowledge, we set $\alpha := \frac{1}{\text{margin}(w)}$, which in turn simplifies the fraction to $\frac{1}{||\alpha w||}$.
Now, my confustion starts:
After setting $\tilde{w} := \alpha \cdot w$, we definde the hard-margin SVM as the problem:
$$ \min_{\tilde{w}} ||w||, \text{ s.t.} \\ y_i \cdot \tilde{w}^T x_i \geq 1 \text{ for } i = 1, \dots, n $$
Problem:
I don't get the constraint of the problem, i.e. $ y_i \cdot \tilde{w}^T x_i \geq 1 \text{ for } i = 1, \dots, n$. I get that the condition comes from the fact that the margin is the minimum distance and by our choice of $\alpha$ we get $\text{margin}(\tilde{w}) = \text{margin}\left(\frac{w}{margin(w)}\right) = 1$, but nonetheless if I was asked to explain the condition in words I would have no idea what to say.
Question: What does the constraint of the problem formulation achieve? Why do we need it and what is the intuition behind it?
N.B.: I have checked online, though the ressources I found use a different way of motivating/deriving SVMs (they take two hyperplanes, then take the middle of them as the decision boundary) so they don't answer my question.