I am reading the Wikipedia article about SVM and there is something I don't understand. When they say:
These hyperplanes can be described by the equations
$$ wx - b=1 $$ and
$$wx - b=-1$$
I was wondering where does the +1 and -1 come from?
I found two papers which explain that this is an arbitrary choice:
We can write the following equations for the support hyperplanes:
$$w^T x = b + \delta $$ $$w^T x = b − \delta $$ We now note that we have over-parameterized the problem: if we scale w, b and $\delta$ by a constant factor $\alpha$, the equations for x are still satisfied.
To remove this ambiguity we will require that $\delta$ = 1, this sets the scale of the problem, i.e. if we measure distance in millimeters or meters
but I don't understand what he means when he says "this sets the scale of the problem"
and
Note that if the equation $f(x) = wx + b$ defines a discriminant function
(so that the output is > $sgn(f(x))$),
then the hyperplane $cwx + cb$ defines the same discriminant function for any $c > 0$.
Thus we have the freedom to choose the scaling of $w$ so that $min_{x_i} |wx_i + b| = 1$.
but I don't understand why he introduces $min_{x_i} |wx_i + b| = 1$.
My understanding is that we can do it because variables $w$, $b$ and $\delta$ are kind of linked together.
Can we change the Wikipedia definition and say
These hyperplanes can be described by the equations $$ wx - b= 2 $$ and $$wx - b=- 2$$
or is this incorrect and so we must say that :
These hyperplanes can be described by the equations $$ 2wx - 2b= 2 $$ and $$2wx - 2b=- 2$$
Could you clarify this for me?