8
$\begingroup$

I am reading the Wikipedia article about SVM and there is something I don't understand. When they say:

These hyperplanes can be described by the equations

$$ wx - b=1 $$ and

$$wx - b=-1$$

I was wondering where does the +1 and -1 come from?

I found two papers which explain that this is an arbitrary choice:

We can write the following equations for the support hyperplanes:

$$w^T x = b + \delta $$ $$w^T x = b − \delta $$ We now note that we have over-parameterized the problem: if we scale w, b and $\delta$ by a constant factor $\alpha$, the equations for x are still satisfied.

To remove this ambiguity we will require that $\delta$ = 1, this sets the scale of the problem, i.e. if we measure distance in millimeters or meters

Source

but I don't understand what he means when he says "this sets the scale of the problem"

and

Note that if the equation $f(x) = wx + b$ defines a discriminant function

(so that the output is > $sgn(f(x))$),

then the hyperplane $cwx + cb$ defines the same discriminant function for any $c > 0$.

Thus we have the freedom to choose the scaling of $w$ so that $min_{x_i} |wx_i + b| = 1$.

Source

but I don't understand why he introduces $min_{x_i} |wx_i + b| = 1$.

My understanding is that we can do it because variables $w$, $b$ and $\delta$ are kind of linked together.

Can we change the Wikipedia definition and say

These hyperplanes can be described by the equations $$ wx - b= 2 $$ and $$wx - b=- 2$$

or is this incorrect and so we must say that :

These hyperplanes can be described by the equations $$ 2wx - 2b= 2 $$ and $$2wx - 2b=- 2$$

Could you clarify this for me?

$\endgroup$
1
  • 1
    $\begingroup$ For reader's information: the Wikipedia reference is: Support vector machine . $\endgroup$ Commented Mar 28, 2018 at 14:00

1 Answer 1

16
+50
$\begingroup$

Very simply, if you choose any number other than 1, you can simply scale it away again. Consider $$ \mathbf{w}\cdot\mathbf{x}-b=\pm\delta $$ Now, divide both sides of the equation by $\delta$, and we get $$ \left(\frac1\delta\mathbf{w}\right)\cdot\mathbf{x}-\frac{b}{\delta}=\pm1 $$ Which means that we can define $\mathbf{\hat w}=\frac1\delta\mathbf{w}$ and $\hat b=\frac{b}{\delta}$, and we have $$ \mathbf{\hat w}\cdot\mathbf{x}-\hat b=\pm1 $$ And because the goal is to minimise $\|\mathbf{w}\|$ (in order to maximise the size of the margin, $\frac2{\|\mathbf{w}\|}$), it doesn't matter if we scale $\mathbf{w}$ by some constant such as $\delta$ first - and so, we can use $\mathbf{\hat w}$ in place of $\mathbf{w}$, and the choice of $\delta$ is irrelevant (aside from needing to be a positive real number).

Since it's irrelevant, might as well make it the simplest possible positive real number, 1.

As for why it "sets the scale" of the problem, think of it this way: changing $\delta$ would change the scaling of $\mathbf{w}$ (that is, choosing $\delta=2$ would make $\mathbf{w}$ twice as big, for example). And so, just as changing $\delta$ changes the scale, so too does setting $\delta$ set the scale - it keeps it fixed, rather than having it vary from instance to instance.

$\endgroup$
11
  • $\begingroup$ Thanks Glen....! $\endgroup$ Commented Mar 29, 2018 at 13:19
  • $\begingroup$ Perfect! You cleared my confusions. +1 $\endgroup$ Commented Apr 1, 2019 at 8:28
  • $\begingroup$ "it doesn't matter if we scale $\mathbf{w}$ by some constant such as $\delta$" What makes you think that $\delta$ is a constant?? $\endgroup$ Commented Feb 16, 2023 at 18:29
  • $\begingroup$ "and because the goal is to minimise $\|\mathbf{w}\|$ " No, the goal is to minimize $\mathbf{\|\hat w\|}$, which is $\frac{\mathbf{\|\hat w\|}}{\delta}$. Minimizing $\|\mathbf{w}\|$ is the goal when $\delta$ is assumed to be equal to 1, not in actuality. $\endgroup$ Commented Feb 16, 2023 at 18:32
  • $\begingroup$ @mehdicharife - if $\delta$ isn't a constant, then we don't have support hyperplanes parallel to the main hyperplane - either the support hyperplanes will not be parallel, or they won't be hyperplanes (because they're not "flat"). And because $\delta$ is constant, minimising $\|\mathbf{w}\|$ is the same as minimising $\|\mathbf{\hat{w}}\|$ $\endgroup$ Commented Apr 6, 2023 at 1:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.