An additional issue is that hyperplanes may not be the right shape for separating complicated data. However, by Mercer's Theorem [Courant53] we can replace the inner product xi*x in the classifier formula with any kernel (i.e., symmetric positive definite) function. Applying such a kernel function in the data's native space automatically corresponds to performing hyperplane separation in some other space. This is essentially a form of feature selection. Thus, we can use SVMs to find non-linear separators in the data space that correspond to optimal hyperplane separators in some other (usually higher dimensional) space. We use Gaussian kernels of the form...
Kazuhiko's interpretation: We might not be modelling this right at all but that's OK since we can hand-wave away any differences between our model and those other models by changing this one function. I especially like the line "This is essentially a form of feature selection." which, to me, reads "It's not a bug it's a feature!".
(PeterTaylor) Not at all. Feature selection is just another term for classification.
There's a lot of CollaborativeFiltering? going on out there: