Feature Selection
Suppose that we are given some set of d features, which comprise the components
of a feature vector x = [x1, x2,
... , xd]'. It often happens that many of these features have
relatively little value in discriminating between the different classes.
For example, we might have d = 100 features, but we might be able to classify
x just about as well using only 5 or 6 features. Including
a lot of weak or irrelevant features not only slows things down, but can
also degrade classification performance. How do we find the good features?
Unfortunately, the only guaranteed solutions to this problem are exhaustive.
However, there are some heuristic approaches that are often useful. We will
look at the following approaches:
On to Clustering
Up to Feature Selection
and Clustering