These notes provide background on feature selection and clustering for
the new NSF-sponsored course entitled Human
Computer Interface Design.
We often want to recognize patterns in the signals that we get from input
sensors, and other notes for this course describe some statistically-based
procedures for pattern classification. The standard feature-vector model
for classification assumes that one way or another the designer has identified
the features upon which the classification will be based. The classifier
then uses all of these features to assign a feature vector to a class.
Because the specific features are so problem specific, there is no general
theory for designing an effective feature set. However, there are some useful
procedures for improving the performance one can obtain with a given set
of features:
- Feature selection. If the number of features is too large, one can speed up and often improve the process by using a small subset of the most important features.
- Clustering. If the problem possesses natural subcategories, one can improve accuracy by finding the clusters and classifying in two stages -- subcategory classification followed by final classification.
* Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted with or without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. To copy otherwise, to republish, to post on services, or to redistribute to lists, requires specific permission and/or a fee.