Remarks on k-nearest neighbor

Next: Locally Weighted Regression Up: Instance-Based Learning Previous: Distance Weighted Nearest Neighbor

distance weighted k-Nearest neighbor is a highly effective algorithm for many practical problems
robust to noisy data if the training set is large enough
bias is that the classification of an instance is most similar to other instances that are nearby in Euclidean distance
because distance is calculated on all attributes - irrelevant attributes are a problem - curse of dimensionality
some approaches weight attributes to overcome this - stretching the Euclidean space - determined automatically using cross-validation
alternatively eliminate the least relevant attributes - they used leave-one out cross-validation - ideal for IBL
could locally stretch an axis...but more degrees of freedom...so more chance of overfitting...so much less common
efficient indexing of instances can be done with kd-trees

Patricia Jean Riddle
Wed Jun 23 13:06:34 NZST 1999