LMS: For each
use current weights to calculate .
For each weight
where is a small constant .01 that moderates the size of the weight update ( )
To get an intuitive understanding notice that when the error is 0 no weights are changed, when it is positive then each weight is increased in proportion to the value of its corresponding feature
Surprisingly, in certain settings this simple method can be proven to converge to the least squared approximation to . In how many training instances? How understandable is the result? (Datamining)