SNoW implements a function approximation version of Perceptron's additive update rule - a stochastic approximation to the Gradient Descent algorithm (GD) - and of Winnow's multiplicative update rule - an exponentiated Gradient Descent algorithm (EGD) [Kivinen and Warmuth, 1997]. See option -G for usage details.
Both use the same training policy. They are non-mistake-driven, meaning that each target node an example is presented to will perform an update on that example whether the network can be said to have made a mistake on that example or not. They are also on-line, as opposed to the true Gradient Descent algorithm in which the total error incurred by a weight vector with respect to an entire training set and the total update to that weight vector with respect to that error are calculated before any weights are updated. Mitchell calls the on-line version a stochastic approximation. A disadvantage to this approximation algorithm is that it requires a smaller learning rate to guarantee convergence. An advantage is that it is less susceptible to ``getting stuck'' in a local minimum that is not the global minimum [Mitchell, 1997].
Specifically, let be the set of active features in a given example that are linked to target node , let be the weight of feature in , let be the learning rate, let be the strength of feature in the given example, and let be the strength of target in that same example4.9. Then is the predicted activation of target node before updating, and will represent the correct activation for this example while updating.
A regression update in SNoW proceeds as follows:
GD: When enabled in conjunction with Perceptron, the GD update rule becomes:
EGD: When enabled in conjunction with Winnow, the corresponding algorithm is referred to as Exponentiated Gradient Descent in the literature:
(Note that is never used in this case.)
Naive Bayes does not have a Gradient Descent update rule. In both Winnow and Perceptron, each example is presented only to targets active in that example.