Perceptron ================== Gradient-free linear model training Each example is feature weights, label -1 or +1 Model contains linear weights, bias If the model predicts correctly, nothing happens Else, add/sub feature vector from weights and add/sub 1 from bias depending on label * The margin is the distance between the hyperplane and the closest point to it Voted perceptron ----------------- Keep all weights over time, at prediction step, make all predict then take average vote Problems ************** Computationally intensive Average weighted perceptron ----------------- Keep all weights over time, then average them, then do single prediction Perceptron proof ------------------ Inspiration ************** Every time we update the perceptron weights from :math:`w_k` to :math:`w_{k+1}`, we want make sure that :math:`w_{k+1}` and :math:`w_*` become more similar. We measure this in 2 ways: 1. Ensure :math:`w_{k+1} * w_*` grows at each iteration 2. Ensure that :math:`w_{k+1} * w_*` grows faster than :math:`||w_k||^2` Demonstrating that :math:`w_{k+1} * w_*` grows at each iteration ************************************************************************ .. math:: w_{k+1} = w_k + y*x \text{multiply both sides by } w_* w_* * w_{k+1} = w_* * (w_k + yx) w_* * w_{k+1} = w_* w_k + w_* * yx \text{Recall the definition of the margin: } \gamma = \sup_{(x, y)}() Practice ----------- * Derive convergence proof * Derive voted, average weighted formula