Basic Terms
Contents
Basic Terms#
- normalization
scale values into [0,1]
- standardization
mean 0 std 1
- inductive bias
certain assumptions the model makes like “the decision boundary is linear”
- Entropy
measures expected amount of information (in bits) needed to represent an event. For example, the result of flipping a fair coin can be encoded with a single bit (i.e. -log_2 (0.50)). We only expect to do this 50% of the time, so the entropy of this event would be -log_2 (0.50) * 0.50 AKA the average amount of bits needed to encode the info.
- Information gain
Expected reduction in entropy caused by partitioning on an attribute, a, which could be “weather”
High gain means high reduction in entropy
\[Gain (data, a) = \text{Entropy(data}) - \sum_{\text{v = for each value of a}}^{} \frac{\text{size of v}}{\text{size of data}} * \text{Entropy(v)}\]
- representation learning
Learning allgos that automatically learn useful feature representations
- Accuracy
(tp + tn) / (tp + tn + fp + fn)
- precision
% of predicted positives that are actually positive
- recall
% of actual positives predicted to be positive
- F-score
weighted harmonic mean of precision, recall
\[\frac{1} {(a)*\frac{1}{P} + (1-a)*\frac{1}{R}}\]
- F1-score
a = 0.5
- bias
Set of possible models with your configuration
- k-fold cross validation for hyperparam selection
for each hyperparam combo, do k-fold cross val and get avg error to find which is best
the error can be decomposed into [estimation error] and (approximation) error. Error(f) = [error(f) - minposserror] + (minposserror)
Excercises#
Derive entropy, information gain, compute on sample data
Derive F1