Basic Terms#

normalization
- scale values into [0,1]
standardization
- mean 0 std 1
inductive bias
- certain assumptions the model makes like “the decision boundary is linear”
Entropy
measures expected amount of information (in bits) needed to represent an event. For example, the result of flipping a fair coin can be encoded with a single bit (i.e. -log_2 (0.50)). We only expect to do this 50% of the time, so the entropy of this event would be -log_2 (0.50) * 0.50 AKA the average amount of bits needed to encode the info.
Information gain
Expected reduction in entropy caused by partitioning on an attribute, a, which could be “weather”

High gain means high reduction in entropy

\[Gain (data, a) = \text{Entropy(data}) - \sum_{\text{v = for each value of a}}^{} \frac{\text{size of v}}{\text{size of data}} * \text{Entropy(v)}\]
representation learning
Learning allgos that automatically learn useful feature representations
Accuracy
(tp + tn) / (tp + tn + fp + fn)
precision
% of predicted positives that are actually positive
recall
% of actual positives predicted to be positive
F-score
weighted harmonic mean of precision, recall

\[\frac{1} {(a)*\frac{1}{P} + (1-a)*\frac{1}{R}}\]
F1-score
a = 0.5
bias
Set of possible models with your configuration
k-fold cross validation for hyperparam selection
for each hyperparam combo, do k-fold cross val and get avg error to find which is best
the error can be decomposed into [estimation error] and (approximation) error. Error(f) = [error(f) - minposserror] + (minposserror)

Excercises#

Derive entropy, information gain, compute on sample data
Derive F1

ML/DL Notes 0.0.1 documentation

Basic Terms

Contents

Basic Terms#

Excercises#