Neural Embeddings

Neural Embeddings#

More efficient than neural LM since no big softmax

Skip gram with negative sampling#

self supervised method trains a binary classifier.

Uses target, context pairs as positive examples and target, random word pairs as negative examples

Separate vectors are trained for context and target words.

Similarity is computed by multiplying c * w

Smooth probabilities when selecting random words using:

\[p_a(w) = \frac{count(w)^a}{\sum_{w'}count(w')^a}\]

Different neural embedding method that represents words as theirselves + subwords (as n-grams)

e.g. where -> [<wh, whe, her, ere, re>]