embedding
Table of Contents
embedding refers to 2 things:
- a map from raw data to a vector (of lower rank)
- the resulting vector of that map
1. training embedding
embedding could be trained/learned from data with a learning signal. However, the learning signal may not be relevent in the usage of the generated embedding as it could be just a way to generate the embedding, to have the raw data represented with their corresponding embedding vector.
2. link
Backlinks
websites worth noting
(good tools)
- better explained
- blog(mostly tutorials) of Kalid Azada, who believed in this (allegedly) Einstein quote: “If you can’t explain it simply, you don’t understand it well enough.”
- hypothes.is search
- a place where user highlights of are publicly searchable. greate place to discover new websites and solid insights from comments.
- Nomic Atlas
- a tool to explorer unstructured(raw text/img/…) dataset with embedding of entries. Numerous dataset visualizations including one of decades of machine learning papers are hosted publicly
with a foundation model embedding map (download from a open source large model):
- map titles of (perhaps abstract?) literatures into embeddings
- map query string (“foundation model and …”) into embedding
- search with vector similarity between query string’s embedding and embeddings of literature titles.
Brought up by John Kitchen’s proof of concept on embedding with literature titles.
transformer encoder
in transformer encoder, several steps are performed:
- tokens are mapped to their embedding (with normally a fully connected neural network layer and identity as activation function)
position
of token in sentence is fused with transformer position encoder -> positioned embeddinginter-token similarity/correlation
is measured with query neural net and key neural net using the positioned embedding as input- first with the positioned embedding, its query embedding and key embedding are computed with respective neural nets
- then key embeddings are queried with query embedding to output correlation, which are passed through softmax to output a percentage of contribution/relation of the word from all words in the sentence(including self)
- This would finally output a vector looking like [.4 .3 .3], referring to each token’s influcence on the current token being computed, so if we have passed in the second in the sentence token’s embedding, this vector would mean that first token contributes to/represents .4 of the second token’s meaning, the second token .3, the third .3. This is the
self-attention
vector
self-attention
is fused with value neural net and self-attention vector- another neural net, similar to query and key, is used to map the positioned embeddings to a value embedding
- the value embeddings (of each token in the sentence) is then mixed with the self-attention vector ([.4 .3 .3] -> .4 * valueEmbedding1 + .3 * valueEmbedding2 + .3 * valueEmbedding3, valueEmbeddingn is a vector), to output the self-attention values (self-attention embedding)
neural networks
- recurrent neural network
- embedding - to represent object/symbol with tensors.