embedding

Table of Contents

embedding refers to 2 things:

screenshot-03.png

1. training embedding

embedding could be trained/learned from data with a learning signal. However, the learning signal may not be relevent in the usage of the generated embedding as it could be just a way to generate the embedding, to have the raw data represented with their corresponding embedding vector.

2. link

Backlinks

websites worth noting

(good tools)

better explained
blog(mostly tutorials) of Kalid Azada, who believed in this (allegedly) Einstein quote: “If you can’t explain it simply, you don’t understand it well enough.”
hypothes.is search
a place where user highlights of are publicly searchable. greate place to discover new websites and solid insights from comments.
Nomic Atlas
a tool to explorer unstructured(raw text/img/…) dataset with embedding of entries. Numerous dataset visualizations including one of decades of machine learning papers are hosted publicly

with a foundation model embedding map (download from a open source large model):

  1. map titles of (perhaps abstract?) literatures into embeddings
  2. map query string (“foundation model and …”) into embedding
  3. search with vector similarity between query string’s embedding and embeddings of literature titles.

Brought up by John Kitchen’s proof of concept on embedding with literature titles.

transformer encoder

in transformer encoder, several steps are performed:

  1. tokens are mapped to their embedding (with normally a fully connected neural network layer and identity as activation function)
  2. position of token in sentence is fused with transformer position encoder -> positioned embedding
  3. inter-token similarity/correlation is measured with query neural net and key neural net using the positioned embedding as input
    1. first with the positioned embedding, its query embedding and key embedding are computed with respective neural nets
    2. then key embeddings are queried with query embedding to output correlation, which are passed through softmax to output a percentage of contribution/relation of the word from all words in the sentence(including self)
    3. This would finally output a vector looking like [.4 .3 .3], referring to each token’s influcence on the current token being computed, so if we have passed in the second in the sentence token’s embedding, this vector would mean that first token contributes to/represents .4 of the second token’s meaning, the second token .3, the third .3. This is the self-attention vector
  4. self-attention is fused with value neural net and self-attention vector
    1. another neural net, similar to query and key, is used to map the positioned embeddings to a value embedding
    2. the value embeddings (of each token in the sentence) is then mixed with the self-attention vector ([.4 .3 .3] -> .4 * valueEmbedding1 + .3 * valueEmbedding2 + .3 * valueEmbedding3, valueEmbeddingn is a vector), to output the self-attention values (self-attention embedding)

neural networks

Author: Linfeng He

Created: 2024-04-03 Wed 20:21