Tan, Zhen and Beigi, Alimohammad and Wang, Song and Guo, Ruocheng and Bhattacharjee, Amrita and Jiang, Bohan and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan ::: Large Language Models for Data Annotation: A Survey
Table of Contents
- 1. core aspects
- 2. other contents
- 3. typical Data annotation tasks
- 4. prompt and tuning techniques for LLMs
- 5. LLM-Based Data Annotation
- 6. Assessing LLM-Generated Annotations
- 7. learning with LLM-generated annotations
- Bibliography
- Backlinks
1. core aspects
- LLM-Based Data Annotation
- Assessing LLM-generated annotations
- learning with LLM-generated annotations
2. other contents
- taxonomy of methods of using LLMs for data annotation
- review of learning strategies with models using LLM-generated annotations
- challenges and limitations associated with using LLM for data annotation
3. typical Data annotation tasks
- [basic classification] categorizing raw data -> class/task label
- [depth] intermediate labels for contextual depth (Yu et al. 2022) [1]
- [reliability] assigning confidence score to gauge annotation reliability (lin at al 2022) [2]
- [output engineering] apply alignment/preference labels to tailor outputs to requirement(industrial criteria, user need)
- annotating entity relationships
- marking semnatic roles - role of entity in a sentence
- tagging temporal sequences to capture order of events
4. prompt and tuning techniques for LLMs
- Input-Output prompting - use prompt
- In-Context learning - give demonstration
- Chain-ofThourhgt Prompting - give reasoning pathway
- Instruction Tuning - give instruction at the beginning of prompt
- Alignment Tuning - generate a bunch of output, and huamn label the good ones.
5. LLM-Based Data Annotation
The annotation task could be described as \(F(x) = y\), where
- \(F\) represent the process of LLM recieving a prmopt \(x\) and generate an output \(y\)
- \(x\) would be the data set being annotated, or a data point from the dataset
(1,22.3,0.1)
- \(y\) would be the label LLM produced for the data point(s)
class 1
or(class 1,class 1, class 2)
The keypoint here is that the generated label should align with common sense
. i.e., the annotation does split the dataset into some kind of categorization that human would categorize them into, like “angry driver” and “sunday driver”.
5.1. manually engineered prompts
5.1.1. zero-shot
no demonstration
- NO_ITEM_DATA:yeZeroGenEfficientZeroshot2022 - generate a dataset from scratch with PLMs()
5.1.2. few-shot
with demonstration, with
- selection of demo samples is crucial
- let GPT-3 to select random samples from the training set as demonstrations [4]
- use another LLM to score potential usefulness of demonstration samples
- incorporta other types of annotations into ICL
- superICL - confidence scores (from a small language model) -> demonstrations
5.2. alignment via pairwise feedback
align LLM behaviour with human behaviours.
5.2.1. human feedback
- feedback/rate on LLM response - quite expensive, lots of efforts
5.2.2. automated feedback
- a LLM functioning as a reward model
Furthermore, Askell et al. (2021) evaluated different training goals for the reward model, discovering that ranked preference modeling tends to improve with model size more effectively than imitation learning.
6. Assessing LLM-Generated Annotations
6.1. Evaluating LLM-generated annotations
6.1.1. general approaches
6.1.2. task-specific evaluations
6.2. data selection via active learning
6.2.1. LLM as Acquisition Functions
6.2.2. LLM as Oracle Annotators
7. learning with LLM-generated annotations
7.1. target domain inference
7.1.1. Predicting Labels
7.1.2. inferring additional attributes
7.2. knowledge distillation
7.2.1. model enhencement
7.2.2. KD innovations
7.3. harnessing LLM annotation for fine-tuning and prompting
7.3.1. In-Context Learning
7.3.2. Chain-of-Throught Prompting
7.3.3. Instruction Tuning
7.3.4. Alignment Tuning
Bibliography
[1]
W. Yu et al., “Generate rather than retrieve: Large language models are strong context generators,” Arxiv, vol. abs/2209.10063, 2022, Available: https://api.semanticscholar.org/CorpusID:252408513
[2]
S. C. Lin, J. Hilton, and O. Evans, “Teaching models to express their uncertainty in words,” vol. 2022, 2022, Available: https://api.semanticscholar.org/CorpusID:249191391
NO_ITEM_DATA:yeZeroGenEfficientZeroshot2022
[4]
R. Shin et al., “Constrained Language Models Yield Few-Shot Semantic Parsers,” Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 7699–7715, 2021, doi: 10.18653/v1/2021.emnlp-main.608.
[5]
A. Askell et al., “A general language assistant as a laboratory for alignment,” Arxiv, vol. abs/2112.00861, 2021, Available: https://api.semanticscholar.org/CorpusID:244799619
[6]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing,” Acm computing surveys, vol. 55, no. 9, pp. 1–35, Sep. 2023, doi: 10.1145/3560815.
Backlinks
survey
(example)
- Tan: Large Language Models for Data Annotation: A Survey
- Aubret, Arthur: An Information-Theoretic Perspective on Intrinsic Motivation in Reinforcement Learning: A Survey
- a survey of various of designs of mob farms in minecraft
- a survey of learning strategies(mindmap, speed reading, zettelkasten, flash card, intermittent spacing, active recall, survey, practice and drill,…)
- Tan, Zhen, et al ::: Large Language Models for Data Annotation: A Survey
- [6] - this survey is very good