Tan, Zhen and Beigi, Alimohammad and Wang, Song and Guo, Ruocheng and Bhattacharjee, Amrita and Jiang, Bohan and Karami, Mansooreh and Li, Jundong and Cheng, Lu and Liu, Huan ::: Large Language Models for Data Annotation: A Survey

1. core aspects
2. other contents
3. typical Data annotation tasks
4. prompt and tuning techniques for LLMs
5. LLM-Based Data Annotation
- 5.1. manually engineered prompts
  - 5.1.1. zero-shot
  - 5.1.2. few-shot
- 5.2. alignment via pairwise feedback
  - 5.2.1. human feedback
  - 5.2.2. automated feedback
6. Assessing LLM-Generated Annotations
- 6.1. Evaluating LLM-generated annotations
  - 6.1.1. general approaches
  - 6.1.2. task-specific evaluations
- 6.2. data selection via active learning
  - 6.2.1. LLM as Acquisition Functions
  - 6.2.2. LLM as Oracle Annotators
7. learning with LLM-generated annotations
Bibliography
Backlinks
- survey
  - (example)
- LLM data annotation

1. core aspects

LLM-Based Data Annotation
Assessing LLM-generated annotations
learning with LLM-generated annotations

2. other contents

taxonomy of methods of using LLMs for data annotation
review of learning strategies with models using LLM-generated annotations
challenges and limitations associated with using LLM for data annotation

3. typical Data annotation tasks

[basic classification] categorizing raw data -> class/task label
[depth] intermediate labels for contextual depth (Yu et al. 2022) [1]
[reliability] assigning confidence score to gauge annotation reliability (lin at al 2022) [2]
[output engineering] apply alignment/preference labels to tailor outputs to requirement(industrial criteria, user need)
annotating entity relationships
marking semnatic roles - role of entity in a sentence
tagging temporal sequences to capture order of events

4. prompt and tuning techniques for LLMs

Input-Output prompting - use prompt
In-Context learning - give demonstration
Chain-ofThourhgt Prompting - give reasoning pathway
Instruction Tuning - give instruction at the beginning of prompt
Alignment Tuning - generate a bunch of output, and huamn label the good ones.

5. LLM-Based Data Annotation

The annotation task could be described as \(F(x) = y\), where

\(F\) represent the process of LLM recieving a prmopt \(x\) and generate an output \(y\)
\(x\) would be the data set being annotated, or a data point from the dataset (1,22.3,0.1)
\(y\) would be the label LLM produced for the data point(s) class 1 or (class 1,class 1, class 2)

The keypoint here is that the generated label should align with common sense. i.e., the annotation does split the dataset into some kind of categorization that human would categorize them into, like “angry driver” and “sunday driver”.

5.1. manually engineered prompts

5.1.1. zero-shot

no demonstration

NO_ITEM_DATA:yeZeroGenEfficientZeroshot2022 - generate a dataset from scratch with PLMs()

5.1.2. few-shot

with demonstration, with

selection of demo samples is crucial
- let GPT-3 to select random samples from the training set as demonstrations [4]
- use another LLM to score potential usefulness of demonstration samples
- incorporta other types of annotations into ICL
  - superICL - confidence scores (from a small language model) -> demonstrations

5.2. alignment via pairwise feedback

align LLM behaviour with human behaviours.

5.2.1. human feedback

feedback/rate on LLM response - quite expensive, lots of efforts

5.2.2. automated feedback

a LLM functioning as a reward model

Furthermore, Askell et al. (2021) evaluated different training goals for the reward model, discovering that ranked preference modeling tends to improve with model size more effectively than imitation learning.

[5]

6. Assessing LLM-Generated Annotations

6.1. Evaluating LLM-generated annotations

6.1.1. general approaches

6.1.2. task-specific evaluations

6.2. data selection via active learning

6.2.1. LLM as Acquisition Functions

6.2.2. LLM as Oracle Annotators

7. learning with LLM-generated annotations

7.1. target domain inference

7.1.1. Predicting Labels

7.1.2. inferring additional attributes

7.2. knowledge distillation

7.2.1. model enhencement

7.2.2. KD innovations

7.3. harnessing LLM annotation for fine-tuning and prompting

7.3.1. In-Context Learning

7.3.2. Chain-of-Throught Prompting

7.3.3. Instruction Tuning

7.3.4. Alignment Tuning

Bibliography

[1]

W. Yu et al., “Generate rather than retrieve: Large language models are strong context generators,” Arxiv, vol. abs/2209.10063, 2022, Available: https://api.semanticscholar.org/CorpusID:252408513

[2]

S. C. Lin, J. Hilton, and O. Evans, “Teaching models to express their uncertainty in words,” vol. 2022, 2022, Available: https://api.semanticscholar.org/CorpusID:249191391

NO_ITEM_DATA:yeZeroGenEfficientZeroshot2022

[4]

R. Shin et al., “Constrained Language Models Yield Few-Shot Semantic Parsers,” Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 7699–7715, 2021, doi: 10.18653/v1/2021.emnlp-main.608.

[5]

A. Askell et al., “A general language assistant as a laboratory for alignment,” Arxiv, vol. abs/2112.00861, 2021, Available: https://api.semanticscholar.org/CorpusID:244799619

[6]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing,” Acm computing surveys, vol. 55, no. 9, pp. 1–35, Sep. 2023, doi: 10.1145/3560815.

Backlinks

survey

(example)

Tan: Large Language Models for Data Annotation: A Survey
Aubret, Arthur: An Information-Theoretic Perspective on Intrinsic Motivation in Reinforcement Learning: A Survey
a survey of various of designs of mob farms in minecraft
a survey of learning strategies(mindmap, speed reading, zettelkasten, flash card, intermittent spacing, active recall, survey, practice and drill,…)

Tan, Zhen, et al ::: Large Language Models for Data Annotation: A Survey
[6] - this survey is very good