A downloadable research

Vision Transformers, just like other neural networks for image classification, have long been interpretable thanks to guided backpropagation to visualise attention. Likewise, simply by visualising them, it is clear that individual neurons in visual transformers learn and identify features that align with human-interpretable features that we might also use for classification.

Language models have struggled with a lack of interpretability due to the lack of intuitive visualisation of linguistic patterns as they cannot be easily 'averaged over' to produce a representative sample of some linguistic feature. There is no human-understandable 'average' humorous sentence. As a result, we have been unable to learn whether the neurons in language models similarly learn and represent linguistic features that align with human concepts that we might use for classification.

We investigate whether large language models learn and represent linguistic features, some of which correspond to human-interpretable linguistic concepts such as 'sarcasm', 'formality', 'humour' etc., which are intuitively relevant for the specific classification task.

More information

Status	Released
Category	Other
Author	roksanagow

Download

In_search_of_linguistic_concepts__investigating_BERT_s_context_vectors.pdf 341 kB

In search of linguistic concepts: investigating BERT's context vectors

Download

Leave a comment