Allgemein

Gunnar Wolf: Unique security and privacy threats of large language models — a comprehensive survey

Gunnar Wolf: Unique security and privacy threats of large language models — a comprehensive survey

This post is an unpublished review

for Unique security and privacy threats of large language models — a comprehensive survey

Much has been written about large language models (LLMs) being a risk to
user security and privacy, including the issue that, being trained with
datasets whose provenance and licensing are not always clear, they can be
tricked into producing bits of data that should not be divulgated. I took
on reading this article as means to gain a better understanding of this
area. The article completely fulfilled my expectations.

This is a review article, which is not a common format for me to follow:
instead of digging deep into a given topic, including an experiment or some
way of proofing the authors’ claims, a review article will contain a brief
explanation and taxonomy of the issues at hand, and a large number of
references covering the field. And, at 36 pages and 151 references, that’s
exactly what we get.

The article is roughly split in two parts: The first three sections present
the issue of security and privacy threats as seen by the authors, as well
as the taxonomy within which the review will be performed, and sections 4
through 7 cover the different moments in the life cycle of a LLM model (at
pre-training, during fine-tuning, when deploying systems that will interact
with end-users, and when deploying LLM-based agents), detailing their
relevant publications. For each of said moments, the authors first explore
the nature of the relevant risks, then present relevant attacks, and
finally close outlining countermeasures to said attacks.

The text is accompanied all throughout its development with tables,
pipeline diagrams and attack examples that visually guide the reader. While
the examples presented are sometimes a bit simplistic, they are a welcome
guide and aid to follow the explanations; the explanations for each of the
attack models are necessarily not very deep, and I was often left wondering
I correctly understood a given topic, or wanting to dig deeper – but being
this a review article, it is absolutely understandable.

The authors present an easy to read prose, and this article covers an
important spot in understanding this large, important, and emerging area of
LLM-related study.