New research from EPFL demonstrates that even the most recent large language models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways.
Home ยป Can we convince AI to answer harmful requests?
New research from EPFL demonstrates that even the most recent large language models (LLMs), despite undergoing safety training, remain vulnerable to simple input manipulations that can cause them to behave in unintended or harmful ways.