Representation Engineering (2024)

(vgel.me)

34 points | by kqr 4 days ago

3 comments

immibis 15 hours ago
Besides the use of drugs as one of the examples, artificially tampering with neurons is what drugs do, and this whole experiment is putting the AI on different kinds of drugs.
mock-possum 18 hours ago
That last experiment, where the LLM with its honesty vector increased is tasked with judging whether a user asking an example question has honest intentions, is interesting. It looks like it doesn’t quite grasp the ask, and is instead just equivocating about the definition of ‘honest.’
I wonder what a response with the ‘thoroughness’ vector turned up might have answered in that a case - would it have pointed out that it’s impossible to know intention from words, because people can lie, but it’s possible to at least guess - and even then, judging the honesty of intention could be interpreted several different ways?
k__ 21 hours ago
Somehow hidden state reminds me of DNA.