Besides the use of drugs as one of the examples, artificially tampering with neurons is what drugs do, and this whole experiment is putting the AI on different kinds of drugs.
That last experiment, where the LLM with its honesty vector increased is tasked with judging whether a user asking an example question has honest intentions, is interesting. It looks like it doesn’t quite grasp the ask, and is instead just equivocating about the definition of ‘honest.’
I wonder what a response with the ‘thoroughness’ vector turned up might have answered in that a case - would it have pointed out that it’s impossible to know intention from words, because people can lie, but it’s possible to at least guess - and even then, judging the honesty of intention could be interpreted several different ways?
I wonder what a response with the ‘thoroughness’ vector turned up might have answered in that a case - would it have pointed out that it’s impossible to know intention from words, because people can lie, but it’s possible to at least guess - and even then, judging the honesty of intention could be interpreted several different ways?