I'm surprised the article doesn't mention the biggest use of steering vectors, which is the potential to remove refusals from models (a.k.a. abliteration or uncensoring).
There was an earlier paper that found that "most refusals are on a single vector", and you can identify and "nerf" that vector so the model will skip refusals and answer "any" request normally. This was very doable for earlier models trained with SFT for refusals, seems to be a bit more complicated for newer models, but still doable to some extent.
There are already some libraries to automate this process and reduce refusals, but usually they focus on identifying and then modifying the models and releasing them as uncensored models. This technique of steering lets you enable this vector changing dynamically, so you don't need to change models if the abliteration process somehow hurts accuracy on other unrelated tasks.
> inspired to write this post by antirez’s recent project DwarfStar 4, which is a version of llama.cpp that’s been stripped down to run only DeepSeek-V4-Flash
This is not true, it is its own project.
Indebted to llama.cpp, sure, but not a stripped down version
>instead of fiddling with the prompt (adding or removing qualifiers like “you MUST”), couldn’t we just have a control panel of sliders like “succinctness/verbosity” or “conscientiousness/speed” and move them around directly?
how is the latter better. i am not getting why figuring out steering vector is useful vs just typing in prompts.
There was an earlier paper that found that "most refusals are on a single vector", and you can identify and "nerf" that vector so the model will skip refusals and answer "any" request normally. This was very doable for earlier models trained with SFT for refusals, seems to be a bit more complicated for newer models, but still doable to some extent.
There are already some libraries to automate this process and reduce refusals, but usually they focus on identifying and then modifying the models and releasing them as uncensored models. This technique of steering lets you enable this vector changing dynamically, so you don't need to change models if the abliteration process somehow hurts accuracy on other unrelated tasks.
This is not true, it is its own project.
Indebted to llama.cpp, sure, but not a stripped down version
how is the latter better. i am not getting why figuring out steering vector is useful vs just typing in prompts.