By Shawn Graham.
For many years, I built agent based models of various phenomena based on archaeological data. These are models of interacting software agents in a simulated environment, with many different parameters; the first step in understanding what your model implies for the culture under study is to understand how your model behaves. That is to say, history is run but once but given a set of x,y,z conditions in the past, there were x,y,z possible outcomes. So you run the model hundreds of times at each combination of variables. You account for the randomness, the contingency, of your model by running over and over again, and by doing that for every combination of variables, you end up with a landscape of possible outcomes. (Then, you’d match your actual archaeological evidence against this landscape and see which combination(s) of values map best to your observations, and in this way you’d know something new about the phenomenon).
So why can’t we do that with large language models? They’re probabilistic! They’re stochastic! Why don’t we run them over and over and over again, attenuating temperature and top-p and other values, so that we end up with a landscape of possibilities for particular prompts?
I wrote a quick notebook up using the Cohere api. Give it a prompt, and give it a range of temperature and top-p values to test and it’ll generate several iterations of responses. Then, measure the cosine similarity of every pair of responses. Et voilà, the behaviour space of the command-r model when given the prompt ‘Write a haiku about Bob Dole’. You could write a library of ‘standard’ prompts – without all of the alchemical extras – and see how models compare. What might we find?
Notebook Link <- update apr 16 -> there’s something a bit wonky with my code for generating variations, so when I have a moment I’ll delve back into it. But in the meantime, a similar kind of experiment was conducted by Max Woolf on slight modulations to prompts. I’ll probably use his code for generating variations, modify to sweep parameters, then bolt the tf-idf on as a scorer. See his post here.