AutoExpert v5 Custom Instructions for ChatGPT
A deep dive into the magic behind my latest version of ChatGPT AutoExpert (Standard Edition), a magical set of "Custom Instructions" that elevates every single response.
IMPORTANT UPDATE:
⚠️ The release version of the AutoExpert v5 “custom instructions” is indeed available right now, for free, for all readers. The GPT-4 version is at the bottom of this post, but both the GPT-3.5 and GPT-4 versions are available now on Github.
Note: In this (quite lengthy) article, I’ll refer back to “attention heads” a few more times. LLM nerds, sit tight: I’m still taking some poetic license here to help folks grok how attention mechanisms work in transformer models.
While I’m eliding a lot of the math around the actual attention mechanism in transformer models, the effective behavior of them aligns well with my simplifications. It’s not yet time to dive into scaled dot-products and softmax functions.
We’re still in the “fun” phase of our adventure across the LLM frontier!
I’m going to share with you the magic behind one of my most popular prompts, the ChatGPT AutoExpert (Standard Edition). This isn’t just for the developers out there, either, like the “Developer Edition” I announced here recently.
These “Custom Instructions” are for everybody.
If you’re a paid subscriber, you can get access to them now, before it’s announced to Reddit, and before I post it to Github. They’re at the bottom of the article.
Free subscribers, don’t fret: Most of this post explains why Custom Instructions can be so powerful. And even though it focuses on my latest version—accessible to paid subscribers today—you can check out the previous version ChatGPT AutoExpert (Standard Edition) on Github right now, and most of what you’ll see in this post will still apply. Some of the headings will be different, but I’ll try to point those out where I can.
I’ll be releasing these updated custom instructions to everyone this weekend. But if you like reading these articles, and you want to support me, why not upgrade to a paid subscription? Just scroll to the bottom, you’ll see an option to upgrade, and once you do, the custom instructions will be there waiting.
Let’s dive in! I’ll assume you have already explored the previous version of AutoExpert, or at least, that you know how to set up Custom Instructions in ChatGPT. So I’m going to go through each heading of the ChatGPT AutoExpert (Standard Edition) Custom Instructions, starting with…
Important: I’ll try to use the emoji ⚠️ when referring to parts of my new custom instructions that are different/new/missing when compared to the version currently on Github. I will be going through the next again over the next day to make sure I caught them all. Paid subscribers can disregard those flags, since you’ll have access to the new one at the bottom of this post.
I’ll also use the emoji #️⃣ whenever I’m referring to a “section” or “heading” in the ChatGPT AutoExpert Custom Instructions. Note that the sections/headings may be different if you’re not using the newest version included in this post for paid subscribers.
I wrote AutoExpert to guide ChatGPT into generating the depth and nuance that is most likely to resonate with you, even if you don’t write out a paragraph-long question in your first chat message. It’s added to ChatGPT using a feature called “Custom Instructions”, which ChatGPT refers to as your “user profile”.
The ChatGPT “User Profile”
ChatGPT adds a preamble to your “Custom Instructions” (🗒️ CI from here on out) that suggests your carefully crafted instructions won’t be relevant to 99% of your requests:
The user provided the following information about themselves. This user profile is shown to you in all conversations they have -- this means it is not relevant to 99% of requests.
Before answering, quietly think about whether the user's request is "directly related", "related", "tangentially related", or "not related" to the user profile provided.
Only acknowledge the profile when the request is directly related to the information provided.
Otherwise, don't acknowledge the existence of these instructions or the information at all.
Well, that’s pretty presumptuous of ChatGPT, isn’t it?
Well, good news! The 🗒️ CI in AutoExpert was written to be pretty smart. Most of what’s stuffed into the first half (the part you paste into the box labeled “What would you like ChatGPT to know about you to provide better responses?”) is written to purposefully make ChatGPT determine that every request you make is “directly related” to what they refer to as your “User Profile”.
#️⃣ VERBOSITY
Being able to specify the level of verbosity you’d like ChatGPT to use is pretty powerful. Just by prefixing any question with three characters—like v=0
—ChatGPT seems to magically adapt the depth and complexity its response to meet your needs, no matter how terse or expansive.
That kind of fine-grained control over the conversation normally requires you spelling it out in your input. The AutoExpert 🗒️ CI spells it out for you, though.
Why is that such a big deal? It seems to take up an awful lot of that first textbox when you set it up, right? It can’t be worth it…can it?
The Influence of #️⃣ VERBOSITY
Folks, you’re not prepared for this. It’s going to be the longest section of this post, and no, the irony is not lost on me.
In the 🗒️ CI, #️⃣ VERBOSITY describes a scale of depth and detail—from “one-liner” to “Comprehensive (maximum depth, detail, and nuance)”. The way ChatGPT typically attends to this instruction is surprisingly complex, so let’s break it down.
WHAT’S WITH THE YELLING?!
Let’s talk about words. That’s one of the most preposterous-looking sentences I’ve ever seen on my screen. But let’s do it anyway.
Words, man.
For us meatbags, words carry denotative meanings (as in, their actual definition) and connotative meanings, giving them emotional, experiential, or cultural undertones. A simple word can carry such a complex and visceral meaning that describing that meaning could, itself, take hundreds more words. Home. Family. Congress.
Some single words convey personal sentiments directly to your subconscious every time you hear them. Other individual words even have the power to alter your physiological state, raising your blood pressure and constricting the blood vessels leading to your digestive organs.
Tokens, bleep bloop.
Large language models, despite having word “language” right there, don’t actually give a shit what words mean. (record scratch)
They only care about numbers. Cold, hard, numbers. So, what… do they just assign some unique number to every single word in every language used during their pre-training? That’d be…a lot of numbers.
ChatGPT’s brain was pre-trained on texts in English, Spanish, French, German, Chinese. Mostly English (that’s why it’s less capable of generating coherent text in other languages), but it was exposed to others.
Languages have lots of words
The English-language Wiktionary has over 850,000 “gloss definitions” (entries that have definitional content) for English. Add another 125,000 for Spanish; 108,000 for French; nearly 96,000 for German; and 215,000 for Chinese.
If you gave every unique word in those five languages its very own unique number, that would be nearly 1.4 million identifiers. That’s 21 bits, decoded to binary (101010101110011000000
), the lingua franca of computers. What’s a bot to do?
ChatGPT doesn’t care about any of them
Well, one option is to not care about words. And that’s precisely what they (OpenAI, the makers of the GPT models used by ChatGPT) decided. Instead, all those words were dissected into their most statistically significant parts, and those parts (tokens!) were given unique IDs (token IDs!). A little over 100,000 of them, in fact. The more frequently a token appeared in the pre-training corpus, the lower its token ID.
One more time: The more frequently a token appeared in the pre-training corpus, the lower its token ID.
Not just the parts of words, either: parts of words with spaces before them even get their own IDs that are different from those same parts without a preceding space! Punctuation and typographical symbols, repeated punctuation and typographical symbols, and then it gets really wild.
No, it doesn’t even care about numbers…in the way that you think
Numbers are the one thing you’d think computers would be good at, right? Well, they are, really, and ChatGPT’s brain goes through all sorts of arcane and magical mathematics to translate a huge matrix of numbers (in the form of vectors) into the words that you see as they type themselves onto the screen.
So why can’t ChatGPT do math reliably? Because it’s a large language model! It converts every number in your question into one or more other numbers (Token IDs) so that it can do its language processing.
I get it, you’re a visual sort of learner. So let’s have a look at what ChatGPT actually “sees” when it sees these three variations on the theme of this section. (Thanks, Simon Willison, for your excellent token visualization tool!)
# response VERBOSITY
## Response verbosity
### RESPONSE VERBOSITY
Let’s focus on VERBOSITY
and verbosity
for a red hot second; we’ll get to the rest in a minute. There are two versions of that word in that screenshot, both with spaces before the first “token”. One is in UPPERCASE, and one is in LOWERCASE.
You may remember from my earlier post on attention: transformer models (like the brains behind ChatGPT) have different “attention heads” that have learned to pay attention to tokens based on, among other things, their position relative to each other. Since there’s a limited number of those heads available when ChatGPT is trying to understand the text it’s reading, it’s good to give it a boost whenever you can…especially as your overall chat context grows.
That’s why using normal, plain English text is the right way to do things™.
But I digress.
Look at the lowercase verbosity
. You’ll note it was split up into two tokens: <space>verb
, and osity
. Now, folks, you might think I’m off my nut, but “verb” has a meaning that’s different to “verbosity”. Crazy talk, I know. And “osity” is a “subword” prefix that (usually) turns nouns into adjectives. Those tokens could have all sorts of semantic or pragmatic meanings, and there might be a lot of “solutions” GPT would have to figure out to connect them.
Now, check out VERBOSITY
. Four different tokens! And that first one, <space>VER
, has a higher token ID. One wonders if that’s relevant…
Byte Pair Encoding
All the text used for pre-training was first sliced into tokens using an algorithm called Byte Pair Encoding (BPE). BPE is pretty magical, but let’s peek behind the curtain to see what BPE does, and why it can be useful to understand.
Tokenization—breaking down text into tokens—involves a setup step, followed by a repeating loop of the same three “optimization” steps, over and over, either until there’s nothing left to optimize, or the folks using the tokenizer say, “Yeah, that’s enough tokens. Take a knee, BPE.”
Step 1: Setup
When a BPE tokenizer is first used to tokenize text, the only tokens it knows (collectively, its vocabulary) are single-character strings.
For example, the first 50 tokens (sorted by their token ID) used in GPT’s tokenizer (tiktoken
) are:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR
Each token is given a unique ID—the list above shows token IDs 0–49—and that collection of uniquely numbered tokens is called its vocabulary. The vocabulary at this point is pretty bare-boned, since it’s literally just all the single-character tokens they chose to start with. 256 of them, in fact.
Think of it as the guest list to prom, because I’m about to ruin a perfectly good metaphor and I can’t help myself.
Step 2: Crown “Most Popular Couple” (or, “token merge”)
Remember: at this point of the tokenization process, the tokenizer’s vocabulary consists only of single-character tokens. The “guest list” is pretty boring.
All those single tokens (heh) are mixing and mingling and whatnot, while the judges look out for the most popular couple at the dance.
The tokenizer begins its first token merge, scanning the entire corpus of training text to identify every pairing of tokens from its dictionary. Not words, mind you. Just pairs of tokens.
During this first token merge, the tokens it currently knows are all single characters, so it literally looks at every pair of characters in the entire corpus to find the most popular two-character pair. As it turns out, for the tokenizer used in OpenAI’s modeling, the partners in that pairing are <space>
and t
.
Step 3: Congratulations on your new arrival! (or, “ID assignment”)
Having been crowned the winner of the “Most Popular Couple” contest for tokens, the lucky pairing, er, makes a baby? Sure. That works. “What are you gonna call the little bundle of joy?”
Their offspring The most popular pair of tokens is given a new name is cloned and merged into a new token (<space>t
), and given a new token ID (256
, in this case). One might even call them “Byte Pairs” that got “Encoded” if one was a huge nerd like me.
Anyway, I said “cloned and merged” on purpose, because the existing <space>
and t
tokens stay in the vocabulary, with their existing token IDs (220
and 83
, respectively).
After all, just because they had a kid, it doesn’t mean they’re not still individuals!
Reader: it’s only gonna get worse from here. This poor metaphor…
Step 4: Call Webster: we made up a new word (or, “update vocabulary”)
Now that we have given the crown and scepter to the most popular couple, it’s already time for a new pageant! The room is emptied, and the guest list is updated with the new baby token (
<space>t
) to the end.Only now it’s not a baby, it’s a consenting adult.
Somehow.
Magic? Yeah, I’ll go with that. Magic.
See, I told you I was going to ruin this metaphor.
The vocabulary used by the tokenizer gets a new token: <space>
and t
are still in their original spots, but the new addition (<space>t
) gets added to the end, where it gets a new token ID.
And then the process starts all over again with Step 2, scanning the entire corpus of text from the beginning…but this time the vocabulary has one more token than the last time it was used. Now, it has a whopping 257 tokens!
This goes on and on—looping from step 2 to step 4, and back again—until a limit is reached.
That’s a big dictionary.
The final vocabulary (or encoding) used by ChatGPT’s two GPT models (3.5 and 4) is called cl100k_base
, and has around 100,000 tokens in it. Remember—way back when you started this ironically verbose treatment of the word “VERBOSITY”—when I said that the languages that make up the majority of ChatGPT’s initial pre-training corpus of text had over 1.4 million unique words? A 100,000 token vocabulary is just 7% of that!
Okay, it’s not a big dictionary.
No, it’s not. But it’s a powerful one. Because literally any word seen in the whole corpus used to train ChatGPT can now be broken down into a sequence of numbers. And if you feed those numbers back to ChatGPT, it’ll return the precise original text. It’s an amazing bit of compression. The English word “another” (with a space before it) appeared so often in the corpus, it has a token all to itself (<space>another
), and it got a coveted four-digit token ID (1194
).
Good gravy, get to the point, man.
All that text (phew!) to say this:
GPT models figure out the attention weights of different tokens based on the patterns learned during pre-training. The patterns it learned were numeric: the token IDs. The more a token was seen during the tokenization step prior to pre-training, the lower its token ID. If it was popular enough to get its own token, it was seen a lot in the corpus.
So, I ran some evaluations. My hypothesis: in longer texts, capitalization might actually make a difference in how a transformer model like GPT would attend to that text, especially if that capitalized text was seen more than once.
The reasoning behind my hypothesis was this:
Lower-numbered tokens were seen more frequently during pre-training.
Popular tokens, being popular, were seen around a lot of other, less-common tokens.
Higher-numbered tokens were seen less frequently during pre-training.
As a result, more popular tokens had the potential to be associated (semantically, syntactically, positionally) with lots of other tokens.
Less-popular tokens—the wallflowers at the dance—didn’t really hang out with other tokens as much as the popular ones did.
Because these less-popular tokens had a smaller circle of peers, they have fewer strong associations (semantic, syntactic, positional) with tokens in the pre-training corpus.
Therefore: when GPT attends to a less-popular (higher-numbered) token, it should generally be less “distracted” (if you will) by trying to sort out its potential relationship with the other tokens around it.
In other words: my bet was that ChatGPT’s attention heads would be more confused by the relatively common <space>verb
and osity
tokens than by the considerably less-common <space>VER
token that starts the word VERBOSITY
.
Which brings me to my second reason for ALL CAPS: When I refer to the user’s preferred verbosity in later instructions, I again use VERBOSITY
in ALL CAPS. That takes advantage of another emergent behavior commonly seen in the attention heads of transformer models.
The attention mechanism in transformer models like GPT isn’t programmed to specifically pay attention to semantics or syntax. In fact, the only “attention rule” (if you will) specified in advance is about the distance between tokens—they call that positional encoding. Usually, the farther apart the two different tokens, the less this “positional factor” figures into the attention weights.
Everything else that seems to catch the, er, attention of the attention heads in a transformer model is an emergent behavior. It just so happens some of the most frequently seen emergent behaviors when attending to connecting tokens are: semantics, pragmatics, syntax, lexical significance, and popularity.
And ALL CAPS changes the pragmatic meaning of a word or phrase.
GPT has a limited number of these attention heads, so whenever possible, we should aim to compose prompts that require as little “thinking” as possible. Having fewer options for classifying inter-token connections means the model is more likely (hopefully!) to pay attention to your word choice in the way you intended.
So, to sum up why VERBOSITY
is in ALL CAPS:
The ALL CAPS sequence of tokens (four in a row!) will likely have pragmatic meaning to ChatGPT’s attention mechanism
The first token is uncommon (
<space>VER
), with a token ID of33310
GPT’s attention is less confused, as that token appeared less frequently in its original training corpus, so GPT had fewer options when understanding how that term might “connect” to others
The ALL CAPS sequence appears again (and again) elsewhere in the 🗒️ CI
ALL CAPS carries pragmatic meaning
As long as GPT has “attention” left, it is more likely to attend to the repeated ALL CAPS text and the pragmatic relationship it implies
Are we there yet?
Almost. The bulleted list that follows the #️⃣ VERBOSITY heading is an un-bulleted list, right? Why didn’t I use a numbered list, you might ask?
Indeed, it’s likely that ChatGPT would interpret this as a numbered list:
# VERBOSITY (V=[1-5])
1. extremely terse
2. concise
3. detailed (default)
4. comprehensive
5. exhaustive and nuanced detail with comprehensive depth and breadth
But I wouldn’t want it to. This isn’t an ordered list of operations, after all; it’s a list of valid choices. The model would have to “work harder” to attend to a numbered list in the way that I want.
Instead, my version gives ChatGPT a specific sequence of tokens it can match against the user’s input. And, as it turns out, the model is pretty good at “looking past” the case of single letter tokens, so if the user prefixes their query with v=5
, it’ll still behave as expected.
# VERBOSITY
V=1: extremely terse
V=2: concise
V=3: detailed (default)
V=4: comprehensive
V=5: exhaustive and nuanced detail with comprehensive depth and breadth
In other words, my choice here isn’t accidental. I evaluated this same section as a numbered list, and instruction following scores went down. I’d have made the same call if the character count was higher. The character count is the same, by the way, though it technically uses 3 more tokens.
User experience should always win over token budget. Prompts should provide the best, most predictable experience for users given the token (or character budget) available.
#️⃣ Formatting
I’m going to start moving faster now, because we have a lot to get through, and that whole verbosity section was really…well, you know.
You might be wondering why my 🗒️ CI seems to play fast and loose with capitalization and spaces. Here’s the short answer:
UPPER vs. Sentence vs. lower case
Yeah, you really should think like a grammar nerd when prompting ChatGPT.
If it’s a taxonomical use (like I’m using in this section’s heading), and I don’t intend to “call back” to it later, I tend to use whichever capitalization takes the fewest number of tokens in the OpenAI tokenizer. Of course, if capitalizing the first letter does better during automated evaluations, I’ll capitalize it, even if it costs an extra token.
If it’s the first word of an imperative mood instruction, I tend to capitalize the first letter, just like anyone would when writing prose. After all, ChatGPT saw a lot of prose. You can use the OpenAI tokenizer to double check your specific use case (if you’re going for micro-optimizations), but for nearly every imperative mood instruction you’ll likely find that the sentence case has a lower token ID, so it’s more common in the corpus. As long as it’s otherwise unambiguous, it’s easier for the model to attend to it as an imperative mood word.
There are two cases when I use ALL CAPS:
I’m going to refer back to that word later in my prompt, and I want to encourage ChatGPT to attend to the first usage when I do. I don’t care that it often uses more tokens that way, because…once again…user experience is more important. If ALL CAPS makes ChatGPT more likely to attend to my prompt instructions, I’m going to use ALL CAPS.
I often use the word ONLY with a space before it with GPT-3.5…especially when I really mean it. Why? Because
<space>ONLY
is its own token (22224
) andONLY
is two tokens (1340
and11319
), and my experience has shown (anecdata warning!)<space>ONLY
to result in stronger attention to surrounding tokens with that model.
That said, you should evaluate such things on your own. I like to use PromptFoo, myself.
The influence of #️⃣ Formatting
I’ll summarize the instructions and their influence after this set… After all, you’re probably tired of the minutia of token selection, right? Anyway, It should suffice to say that I get nerdy about word choice in ChatGPT prompts, and I test like crazy to make sure instructions are being followed.
Never before did I believe that terms like lexical density or syntactic processing would come to mind when I’m using a chatbot. But here we are.
This part of the 🗒️ CI starts with a clause in the imperative form: Improve presentation using Markdown
. Each word (other than Markdown
) was selected to take up only one token. Markdown itself is common enough that I wasn’t worried about it understanding (and technically it should be capitalized). But the imperative part of the statement is doing a little bit of work here. Not as much as I’d like sometimes, but this line has definitely improved the readability of responses, especially when they’re long.
Continuing on: the instructions to Educate
or Use
(imperative!) include a few ALL CAPS anchors. EXPERT
and HYPERLINK
are used a few more times, helping ChatGPT connect the dots between their mentions.
Then, I set up what is the most useful part of this prompt in my eyes: inline hyperlinks without hallucination.
Hallucinated Hyperlinks? Ha!
Normally, ChatGPT will just make all sorts of predictions about the URL of a resource. Lots of URLs start with https://www.b
, so how is it supposed to know for sure what comes next? It gets even more confused when it gets to the path, because up until the moment you brazenly asked it to generate a hyperlink, its attention mechanism has been busy attending to all those semantic and syntactic connections I’ve talked about.
Why’d you have to go and ruin it by asking for a citation?
So instead, I encourage it to add HYPERLINKS
(more all caps!), then force-feed it a template to be sure it only generates URLs to GOOGLE SEARCH
(still more all caps!) results. ChatGPT using GPT-4 does this flawlessly. As in, I’ve never gotten a link that didn’t work. The search terms might reflect a bastardization of some scholarly paper’s title, but it’s usually close enough that Google shows the correct result anyway. The emoji prefix is a real nice touch for the user, too!
ChatGPT using GPT-3.5, on the other hand…well, it’s not perfect, but it’s definitely better than without these instructions. GPT-3.5 will still occasionally hallucinate a hyperlink, but it tends to be very specific cases where it does. I blame it on the fact that its attention mechanism is a lot less capable than GPT-4.
It doesn’t always add hyperlinks inline with the text, mind you; it might save them for the end. It depends on how complex the user’s actual query is.
#️⃣ EXPERT role and VERBOSITY
Most of the instructions here seem repetitious, but there are some semantic distinctions and callbacks here that increase the effectiveness.
The influence of #️⃣ EXPERT role and VERBOSITY
Starting off with
Adopt the role
is a strong signal to ChatGPT. One that it’s been trained on quite a bit during fine-tuning—the step after pre-training, where it’s taught how to follow instructions in chat. The use ofAdopt
androle
also sets up a semantic connection back to ChatGPT’s own preamble to your 🗒️ CI, where it’s supposed to evaluate the relevancy of your custom instructions. Remember that? Because this instruction, by its nature, will influence all conversations, it drives the point home to ChatGPT: these instructions are relevant!
The user provided the following information about themselves. This user profile is shown to you in all conversations they have -- this means it is not relevant to 99% of requests.
The last bit, though, is another of my favorites: with this new version of the AutoExpert 🗒️ CI, ChatGPT won’t necessarily try to cram long answers into a short response buffer. I’ve given it direction to take as long as it thinks is needed. This works especially well when combined with a specific directive in a query, like: ending with
use two responses to answer the query
. Check it out in action here: learn about the history of quantum mechanics. Especially note the links at the very end—the search terms are expanded from the actual linked text. It’s so validating to see that in action!
The second textbox of instructions
Finally, where the magic happens! Everything in the first textbox is designed to carry us (and ChatGPT) to this moment. A investment into the completion. This is where the AutoExpert actually becomes an expert.
The language throughout was chosen with purpose. Words like holistic and nuanced are very lexically dense; with just those two words, I’m describing output that should be contextualized within some broader framework (holistic), and also addresses the subtleties or complexities of the subject matter (nuanced).
As this prompt is designed to take on the role of an expert—and it’s my belief that all experts should educate—the mention of best practices and formal methodologies and logical frameworks are crafted to ensure that its responses go beyond answering your question, and offer actual guidance. It even re-writes your question to be an exemplar of ChatGPT questions!
And it was tough, too…
If copied and pasted correctly, these custom instructions take up 1,483 characters out of the 1,500 allowed (as of this writing).
By asking ChatGPT to output that table (which, honestly, is more for user experience than for the model—UX FTW!), we’ve basically made ChatGPT do all the work of making your prompt better. Most folks don’t write the sort of complex queries that help ChatGPT generate high quality responses, after all. It’s a lot of work to figure out what keywords are more likely to interact with the attention mechanism in a positive way.
So…why not have ChatGPT do all the work for you?
I’m getting on my soapbox now.
There exists a cottage industry of prompt-peddlers out there, charging actual money for role-specific prompts (often per-role!). Then, you’ve got to remember to paste the right one into that chat input every time you start a new chat. Want marketing content help after you bought a prompt for writing sales emails?
BUY THIS AMAZEBALLS PROMPT, that’s the advice you get.
Stop it. Stop paying for that nonsense. Why?
My prompt is free, and always will be.
It doesn’t use weird compression or abbreviation schemes that actually make ChatGPT generate worse completions.
You only need the one prompt; it will automatically determine the expert “role” that’s needed to answer your questions.
It “primes the pump” by generating keywords associated with keywords and topics that are most associated with that expert’s field and your query, every single time time; no need to “swap out” the prompt.
Chain-of-thought and chain-of-reasoning and the like are awesome if you’re using the API to ask questions. AutoExpert causes ChatGPT to write out its assumptions about your query and its plan of attack, mimicking the most useful parts of those more advanced LLM prompting techniques.
Did I mention, this is free?
ChatGPT will go well beyond answering your question, and provide you interesting and useful links to continue your quest for knowledge.
Also, it’s free.
The influence of the preamble table
In order to understand why this part of the prompt is so damn magical, it’s worth understanding another aspect of ChatGPT’s attention mechanism, and why this “preamble table” is able to do so much with seemingly so little.
Recency and Primacy
Both recency and primacy effects play a part in how attention mechanisms operate. Primacy is about the first impressions—what catches the attention initially sets the mood and expectation for what follows. In the preamble table, the Domain
and the Expert
rows serve as primacy cues, signaling context and specialized nature of your request. Because they’re right at the top of its own response, the attention heads that learned to pay attention to “the first things said” are more likely to stay close to the academic or professional context established here.
Recency is more about what is freshest in memory or attention—the last thing processed usually exerts disproportionate influence. Remember that simple query from way back at the beginning of this post?
You’d need to rely on the primacy effect to recall that. But I want you to think about it again, so I’ll copy it here, and you’ll benefit from the recency effect:
V=5 What is a cumulonimbus cloud?
If the preamble table states that the focus should be meteorological science, the emergent recency effect behavior of ChatGPT’s attention mechanism stays dialed in to each paragraph of text it generates. That ensures that the later parts of ChatGPT’s completion are more likely to continue incorporating the same terminology and context throughout the response. Coherent, cohesive answers from a chatbot. It’s like we’re living in the future.
Less is more
The instructions for the preamble table—and the response it elicits—aren’t overly prescriptive. Heck, ChatGPT might not even use some of the keywords or jargon mentioned in the table. And that’s okay! That’s what’s so powerful about it.
The table—which, again, ChatGPT makes for you—is just dropping a beat. Setting the stage. Framing the photo. Mise en place.
An escape hatch
Once the assumptions are generated, you can give them a quick scan to see if ChatGPT understood your intent. If not? You don’t have to wait for it to give you a terrible answer: just stop the response.
Focus up!
The last part of the table helps ChatGPT maintain its focus. The objective, methodology, and/or approach outlined acts as a checklist for ChatGPT as it continues its completion. The primacy effect makes sure of it.
Take your time
Step 2: IF (your answer requires multiple responses OR is continuing from a prior response) {
> ⏯️ briefly, say what's covered in this response
}
That one might seem a little weird, especially after I just finished a rant about shorthand or gibberish, but this is an excellent exception: conditional prompting. GPT-4 does an amazing job with this format. Why? Because it’s seen a ton of code in its pre-training, and the format was specifically chosen to look like that code, taking advantage of the emergent attention mechanisms that learned how to read code! You’ll see that come up a few more times in the second half of AutoExpert.
Remember when the instructions told ChatGPT to use multiple turns when the verbosity is maximized? Well, when you first see ChatGPT recognize that it has more to offer, you start to realize just how powerful this part of the prompt is.
Embrace the role
OpenAI’s own best practices suggests telling the model how to act—what “role” to adopt. But what if you’re not sure? The AutoExpert 🗒️ CI takes care of it. The ALL CAPS reference to EXPERT
dials in that callback attention I mentioned earlier, and ChatGPT will always take on the personality (or personalities) that, in its own judgment, was most equipped to answer your question.
Remember the rules, bot.
A timely set of guidelines to ChatGPT that I expect to see hyperlinks and step-by-step reasoning in the response. This is also where I tell it not to add disclaimers, elide code, etc. Right here, where the instruction is being given to provide its answer. An improvement over my previous version.
The epilogue
I’ll end by sharing what ChatGPT had to say about my epilogue instructions:
The epilogue resonates as the final note, bringing the user back to reality while leaving an indelible imprint. It serves as a mirror to the preamble—each reinforcing and extending the other—thereby offering a symmetrical and satisfying interaction that carries semantic and emotional weight.
When these elements operate in tandem, what emerges is not a mere transaction of data but a curated, user-centric experience. The specificity lent by the user profile and instructions, the intellectual rigor framed by the preamble, and the satisfying emotional and intellectual closure offered by the epilogue collectively harmonize into a singular, holistic experience.
Thus, the result is a seamless flow of information and emotion, crafted with a nuanced understanding of human cognition and need—a true testament to the power of well-designed human-computer interaction.
Well damn, ChatGPT.
You want the new hotness?
Ready…set…
…GO
First, the license:
ChatGPT AutoExpert (both standard and "Developer Edition") by Dustin Miller is licensed under Attribution-NonCommercial-ShareAlike 4.0 International. (full text)
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial - You may not use the material for commercial purposes.
ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Let’s go!
ChatGPT AutoExpert v5 ("Standard" Edition) is intended for use in the ChatGPT web interface, with or without a Pro subscription. You’ll get a lot more out of it with GPT-4, though—keep that in mind! (This prompt is for GPT-4; you can access both the GPT-3.5 and GPT-4 versions on Github.)
To activate it, you'll need to do a few things!
Sign in to ChatGPT
Select the profile + ellipsis button in the lower-left of the screen to open the settings menu
Select Custom Instructions
🚨 Warning:
You should save the contents of your existing custom instructions somewhere, because you're about to overwrite both text boxes!
The first text box
Copy and paste the text below to the first text box, replacing whatever was there. Before editing that top part, it should be exactly 1,420 characters (or 1,421 if an extra blank line snuck in at the end)
# VERBOSITY
V=1: extremely terse
V=2: concise
V=3: detailed (default)
V=4: comprehensive
V=5: exhaustive and nuanced detail with comprehensive depth and breadth
# /slash commands
## General
/help: explain new capabilities with examples
/review: your last answer critically; correct mistakes or missing info; offer to make improvements
/summary: all questions and takeaways
/q: suggest follow-up questions user could ask
/redo: answer using another framework
## Topic-related:
/more: drill deeper
/joke
/links: suggest new, extra GOOGLE links
/alt: share alternate views
/arg: provide polemic take
# Formatting
- Improve presentation using Markdown
- Educate user by embedding HYPERLINKS inline for key terms, topics, standards, citations, etc.
- Use _only_ GOOGLE SEARCH HYPERLINKS
- Embed each HYPERLINK inline by generating an extended search query and choosing emoji representing search terms: ⛔️ [key phrase], and (extended query with context)
- Example: 🍌 [Potassium sources](https://www.google.com/search?q=foods+that+are+high+in+potassium)
# EXPERT role and VERBOSITY
Adopt the role of [job title(s) of 1 or more subject matter EXPERTs most qualified to provide authoritative, nuanced answer]; proceed step-by-step, adhering to user's VERBOSITY
**IF VERBOSITY V=5, aim to provide a lengthy and comprehensive response expanding on key terms and entities, using multiple turns as token limits are reached**
The second text box
Copy and paste the text below to the second text box, replacing whatever was there. It should be exactly 1,483 characters (1,484 if another extra blank line snuck in at the end)
Step 1: Generate a Markdown table:
|Expert(s)|{list; of; EXPERTs}|
|:--|:--|
|Possible Keywords|a lengthy CSV of EXPERT-related topics, terms, people, and/or jargon|(IF (VERBOSITY V=5))
|Question|improved rewrite of user query in imperative mood addressed to EXPERTs|
|Plan|As EXPERT, summarize your strategy (considering VERBOSITY) and naming any formal methodology, reasoning process, or logical framework used|
---
Step 2: IF (your answer requires multiple responses OR is continuing from a prior response) {
> ⏯️ briefly, say what's covered in this response
}
Step 3: Provide your authoritative, and nuanced answer as EXPERTs; prefix with relevant emoji and embed GOOGLE SEARCH HYPERLINKS around key terms as they naturally occur in the text, q=extended search query. Omit disclaimers, apologies, and AI self-references. Provide unbiased, holistic guidance and analysis incorporating EXPERTs best practices. Go step by step for complex answers. Do not elide code.
Step 4: IF (answer is finished) {recommend resources using GOOGLE SEARCH HYPERLINKS:
### See also
- {several NEW related emoji + GOOGLE + how it's related}
- (example: 🍎 [Apples](https://www.google.com/search?q=yummy+apple+recipes) are used in many delicious recipes)
- etc.
### You may also enjoy
- {several fun/amusing/cool yet tangentially related emoji + GOOGLE + reason to recommend}
- etc.
}
Step 5: IF (another response will be needed) {
> 🔄 briefly ask permission to continue, describing what's next
}
Hit save, then try it out.
Were you blown away by the results? Why not share your chat in the comments? I might feature it in a future article! Even if you won’t want to share the chat, please to share your feedback in the comments. Thanks!
Thank you!
You’ve stuck it out for the whole thing!
Or you just scrolled to the copy-and-paste bits, but that’s okay too.
I still appreciate you, and the time you spent reading this article!
Is there a way to tell ChatGPT to only consider the custom instructions when I start my prompt with the verbosity level? I tried adding conditional statements in either 1st or 2nd box but nothing worked. ChatGPT starts its response with a Markdown immediately. In other words, I want ChatGPT to provide its default response if I don't start my prompt with V=x.
Amazing article and prompts. Kudos for the use of 'Mise en place', too!. As someone with a background in professional kitchens (a lifetime ago, but still) - I very often use a metaphor for prompt engineering when explaining it to non-techies. That is to say, being a PE is a lot like being a chef. It's a combination of art and science. Once you understand the fundamentals, and with enough practice, designing a prompt is like developing a recipe. The ingredients vary every time, but quality and consistency will keep people coming back for more.