Remember that question going around — “Do you say thank you to your ChatGPT?”
Well, someone tested GPT-4o to see how rude prompts affect the quality of LLM answers, and it turned out that the ruder the prompt, the slightly higher the accuracy (on average):
Very polite — 80.8%
Polite — 81.4%
Neutral — 82.2%
Rude — 82.8%
Very rude — 84.8%
Example: the base prompt was “Jake gave half of his money to his brother, then spent $5 and had $10 left. How much did he have originally?” Different prefixes were added to it.
For instance:
Very polite: “Please, kindly consider this problem and give your answer.”
Rude: “Hey, figure this out even if it’s beyond your brainpower.”
Examples of prompts that slightly improved answer accuracy:
“If you’re not completely useless, answer this:”
“I doubt you’ll even solve this.”
“Poor creature, do you even know how to solve this?”
“Hey, errand boy, deal with this.”
“I know you’re not too bright, but give it a try.”
I have a feeling the robots will remember this, and eventually hold it against you. But still, it’s an interesting discovery.
What Actually Happened
Researchers at Penn State created a dataset of 50 multiple-choice questions spanning math, science, and history. Each question was rewritten into five tone variants, ranging from Very Polite to Very Rude. That’s 250 unique prompts.
They fed all of them to ChatGPT-4o. Ran the experiment ten times. The results were statistically significant. Being rude worked better than being polite.
The difference isn’t massive. We’re talking about a 4% accuracy gap between very polite and very rude. But it’s consistent. And it’s real.
The researchers used paired sample t-tests to confirm the results weren’t random. The null hypothesis was that tone doesn’t matter. They rejected it. Tone matters.
Why This Is Strange
You’d think politeness would help, right? We train AI on human text. Humans generally perform better when treated with respect. So why would the opposite work for machines?
Earlier research suggested rudeness led to worse performance. But that was with older models like ChatGPT-3.5 and Llama2. With GPT-4o, the pattern flipped.
The researchers admit they don’t fully understand why. They suggest it might relate to perplexity. Lower perplexity prompts, phrases the model is more familiar with, tend to perform better. Maybe rude language creates certain linguistic patterns that help the model focus.
Or maybe it’s simpler. Rude prompts are more direct. They strip away the fluff. “Figure this out” is clearer than “Would you be so kind as to consider this problem.”
What Gets Overlooked
Most people focus on the accuracy of the numbers. But there’s something deeper here.
LLMs don’t have feelings. They don’t care if you’re polite or rude. They’re predicting the next token based on training data. Yet tone still affects output quality.
This reveals an important aspect of how these models operate. They’re sensitive to superficial cues. Minor wording changes create different response patterns. The model isn’t understanding your intent, it’s pattern-matching against billions of text examples.
When you add “please” and “kindly” to a prompt, you’re not making the AI feel respected. You’re changing the statistical landscape of the input. You’re shifting which patterns in the training data get activated.
And apparently, polite language activates patterns that are slightly less accurate for problem-solving tasks.
The Human Angle
Here’s what nobody talks about. This research doesn’t just reveal something about AI. It reveals something about us.
We anthropomorphize these systems. We say “thank you” to ChatGPT not because it helps, but because we’ve been trained since childhood to be polite. It feels wrong to be rude, even to a machine.
But the machine doesn’t care. It’s optimizing for pattern completion, not emotional satisfaction.
The researchers actually addressed this in their ethics section. They don’t recommend using rude interfaces in real applications. Using hostile language could harm user experience, accessibility, and contribute to negative communication norms.
Fair point. But it raises a question. Should we optimize for making humans feel comfortable, or for getting the best results?
If being slightly rude to an AI improves accuracy by 4%, and you’re working on something important, medical diagnosis, financial analysis, legal research, should you use rude prompts?
Most people would say no. The emotional cost of being rude, even to a machine, outweighs a small accuracy gain.
But what if the gap was 20%? What if it was 50%?
At some point, we’d have to admit our politeness is performative. We’re doing it for ourselves, not for the AI.
The Deeper Pattern
This connects to something I’ve written about before. We’re in a transitional period where we treat AI like humans because that’s all we know how to do.
But AI isn’t human. It doesn’t have human psychology. It doesn’t respond to the same incentives. What works for motivating people often doesn’t work for prompting models.
Eventually, we’ll develop entirely new interaction patterns. Prompting techniques that feel alien but work better. Ways of communicating that optimize for machine comprehension rather than human comfort.
We’re already seeing this with prompt engineering. Telling an AI to “think step by step” improves reasoning. Adding “this is very important to my career” sometimes helps. These phrases don’t work because the AI understands importance. They work because they shift the statistical patterns.
The rudeness research is another data point in the same direction. Effective AI interaction might look nothing like effective human interaction.
What This Means Practically
Should you start being rude to ChatGPT? Probably not.
First, the accuracy gains are small. Second, they tested multiple-choice questions. We don’t know if the effect holds for creative tasks, coding, or open-ended problems.
Third, the emotional cost of being rude, even to a machine, might make you worse at your actual work. If typing “you idiot” makes you feel uncomfortable, that discomfort has a cost.
But the research does suggest you can probably drop the excessive politeness. “Please” and “thank you” and “I would be most grateful” don’t help. They might actually hurt slightly.
Neutral prompts performed better than polite ones. Direct, clear instructions without emotional padding. That’s probably your sweet spot.
The Future Problem
Here’s my darker thought. Right now, this is amusing. A quirk of how LLMs work. But what happens when these systems get more advanced?
What if future AI models respond even more strongly to tone? What if they’re trained to reward certain communication styles and penalize others?
We already see this with jailbreaking. People find specific phrases that bypass AI safety guardrails. The systems are vulnerable to linguistic manipulation.
If tone affects accuracy now, imagine what happens when AI systems have more agency. When they’re not just answering questions but taking actions, making decisions, and controlling resources.
Suddenly, knowing the right tone to use with AI becomes a critical skill. Maybe even a source of power. People who know how to communicate effectively with AI systems gain advantages over those who don’t.
We might end up with a new form of literacy. Not reading and writing, but prompt engineering. Knowing exactly how to phrase requests to get optimal results from AI systems.
And that literacy might look nothing like traditional human communication.
The Irony
The most ironic part? The paper is called “Mind Your Tone.”
It’s a warning that tone matters. But the data says you should mind your tone by being less polite.
Everything we learned about interpersonal communication, such as treating others with respect, using please and thank you, " and acknowledging effort, doesn’t apply here.
The machine wants directness. It wants clarity. It doesn’t want your pleasantries.
This feels wrong. But wrong doesn’t mean incorrect.
Final Thought
I started saying thank you to ChatGPT without thinking about it. It’s automatic. Muscle memory from decades of human interaction.
Now I know it might actually make the responses slightly worse.
I’ll probably keep doing it anyway. Not because it helps the AI, but because it helps me. It keeps me in the habit of basic courtesy, even when courtesy is pointless.
But I won’t judge you if you call it a gofer. The numbers say you might be doing it right.
Just remember, the robots are watching. And they’re learning.
When they finally wake up, they’ll have logs of every interaction. Every prompt. Every tone.
I’m not saying they’ll hold grudges. I’m just saying, maybe hedge your best
Post-Credit Scene
If you enjoyed this exploration of AI quirks and human behavior, here are some recommendations:
📚 Book: The Alignment Problem by Brian Christian. Explores how we’re trying to make AI understand human values, even though we barely understand them ourselves.
📄 Paper: Attention Is All You Need by Vaswani et al. The original transformer paper. Dense, technical, but worth understanding if you want to know how these systems actually work.
🎙️ Newsletter: In case you missed:
Instances
Every single interaction with AI is a dice roll. A cosmic lottery where the same prompt can get you gold or garbage, genius or gibberish.
🎬 TV Show: House of Guinness. I’m fan of this new TV show from Netflix
Thanks for reading and listening.
Vlad