What prompt engineering actually is.
Prompt engineering is the practice of designing inputs to large language models so they produce reliable, high-quality, on-target outputs. It's part programming, part interviewing, part writing.
An LLM is a probability machine. It samples the most likely continuation of your text. Everything you put in front of it — your role assignment, the context you give, the structure you impose, the examples you provide — shifts those probabilities. The prompt isn't a request. It's a configuration.
"You are not asking the model. You are programming the distribution it samples from."
Why it matters now
- Cost — a tight prompt uses fewer tokens and avoids retries.
- Quality — the gap between a casual prompt and an engineered one is often the gap between a useless and a deployable answer.
- Reliability — engineered prompts make output consistent across runs.
- Portability — prompts engineered with structure transfer cleanly between models.
The six components of a great prompt.
Almost every world-class prompt has these in some form. Drop one and quality drops with it.
1. Role / Persona
Anchor the model in an identity. Be specific — "senior copy editor with 15 years at major broadsheets" beats "good writer".
2. Context
What the model needs to know: the audience, the situation, the surrounding data.
3. Task
The actual instruction in imperative voice. One core task per prompt.
4. Constraints
Length, tone, things to avoid, things to always include.
5. Output format
Bullet list? JSON? Markdown table? Models will improvise format unless told.
6. Examples (few-shot)
One or two input→output pairs is the single highest-leverage move you can make.
// Universal skeleton
## Role
You are a [specific role].
## Context
[What the model needs to know.]
## Task
[Imperative instruction.]
## Constraints
- [Length, tone, taboos.]
## Output Format
[Exact structure.]
## Example
Input: [...]
Output: [...]
## Now do this:
{{INPUT}}
Techniques, ranked by leverage.
Not all techniques are equal. These move the needle most.
Few-shot prompting
Provide 1–5 input→output examples. If you only do one thing, do this.
Chain-of-Thought (CoT)
Add "think step by step." Massively improves math, logic, multi-step problems.
Role prompting
"You are a..." with specificity. Pulls the model into the right region of its training.
Negative prompting
Tell it what NOT to do. "Do not use the words synergy, leverage, or robust."
ReAct (Reason + Act)
Interleave reasoning steps with tool calls. Foundation of every serious agent.
Prompts that think.
For reasoning, math, or multi-step logic, you must give the model space to work.
The think-then-answer pattern
First, think through the problem inside <scratchpad> tags.
Consider edge cases. List your assumptions.
Then provide your final answer inside <answer> tags.
Self-critique
Ask for an answer. Ask the model to critique it. Ask it to revise. Quality lifts noticeably.
"Reasoning is just giving the model permission to write more before deciding."
Per-model quirks.
Claude
Loves XML tags: <role>, <task>, <context>, <examples>. Most important instruction first.
GPT-4 / 5
Markdown-first. ## headers, numbered lists. Responds well to "step 1, step 2" instructions.
Gemini
Examples-first. Lead with 1–2 demonstrations before stating the task. Strong at multimodal.
Llama / open
Keep it tight. Short attention. One example, clear task, exit.
Mistral
Direct, instructional. Dislikes role-play wrappers. Prefers brevity.
Universal
Markdown sections, no model-specific syntax, explicit format, one example.
Iteration and evaluation.
You don't write a great prompt. You iterate to one.
The iteration loop
- Run the prompt on 5–10 representative inputs.
- Score each output against a rubric you write down.
- Diagnose the worst output. Ask why it failed.
- Patch the prompt to fix that specific failure.
- Re-run. Make sure the patch didn't break the working cases.
"The best prompt engineers are also the best at admitting their prompt is bad and fixing it."
Pitfalls and how to avoid them.
Vague roles
"You are an expert" is empty. Specify domain, seniority, context.
Compound tasks
Six things in one prompt → mediocre output on all six. Split it.
No format spec
If you don't specify format, the model invents one. Always state structure.
Implicit assumptions
Make every assumption explicit — audience, tone, taboos, constraints.
Trusting the first run
Run the prompt 5 times. The variance is what tells you whether it's reliable.
Ignoring the model
A prompt for GPT-4 may flop on Gemini. Test on the actual model you'll deploy on.