how-it-works ai-writing-voice voice-profile ai-writing-style

Wilfred OkajevoApril 8, 20266 min read

How to train AI on your writing style: inside the extraction engine

5-10 samples per format. One extraction. About 90 seconds. Here's what happens inside.

Your writing samples in, your voice profile out. 5-10 samples per format, one extraction, about 90 seconds.

What comes out is a layered voice profile in Markdown: readable, portable, editable, and entirely yours to edit or change.

This post explains what goes in, what the engine analyzes, and what prevents it from making things up.

What goes in, what comes out

You give the engine writing samples: tweets, blog posts, emails, whatever you write regularly. 5-10 per format is enough. The engine needs signal, not volume.

What comes back is a Markdown voice profile with two layers:

Core identity. The patterns that hold across everything you write. Word choices, sentence rhythms, argument tendencies, analogy sources. These are stable. They show up in your tweets and your blog posts and your emails.
Format-specific contexts. The patterns that shift depending on what you're writing. Your tweets have a different density than your essays. Your emails shift register. The engine captures those differences separately so generation stays accurate per format.

The whole process takes about 90 seconds: no training, no fine-tuning, one extraction pass over your samples.

What the engine analyzes

Six dimensions. Each one captures a different layer of how you write.

Word choices and vocabulary. Which words you reach for. Which ones you avoid. Not just the obvious stuff (formal vs. casual). The specific verbs, qualifiers, and transitions that show up consistently. The words you never use are just as revealing as the ones you do.

Sentence patterns. Length distribution. How you open sentences. How you end them. Punctuation habits. Whether you write in fragments or compound constructions. These patterns are automatic for most writers. You don't choose them consciously. That's why they're so distinctive.

Argument structures. Do you lead with the conclusion or build to it? How many supporting points before you move on? Do you use concessions ("sure, but...") or just assert? The shape of your arguments is a fingerprint.

Rhetorical moves. Named patterns the engine identifies and labels. How you use analogies. How you transition between ideas. How you open and close pieces. These moves repeat across your writing, and they're specific enough to name: "opens with a counterintuitive claim," "uses a concrete counting move to ground abstractions."

Analogy domains. Where your metaphors come from. Cooking, construction, sports, software, physics. Most writers draw from 2-3 domains habitually. The engine maps these and notes which domains you never touch.

Format-specific pacing. How your sentence structure shifts between tweets and long-form. Tweets compress. Blog posts breathe. The pacing is measurably different, and the engine captures the specific ways yours diverge.

6 dimensions of voice extraction: word choices, sentence patterns, argument structure, rhetorical moves, analogy domains, format-specific pacing

Every finding traces back to real writing

This is the part that matters most for trust.

Every pattern in the profile includes cited examples from your actual writing. The engine doesn't just say "you tend to use short declarative sentences." It points to the specific sentences. Real quotes from your real samples.

Then the quality check runs. It verifies that every cited example actually exists in the source material. If something doesn't check out, the profile gets regenerated.

Nothing fabricated, nothing inferred from vibes, nothing aspirational. If the engine says you do something, it can show you where.

Across all test runs: 0 fabricated examples. Every finding traces back to real writing.

The benchmark

We tested the extraction engine against a 300-line hand-crafted voice guide we'd built manually over several weeks. Same writing samples. Line-by-line comparison.

The results:

90% coverage. Nine out of ten hand-documented patterns appeared in the engine's output. Different phrasing sometimes. Same underlying observations.

8 patterns the engine caught that the human author missed. Consistent punctuation avoidance. Analogy clustering across pieces. Concession-specific sentence constructions. Patterns that were too granular or too automatic for us to notice ourselves.

0 fabrications. Every pattern the engine produced traced back to evidence in the source text.

The engine doesn't just match human analysis. It extends it. Finds threads you didn't know were there. And it does it in 90 seconds instead of several weeks.

Why Markdown profiles beat the alternatives

There are other ways to get AI to write like you. Each one trades something important.

Approach	Input required	Voice depth	Portability	Inspectable
System prompts (manual)	Hours of self-analysis	~10% of patterns	Copy-paste	Yes
Fine-tuning	Thousands of examples	Deep but opaque	Locked to one model	No
ChatGPT memory	Conversations over time	Captures what you tell it, not what it discovers	Locked to ChatGPT	Partially
Training over time	Ongoing corrections	Degrades, resets	Not portable	No
Noren voice profile	5-10 samples per format	50+ named patterns	Works with any LLM	Fully readable and editable

System prompts capture what you can articulate about your own writing. That's about 10%. Fine-tuning captures depth but locks you to one model and produces a black box you can't inspect or edit. ChatGPT memory knows what you've told it, not what it's discovered. Training-over-time approaches degrade and don't transfer.

A Noren profile is a Markdown file. Open it. Read every pattern. Change anything you disagree with. Feed it to Claude, GPT, Gemini, or a local model. It's yours.

Technical choices

Noren is a Tauri app (Rust backend, web frontend). About 10MB binary. Sits in your menu bar.

Provider-agnostic. Bring your own API key for Anthropic, OpenAI, Gemini, or run local models through Ollama. Your writing samples and your voice profile stay on your machine. No telemetry. No data collection.

Profiles are Markdown because Markdown is durable. It works with every LLM. It works in five years. It works if we disappear tomorrow. The fabric of your voice shouldn't depend on our infrastructure existing.

macOS-first. Deep accessibility API integration for Cmd+K overlay. Write in whatever app you're already using.

Open source app, your data

The desktop app is open source. The extraction engine is proprietary, but the profile it produces is yours. Markdown file. Read it, edit it, take it anywhere.

Data stays local. Nothing leaves your machine unless you choose to use a cloud API provider. No telemetry.

Try Noren free or see pricing details.

FAQ

How many writing samples do I need?

5-10 per format where you write: tweets, blog posts, emails, etc. The engine needs signal variety, not volume. Include samples from different time periods and contexts so it sees what's stable versus what shifts.

What if I don't write enough to have 5-10 samples?

Start with what you have. The more samples, the richer the profile. But even 3 solid samples in a format can produce usable patterns. Add more as you write.

Can I edit the profile after extraction?

Yes. The profile is a readable Markdown file. You can change any pattern, remove ones you disagree with, or add new ones you notice. The profile is yours to inspect and customize.

Does the profile work with all AI models?

Yes. It's a system prompt in Markdown format. Works with Claude, GPT, Gemini, Ollama, or any LLM that accepts text system prompts. If the model is available as an API, the profile works.

How often should I re-extract?

Re-extract when your writing changes noticeably. Roughly every 6 months if you're actively writing. The extraction captures what's current, so periodic updates keep the profile aligned with your actual voice.

What if the profile has a pattern I disagree with?

Change it. The profile is a tool for you, not a constraint. If the engine found a pattern that doesn't feel like you, edit it out or modify the description.

Yes. It's a Markdown file. Share it with a ghostwriter, colleague, or team member. They can load it into their preferred AI tool and generate in your voice.

Wilfred Okajevo

ML and cognitive science engineer. Spent 5 months researching how writing patterns encode identity. Built a voice extraction engine that scores 90% coverage with 0 fabrications. Cofounder, Noren.

XLinkedIn

Your pattern is waiting.

Extract your writing patterns. Generate text that sounds like you.