Slugify, Case-convert and Clean Text for the Web
Text that looks fine on screen is often quietly messy underneath: a double space here, a stray tab there, a title in the wrong case, a URL slug with characters that break links. None of it is dramatic, but it causes the small, maddening bugs that waste an afternoon. A short, repeatable cleaning routine fixes the lot before it reaches your page or database.
Step 1: tame the whitespace
Whitespace is the silent troublemaker. Copying from a PDF, a spreadsheet or a chat app drags along double spaces, tab characters, non-breaking spaces and trailing blanks at the ends of lines. These break visual alignment, throw off exact-match searches and inflate character counts. Collapsing repeated spaces into one and trimming the ends is the first thing to do. A whitespace remover does it in a click, and a letter counter lets you confirm the length afterwards.
Always clean whitespace first. If you slugify or case-convert before trimming, stray spaces sneak into the result, turning into extra hyphens in a slug, for instance.
Step 2: fix the case
Inconsistent capitalisation reads as careless and can break sorting and matching. Decide on a convention for the context (sentence case for body copy, title case for headings, lowercase for tags and slugs) and apply it uniformly. A case converter switches a whole block between sentence case, Title Case, UPPERCASE and lowercase without retyping, which is far less error-prone than fixing words by hand.
Step 3: generate a clean slug
A slug is the readable tail of a URL, like clean-text-for-the-web. A good one is
lowercase, uses hyphens between words, and contains only URL-safe characters. Making one means
lowercasing the title, replacing spaces with hyphens, and stripping punctuation and accents that have
no business in a link. A slug generator handles the edge
cases, like accented letters, ampersands and double hyphens, that hand-typed slugs usually get
wrong.
Why lowercase slugs matter
On some servers, URLs are case-sensitive, which means /About and /about can
be treated as two different pages. That invites duplicate-content problems and broken links. Sticking
to lowercase, hyphenated slugs sidesteps the whole class of issue and keeps links predictable.
The routine, in order
- Collapse repeated whitespace and trim the ends.
- Convert to the case the context calls for.
- Generate the slug last, from the cleaned title.
Run in that order every time and pasted text stops being a source of surprises. If your raw text is structured data rather than prose, a JSON formatter is the equivalent tidy-up step before you store or display it.
Frequently asked questions
What is a slug?
A slug is the human-readable part of a URL that identifies a page, usually lowercase words separated by hyphens. It is made by lowercasing a title, replacing spaces with hyphens and removing characters that are unsafe in URLs.
Why does pasted text break my layout or search?
Pasted text often carries invisible extras: double spaces, tab characters, non-breaking spaces and trailing blanks. These can break alignment, throw off exact matches and inflate character counts, so trimming them first prevents subtle bugs.
Should slugs be uppercase or lowercase?
Lowercase. URLs can be case-sensitive on some servers, so mixing cases risks two URLs pointing at what looks like one page. Lowercase, hyphenated slugs avoid duplicate-content and broken-link problems.
What order should I clean text in?
Collapse and trim whitespace first, then fix the case, then generate the slug last. Cleaning whitespace before slugifying ensures stray spaces do not turn into stray hyphens in the final URL.