← All guides

Regex Basics Every Developer Should Know

Regular expressions look like line noise until you learn that they are built from a small number of pieces. Master maybe eight of them and you can handle the vast majority of everyday matching: finding patterns, validating input loosely, and search-and-replace across a file. This is the core set, with examples you can paste straight into a regex tester to watch them work.

Literal characters and the dot

Most characters in a regex match themselves: cat matches the text “cat”. The exception is a set of special characters. The most important is the dot ., which matches any single character except a line break. That power is also the most common beginner trap: to match a literal full stop you must escape it as \., otherwise it matches everything.

Character classes

Square brackets define a set of allowed characters. [aeiou] matches any one vowel; [0-9] matches a digit; [a-zA-Z] matches any letter. A caret inside the brackets negates the set, so [^0-9] matches anything that is not a digit. There are shorthands too: \d for a digit, \w for a word character, and \s for whitespace.

Quantifiers: how many

Quantifiers say how many times the previous item may repeat:

  • *: zero or more
  • +: one or more
  • ?: zero or one (optional)
  • {2,4}: between two and four times

So \d{3}-\d{4} matches a seven-digit number split by a hyphen, and colou?r matches both “color” and “colour”.

Anchors: where it matches

Anchors pin a match to a position rather than a character. ^ means the start of the line and $ means the end. ^\d+$ matches a line that is entirely digits, which is the usual way to say “this whole field must be a number.” Without anchors, a pattern matches anywhere inside the text.

A huge share of regex bugs come from forgetting anchors. \d+ matches the digits inside “abc123”; ^\d+$ rejects it. Decide up front: do you mean “contains” or “is”?

Groups and alternation

Parentheses group part of a pattern so a quantifier applies to the whole group, and they also capture what matched for reuse. The pipe | means “or.” So (cat|dog)s? matches cat, cats, dog or dogs. Capturing groups are what power search-and-replace: you match with a group and refer back to it in the replacement.

Greedy vs lazy

By default quantifiers are greedy: they match as much as possible. <.*> against <a><b> matches the whole thing, not just the first tag. Add a ? to make it lazy (<.*?>) and it matches the smallest piece, one tag at a time. When a pattern “matches too much,” greed is usually the culprit.

A word on email validation

Resist the urge to write the perfect email regex. The full specification is so permissive that a complete pattern is monstrous and still cannot confirm the address exists. A loose check for an @ and a dotted domain catches typos; the only real validation is sending a confirmation message. Test your pattern against real and fake samples in the tester before trusting it.

Frequently asked questions

What does the dot mean in regex?

A plain dot matches any single character except a line break. To match a literal full stop, escape it with a backslash as a backslash followed by a dot, otherwise it will match far more than you expect.

What is the difference between greedy and lazy matching?

Greedy quantifiers grab as much text as possible and then give back if needed. Adding a question mark makes a quantifier lazy, so it grabs as little as possible. Lazy matching is often what you want when matching between delimiters.

Why is my regex matching too much?

Usually because of a greedy quantifier or an unescaped dot. A pattern like dot-star is greedy and will stretch across more than one item on a line. Make the quantifier lazy or use a more specific character class.

Should I validate email addresses with regex?

Only loosely. A short pattern that checks for an at sign and a dot in the domain catches obvious mistakes. Fully validating every legal email with regex is impractical, so confirm real addresses by sending a verification message.