Regular Expressions for Developers: A Practical Guide with Real-World Patterns
Regular expressions are one of the most powerful tools in a developer's toolkit. Learn regex syntax, quantifiers, groups, lookaheads, and a collection of real-world patterns for validating email, URLs, dates, and more.
What Are Regular Expressions?
A regular expression (regex) is a sequence of characters that defines a search pattern. Regex engines can test whether a string matches a pattern, extract matching substrings, or replace parts of a string.
Regex is supported in virtually every programming language (with minor syntax differences) and in most text editors, command-line tools (grep, sed), and database engines.
Core Syntax
Literal Characters
Most characters match themselves: cat matches the string "cat".
Character Classes
| Syntax | Matches |
|---|---|
[abc] | a, b, or c |
[^abc] | Any character except a, b, c |
[a-z] | Any lowercase letter |
[0-9] | Any digit |
. | Any character except newline |
\d | Digit [0-9] |
\w | Word character [a-zA-Z0-9_] |
\s | Whitespace [ \t\n\r\f\v] |
\D, \W, \S | Negated versions |
Quantifiers
| Syntax | Meaning |
|---|---|
* | 0 or more |
+ | 1 or more |
? | 0 or 1 (optional) |
{n} | Exactly n |
{n,} | At least n |
{n,m} | Between n and m |
Quantifiers are greedy by default (match as much as possible). Append ? to make them lazy (match as little as possible): .*?
Anchors
| Syntax | Meaning |
|---|---|
^ | Start of string (or line in multiline mode) |
$ | End of string (or line in multiline mode) |
\b | Word boundary |
\B | Non-word boundary |
Groups and References
(abc) — capturing group
(?:abc) — non-capturing group
(?<name>abc) — named capturing group
\1 — backreference to group 1
Lookaheads and Lookbehinds
(?=...) — positive lookahead: followed by
(?!...) — negative lookahead: not followed by
(?<=...) — positive lookbehind: preceded by
(?<!...) — negative lookbehind: not preceded by
Flags
| Flag | Meaning |
|---|---|
i | Case-insensitive |
g | Global (find all matches, not just first) |
m | Multiline (^ and $ match line start/end) |
s | Dotall (. matches newlines too) |
u | Unicode mode |
Using Regex in JavaScript
// Test for a match
/^\d+$/.test("12345"); // true
/^\d+$/.test("123a5"); // false
// Extract first match
"hello world".match(/\w+/); // ["hello"]
// Extract all matches (global flag)
"hello world".match(/\w+/g); // ["hello", "world"]
// Replace
"foo bar".replace(/\b\w/g, (c) => c.toUpperCase()); // "Foo Bar"
// Named groups
const { year, month, day } =
"2026-04-01".match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/)?.groups ?? {};
Common Patterns
Email (simple)
^[^\s@]+@[^\s@]+\.[^\s@]+$
Note: true email validation per RFC 5321 is extremely complex. Use this simple pattern for basic input sanitisation, but always verify email by sending a confirmation link.
URL
^https?:\/\/([\w-]+\.)+[\w-]+(\/[\w\-./?%&=]*)?$
UK Postcode
^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$
IPv4 Address
^(\d{1,3}\.){3}\d{1,3}$
(This allows invalid ranges like 999.999.999.999 — add numeric range checks in code.)
ISO 8601 Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Hex Colour Code
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Semantic Version
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\w.-]+))?(?:\+([\w.-]+))?$
Performance Pitfalls: Catastrophic Backtracking
Poorly written regex can cause exponential backtracking, hanging the regex engine.
Danger pattern:
(a+)+b
Against the input "aaaaaaaaaa" (no trailing b), this causes catastrophic backtracking because the engine tries every way to partition the as between the nested quantifiers.
Avoid:
- Nested quantifiers with overlapping matches
- Unbounded
.*in the middle of a pattern — use anchors or make quantifiers possessive (*+in PCRE)
For user-supplied regex patterns (e.g., search features), always validate or timeout the execution to prevent ReDoS (Regular expression Denial of Service).