Regex Explained¶

A regular expression (regex) is a pattern that describes text. It lets you search for, match, and extract specific patterns from strings — like finding all email addresses in a document or validating that a phone number has the right format. Python's re module provides regex support.

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize
You are here	Projects	Videos	Quiz	Flashcards	Diagrams

Why This Matters¶

String methods like .find() and .startswith() work for simple cases, but they fall apart when patterns are complex. "Find every word that starts with a capital letter and ends with a number" is one line of regex but dozens of lines of string manipulation. Regex is a universal skill — the same patterns work in Python, JavaScript, SQL, and most editors.

The `re` module basics¶

import re

text = "My phone number is 555-123-4567 and my zip is 97201"

# search — find the first match:
match = re.search(r"\d{3}-\d{3}-\d{4}", text)
if match:
    print(match.group())    # "555-123-4567"

# findall — find ALL matches:
numbers = re.findall(r"\d+", text)
print(numbers)    # ["555", "123", "4567", "97201"]

# sub — replace matches:
cleaned = re.sub(r"\d", "X", text)
print(cleaned)    # "My phone number is XXX-XXX-XXXX and my zip is XXXXX"

Always use raw strings (r"...") for regex patterns — this prevents Python from interpreting backslashes before the regex engine sees them.

Character classes¶

Pattern	Matches	Example
`\d`	Any digit (0-9)	`\d+` matches `"42"`
`\D`	Any non-digit	`\D+` matches `"hello"`
`\w`	Word character (letter, digit, underscore)	`\w+` matches `"hello_42"`
`\W`	Non-word character	`\W+` matches `"!! "`
`\s`	Whitespace (space, tab, newline)	`\s+` matches `" \t"`
`\S`	Non-whitespace	`\S+` matches `"hello"`
`.`	Any character except newline	`a.c` matches `"abc"`, `"a3c"`
`[abc]`	Any of a, b, or c	`[aeiou]` matches vowels
`[^abc]`	Any character NOT a, b, or c	`[^0-9]` matches non-digits
`[a-z]`	Any lowercase letter	`[A-Za-z]` matches any letter

Quantifiers — how many?¶

Pattern	Meaning	Example
`*`	Zero or more	`\d*` matches `""`, `"5"`, `"42"`
`+`	One or more	`\d+` matches `"5"`, `"42"` but not `""`
`?`	Zero or one	`colou?r` matches `"color"` and `"colour"`
`{3}`	Exactly 3	`\d{3}` matches `"123"`
`{2,4}`	Between 2 and 4	`\d{2,4}` matches `"12"`, `"123"`, `"1234"`
`{2,}`	2 or more	`\d{2,}` matches `"12"`, `"123456"`

Anchors — where in the string?¶

Pattern	Meaning
`^`	Start of string
`$`	End of string
`\b`	Word boundary

# Only match if the entire string is digits:
re.match(r"^\d+$", "12345")    # Match
re.match(r"^\d+$", "123abc")   # No match

# Word boundaries — match whole words:
re.findall(r"\bcat\b", "the cat sat on the catalog")
# ["cat"] — does NOT match "cat" inside "catalog"

Groups — capturing parts of a match¶

Parentheses () create groups that capture parts of the match:

text = "2024-01-15"
match = re.search(r"(\d{4})-(\d{2})-(\d{2})", text)

if match:
    print(match.group())     # "2024-01-15" (entire match)
    print(match.group(1))    # "2024" (first group)
    print(match.group(2))    # "01" (second group)
    print(match.group(3))    # "15" (third group)

Named groups make code more readable:

match = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text)

if match:
    print(match.group("year"))     # "2024"
    print(match.group("month"))    # "01"
    print(match.group("day"))      # "15"

`match` vs `search` vs `findall`¶

text = "hello 42 world 99"

# match — only checks the BEGINNING of the string:
re.match(r"\d+", text)         # None (string starts with "hello")
re.match(r"\d+", "42 cats")    # Match: "42"

# search — finds the FIRST match anywhere:
re.search(r"\d+", text)        # Match: "42"

# findall — finds ALL matches, returns a list:
re.findall(r"\d+", text)       # ["42", "99"]

# finditer — like findall but returns match objects:
for m in re.finditer(r"\d+", text):
    print(f"Found {m.group()} at position {m.start()}")

`re.compile` — precompile for performance¶

If you use the same pattern many times, compile it once:

pattern = re.compile(r"\d{3}-\d{3}-\d{4}")

# Now use the pattern object:
pattern.search(text1)
pattern.findall(text2)
pattern.sub("XXX-XXX-XXXX", text3)

This is faster when the pattern is used in a loop.

Common patterns¶

# Email (simplified):
email_pattern = r"[\w.+-]+@[\w-]+\.[\w.]+"
re.findall(email_pattern, "Contact alice@example.com or bob@test.org")
# ["alice@example.com", "bob@test.org"]

# URL:
url_pattern = r"https?://[\w./\-?=&#]+"
re.findall(url_pattern, "Visit https://example.com/page?id=1")

# Phone number (US):
phone_pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"

# IP address:
ip_pattern = r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"

# Extract key=value pairs:
kv_pattern = r"(\w+)=(\w+)"
re.findall(kv_pattern, "name=Alice age=30 city=Portland")
# [("name", "Alice"), ("age", "30"), ("city", "Portland")]

Flags¶

# Case-insensitive matching:
re.findall(r"python", "Python PYTHON python", re.IGNORECASE)
# ["Python", "PYTHON", "python"]

# Multiline — ^ and $ match start/end of each line:
re.findall(r"^\w+", "hello\nworld\nfoo", re.MULTILINE)
# ["hello", "world", "foo"]

# DOTALL — . matches newlines too:
re.search(r"hello.world", "hello\nworld", re.DOTALL)
# Match: "hello\nworld"

# VERBOSE — add comments to complex patterns:
pattern = re.compile(r"""
    (\d{4})    # year
    -
    (\d{2})    # month
    -
    (\d{2})    # day
""", re.VERBOSE)

Common Mistakes¶

Forgetting the raw string:

# WRONG — Python interprets \b as backspace:
re.search("\bword\b", text)

# RIGHT — raw string:
re.search(r"\bword\b", text)

Greedy vs lazy matching:

text = "<b>bold</b> and <b>more bold</b>"

# Greedy (default) — matches as MUCH as possible:
re.search(r"<b>.*</b>", text).group()
# "<b>bold</b> and <b>more bold</b>"

# Lazy (add ?) — matches as LITTLE as possible:
re.search(r"<b>.*?</b>", text).group()
# "<b>bold</b>"

Not checking for None from search():

# WRONG — crashes if no match:
result = re.search(r"\d+", "no numbers here").group()

# RIGHT — check first:
match = re.search(r"\d+", "no numbers here")
if match:
    result = match.group()

Practice¶

Level 1 / 08 Log Level Counter
Module 01 Web Scraping — extracting data from HTML
Module 02 CLI Tools — parsing user input

Quick check: Take the quiz (coming soon)

Review: Flashcard decks Practice reps: Coding challenges