Regular Expressions (Regex) for text pattern matching. Current versions: Python 3.12, JavaScript (ECMAScript 2024/ES15).

Core Syntax

Regex literals (JS) or compiled patterns (Python) define the search pattern. Flags modify behavior.

// JavaScript: Regex literal (preferred for static patterns)
const reLiteral = /abc/i; // Matches "abc" case-insensitively

// JavaScript: RegExp constructor (for dynamic patterns from strings)
const patternString = "xyz";
const reConstructor = new RegExp(patternString, 'g'); // Matches "xyz" globally

// Test method returns boolean
console.log(reLiteral.test("ABC")); // true
console.log(reConstructor.test("0xyz1xyz2")); // true
import re

# Python: Compile a regex pattern (preferred for repeated use)
pattern_compiled = re.compile(r"abc", re.IGNORECASE) # Matches "abc" case-insensitively

# Python: Direct function call (for one-off uses)
match_obj = re.search(r"xyz", "0xyz1xyz2", re.M) # Searches for "xyz" in the string

# Match object evaluates to True if a match is found
print(bool(pattern_compiled.search("ABC"))) # True
print(bool(match_obj)) # True

Essential Patterns

Character classes simplify matching common types of characters. Quantifiers specify how many times a character or group must appear.

// JavaScript: Character classes and quantifiers
const text = "The quick brown fox jumps over 12 lazy dogs.";

// \d+ : one or more digits
console.log(text.match(/\d+/g)); // ["12"]

// \w+ : one or more word characters (alphanumeric + underscore)
console.log(text.match(/\w+/g)); // ["The", "quick", "brown", "fox", "jumps", "over", "12", "lazy", "dogs"]

// \s+ : one or more whitespace characters
console.log(text.match(/\s+/g)); // [" ", " ", " ", " ", " ", " ", " ", " ", " "]

// . : any character (except newline)
console.log("a.b".match(/a.b/)); // ["a.b"]

// {n,m} : between n and m occurrences
console.log("aaaaabbb".match(/a{2,4}/)); // ["aaaa"] (greedy by default)
import re

text = "The quick brown fox jumps over 12 lazy dogs."

# Python: Character classes and quantifiers
# \d+ : one or more digits
print(re.findall(r"\d+", text)) # ['12']

# \w+ : one or more word characters (alphanumeric + underscore)
print(re.findall(r"\w+", text)) # ['The', 'quick', 'brown', 'fox', 'jumps', 'over', '12', 'lazy', 'dogs']

# \s+ : one or more whitespace characters
print(re.findall(r"\s+", text)) # [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

# . : any character (except newline)
print(re.search(r"a.b", "a.b").group(0)) # a.b

# {n,m} : between n and m occurrences
print(re.search(r"a{2,4}", "aaaaabbb").group(0)) # aaaa (greedy by default)

Common Use Cases

Regex excels at data validation, extraction, and string manipulation like find-and-replace.

// JavaScript: Email validation (basic)
const email = "[email protected]";
const invalidEmail = "invalid-email";
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;

console.log(emailPattern.test(email)); // true
console.log(emailPattern.test(invalidEmail)); // false

// JavaScript: Extracting specific data with capturing groups
const logEntry = "ERROR 2025-12-27 User 'admin' failed login from 192.168.1.100";
const logPattern = /ERROR (\d{4}-\d{2}-\d{2}) User '(\w+)' failed login from (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/;
const match = logEntry.match(logPattern);

if (match) {
    console.log(`Date: ${match[1]}, User: ${match[2]}, IP: ${match[3]}`);
}
// JavaScript: Replace all occurrences
const sentence = "The dog chased the cat. The cat ran away.";
console.log(sentence.replace(/the/gi, "a")); // "A dog chased a cat. A cat ran away."
import re

# Python: Email validation (basic)
email = "[email protected]"
invalid_email = "invalid-email"
email_pattern = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")

print(bool(email_pattern.match(email))) # True
print(bool(email_pattern.match(invalid_email))) # False

# Python: Extracting specific data with capturing groups
log_entry = "ERROR 2025-12-27 User 'admin' failed login from 192.168.1.100"
log_pattern = re.compile(r"ERROR (\d{4}-\d{2}-\d{2}) User '(\w+)' failed login from (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")
match = log_pattern.search(log_entry)

if match:
    # Access groups by index or name
    print(f"Date: {match.group(1)}, User: {match.group(2)}, IP: {match.group(3)}")

# Python: Replace all occurrences
sentence = "The dog chased the cat. The cat ran away."
print(re.sub(r"the", "a", sentence, flags=re.IGNORECASE)) # "A dog chased a cat. A cat ran away."

Gotchas & Best Practices

Be aware of greedy vs. non-greedy matching, backslash escaping, and catastrophic backtracking.

// JavaScript: Greedy vs. Non-greedy quantifiers
const html = "<div><span>Hello</span><span>World</span></div>";

// Greedy: Matches the longest possible string
console.log(html.match(/<span>.*<\/span>/)); // ["<span>Hello</span><span>World</span>"]

// Non-greedy (add '?'): Matches the shortest possible string
console.log(html.match(/<span>.*?<\/span>/g)); // ["<span>Hello</span>", "<span>World</span>"]

// JavaScript: Escaping special characters
// To match a literal dot, use \.
console.log("file.txt".match(/file\.txt/)); // ["file.txt"]
console.log("file.txt".match(/file.txt/)); // ["file.txt"] - also matches "fileXtxt"
import re

# Python: Greedy vs. Non-greedy quantifiers
html = "<div><span>Hello</span><span>World</span></div>"

# Greedy: Matches the longest possible string
print(re.search(r"<span>.*</span>", html).group(0)) # <span>Hello</span><span>World</span>

# Non-greedy (add '?'): Matches the shortest possible string
print(re.findall(r"<span>.*?</span>", html)) # ['<span>Hello</span>', '<span>World</span>']

# Python: Raw strings for regex patterns
# Use r"" to avoid issues with backslashes being interpreted as escape sequences by Python string literal parser.
print(re.search(r"\bword\b", "a word is here").group(0)) # word
# print(re.search("\bword\b", "a word is here")) # Warning: \b is backspace character in string literal

# Python: Catastrophic backtracking example (avoid patterns like (a+)+)
# This pattern is highly inefficient and can lead to ReDoS attacks or timeouts.
# re.match(r"(a+)+b", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac") # Will be very slow or fail

Advanced Techniques

Lookaheads and lookbehinds assert conditions without consuming characters. Named groups improve readability.

// JavaScript (ES2018+): Named Capturing Groups
const dateString = "2025-12-27";
const datePattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const matchDate = dateString.match(datePattern);

if (matchDate) {
    console.log(`Year: ${matchDate.groups.year}, Month: ${matchDate.groups.month}, Day: ${matchDate.groups.day}`);
}

// JavaScript: Positive Lookahead (?=...)
// Matches "foo" only if it's followed by "bar" (but "bar" is not included in the match)
console.log("foobar".match(/foo(?=bar)/)); // ["foo"]
console.log("foobaz".match(/foo(?=bar)/)); // null

// JavaScript: Negative Lookahead (?!...)
// Matches "foo" only if it's NOT followed by "bar"
console.log("foobaz".match(/foo(?!bar)/)); // ["foo"]
console.log("foobar".match(/foo(?!bar)/)); // null
import re

# Python: Named Capturing Groups
date_string = "2025-12-27"
date_pattern = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})")
match_date = date_pattern.search(date_string)

if match_date:
    print(f"Year: {match_date.group('year')}, Month: {match_date.group('month')}, Day: {match_date.group('day')}")

# Python: Positive Lookahead (?=...)
# Matches "foo" only if it's followed by "bar"
print(re.search(r"foo(?=bar)", "foobar").group(0)) # foo
print(re.search(r"foo(?=bar)", "foobaz")) # None

# Python: Negative Lookahead (?!...)
# Matches "foo" only if it's NOT followed by "bar"
print(re.search(r"foo(?!bar)", "foobaz").group(0)) # foo
print(re.search(r"foo(?!bar)", "foobar")) # None

# Python: Positive Lookbehind (?<=...)
# Matches "bar" only if it's preceded by "foo"
print(re.search(r"(?<=foo)bar", "foobar").group(0)) # bar

# Python: Negative Lookbehind (?<!...)
# Matches "bar" only if it's NOT preceded by "foo"
print(re.search(r"(?<!foo)bar", "bazbar").group(0)) # bar

Quick Reference

  • Anchors: ^ (start), $ (end), \b (word boundary), \B (non-word boundary)
  • Quantifiers: * (0+), + (1+), ? (0 or 1), {n} (exactly n), {n,} (n+), {n,m} (n to m), ? (non-greedy suffix)
  • Character Classes: . (any char except newline), \d (digit), \D (non-digit), \w (word char), \W (non-word char), \s (whitespace), \S (non-whitespace), [abc] (any of a,b,c), [^abc] (not a,b,c)
  • Groups: (...) (capturing), (?:...) (non-capturing), (?<name>...) (JS/Python named capturing), (?P<name>...) (Python named capturing)
  • Alternation: | (OR)
  • Flags (JS): i (ignore case), g (global), m (multiline), u (unicode), s (dotall)
  • Flags (Python): re.IGNORECASE, re.MULTILINE, re.DOTALL, re.VERBOSE, re.ASCII, re.UNICODE
  • JS Methods: test(), match(), matchAll(), search(), replace(), split()
  • Python Methods (re module): search(), match(), fullmatch(), findall(), finditer(), sub(), split()

References

  1. MDN Web Docs: Regular expressions - JavaScript
  2. Python Docs: re — Regular expression operations
  3. Regex101.com - Online regex tester and debugger (useful for learning and testing patterns)

This page is AI-assisted. References official documentation.