Regular Expressions (Regex) for text pattern matching. Current versions: Python 3.12, JavaScript (ECMAScript 2024/ES15).
Core Syntax
Regex literals (JS) or compiled patterns (Python) define the search pattern. Flags modify behavior.
// JavaScript: Regex literal (preferred for static patterns)
const reLiteral = /abc/i; // Matches "abc" case-insensitively
// JavaScript: RegExp constructor (for dynamic patterns from strings)
const patternString = "xyz";
const reConstructor = new RegExp(patternString, 'g'); // Matches "xyz" globally
// Test method returns boolean
console.log(reLiteral.test("ABC")); // true
console.log(reConstructor.test("0xyz1xyz2")); // true
import re
# Python: Compile a regex pattern (preferred for repeated use)
pattern_compiled = re.compile(r"abc", re.IGNORECASE) # Matches "abc" case-insensitively
# Python: Direct function call (for one-off uses)
match_obj = re.search(r"xyz", "0xyz1xyz2", re.M) # Searches for "xyz" in the string
# Match object evaluates to True if a match is found
print(bool(pattern_compiled.search("ABC"))) # True
print(bool(match_obj)) # True
Essential Patterns
Character classes simplify matching common types of characters. Quantifiers specify how many times a character or group must appear.
// JavaScript: Character classes and quantifiers
const text = "The quick brown fox jumps over 12 lazy dogs.";
// \d+ : one or more digits
console.log(text.match(/\d+/g)); // ["12"]
// \w+ : one or more word characters (alphanumeric + underscore)
console.log(text.match(/\w+/g)); // ["The", "quick", "brown", "fox", "jumps", "over", "12", "lazy", "dogs"]
// \s+ : one or more whitespace characters
console.log(text.match(/\s+/g)); // [" ", " ", " ", " ", " ", " ", " ", " ", " "]
// . : any character (except newline)
console.log("a.b".match(/a.b/)); // ["a.b"]
// {n,m} : between n and m occurrences
console.log("aaaaabbb".match(/a{2,4}/)); // ["aaaa"] (greedy by default)
import re
text = "The quick brown fox jumps over 12 lazy dogs."
# Python: Character classes and quantifiers
# \d+ : one or more digits
print(re.findall(r"\d+", text)) # ['12']
# \w+ : one or more word characters (alphanumeric + underscore)
print(re.findall(r"\w+", text)) # ['The', 'quick', 'brown', 'fox', 'jumps', 'over', '12', 'lazy', 'dogs']
# \s+ : one or more whitespace characters
print(re.findall(r"\s+", text)) # [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
# . : any character (except newline)
print(re.search(r"a.b", "a.b").group(0)) # a.b
# {n,m} : between n and m occurrences
print(re.search(r"a{2,4}", "aaaaabbb").group(0)) # aaaa (greedy by default)
Common Use Cases
Regex excels at data validation, extraction, and string manipulation like find-and-replace.
// JavaScript: Email validation (basic)
const email = "[email protected]";
const invalidEmail = "invalid-email";
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
console.log(emailPattern.test(email)); // true
console.log(emailPattern.test(invalidEmail)); // false
// JavaScript: Extracting specific data with capturing groups
const logEntry = "ERROR 2025-12-27 User 'admin' failed login from 192.168.1.100";
const logPattern = /ERROR (\d{4}-\d{2}-\d{2}) User '(\w+)' failed login from (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/;
const match = logEntry.match(logPattern);
if (match) {
console.log(`Date: ${match[1]}, User: ${match[2]}, IP: ${match[3]}`);
}
// JavaScript: Replace all occurrences
const sentence = "The dog chased the cat. The cat ran away.";
console.log(sentence.replace(/the/gi, "a")); // "A dog chased a cat. A cat ran away."
import re
# Python: Email validation (basic)
email = "[email protected]"
invalid_email = "invalid-email"
email_pattern = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
print(bool(email_pattern.match(email))) # True
print(bool(email_pattern.match(invalid_email))) # False
# Python: Extracting specific data with capturing groups
log_entry = "ERROR 2025-12-27 User 'admin' failed login from 192.168.1.100"
log_pattern = re.compile(r"ERROR (\d{4}-\d{2}-\d{2}) User '(\w+)' failed login from (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")
match = log_pattern.search(log_entry)
if match:
# Access groups by index or name
print(f"Date: {match.group(1)}, User: {match.group(2)}, IP: {match.group(3)}")
# Python: Replace all occurrences
sentence = "The dog chased the cat. The cat ran away."
print(re.sub(r"the", "a", sentence, flags=re.IGNORECASE)) # "A dog chased a cat. A cat ran away."
Gotchas & Best Practices
Be aware of greedy vs. non-greedy matching, backslash escaping, and catastrophic backtracking.
// JavaScript: Greedy vs. Non-greedy quantifiers
const html = "<div><span>Hello</span><span>World</span></div>";
// Greedy: Matches the longest possible string
console.log(html.match(/<span>.*<\/span>/)); // ["<span>Hello</span><span>World</span>"]
// Non-greedy (add '?'): Matches the shortest possible string
console.log(html.match(/<span>.*?<\/span>/g)); // ["<span>Hello</span>", "<span>World</span>"]
// JavaScript: Escaping special characters
// To match a literal dot, use \.
console.log("file.txt".match(/file\.txt/)); // ["file.txt"]
console.log("file.txt".match(/file.txt/)); // ["file.txt"] - also matches "fileXtxt"
import re
# Python: Greedy vs. Non-greedy quantifiers
html = "<div><span>Hello</span><span>World</span></div>"
# Greedy: Matches the longest possible string
print(re.search(r"<span>.*</span>", html).group(0)) # <span>Hello</span><span>World</span>
# Non-greedy (add '?'): Matches the shortest possible string
print(re.findall(r"<span>.*?</span>", html)) # ['<span>Hello</span>', '<span>World</span>']
# Python: Raw strings for regex patterns
# Use r"" to avoid issues with backslashes being interpreted as escape sequences by Python string literal parser.
print(re.search(r"\bword\b", "a word is here").group(0)) # word
# print(re.search("\bword\b", "a word is here")) # Warning: \b is backspace character in string literal
# Python: Catastrophic backtracking example (avoid patterns like (a+)+)
# This pattern is highly inefficient and can lead to ReDoS attacks or timeouts.
# re.match(r"(a+)+b", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac") # Will be very slow or fail
Advanced Techniques
Lookaheads and lookbehinds assert conditions without consuming characters. Named groups improve readability.
// JavaScript (ES2018+): Named Capturing Groups
const dateString = "2025-12-27";
const datePattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const matchDate = dateString.match(datePattern);
if (matchDate) {
console.log(`Year: ${matchDate.groups.year}, Month: ${matchDate.groups.month}, Day: ${matchDate.groups.day}`);
}
// JavaScript: Positive Lookahead (?=...)
// Matches "foo" only if it's followed by "bar" (but "bar" is not included in the match)
console.log("foobar".match(/foo(?=bar)/)); // ["foo"]
console.log("foobaz".match(/foo(?=bar)/)); // null
// JavaScript: Negative Lookahead (?!...)
// Matches "foo" only if it's NOT followed by "bar"
console.log("foobaz".match(/foo(?!bar)/)); // ["foo"]
console.log("foobar".match(/foo(?!bar)/)); // null
import re
# Python: Named Capturing Groups
date_string = "2025-12-27"
date_pattern = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})")
match_date = date_pattern.search(date_string)
if match_date:
print(f"Year: {match_date.group('year')}, Month: {match_date.group('month')}, Day: {match_date.group('day')}")
# Python: Positive Lookahead (?=...)
# Matches "foo" only if it's followed by "bar"
print(re.search(r"foo(?=bar)", "foobar").group(0)) # foo
print(re.search(r"foo(?=bar)", "foobaz")) # None
# Python: Negative Lookahead (?!...)
# Matches "foo" only if it's NOT followed by "bar"
print(re.search(r"foo(?!bar)", "foobaz").group(0)) # foo
print(re.search(r"foo(?!bar)", "foobar")) # None
# Python: Positive Lookbehind (?<=...)
# Matches "bar" only if it's preceded by "foo"
print(re.search(r"(?<=foo)bar", "foobar").group(0)) # bar
# Python: Negative Lookbehind (?<!...)
# Matches "bar" only if it's NOT preceded by "foo"
print(re.search(r"(?<!foo)bar", "bazbar").group(0)) # bar
Quick Reference
- Anchors:
^(start),$(end),\b(word boundary),\B(non-word boundary) - Quantifiers:
*(0+),+(1+),?(0 or 1),{n}(exactly n),{n,}(n+),{n,m}(n to m),?(non-greedy suffix) - Character Classes:
.(any char except newline),\d(digit),\D(non-digit),\w(word char),\W(non-word char),\s(whitespace),\S(non-whitespace),[abc](any of a,b,c),[^abc](not a,b,c) - Groups:
(...)(capturing),(?:...)(non-capturing),(?<name>...)(JS/Python named capturing),(?P<name>...)(Python named capturing) - Alternation:
|(OR) - Flags (JS):
i(ignore case),g(global),m(multiline),u(unicode),s(dotall) - Flags (Python):
re.IGNORECASE,re.MULTILINE,re.DOTALL,re.VERBOSE,re.ASCII,re.UNICODE - JS Methods:
test(),match(),matchAll(),search(),replace(),split() - Python Methods (re module):
search(),match(),fullmatch(),findall(),finditer(),sub(),split()
References
- MDN Web Docs: Regular expressions - JavaScript
- Python Docs: re — Regular expression operations
- Regex101.com - Online regex tester and debugger (useful for learning and testing patterns)
This page is AI-assisted. References official documentation.