About Regular Expressions
Regular expressions (regex or regexp) are powerful patterns used to match, search, and manipulate text. They are supported in virtually every programming language and many command-line tools. While the syntax can seem intimidating at first, understanding the basics unlocks a powerful tool for text processing.
Common Regex Patterns
Here are some frequently used patterns:
- Email:
\b[\w.-]+@[\w.-]+\.\w{2,}\b - URL:
https?://[\w.-]+(?:\.[\w]{2,})(?:/[\w./-]*)* - Phone (US):
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} - IP Address (v4):
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b - Date (YYYY-MM-DD):
\d{4}-\d{2}-\d{2} - Hex Color:
#[0-9a-fA-F]{3,8}
Regex Syntax Quick Reference
- . matches any character except newline
- \d matches any digit (0-9), \D matches non-digits
- \w matches word characters (letters, digits, underscore), \W matches non-word characters
- \s matches whitespace, \S matches non-whitespace
- * zero or more, + one or more, ? zero or one
- {n,m} between n and m occurrences
- [abc] character class, [^abc] negated class
- ^ start of string, $ end of string
- (group) capturing group, (?:group) non-capturing
- \b word boundary
Regex Flags
- g (global): Find all matches, not just the first
- i (case-insensitive): Ignore uppercase/lowercase differences
- m (multiline): ^ and $ match start/end of each line, not just the string
- s (dotAll): Makes . match newline characters too
- u (unicode): Treat pattern as a sequence of Unicode code points