Comparison of regular expression engines
Appearance
This article has no lead section. (May 2010) |
Libraries
Languages
Language | Official website | Software license | Remarks |
---|---|---|---|
.NET | MSDN | Proprietary | |
C++ | since ISO14822:2011(e) | ||
D | D | Boost Software License[Note 1] | |
Go | Golang.org | BSD-style license | |
Haskell | Haskell.org | BSD3 | Not included in the language report; nor in GHC's Hierarchical Libraries |
Java | Java | GNU General Public License | REs are written as strings in source code (all backslashes must be doubled, hurting readability). |
JavaScript/ECMAScript | ? | Limited but REs are first-class citizens of the language with a specific /.../mod syntax.
| |
Lua | Lua.org | MIT License | Uses a simplified, limited dialect. Can be bound to a more powerful library, like PCRE or an alternative parser like LPeg. |
Object Pascal (Free Pascal) | www.freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin as well as with 2 other regular expression libraries. See http://wiki.lazarus.freepascal.org/Regexpr |
Objective-C (Cocoa on iOS only) | Apple | Proprietary | Currently only available on iOS 4+ |
OCaml | Caml | LGPL | |
Perl | Perl.com | Artistic License or the GNU General Public License | Full, central part of the language. |
PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient (speed, functionalities). |
Python | python.org | Python Software Foundation License | |
Ruby | ruby-doc.org | GNU Library General Public License | Ruby 1.8 and 1.9 use different engines; Ruby 1.9 integrates Oniguruma. |
SAP ABAP | SAP.com | ? | |
Tcl 8.4 | tcl.tk | Tcl/Tk License (Permissive, similar to BSD) |
|
ActionScript 3 | ? | ? |
Language features
NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.
Part 1
"+" quantifier | Negated character classes | Non-greedy quantifiers[Note 1] | Shy groups[Note 2] | Recursion | Lookahead | Lookbehind | Backreferences[Note 3] | >9 indexable captures | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes [Note 4] | Yes | Yes | Yes | Yes |
Boost.Xpressive | Yes | Yes | Yes | Yes | Yes [Note 5] | Yes | Yes | Yes | Yes |
CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
FREJ | No [Note 6] | No | Some [Note 6] | Yes | No | No | No | Yes | Yes |
GLib/GRegex | Yes | ? | Yes | ? | No | ? | ? | ? | ? |
GNU Grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | ? |
Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
JGsoft | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
.NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
OCaml | Yes | Yes | No | No | No | No | No | Yes | No |
OmniOutliner 3.6.2 | Yes | Yes | Yes | No | No | No | No | ? | ? |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Python | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
re2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes |
Ruby | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No |
Vim | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? |
- ^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all
- ^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the groups content needs not be accessed later.
- ^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab"
- ^ http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
- ^ http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference
- ^ a b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier
Part 2
Directives [Note 1] | Conditionals | Atomic groups [Note 2] | Named capture [Note 3] | Comments | Embedded code | Partial matching[clarification needed] | Fuzzy matching | Unicode property support [3] | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Yes | No | Some [Note 4] [Note 5] |
Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | Yes | No | No |
CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | ? | No | No |
EmEditor | Yes | Yes | ? | ? | Yes | No | Yes | No | ? |
FREJ | No | No | Yes | Yes | Yes | No | No | Yes | ? |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Yes | No | Some [Note 4] [Note 5] |
GNU Grep | Yes | Yes | ? | Yes | Yes | No | ? | No | No |
Haskell | ? | ? | ? | ? | ? | No | ? | No | No |
Java | Yes | No | Yes | Yes [Note 6] | No | No | ? | No | Some [Note 5] |
ICU Regex | Yes | No | Yes | No | Yes | No | No | No | Yes [Note 7] |
JGsoft | Yes | Yes | Yes | Yes | Yes | No | Yes | ? | Some [Note 5] |
.NET | Yes | Yes | Yes | Yes | Yes | No | ? | No | Some [Note 5] |
OCaml | No | No | No | No | No | No | ? | No | No |
OmniOutliner 3.6.2 | ? | ? | ? | ? | No | No | ? | No | ? |
PCRE | Yes | Yes | Yes | Yes [Note 8] | Yes | Yes | Yes | No | Some [Note 4] [Note 5] |
Perl | Yes | Yes | Yes | Yes [Note 9] | Yes | Yes | No | No | Yes [Note 7] |
PHP | Yes | Yes | Yes | Yes | Yes | No | No | No | No |
Python | Yes | Yes | No | Yes | Yes | No | Yes | No | No |
Qt/QRegExp | No | No | No | No | No | No | Yes | No | No |
re2 | Yes | No | ? | Yes | No | No | No | No | Some [Note 5] |
Ruby | Yes | No | Yes | Yes | Yes | Yes | No | No | Some [Note 5] |
TRE | Yes | No | No | No | Yes | No | No | Yes | ? |
Vim | Yes | No | Yes | No | No | No | Yes | No | No |
RGX | Yes | Yes | Yes | Yes | Yes | No | No | No | Yes |
- ^ Also known as Flags modifiers or Option letters. Example pattern: "(?i:test)"
- ^ Also called Independent sub-expressions
- ^ Similar to back references but with names instead of indices
- ^ a b c Requires optional Unicode support enabled.
- ^ a b c d e f g h Supports only a subset of Unicode properties, not all of them.
- ^ Available as of JDK7.
- ^ a b Supports all Unicode properties, including non-binary properties.
- ^ Available as of PCRE 7.0 (as of PCRE 4.0 with Python-like syntax
(?P<name>...)
) - ^ Available as of perl 5.9.5
API features
Native UTF-16 support | Native UTF-8 support | Non-linear input support | Dot-matches-newline option | Anchor-matches-newline option | |
---|---|---|---|---|---|
Boost.Regex | No | No | Yes | Yes | Yes |
GLib/GRegex | No | Yes [Note 1] | No | Yes | Yes |
ICU Regex | Yes | No | Yes | Yes | Yes |
Java | Yes | No | Yes | Yes | Yes |
.NET | No [Note 2] | No | Yes | Yes | Yes |
PCRE | No | Yes [Note 1] | No | Yes | Yes |
Qt/QRegExp | Yes | No | No | No | No |
TRE | No | ? | Yes | Yes | Yes |
RGX | No | No | ? | Yes | Yes |
See also
External links
- Regular Expression Flavor Comparison — Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary