XRegExp

Extended JavaScript regular expressions

README

XRegExp 5.1.1

Build Status

XRegExp provides augmented (and extensible) JavaScript regular expressions. You get modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your grepping and parsing easier, while freeing you from regex cross-browser inconsistencies and other annoyances.

XRegExp supports all native ES6 regular expression syntax. It supports ES5+ browsers, and you can use it with Node.js or as a RequireJS module. Over the years, many of XRegExp's features have been adopted by new JavaScript standards (named capturing, Unicode properties/scripts/categories, flag s, sticky matching, etc.), so using XRegExp can be a way to extend these features into older browsers.

Performance


XRegExp compiles to native RegExp objects. Therefore regexes built with XRegExp perform just as fast as native regular expressions. There is a tiny extra cost when compiling a pattern for the first time.

Named capture breaking change in XRegExp 5


XRegExp 5 introduced a breaking change where named backreference properties now appear on the result's groups object (following ES2018), rather than directly on the result. To restore the old handling so you don't need to update old code, run the following line after importing XRegExp:

  1. ``` js
  2. XRegExp.uninstall('namespacing');
  3. ```

XRegExp 4.1.0 and later allow introducing the new behavior without upgrading to XRegExp 5 by running XRegExp.install('namespacing').

Following is the most commonly needed change to update code for the new behavior:

  1. ``` js
  2. // Change this
  3. const name = XRegExp.exec(str, regexWithNamedCapture).name;

  4. // To this
  5. const name = XRegExp.exec(str, regexWithNamedCapture).groups.name;
  6. ```

See below for more examples of using named capture with XRegExp.exec and XRegExp.replace.

Usage examples


  1. ``` js
  2. // Using named capture and flag x for free-spacing and line comments
  3. const date = XRegExp(
  4.     `(?<year>  [0-9]{4} ) -?  # year
  5.      (?<month> [0-9]{2} ) -?  # month
  6.      (?<day>   [0-9]{2} )     # day`, 'x');

  7. // XRegExp.exec provides named backreferences on the result's groups property
  8. let match = XRegExp.exec('2021-02-22', date);
  9. match.groups.year; // -> '2021'

  10. // It also includes optional pos and sticky arguments
  11. let pos = 3;
  12. const result = [];
  13. while (match = XRegExp.exec('<1><2><3>4<5>', /<(\d+)>/, pos, 'sticky')) {
  14.     result.push(match[1]);
  15.     pos = match.index + match[0].length;
  16. }
  17. // result -> ['2', '3']

  18. // XRegExp.replace allows named backreferences in replacements
  19. XRegExp.replace('2021-02-22', date, '$<month>/$<day>/$<year>');
  20. // -> '02/22/2021'
  21. XRegExp.replace('2021-02-22', date, (...args) => {
  22.     // Named backreferences are on the last argument
  23.     const groups = args[args.length - 1];
  24.     return `${groups.month}/${groups.day}/${groups.year}`;
  25. });
  26. // -> '02/22/2021'

  27. // XRegExps compile to RegExps and work with native methods
  28. date.test('2021-02-22');
  29. // -> true
  30. // However, named captures must be referenced using numbered backreferences
  31. // if used with native methods
  32. '2021-02-22'.replace(date, '$2/$3/$1');
  33. // -> '02/22/2021'

  34. // Use XRegExp.forEach to extract every other digit from a string
  35. const evens = [];
  36. XRegExp.forEach('1a2345', /\d/, (match, i) => {
  37.     if (i % 2) evens.push(+match[0]);
  38. });
  39. // evens -> [2, 4]

  40. // Use XRegExp.matchChain to get numbers within tags
  41. XRegExp.matchChain('1 <b>2</b> 3 <B>4 \n 56</B>', [
  42.     XRegExp('<b>.*?</b>', 'is'),
  43.     /\d+/
  44. ]);
  45. // -> ['2', '4', '56']

  46. // You can also pass forward and return specific backreferences
  47. const html =
  48.     `<a href="https://xregexp.com/">XRegExp</a>
  49.      <a href="https://www.google.com/">Google</a>`;
  50. XRegExp.matchChain(html, [
  51.     {regex: //i, backref: 1},
  52.     {regex: XRegExp('(?i)^https?://(?[^/?#]+)'), backref: 'domain'}
  53. ]);
  54. // -> ['xregexp.com', 'www.google.com']

  55. // Merge strings and regexes, with updated backreferences
  56. XRegExp.union(['m+a*n', /(bear)\1/, /(pig)\1/], 'i', {conjunction: 'or'});
  57. // -> /m\+a\*n|(bear)\1|(pig)\2/i
  58. ```


Addons


You can either load addons individually, or bundle all addons with XRegExp by loading xregexp-all.js from https://unpkg.com/xregexp/xregexp-all.js.

Unicode


If not using xregexp-all.js, first include the Unicode Base script and then one or more of the addons for Unicode categories, properties, or scripts.

Then you can do this:

  1. ``` js
  2. // Test some Unicode scripts
  3. // Can also use the Script= prefix to match ES2018: \p{Script=Hiragana}
  4. XRegExp('^\\p{Hiragana}+$').test('ひらがな'); // -> true
  5. XRegExp('^[\\p{Latin}\\p{Common}]+$').test('Über Café.'); // -> true

  6. // Test the Unicode categories Letter and Mark
  7. // Can also use the short names \p{L} and \p{M}
  8. const unicodeWord = XRegExp.tag()`^\p{Letter}[\p{Letter}\p{Mark}]*$`;
  9. unicodeWord.test('Русский'); // -> true
  10. unicodeWord.test('日本語'); // -> true
  11. unicodeWord.test('العربية'); // -> true
  12. ```

By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. This is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.

  1. ``` js
  2. // Using flag A to match astral code points
  3. XRegExp('^\\p{S}$').test('💩'); // -> false
  4. XRegExp('^\\p{S}$', 'A').test('💩'); // -> true
  5. // Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
  6. XRegExp('^\\p{S}$', 'A').test('\uD83D\uDCA9'); // -> true

  7. // Implicit flag A
  8. XRegExp.install('astral');
  9. XRegExp('^\\p{S}$').test('💩'); // -> true
  10. ```

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

XRegExp uses Unicode 14.0.0.

XRegExp.build


Build regular expressions using named subpatterns, for readability and pattern reuse:

  1. ``` js
  2. const time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', {
  3.     hours: XRegExp.build('{{h12}} : | {{h24}}', {
  4.         h12: /1[0-2]|0?[1-9]/,
  5.         h24: /2[0-3]|[01][0-9]/
  6.     }),
  7.     minutes: /^[0-5][0-9]$/
  8. });

  9. time.test('10:59'); // -> true
  10. XRegExp.exec('10:59', time).groups.minutes; // -> '59'
  11. ```

Named subpatterns can be provided as strings or regex objects. A leading `^` and trailing unescaped `$` are stripped from subpatterns if both are present, which allows embedding independently-useful anchored patterns. `{{…}}` tokens can be quantified as a single unit. Any backreferences in the outer pattern or provided subpatterns are automatically renumbered to work correctly within the larger combined pattern. The syntax `({{name}})` works as shorthand for named capture via `(?{{name}})`. Named subpatterns cannot be embedded within character classes.

XRegExp.tag (included with XRegExp.build)


Provides tagged template literals that create regexes with XRegExp syntax and flags:

  1. ``` js
  2. XRegExp.tag()`\b\w+\b`.test('word'); // -> true

  3. const hours = /1[0-2]|0?[1-9]/;
  4. const minutes = /(?[0-5][0-9])/;
  5. const time = XRegExp.tag('x')`\b ${hours} : ${minutes} \b`;
  6. time.test('10:59'); // -> true
  7. XRegExp.exec('10:59', time).groups.minutes; // -> '59'

  8. const backref1 = /(a)\1/;
  9. const backref2 = /(b)\1/;
  10. XRegExp.tag()`${backref1}${backref2}`.test('aabb'); // -> true
  11. ```

XRegExp.tag does more than just interpolation. You get all the XRegExp syntax and flags, and since it reads patterns as raw strings, you no longer need to escape all your backslashes. XRegExp.tag also uses XRegExp.build under the hood, so you get all of its extras for free. Leading ^ and trailing unescaped $ are stripped from interpolated patterns if both are present (to allow embedding independently useful anchored regexes), interpolating into a character class is an error (to avoid unintended meaning in edge cases), interpolated patterns are treated as atomic units when quantified, interpolated strings have their special characters escaped, and any backreferences within an interpolated regex are rewritten to work within the overall pattern.

XRegExp.matchRecursive


A robust and flexible API for matching recursive constructs using XRegExp pattern strings as left and right delimiters:

  1. ``` js
  2. const str1 = '(t((e))s)t()(ing)';
  3. XRegExp.matchRecursive(str1, '\\(', '\\)', 'g');
  4. // -> ['t((e))s', '', 'ing']

  5. // Extended information mode with valueNames
  6. const str2 = 'Here is <div> <div>an</div></div> example';
  7. XRegExp.matchRecursive(str2, '<div\\s*>', '</div>', 'gi', {
  8.     valueNames: ['between', 'left', 'match', 'right']
  9. });
  10. /* -> [
  11. {name: 'between', value: 'Here is ',       start: 0,  end: 8},
  12. {name: 'left',    value: '<div>',          start: 8,  end: 13},
  13. {name: 'match',   value: ' <div>an</div>', start: 13, end: 27},
  14. {name: 'right',   value: '</div>',         start: 27, end: 33},
  15. {name: 'between', value: ' example',       start: 33, end: 41}
  16. ] */

  17. // Omitting unneeded parts with null valueNames, and using escapeChar
  18. const str3 = '...{1}.\\{{function(x,y){return {y:x}}}';
  19. XRegExp.matchRecursive(str3, '{', '}', 'g', {
  20.     valueNames: ['literal', null, 'value', null],
  21.     escapeChar: '\\'
  22. });
  23. /* -> [
  24. {name: 'literal', value: '...',  start: 0, end: 3},
  25. {name: 'value',   value: '1',    start: 4, end: 5},
  26. {name: 'literal', value: '.\\{', start: 6, end: 9},
  27. {name: 'value',   value: 'function(x,y){return {y:x}}', start: 10, end: 37}
  28. ] */

  29. // Sticky mode via flag y
  30. const str4 = '<1><<<2>>><3>4<5>';
  31. XRegExp.matchRecursive(str4, '<', '>', 'gy');
  32. // -> ['1', '<<2>>', '3']

  33. // Skipping unbalanced delimiters instead of erroring
  34. const str5 = 'Here is <div> <div>an</div> unbalanced example';
  35. XRegExp.matchRecursive(str5, '<div\\s*>', '</div>', 'gi', {
  36.     unbalanced: 'skip'
  37. });
  38. // -> ['an']
  39. ```

By default, XRegExp.matchRecursive throws an error if it scans past an unbalanced delimiter in the target string. Multiple alternative options are available for handling unbalanced delimiters.

Installation and usage


In browsers (bundle XRegExp with all of its addons):

  1. ``` html
  2. <script src="https://unpkg.com/xregexp/xregexp-all.js"></script>
  3. ```

Using npm:

  1. ``` sh
  2. npm install xregexp
  3. ```


  1. ``` js
  2. const XRegExp = require('xregexp');
  3. ```

Contribution guide


1. Fork the repository and clone the forked version locally.
2. Ensure you have the typescript module installed globally.
3. Run npm install.
4. Ensure all tests pass with npm test.
5. Add tests for new functionality or that fail from the bug not fixed.
6. Implement functionality or bug fix to pass the test.

Credits


XRegExp project collaborators are:


Thanks to all contributors and others who have submitted code, provided feedback, reported bugs, and inspired new features.

XRegExp is released under the MIT License. Learn more at xregexp.com.