regex

Context-aware regex template tag with advanced features and best practices ...

README

regex logo

regex creates *readable, high performance, native JavaScript regular expressions* with advanced features and best practices built-in. It's lightweight (6.5 KB minified and brotlied) and supports all ES2024+ regex functionality. It can also be used dependency-free as a Babel plugin.

Highlights include support for free spacing and comments, atomic groups via `(?>…)` which can help you avoid [ReDoS](https://en.wikipedia.org/wiki/ReDoS), subroutines via `\g` which enable powerful composition, and context-aware interpolation of `RegExp` instances, escaped strings, and partial patterns.

With the regex package, JavaScript steps up as one of the best regex flavors alongside PCRE and Perl, and maybe surpassing C++, Java, .NET, and Python.

💎 Features


- A modern regex baseline so you don't need to continually opt-in to best practices.
- Always-on flag v gives you the best level of Unicode support and strict errors. In environments without v, flag u is used with v's rules applied. - Always-on implicit flag x allows you to freely add whitespace and comments to your regexes. - Always-on implicit flag n (*named capture only* mode) improves regex readability and efficiency.
  - No unreadable escaped backslashes \\\\ since it's a raw string template tag.
- New regex syntax.
  - Atomic groups via (?>…) can dramatically improve performance and prevent ReDoS.
- Subroutines via `\g` enable powerful composition, improving readability and maintainability.
  - Recursive matching is enabled by an extension.
- Context-aware and safe interpolation of regexes, strings, and partial patterns.
  - Interpolated strings have their special characters escaped.
  - Interpolated regexes locally preserve the meaning of their own flags (or their absense), and any numbered backreferences are adjusted to work within the overall pattern.

🪧 Examples


  1. ```js
  2. import {regex, partial} from 'regex';

  3. // Subroutines
  4. const record = regex('gm')`^
  5.   Born: (?<date> \d{4}-\d{2}-\d{2} ) \n
  6.   Admitted: \g<date> \n
  7.   Released: \g<date>
  8. $`;

  9. // Atomic groups; avoid ReDoS from the nested, overlapping quantifier
  10. const words = regex`^(?>\w+\s?)+$`;

  11. // Context-aware and safe interpolation
  12. const re = regex('m')`
  13.   # Only the inner regex is case insensitive (flag i)
  14.   # Also, the outer regex's flag m is not applied to it
  15.   ${/^a.b$/i}
  16.   |
  17.   # Strings are contextually escaped and repeated as complete units
  18.   ^ ${'a.b'}+ $
  19.   |
  20.   # This string is contextually sandboxed but not escaped
  21.   ${partial('^ a.b $')}
  22. `;

  23. // Adjusts backreferences in interpolated regexes
  24. regex`^ ${/(dog)\1/} ${/(cat)\1/} $`;
  25. // → /^(dog)\1(cat)\2$/v
  26. ```

🕹️ Install and use


  1. ```sh
  2. npm install regex
  3. ```

  1. ```js
  2. import {regex, partial} from 'regex';
  3. ```

In browsers:

  1. ```html
  2. <script src="https://cdn.jsdelivr.net/npm/regex/dist/regex.min.js"></script>
  3. <script>
  4.   const {regex, partial} = Regex;
  5. </script>
  6. ```

❓ Context


Due to years of legacy and backward compatibility, regular expression syntax in JavaScript is a bit of a mess. There are four different sets of incompatible syntax and behavior rules that might apply to your regexes depending on the flags and features you use. The differences are just plain hard to fully grok and can easily create subtle bugs.

See the four parsing modes

1. Unicode-unaware (legacy) mode is the default and can easily and silently create Unicode-related bugs.
2. Named capture mode changes the meaning of \k when a named capture appears anywhere in a regex.
3. Unicode mode with flag u adds strict errors (for unreserved letter escapes, octal escapes, escaped literal digits, and unescaped special characters in some contexts), switches to code-point-based matching (changing the potential handling of the dot, negated sets like `\W`, character class ranges, and quantifiers), makes flag i use Unicode case-folding, and adds new features/syntax.4. UnicodeSets mode with flag v (an upgrade to u) incompatibly changes escaping rules within character classes, fixes case-insensitive matching for doubly-negated `[^\P{…}]`, and adds new features/syntax.