DEV Community

Matt Ellen
Matt Ellen

Posted on

One Byte Explainer: Regular Expressions

This is a submission for DEV Computer Science Challenge v24.06.12: One Byte Explainer.

Explainer

A regular expression (regex) finds patterns in strings with one character of memory. It has an alphabet & defines a language. The alphabet can be any set of characters, including the empty string. Regexes can be joined, joining the alphabets and languages.

Additional Context

Because original regular expressions only allowed for one character of memory, there were no look aheads or look behinds.

A language that is defined by a regular expression is called a regular language.

Regular expressions have notations to allow succinct ways of defining them. These notations vary depending on the implementation, but usually have the follow forms:

  • * - the character or group preceding this must appear at least 0 times. e.g. abc* would match ab, abc, abcc, etc.
  • + - the character or group preceding this must appear at least once. e.g. abc+ would match abc, abcc, etc.
  • ? - the character or group preceding this must appear at most once. e.g. abc? would match ab or abc.
  • . - this matches any character. e.g. . would match a, b, c, etc.
  • [] - only match the characters inside the square brackets. e.g. [hjk] would match h, j, or k.
  • [^] - only match the characters not inside the square brackets. e.g. [^abc] would not match a, b, or c, but would match anything else.
  • () - the string inside the parentheses is a group e.g. (abc) would match abc and the regular expression engine would assign that result a group.
  • (|) - the group can be either what's on the left or what's on the right of the |. e.g. (abc|def) would match abc or def.

Top comments (0)