REGEXEXTRACT in Google Sheets
The regex functions of Google Sheets are useful when you manage a huge amount of data.
They are based on regular expressions (regex for short) that are widely used among programmers, but it’s also helpful for the users of Google Sheets.
A regular expression is used to search for and match specific information. The regular expressions are built up from such rules and you can express basically anything with them.
Out of the three main regex functions, the REGEXEXTRACT is used to extract matching substrings according to a regular expression.
What is the REGEXEXTRACT Function?
The syntax of the function is:
Its syntax might look scary at first, but you can find many resources online that help you to build your own regular expressions.
Once you understood how to use it, it’s a really powerful tool and speeds up your processes a lot.
You can always use it to search for exact matches, but the real power of this function lies in the special characters that allow you to customize your searches.
Let’s see an example. A regular expression usually looks something like this: ([A-Z]\w+), which means:
- A-Z: the substrings we search for starts with an uppercase letter
- \w: it has to be a word (not a digit or whitespace)
- +: it can be followed by 1 or more characters
When to use REGEXMATCH for marketing stuff?
- Extracting specific data from a list
- Extracting emails or links from text.
- Extracting a domain from a URL.
How To Use The REGXMATCH Function in Google Sheets
The first argument, text, is the input text. The second argument, regular_expression, is the expression that should be matched. The first part of the text that matches this expression will be returned.
This function allows you to extract a part of a string. It only works with text.
Suppose you have a list of URLs and you have to extract certain words from them. For example, the URLs contain the language of the pages and you want to extract these languages:
In regex, the ‘|’ pipe character stands for the logical OR. You can search for the specific text by connecting all of the possible words with OR relations. The formula will look for the word spanish or french or german and so on.
The function to do this:
It doesn’t matter where the word is within the text, the function searches for it and returns it as the result of the function.
It starts reading the source string from left to right, and it outputs the first correct result that it finds. For example, it found Italian first in the fifth row and then ignored the fact that English was also there.
REGEXEXTRACT Formula Example
It’s also common to extract the base domains of the full URLs. The REGEXEXTRACT function is suitable for this task too.
The regular expression is a lot more complex here, but you can learn the basics to get familiar with such expressions, or you can just google it and write them on the go.
This regex here excludes “https://” and then excludes anything that is after the first “/” character. This way the only remaining part is the base domain.
REGEXEXTRACT can be a massive time saver.
Once you understood how to work with it, you can replace most of your LEFT, RIGHT, MID or FIND functions with this regex function.