Google_Analytics

What is it RegEx?

RegEx comes from regular expression and it is a technical syntax that helps matching a specific pattern or string. Yes, it sounds very complicated, but you will see that RegEx is logical and easy to use.

Why do we need RegEx?

I usually use RegEx mostly to create filters and goals in Google Analytics. Most common use is to exclude an IP address from reports or to set up complex destination URLs for goals.

The special characters

Wildcards

. (the dot) Matches any single character (letter, number or symbol) goo.gle matches gooogle, goodgle, goo8gle

* (the asterisk) Matches zero or more of the previous item The default previous item is the previous character. goo*gle matches gooogle, goooogle

+ (the plus) Just like a star, except that a plus sign must match at least one previous item gooo+gle matches goooogle, but never google

? (the question mark) Matches zero or one of the previous item labou?r matches both labor and labour

| (the pipe) Lets you do an “or” match a|b matches a or b

.* (the dot star) Matches everything

Anchors

^ (the caret) Requires that your data be at the beginning of its field ^site matches site but not mysite

$ (the dollar) Requires that your data be at the end of its field site$ matches site but not sitescan

Grouping

() (the parentheses) Use parentheses to create an item, instead of accepting the default Thank(s|you) will match both Thanks and Thankyou

[] (the brackets) Use brackets to create a list of items to match to [abc] creates a list with a, b and c in it

– (the dash) Use dashes with brackets to extend your list [A-Z] creates a list for the uppercase English alphabet

{} (the braces) Braces repeat the last “piece” of information a specific number of times. Braces, such as {x,y}, it means, repeat the last “item” at least x times
and no more than y times. When there is only one number in the braces, such as {z}, it means, repeat the last item exactly z times.

Other

\ Turns a regular expression character into an everyday character mysite\.com keeps the dot from being a wildcard

Some examples of usage

For instance I want to exclude an IP address range. Let’s take my IP – 80.97.15.85. We have several IP addresses in a range from 80.97.15.80-80.97.15.89 and we want to exclude our own traffic from Google Analytics reports.

The RegEx will be this:

80\.97\.15\.8[0-9]

You may wonder why I didn’t use a RegEx like this:

80\.97\.15\.8.

The dot matches any other character, but not only numbers. So you will match 80.97.15.8@ or 80.97.15.8y. Anyway that’s not a problem because the IP address contains only numbers. Nevertheless the correct syntax is the first one because it matches exactly the range.

Let’s take another example.

I want a RegEx to match the following pages for a destination URL Goal:

http://domain.com/thank-you/
http://domain.com/thank-you2/
http://domain.com/thank-you?=valid
http://domain.com/thank-you2=?valid

and all the subdirectories like de, es, nl, hr, pr-br  plus parameters for each one. See the example below:

http://domain.com/de/thank-you/
http://domain.com/de/thank-you2/
http://domain.com/de/thank-you?=valid http://domain.com/de/thank-you2?=valid
etc.

This is the RegEx that I have created:

(^/thank-you.*)|(/(de|es|nl|hr|pr\-br)/thank-you.*)

Does this look right to you?

Before you go

There a few things to consider on it comes to RegEx.

  1. Before creating a RegEx filter, play with it first. Don’t apply the filter directly on your Google Analytics property or View because it can’t be undone. You can test a filter in reports and then apply it to your account.
  2. RegEx is greedy. For instance you want to exclude all traffic coming from www.website.com. If you have a RegEx field and add website to it, then it will match everything on www.website.com. You don’t have to create a syntax like this www\.website\.com.*

Resources:

For a better understanding follow the links below and always play with RegEx.

via Google

via LunaMetrics

Tools:

Rubular, RegEx Coach, RegExr, RegExPal