Excluding URLs, Focusing on specific URLs - Negative and Positive Lookaheads

Sometimes you may want to target a wide range of pages, but not include some that might otherwise be caught up (like mysite.com/products/item-###, /products/seasonal-### but excluding /products/promo-###)

Regular expressions have this great feature called "lookahead" and "lookbehind" and these are used quite literally, to look ahead in the URL to target something, or to look behind a particular portion and make sure that another part is included. These can be used in the "positive" or "negative" sense, meaning a regex can look ahead and explicitly include (positive) or exclude (negative) a particular thing.

Negative Lookahead - How to exclude a portion of your site

One of the most useful implementations of this is the negative lookahead. It allows you to exclude whole sets of pages, files, subdomains, or any other part of the URL you don't want to target. In regex terms, it looks for something that is NOT followed by something else. You specify what you DON'T want to include, and put it inside of these characters (?!StuffYouDontWant)

Example 1: Excluding a Section

I want to target my survey to all the pages on http://mysite.com/photos/, /cats/, and /documentation/ but not /users/ or any other single pages.

To use this example, you would add the following to your regex fields:

Subdomain: (www)?

TLD: mysite.com

Path: (?!users).*

This will target any page on my site in a subfolder, EXCEPT for all pages in the /users/ section, and anything not in a subfolder. For example,

  • http://mysite.com/photos/NevadaDesert.html
  • http://mysite.com/photos/DeathValley.html
  • http://mysite.com/photos/Carrum.html
  • http://mysite.com/cats/PeggySue.html
  • http://mysite.com/cats/Turbo.html
  • http://mysite.com/documentation/Qualaroo.html
  • http://mysite.com/documentation/personalwebsite.txt
  • http://mysite.com/documentation/NextBigAndroidApp.php
  • http://mysite.com/documentation/1337Resume.html 

And these pages do not show surveys: 

  • http://mysite.com/users/admin
  • http://mysite.com/users/user1234
  • http://mysite.com/users/mom
  • http://mysite.com/contact
  • http://mysite.com/about-us
  • http://mysite.com/pricing/
  • http://blog.mysite.com/

Example 2: Excluding Groups of Pages

For the example used earlier in this section, say you want to target all the item pages in snacktastic.com/products/item-###, snacktastic.com/products/seasonal-### but excluding snacktastic.com/products/promo-###.

To use this example, you would add the following to your regex fields:

Subdomain: (www)?

TLD: snacktastic.com

Path: products\/(?!promo).*-\d+\/?

The survey will show up on:

  • http://snacktastic.com/products/item-0733
  • http://snacktastic.com/products/item-561211
  • http://snacktastic.com/products/seasonal-559
  • http://snacktastic.com/products/seasonal-01223

 But not

  • http://snacktastic.com/products/promo-001
  • http://snacktastic.com/products/promo-55776

You could also use the | character to get the same results.

To use this example, you would add the following to your regex fields:

Subdomain: (www)?

TLD: snacktastic.com

Path: products\/(item|seasonal)-\d+\/?

There's lots of ways to get to the same answer with regular expressions. You might find yourself using one set of tools more frequently than another, and that's fine.

 


 

Positive Lookahead - Focusing on a specific portion of your site

A Positive Lookhead is basically the opposite of a negative lookahead - it defines a pattern that MUST appear in the URL for the page to be targeted. This is done by adding (?= ) around whatever you want to require.

If you want to target any page on your site with "dragonfly" in the path, you can do so very easily.

To use this example, you would add the following to your regex fields:

Subdomain: (www)?

TLD: naturaljewelrydesigns.com

Path: .*(?=dragonfly)

This regex will match any page with "dragonfly" anywhere in the URL path:

  • http://www.naturaljewelrydesigns.com/dragonfly
  • http://www.naturaljewelrydesigns.com/products/rings/dragonfly
  • http://www.naturaljewelrydesigns.com/new_designs/greendragonfly.php

You can also combine the positive lookahead with other regex characters:

To use this example, you would add the following to your regex fields:

Subdomain: (www)?

TLD: naturaljewelrydesigns.com

Path: .*(?=dragonf(ly|lies|ire))

A regex like this will match any page with the words dragonfly, dragonflies and dragonfire in the URL path.

 

Back to the beginning - Previous - Next

Have more questions? Submit a request
Powered by Zendesk