Regular Expressions Usage Guide

 

Regular expressions are used in computer programming, allowing people to search through many lines of code for a specific piece of data or find a very specific and precise set of information that would otherwise take them many hours of searching and sifting.

 

At Qualaroo, we use regular expressions (shortened to regex and regexes) to let you target your surveys to a specific set of pages or URLs that are complex or dynamic.

 

Key Benefits

 

  • Targeting Specific Pages: Customize your surveys to display only on designated pages using regex patterns.
  • Dynamic URL Matching: Seamlessly adapt your survey targeting to dynamic URLs that follow a pattern.
  • Complex URL Structures: Effectively handle intricate URL structures and still achieve accurate targeting.

 

In this article,

1. Excluding URLs, Focusing on specific URLs - Negative and Positive Lookaheads

2. Backslash - Escape!

3. Question Mark - Not Required

4. Digits and Word Characters

5. Dot-Star: Anything Goes!

6. The OR Pipe (|) - How to Target Several Specific Pages on Your Site

7. Using Parentheses, Brackets, and Sets

 

Click on the titles to expand and learn about the use of basic characters in the regular expression:
 
  • Excluding URLs, Focusing on specific URLs - Negative and Positive Lookaheads


     

    Excluding a portion of your site with Negative Lookahead

     

    Negative lookahead allows you to exclude whole sets of pages, files, subdomains, or any other part of the URL you don't want to target. In regex terms, it looks for something NOT followed by something else. You specify what you DON'T wish to include and put it inside of these characters (?!StuffYouDontWant)

     

    Understanding the Negative Lookahead with examples

    1. Excluding a Section

     

    If you want to target your survey to all the pages on http://mysite.com/photos/, /cats/, and /documentation/ but not /users/ or any other single page, add the following URL to your regex fields:

     

    Subdomain: (www)?

    TLD: mysite.com

    Path: (?!users).*

     

    With this, you can target any page on your site in a subfolder, EXCEPT for all pages in the /users/ section and anything not in a subfolder. For example:

    • http://mysite.com/photos/NevadaDesert.html

    • http://mysite.com/photos/DeathValley.html

    • http://mysite.com/photos/Carrum.html

    • http://mysite.com/cats/PeggySue.html

    • http://mysite.com/cats/Turbo.html

    • http://mysite.com/documentation/Qualaroo.html

    • http://mysite.com/documentation/personalwebsite.txt

    • http://mysite.com/documentation/NextBigAndroidApp.php

    • http://mysite.com/documentation/1337Resume.html 

     

    But the following pages will not display the survey:

    • http://mysite.com/users/admin

    • http://mysite.com/users/user1234

    • http://mysite.com/users/mom

    • http://mysite.com/contact

    • http://mysite.com/about-us

    • http://mysite.com/pricing/

    • http://blog.mysite.com/

     

    2. Excluding Groups of Pages

     

    If you want to target all the item pages in snacktastic.com/products/item-###, snacktastic.com/products/seasonal-### but exclude snacktastic.com/products/promo-###, add the following to your regex fields:

     

    Subdomain: (www)?

    TLD: snacktastic.com

    Path: products/(?!promo).*-d+/?

     

    The survey will be displayed on the following pages:

    • http://snacktastic.com/products/item-0733

    • http://snacktastic.com/products/item-561211

    • http://snacktastic.com/products/seasonal-559

    • http://snacktastic.com/products/seasonal-01223

     

    But not on:

    • http://snacktastic.com/products/promo-001

    • http://snacktastic.com/products/promo-55776

     

    You could also use the | character to get the same results.

     

    Subdomain: (www)?

    TLD: snacktastic.com

    Path: products/(item|seasonal)-d+/?

     

    There are multiple ways to get the same results with regular expressions. You might use one set of tools more frequently than another, and that's fine.

     

    Including a specific portion of your site with a Positive Lookahead

     

    A Positive Lookahead is basically the opposite of a negative lookahead - it defines a pattern that MUST appear in the URL for the page to be targeted. You can do this by adding (?= ) around whatever you want to require.

     

    If you want to target any page on your site with a "dragonfly" in the path, you can do so very easily.

     

    Subdomain: (www)?

    TLD: naturaljewelrydesigns.com

    Path: .*(?=dragonfly)

     

    This regex will match any page with "dragonfly" anywhere in the URL path:

    • http://www.naturaljewelrydesigns.com/dragonfly

    • http://www.naturaljewelrydesigns.com/products/rings/dragonfly

    • http://www.naturaljewelrydesigns.com/new_designs/greendragonfly.php

     

    Also, you can combine the positive lookahead with other regex characters by adding the following value to the regex fields:

     

    Subdomain: (www)?

    TLD: naturaljewelrydesigns.com

    Path: .*(?=dragonf(ly|lies|ire))

     

    A regex like this will match any page with the words dragonflydragonflies and dragonfire in the URL path.


     

  • Backslash - Escape!


     

    How to Target a Path Using Backward Slash

     

    A backslash can target a specific path using the regular expression.

     

    Step 1: Navigate to WHERE in the TARGETING section.

     

    Step 2: Select the radio button in front of the “Use an advanced URL” option.

     

    Step 3: To target: http://staging.company.com/section/cart?promo=749387493, enter the following URL components in the regex fields:

     

    Subdomain: staging

    TLD: company.com

    Path: section/cart?promo=.*

     

    Advanced URL

     

    Validate the path using the regex validator to see how the whole URL appears in the Qualaroo:

     

    Regex URL path alidator

     

    Here,

     

    • The purple arrows are where Qualaroo automatically escapes the periods and slashes in between the three fields.

    • The pink arrows are where you escape the backslashes, periods, and question marks that we want to include in the URL.

    • The green arrow at the end of the regex shows where the period-asterisk regex pattern is left unescaped because those are special characters that are part of the regex.


     

  • Question Mark - Not Required


     

    How to Use a Question Mark for Targeting Subdomain

     

    Some websites are set up to load both “www.site.com” and “site.com” versions. If you use the Simple URL targeting field, Qualaroo loads it automatically.

     

    But if you use a regex on a website, you will need to use the question mark to ensure pages on the www.site.com version appear.

     

    Targeting the www Subdomain

     

    Step 1: Navigate to WHERE in the TARGETING section.

     

    Step 2: Select the radio button in front of the “Use an advanced URL” option.

     

    Step 3: Enter the following URL components in the regex fields:

     

    Subdomain: (www)?

    TLD: site.com

    Path:

     

    This will target both http://www.site.com and http://site.com.

     

    For websites using www, always add the following URL components in the regex fields:

     

    Subdomain: www

    TLD: company.com

    Path:

     

    NOTE: You don't have to escape the period after the "www" as Qualaroo will automatically load the URL starting with www.


     

  • Digits and Word Characters


     

    How to Use Digit Characters in Regular Expression

     

    By using product-\d\.html in the regex field, the search results will be like this:

     

    • product-0.html

    • product-1.html

    • product-2.html

    • ...

    • product-9.html

     

    And if the digits are in hundreds and above, you will have to enter the URL components in the regex fields.

     

    Subdomain: (www)?

    TLD: website.com

    Path: product-\d+\.html

     

    With this regex, the search results will be

     

    • product-0.html

    • product-1.html

    • product-2.html

    • product-10.html

    • product-2450.html

    • ...

     

    How to Use Words in Regular Expression

     

    Add the following URL components in the regex fields to use the “\w” function:

     

    Subdomain: (www)?

    TLD: website.com

    Path: \w+\.php

     

    To match the following results:

    • paris.php

    • Melbourne.php

    • McMurdo_Field_Work_Presentation.php

    • premium_plan_2016Sept.php

    • Loflo_washer_23998.php

     

    Using the \w function \w+\.php in the regular expression, you can target pages with human-readable names that don't use special characters. These can be anything like photo album folders, documents that your users have created, or product pages that include the name and ID of the product.

     

    NOTE: Word characters are case-sensitive in the regular expression.


     

  • Dot-Star: Anything Goes!


     

    How to Use the Dot-Star Combination

     

    Add “products\/.*\/help” to the following regex fields:

     

    Subdomain(www)?

    TLDwebsite.com

    Pathproducts\/.*\/help

     

    This combination of .* in the regex: products\/.*\/help will match the following URLs

     

    • website.com/products/iphone_case_blue-346610/help

    • website.com/products/photoalbum-8x12/help

    • website.com/products/spatulas/help

    • website.com/products/any-P0ss1ble_characters/help

     

    NOTE: If you want to target every page on your website or every page in a specific section, you can also use the Simple URL Targeting field. You only need to use a star(*) in the right part of your URL, and Qualaroo will do the rest.


     

  • The OR Pipe (|) - How to Target Several Specific Pages on Your Site


     

    How to Target Multiple Pages

     

    Say you want to match these pages:

     

    blog.mycats.com/peggysue.html

    blog.mycats.com/turbo.html

     

    Add the following URL components in the regex fields:

     

    Subdomain: blog
    TLD: mycats.com
    Path: (peggysue|turbo)\.html

     

    to match those pages.

     

    How to Target Multiple Subdomains

     

    If you want to target the gallery pages of the Brazil, Chile, and Argentina sections on largetravelcompany.com, using the OR pipe makes this very easy.

     

    Add the following URL components in the regex fields:

     

    Subdomain: (brazil|chile|argentina)

    TLD: largetravelcompany.com

    Path: gallery

     

    to target the required gallery pages of Brazil, Chile, and Argentina.

     

    How to Target Multiple Pages Across Multiple Subdomains

     

    You can even use multiple OR characters in the same regex, to target several pages across multiple subdomains.

     

    Add the following URL components in the regex fields:

     

    Subdomain: (library|parks)

    TLD: smalltown.gov

    Path: (kids|special_events|holiday)-activities\/sign_up_form

     

    to target several pages across multiple subdomains.


     

  • Using Parentheses, Brackets, and Sets


     

    Here is how you can use Parentheses ( ), brackets [ ], and curly brackets { } in regular expression:

     

    • (a|b) - Matches a OR b

    • [xyz] – Matches any single character in the brackets: x, y, OR z.

    • [^a-z] – When inside of a character class, the ^ means NOT. Here, match anything that is NOT a lowercase letter.

    • [A-Z] – Capital A through Capital Z.

    • [a-z]{2} – Exactly 2 a-z letters.

     

    How to Use Parenthesis

     

    Using the parentheses and the OR pipe, you can tell your regex to target one word (sometimes called a "string") or another in your URL.

     

    Add the following characters in the regex fields:

     

    Subdomain : blog

    TLD : mycats.com

    Path : (peggysue|turbo)\.html

     

    to target the URLs: blog.mycats.com/peggysue.html AND blog.mycats.com/turbo.html.

     

    How to Use Brackets and Curly Brackets

     

    By using

     

    • Brackets - you can target a range of letters (like a-z or a-f) or numbers (0-9, 1-5). You can also use the

    • Curly brackets - you can ask for a specific number of letters, numbers, or a range you wish to allow.

     

    If you want to use the brackets to show the letter range [a-z] and curly brackets to determine letter count by allowing {2}, add the following characters in the regex fields:

     

    Subdomain : www

    TLD : international.com

    Path : [a-z]{2}\/products

     

    to target the following sections:

     

    • www.international.com/en/products

    • www.international.com/ca/products

    • www.international.com/uk/products

    • www.international.com/au/products

    • www.international.com/nz/products

     

    In this way, target a survey across several sections of your website, you can use the OR pipe, or the brackets if they have a similar format.

     

    Further, if you want to target pages with a specific URL format, like six letters and eight numbers, add the following characters to the regex fields:

     

    Subdomain : (www)?

    TLD : gifts-for-everyone.org

    Path : holiday\/special_deals\/[a-z]{6}-[0-9]{8}

     

    to match the following pages:

     

    • www.gifts-for-everyone.org/holiday/special_deals/lawnmo-45061367

    • www.gifts-for-everyone.org/holiday/special_deals/hairdr-00002239

    • www.gifts-for-everyone.org/holiday/special_deals/poster-08825041

     

    and ignore the following pages:

     

    • development.gifts-for-everyone.org/holiday/special_deals/lawnmo-45061367---Wrong subdomain

    • www.gifts-for-everyone.org/holiday/special_deals/lawnmower-45061367---Wrong number of letters

    • www.gifts-for-everyone.org/holiday/special_deals/lawnmo-451367---Wrong number of numbers

     

    NOTE: If you'd like more flexibility in the ranges of letters and numbers in the pages you want to target, the curly brackets can also be used for this.

     

    To target the same kinds of pages mentioned above, but with 4-8 letters and 2-8 numbers, add the following characters to the regex fields:

     

    Subdomain : (www)?

    TLD : gifts-for-everyone.org

    Path : holiday\/special_deals\/[a-z]{4,8}-[0-9]{2,8}

     

    to allow the survey to be targeted at a wider range of pages, including:

     

    • www.gifts-for-everyone.org/holiday/special_deals/bowl-27

    • www.gifts-for-everyone.org/holiday/special_deals/heatlamp-00019223

    • www.gifts-for-everyone.org/holiday/special_deals/boots-4512

    • www.gifts-for-everyone.org/holiday/special_deals/catmitte-123380

     

    How to Use Multi-Digit Number Ranges

     

    Regular expressions restrict dealing with numbers greater than 9. So, to set ranges in the double or triple digits, you must specify the range of each digit.

     

    For targeting pages with numbers 25-50, you can use a few sets of numbers and ranges. In this, you must precisely define the range from 25-29, then 30-49, and finally 50.

     

    First, we will define each range and then put it into a single regex. Here

    • 2[5-9] will match 25-29

    • (3|4)[0-9] will match 30-30, and 40-49

    • 50 will match 50

     

    To target the pages : (2[5-9]|(3|4)[0-9]|50)using the OR (pipe) to separate each number range, add the following characters to your regex fields:

     

    Subdomain : www

    TLD : learning_math_is_fun.com

    Path : chapter_(2[5-9]|(3|4)[0-9]|50)

     


     

 

You can also download this guide in a single PDF from the link below:

 

 

That is all about the introduction to regular expressions for URL targeting.

 

 

© 2005 - 2024 ProProfs
-
add chat to your website