Updated Regex to exclude mailto: links only

The rationale being it's covering all of the cases while still having a good performance.
Tracking a non-standard URL is not a user experience problem in a way. It's embedded in a <a href> tag.
This commit is contained in:
Mehdi Rejraji 2016-03-03 15:30:24 +01:00 committed by Evan Morikawa
parent 2d2b9aefa6
commit cc145132da

View file

@ -34,15 +34,17 @@ RegExpUtils =
# 5. the closing tag
linkTagRegex: -> new RegExp(/(<a.*?href\s*?=\s*?['"])(.*?)(['"].*?>)([\s\S]*?)(<\/a>)/gim)
# Test cases: https://regex101.com/r/cK0zD8/2
# Catches link tags containing a valid URL using the Gruber Regex.
# Test cases: https://regex101.com/r/cK0zD8/3
# Catches link tags containing which are:
# - Non empty
# - Not a mailto: link
# Returns the following capturing groups:
# 1. start of the opening a tag to href="
# 2. The contents of the href without quotes if it's a valid URL
# 2. The contents of the href without quotes
# 3. the rest of the opening a tag
# 4. the contents of the a tag
# 5. the closing tag
urlLinkTagRegex: -> new RegExp(/(<a.*?href\s*?=\s*?['"])((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))(['"].*?>)([\s\S]*?)(<\/a>)/gim)
urlLinkTagRegex: -> new RegExp(/(<a.*?href\s*?=\s*?['"])((?!mailto).+)(['"].*?>)([\s\S]*?)(<\/a>)/gim)
# https://regex101.com/r/zG7aW4/3
imageTagRegex: -> /<img\s+[^>]*src="([^"]*)"[^>]*>/g