[pornhub] Fix video url regular expression.
authorGeorge Brighton <george@gebn.co.uk>
Sun, 2 Aug 2015 18:21:10 +0000 (19:21 +0100)
committerSergey M․ <dstftw@gmail.com>
Sun, 2 Aug 2015 20:35:06 +0000 (02:35 +0600)
PornHub seems to have subtly changed their JavaScript. Before, video URL strings were embedded directly in the video's `flashvars_*` object, but they are now assigned to variables of the form `player_quality_*`, which are then added to this object later under the relevant quality key.

youtube_dl/extractor/pornhub.py

index 0b7886840fbced3d9fa6fb219050f40ac709c080..fbaa830d6b8e958f0213fcbea2cb503d91f3f422 100644 (file)
@@ -81,7 +81,7 @@ class PornHubIE(InfoExtractor):
         comment_count = self._extract_count(
             r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
 
-        video_urls = list(map(compat_urllib_parse_unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
+        video_urls = list(map(compat_urllib_parse_unquote, re.findall(r"var player_quality_[0-9]{3}p = '([^']+)'", webpage)))
         if webpage.find('"encrypted":true') != -1:
             password = compat_urllib_parse_unquote_plus(
                 self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))