[youtube:playlist] Work around buggy playlists (fixes #4449)
authorJaime Marquínez Ferrándiz <jaime.marquinez.ferrandiz@gmail.com>
Mon, 15 Dec 2014 18:19:15 +0000 (19:19 +0100)
committerJaime Marquínez Ferrándiz <jaime.marquinez.ferrandiz@gmail.com>
Mon, 15 Dec 2014 18:19:15 +0000 (19:19 +0100)
They show a "Load more" button, but they don't have more videos.
The continuation url in the json file was a link to itself, so we ended up in an infinite loop.

youtube_dl/extractor/youtube.py

index 3319d6bff95afeef6222cbbda941b6d71c83134d..aaa07b5eb31f1c15a36a5d05c66dbd29dd85c0e9 100644 (file)
@@ -1128,6 +1128,13 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
         'info_dict': {
             'title': 'JODA7',
         }
+    }, {
+        'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos',
+        'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA',
+        'info_dict': {
+                'title': 'Uploads from Interstellar Movie',
+        },
+        'playlist_mincout': 21,
     }]
 
     def _real_initialize(self):
@@ -1212,6 +1219,10 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
                 'Downloading page #%s' % page_num,
                 transform_source=uppercase_escape)
             content_html = more['content_html']
+            if not content_html.strip():
+                # Some webpages show a "Load more" button but they don't
+                # have more videos
+                break
             more_widget_html = more['load_more_widget_html']
 
         playlist_title = self._html_search_regex(