Merge pull request #10788 from TRox1972/instagram_comments
authorYen Chi Hsuan <yan12125@gmail.com>
Thu, 29 Sep 2016 13:54:39 +0000 (21:54 +0800)
committerGitHub <noreply@github.com>
Thu, 29 Sep 2016 13:54:39 +0000 (21:54 +0800)
[Instagram] Extract comments

31 files changed:
.github/ISSUE_TEMPLATE.md
.github/PULL_REQUEST_TEMPLATE.md
.gitignore
ChangeLog
Makefile
docs/supportedsites.md
youtube_dl/downloader/hls.py
youtube_dl/extractor/awaan.py
youtube_dl/extractor/brightcove.py
youtube_dl/extractor/cbsnews.py
youtube_dl/extractor/common.py
youtube_dl/extractor/einthusan.py
youtube_dl/extractor/extractors.py
youtube_dl/extractor/formula1.py
youtube_dl/extractor/kaltura.py
youtube_dl/extractor/ketnet.py
youtube_dl/extractor/leeco.py
youtube_dl/extractor/limelight.py
youtube_dl/extractor/mtv.py
youtube_dl/extractor/mwave.py
youtube_dl/extractor/npo.py
youtube_dl/extractor/openload.py
youtube_dl/extractor/periscope.py
youtube_dl/extractor/promptfile.py
youtube_dl/extractor/prosiebensat1.py
youtube_dl/extractor/soundcloud.py
youtube_dl/extractor/twitter.py
youtube_dl/extractor/vk.py
youtube_dl/extractor/voxmedia.py
youtube_dl/extractor/youtube.py
youtube_dl/version.py

index 8b28d784a4dc23b3271f5ad16645a03ba39980e0..273eb8c0b11b3b05031ada446745b54da4951269 100644 (file)
@@ -6,8 +6,8 @@
 
 ---
 
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.19*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.19**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.27*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.27**
 
 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.09.19
+[debug] youtube-dl version 2016.09.27
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
index f24bb4b09c184302cbfc34c5c777d58cd3d8cdf0..46fa26f02d97ff8cc93a6fe2bc1a3864a9044280 100644 (file)
 - [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
 - [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
 
+### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
+- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
+- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
+
 ### What is the purpose of your *pull request*?
 - [ ] Bug fix
+- [ ] Improvement
 - [ ] New extractor
 - [ ] New feature
 
index a802c75a10225f53f8da414fa34fd5422de5bea2..002b700f5a982fc0da1c7bc11efb40cbb8e553a6 100644 (file)
@@ -29,6 +29,7 @@ updates_key.pem
 *.m4a
 *.m4v
 *.mp3
+*.3gp
 *.part
 *.swp
 test/testdata
index 6c72bae90225ba068dca52dbbde59b3a8f2029d7..70da55c903570e892acefdff9aae839633389e27 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,8 +1,54 @@
 version <unreleased>
 
 Extractors
++ [leeco] Recognize more Le Sports URLs (#10794)
+
+
+version 2016.09.27
+
+Core
++ Add hdcore query parameter to akamai f4m formats
++ Delegate HLS live streams downloading to ffmpeg
++ Improved support for HTML5 subtitles
+
+Extractors
++ [vk] Add support for dailymotion embeds (#10661)
+* [promptfile] Fix extraction (#10634)
+* [kaltura] Speed up embed regular expressions (#10764)
++ [npo] Add support for anderetijden.nl (#10754)
++ [prosiebensat1] Add support for advopedia sites
+* [mwave] Relax URL regular expression (#10735, #10748)
+* [prosiebensat1] Fix playlist support (#10745)
++ [prosiebensat1] Add support for sat1gold sites (#10745)
++ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
++ [brightcove:new] Add support for live streams
+* [soundcloud] Generalize playlist entries extraction (#10733)
++ [mtv] Add support for new URL schema (#8169, #9808)
+* [einthusan] Fix extraction (#10714)
++ [twitter] Support Periscope embeds (#10737)
++ [openload] Support subtitles (#10625)
+
+
+version 2016.09.24
+
+Core
++ Add support for watchTVeverywhere.com authentication provider based MSOs for
+  Adobe Pass authentication (#10709)
+
+Extractors
++ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
++ [prosiebensat1] Add support for kabeleinsdoku (#10732)
+* [cbs] Extract info from thunder videoPlayerService (#10728)
 * [openload] Fix extraction (#10408)
 + [ustream] Support the new HLS streams (#10698)
++ [ooyala] Extract all HLS formats
++ [cartoonnetwork] Add support for Adobe Pass authentication
++ [soundcloud] Extract license metadata
++ [fox] Add support for Adobe Pass authentication (#8584)
++ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
++ [trutv] Add support for Adobe Pass authentication (#10519)
++ [turner] Add support for Adobe Pass authentication
+
 
 version 2016.09.19
 
index ac234fcb016ecf22d311ade03b19d30bff6c2e5e..a2763a664188102662cc4f2c5b69518cc6664693 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
 
 clean:
-       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
+       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
        find . -name "*.pyc" -delete
        find . -name "*.class" -delete
 
index 95a137393d6e3df959e2a0fbe9d92c39ed55c84f..26f27557713ce6ccfbee28cf3b0d92dc1000beff 100644 (file)
@@ -40,6 +40,7 @@
  - **Allocine**
  - **AlphaPorno**
  - **AMCNetworks**
+ - **anderetijden**: npo.nl and ntr.nl
  - **AnimeOnDemand**
  - **anitube.se**
  - **AnySex**
  - **CBS**
  - **CBSInteractive**
  - **CBSLocal**
- - **CBSNews**: CBS News
- - **CBSNewsLiveVideo**: CBS News Live Videos
+ - **cbsnews**: CBS News
+ - **cbsnews:livevideo**: CBS News Live Videos
  - **CBSSports**
  - **CCTV**
  - **CDA**
  - **MPORA**
  - **MSN**
  - **mtg**: MTG services
- - **MTV**
+ - **mtv**
  - **mtv.de**
+ - **mtv:video**
  - **mtvservices:embedded**
  - **MuenchenTV**: münchen.tv
  - **MusicPlayOn**
  - **wholecloud**: WholeCloud
  - **Wimp**
  - **Wistia**
- - **WNL**
+ - **wnl**: npo.nl and ntr.nl
  - **WorldStarHipHop**
  - **wrzuta.pl**
  - **wrzuta.pl:playlist**
index 5d70abf62f5bb0e45a83685136cb404912d3947c..541b92ee122261f8230ede54e57c07b68dc40cac 100644 (file)
@@ -31,7 +31,7 @@ class HlsFD(FragmentFD):
     FD_NAME = 'hlsnative'
 
     @staticmethod
-    def can_download(manifest):
+    def can_download(manifest, info_dict):
         UNSUPPORTED_FEATURES = (
             r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)',  # encrypted streams [1]
             r'#EXT-X-BYTERANGE',  # playlists composed of byte ranges of media files [2]
@@ -53,6 +53,7 @@ class HlsFD(FragmentFD):
         )
         check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
         check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
+        check_results.append(not info_dict.get('is_live'))
         return all(check_results)
 
     def real_download(self, filename, info_dict):
@@ -62,7 +63,7 @@ class HlsFD(FragmentFD):
 
         s = manifest.decode('utf-8', 'ignore')
 
-        if not self.can_download(s):
+        if not self.can_download(s, info_dict):
             self.report_warning(
                 'hlsnative has detected features it does not support, '
                 'extraction will be delegated to ffmpeg')
index 66d7515bc5068b08dcf05c94d6bc27c7e1164031..a2603bbffef454481143cb253a24cdb0f4909ec1 100644 (file)
@@ -66,6 +66,7 @@ class AWAANVideoIE(AWAANBaseIE):
             'duration': 2041,
             'timestamp': 1227504126,
             'upload_date': '20081124',
+            'uploader_id': '71',
         },
     }, {
         'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
index aeb22be168402fd4e7e134c164339db74f95a0f8..2ec55b185509da92eeb39577b601d61fa304a887 100644 (file)
@@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
                     'url': text_track['src'],
                 })
 
+        is_live = False
+        duration = float_or_none(json_data.get('duration'), 1000)
+        if duration and duration < 0:
+            is_live = True
+
         return {
             'id': video_id,
-            'title': title,
+            'title': self._live_title(title) if is_live else title,
             'description': clean_html(json_data.get('description')),
             'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
-            'duration': float_or_none(json_data.get('duration'), 1000),
+            'duration': duration,
             'timestamp': parse_iso8601(json_data.get('published_at')),
             'uploader_id': account_id,
             'formats': formats,
             'subtitles': subtitles,
             'tags': json_data.get('tags', []),
+            'is_live': is_live,
         }
index 4aa6917a0b494df6465d810b237040f4a7f30c78..216989230c3a20046045c2c0b658af23385c464b 100644 (file)
@@ -9,6 +9,7 @@ from ..utils import (
 
 
 class CBSNewsIE(CBSIE):
+    IE_NAME = 'cbsnews'
     IE_DESC = 'CBS News'
     _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
 
@@ -68,15 +69,16 @@ class CBSNewsIE(CBSIE):
 
 
 class CBSNewsLiveVideoIE(InfoExtractor):
+    IE_NAME = 'cbsnews:livevideo'
     IE_DESC = 'CBS News Live Videos'
-    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
+    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
 
     # Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
     _TEST = {
         'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
         'info_dict': {
             'id': 'clinton-sanders-prepare-to-face-off-in-nh',
-            'ext': 'flv',
+            'ext': 'mp4',
             'title': 'Clinton, Sanders Prepare To Face Off In NH',
             'duration': 334,
         },
@@ -84,25 +86,22 @@ class CBSNewsLiveVideoIE(InfoExtractor):
     }
 
     def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
+        display_id = self._match_id(url)
 
-        video_info = self._parse_json(self._html_search_regex(
-            r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
+        video_info = self._download_json(
+            'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
+                'device': 'desktop',
+                'dvr_slug': display_id,
+            })
 
-        hdcore_sign = 'hdcore=3.3.1'
-        f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
-        if f4m_formats:
-            for entry in f4m_formats:
-                # URLs without the extra param induce an 404 error
-                entry.update({'extra_param_to_segment_url': hdcore_sign})
-        self._sort_formats(f4m_formats)
+        formats = self._extract_akamai_formats(video_info['url'], display_id)
+        self._sort_formats(formats)
 
         return {
-            'id': video_id,
+            'id': display_id,
+            'display_id': display_id,
             'title': video_info['headline'],
             'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
             'duration': parse_duration(video_info.get('segmentDur')),
-            'formats': f4m_formats,
+            'formats': formats,
         }
index 9c8991542d02f46c8c228e120afd921c344b182b..1076b46da773b5c90cf0c898202f9a8fc5279dbf 100644 (file)
@@ -1828,7 +1828,7 @@ class InfoExtractor(object):
                 for track_tag in re.findall(r'<track[^>]+>', media_content):
                     track_attributes = extract_attributes(track_tag)
                     kind = track_attributes.get('kind')
-                    if not kind or kind == 'subtitles':
+                    if not kind or kind in ('subtitles', 'captions'):
                         src = track_attributes.get('src')
                         if not src:
                             continue
@@ -1836,16 +1836,21 @@ class InfoExtractor(object):
                         media_info['subtitles'].setdefault(lang, []).append({
                             'url': absolute_url(src),
                         })
-            if media_info['formats']:
+            if media_info['formats'] or media_info['subtitles']:
                 entries.append(media_info)
         return entries
 
     def _extract_akamai_formats(self, manifest_url, video_id):
         formats = []
+        hdcore_sign = 'hdcore=3.7.0'
         f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
-        formats.extend(self._extract_f4m_formats(
-            update_url_query(f4m_url, {'hdcore': '3.7.0'}),
-            video_id, f4m_id='hds', fatal=False))
+        if 'hdcore=' not in f4m_url:
+            f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
+        f4m_formats = self._extract_f4m_formats(
+            f4m_url, video_id, f4m_id='hds', fatal=False)
+        for entry in f4m_formats:
+            entry.update({'extra_param_to_segment_url': hdcore_sign})
+        formats.extend(f4m_formats)
         m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
         formats.extend(self._extract_m3u8_formats(
             m3u8_url, video_id, 'mp4', 'm3u8_native',
index f7339702cad3ed2804fe276b9d1fc6857c368206..443865ad27ba96eea8f78c56d14b72a54bc86389 100644 (file)
@@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
     _TESTS = [
         {
             'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
-            'md5': 'af244f4458cd667205e513d75da5b8b1',
+            'md5': 'd71379996ff5b7f217eca034c34e3461',
             'info_dict': {
                 'id': '2447',
                 'ext': 'mp4',
@@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
         },
         {
             'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
-            'md5': 'ef63c7a803e22315880ed182c10d1c5c',
+            'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
             'info_dict': {
                 'id': '1671',
                 'ext': 'mp4',
                 'title': 'Soodhu Kavvuum',
                 'thumbnail': 're:^https?://.*\.jpg$',
-                'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51',
+                'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
             }
         },
     ]
@@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
         video_id = self._search_regex(
             r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
 
-        video_url = self._download_webpage(
+        m3u8_url = self._download_webpage(
             'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
-            % video_id, video_id)
+            % video_id, video_id, headers={'Referer': url})
+        formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
 
         description = self._html_search_meta('description', webpage)
         thumbnail = self._html_search_regex(
@@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
         return {
             'id': video_id,
             'title': title,
-            'url': video_url,
+            'formats': formats,
             'thumbnail': thumbnail,
             'description': description,
         }
index 8166fd4f9fcbb69e22b4a2fd8ae881ff27f991ee..23fd2a3083dcafbd2ce17c0859b48578b470228d 100644 (file)
@@ -516,6 +516,7 @@ from .movingimage import MovingImageIE
 from .msn import MSNIE
 from .mtv import (
     MTVIE,
+    MTVVideoIE,
     MTVServicesEmbeddedIE,
     MTVDEIE,
 )
@@ -611,13 +612,14 @@ from .nowtv import (
 )
 from .noz import NozIE
 from .npo import (
+    AndereTijdenIE,
     NPOIE,
     NPOLiveIE,
     NPORadioIE,
     NPORadioFragmentIE,
     SchoolTVIE,
     VPROIE,
-    WNLIE
+    WNLIE,
 )
 from .npr import NprIE
 from .nrk import (
index 8c417ab65b0478025ae92929830d03767448ebfa..fecfc28ae9667c128a7edf2e46bd83f123f46b4f 100644 (file)
@@ -11,9 +11,13 @@ class Formula1IE(InfoExtractor):
         'md5': '8c79e54be72078b26b89e0e111c0502b',
         'info_dict': {
             'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
-            'ext': 'flv',
+            'ext': 'mp4',
             'title': 'Race highlights - Spain 2016',
         },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
         'add_ie': ['Ooyala'],
     }, {
         'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
index 5a8403777ad45c547632ff3ad239876d7b3cc68c..91bc3a0a7c0af4690cf1a16713de1e76bccaa67a 100644 (file)
@@ -105,20 +105,20 @@ class KalturaIE(InfoExtractor):
                     kWidget\.(?:thumb)?[Ee]mbed\(
                     \{.*?
                         (?P<q1>['\"])wid(?P=q1)\s*:\s*
-                        (?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
+                        (?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
                         (?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
-                        (?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
+                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
                 """, webpage) or
             re.search(
                 r'''(?xs)
                     (?P<q1>["\'])
-                        (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
+                        (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
                     (?P=q1).*?
                     (?:
                         entry_?[Ii]d|
                         (?P<q2>["\'])entry_?[Ii]d(?P=q2)
                     )\s*:\s*
-                    (?P<q3>["\'])(?P<id>.+?)(?P=q3)
+                    (?P<q3>["\'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
                 ''', webpage))
         if mobj:
             embed_info = mobj.groupdict()
index aaf3f807a9217b2b3ce50f269dcdc6ebc3d29656..eb0a160089b395736a1370171ca7460e32f4e7e2 100644 (file)
@@ -21,6 +21,10 @@ class KetnetIE(InfoExtractor):
     }, {
         'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
         'only_matching': True,
+    }, {
+        # mzsource, geo restricted to Belgium
+        'url': 'https://www.ketnet.be/kijken/nachtwacht/de-bermadoe',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
@@ -36,9 +40,25 @@ class KetnetIE(InfoExtractor):
 
         title = config['title']
 
-        formats = self._extract_m3u8_formats(
-            config['source']['hls'], video_id, 'mp4',
-            entry_protocol='m3u8_native', m3u8_id='hls')
+        formats = []
+        for source_key in ('', 'mz'):
+            source = config.get('%ssource' % source_key)
+            if not isinstance(source, dict):
+                continue
+            for format_id, format_url in source.items():
+                if format_id == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id=format_id,
+                        fatal=False))
+                elif format_id == 'hds':
+                    formats.extend(self._extract_f4m_formats(
+                        format_url, video_id, f4m_id=format_id, fatal=False))
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                    })
         self._sort_formats(formats)
 
         return {
index e9cc9aa5983967861b08a2d9ee79297ae3a1726e..c48a5aad17ad36324b3cf70956d0ed234ffa522b 100644 (file)
@@ -29,7 +29,7 @@ from ..utils import (
 
 class LeIE(InfoExtractor):
     IE_DESC = '乐视网'
-    _VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
+    _VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
 
     _URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
 
@@ -73,6 +73,12 @@ class LeIE(InfoExtractor):
     }, {
         'url': 'http://sports.le.com/video/25737697.html',
         'only_matching': True,
+    }, {
+        'url': 'http://www.lesports.com/match/1023203003.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://sports.le.com/match/1023203003.html',
+        'only_matching': True,
     }]
 
     # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
index 6752ffee23140b17389be127ae2a4e3c11ee5582..b7bfa7a6d524e4a5ebd190947b52a369a211e753 100644 (file)
@@ -59,7 +59,7 @@ class LimelightBaseIE(InfoExtractor):
                     format_id = 'rtmp'
                     if stream.get('videoBitRate'):
                         format_id += '-%d' % int_or_none(stream['videoBitRate'])
-                    http_url = 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:])
+                    http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
                     urls.append(http_url)
                     http_fmt = fmt.copy()
                     http_fmt.update({
index bdda6881964f6abe4443182d2315387f6ac013a1..74a3a035e771803154b6685fcd4cbfd3dbb20a9b 100644 (file)
@@ -270,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
 
 
 class MTVIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtv'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
+    _FEED_URL = 'http://www.mtv.com/feeds/mrss/'
+
+    _TESTS = [{
+        'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
+        'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
+        'info_dict': {
+            'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
+            'ext': 'mp4',
+            'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
+            'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
+            'timestamp': 1468846800,
+            'upload_date': '20160718',
+        },
+    }, {
+        'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
+        'only_matching': True,
+    }]
+
+
+class MTVVideoIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtv:video'
     _VALID_URL = r'''(?x)^https?://
         (?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
            m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''
index a103e0323a6c62e4b0d283afdb6d4f5662bb1869..fea1caf478b2a862ae3a028b4a80041b734a5e1b 100644 (file)
@@ -9,9 +9,9 @@ from ..utils import (
 
 
 class MwaveIE(InfoExtractor):
-    _VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
     _URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
-    _TEST = {
+    _TESTS = [{
         'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
         # md5 is unstable
         'info_dict': {
@@ -23,7 +23,10 @@ class MwaveIE(InfoExtractor):
             'duration': 206,
             'view_count': int,
         }
-    }
+    }, {
+        'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
+        'only_matching': True,
+    }]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
@@ -60,8 +63,8 @@ class MwaveIE(InfoExtractor):
 
 
 class MwaveMeetGreetIE(InfoExtractor):
-    _VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
+    _TESTS = [{
         'url': 'http://mwave.interest.me/meetgreet/view/256',
         'info_dict': {
             'id': '173294',
@@ -72,7 +75,10 @@ class MwaveMeetGreetIE(InfoExtractor):
             'duration': 3634,
             'view_count': int,
         }
-    }
+    }, {
+        'url': 'http://mwave.interest.me/en/meetgreet/view/256',
+        'only_matching': True,
+    }]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
index 3293bdb17bbf75c5bfb96d0da7b72df8a6591057..9c7cc777b4297051628cc1f0cee78f84c59dff83 100644 (file)
@@ -5,6 +5,7 @@ import re
 from .common import InfoExtractor
 from ..utils import (
     fix_xml_ampersands,
+    orderedSet,
     parse_duration,
     qualities,
     strip_jsonp,
@@ -438,9 +439,29 @@ class SchoolTVIE(InfoExtractor):
         }
 
 
-class VPROIE(NPOIE):
+class NPOPlaylistBaseIE(NPOIE):
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        entries = [
+            self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
+            for video_id in orderedSet(re.findall(self._PLAYLIST_ENTRY_RE, webpage))
+        ]
+
+        playlist_title = self._html_search_regex(
+            self._PLAYLIST_TITLE_RE, webpage, 'playlist title',
+            default=None) or self._og_search_title(webpage)
+
+        return self.playlist_result(entries, playlist_id, playlist_title)
+
+
+class VPROIE(NPOPlaylistBaseIE):
     IE_NAME = 'vpro'
     _VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
+    _PLAYLIST_TITLE_RE = r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)'
+    _PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
 
     _TESTS = [
         {
@@ -453,12 +474,13 @@ class VPROIE(NPOIE):
                 'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
                 'upload_date': '20130225',
             },
+            'skip': 'Video gone',
         },
         {
             'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
             'info_dict': {
                 'id': 'sergio-herman',
-                'title': 'Sergio Herman: Fucking perfect',
+                'title': 'sergio herman: fucking perfect',
             },
             'playlist_count': 2,
         },
@@ -467,54 +489,40 @@ class VPROIE(NPOIE):
             'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
             'info_dict': {
                 'id': 'education-education',
-                'title': '2Doc',
+                'title': 'education education',
             },
             'playlist_count': 2,
         }
     ]
 
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, playlist_id)
 
-        entries = [
-            self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
-            for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
-        ]
-
-        playlist_title = self._search_regex(
-            r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
-            webpage, 'playlist title', default=None) or self._og_search_title(webpage)
-
-        return self.playlist_result(entries, playlist_id, playlist_title)
-
-
-class WNLIE(InfoExtractor):
+class WNLIE(NPOPlaylistBaseIE):
+    IE_NAME = 'wnl'
     _VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
+    _PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>'
+    _PLAYLIST_ENTRY_RE = r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>Deel \d+'
 
-    _TEST = {
+    _TESTS = [{
         'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
         'info_dict': {
             'id': 'vandaag-de-dag-6-mei',
             'title': 'Vandaag de Dag 6 mei',
         },
         'playlist_count': 4,
-    }
+    }]
 
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
 
-        webpage = self._download_webpage(url, playlist_id)
+class AndereTijdenIE(NPOPlaylistBaseIE):
+    IE_NAME = 'anderetijden'
+    _VALID_URL = r'https?://(?:www\.)?anderetijden\.nl/programma/(?:[^/]+/)+(?P<id>[^/?#&]+)'
+    _PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class=["\'].*?\bpage-title\b.*?["\'][^>]*>(.+?)</h1>'
+    _PLAYLIST_ENTRY_RE = r'<figure[^>]+class=["\']episode-container episode-page["\'][^>]+data-prid=["\'](.+?)["\']'
 
-        entries = [
-            self.url_result('npo:%s' % video_id, 'NPO')
-            for video_id, part in re.findall(
-                r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
-        ]
-
-        playlist_title = self._html_search_regex(
-            r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
-            webpage, 'playlist title')
-
-        return self.playlist_result(entries, playlist_id, playlist_title)
+    _TESTS = [{
+        'url': 'http://anderetijden.nl/programma/1/Andere-Tijden/aflevering/676/Duitse-soldaten-over-de-Slag-bij-Arnhem',
+        'info_dict': {
+            'id': 'Duitse-soldaten-over-de-Slag-bij-Arnhem',
+            'title': 'Duitse soldaten over de Slag bij Arnhem',
+        },
+        'playlist_count': 3,
+    }]
index b6e3ac25037656547974011d1dfffbf67694ad9d..4f5175136bb177f8bf21423a4f7b89fd4c29bbd8 100644 (file)
@@ -24,6 +24,22 @@ class OpenloadIE(InfoExtractor):
             'title': 'skyrim_no-audio_1080.mp4',
             'thumbnail': 're:^https?://.*\.jpg$',
         },
+    }, {
+        'url': 'https://openload.co/embed/rjC09fkPLYs',
+        'info_dict': {
+            'id': 'rjC09fkPLYs',
+            'ext': 'mp4',
+            'title': 'movie.mp4',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'subtitles': {
+                'en': [{
+                    'ext': 'vtt',
+                }],
+            },
+        },
+        'params': {
+            'skip_download': True,  # test subtitles only
+        },
     }, {
         'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
         'only_matching': True,
@@ -71,11 +87,17 @@ class OpenloadIE(InfoExtractor):
             'title', default=None) or self._html_search_meta(
             'description', webpage, 'title', fatal=True)
 
-        return {
+        entries = self._parse_html5_media_entries(url, webpage, video_id)
+        subtitles = entries[0]['subtitles'] if entries else None
+
+        info_dict = {
             'id': video_id,
             'title': title,
             'thumbnail': self._og_search_thumbnail(webpage, default=None),
             'url': video_url,
             # Seems all videos have extensions in their titles
             'ext': determine_ext(title),
+            'subtitles': subtitles,
         }
+
+        return info_dict
index eb1aeba46848cee3ea652c01b284ff44fe25292e..61043cad5c23880abba1666842cdf7a6490c44b3 100644 (file)
@@ -1,6 +1,8 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
+import re
+
 from .common import InfoExtractor
 from ..utils import (
     parse_iso8601,
@@ -41,6 +43,13 @@ class PeriscopeIE(PeriscopeBaseIE):
         'only_matching': True,
     }]
 
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
+        if mobj:
+            return mobj.group('url')
+
     def _real_extract(self, url):
         token = self._match_id(url)
 
@@ -78,7 +87,7 @@ class PeriscopeIE(PeriscopeBaseIE):
                 'ext': 'flv' if format_id == 'rtmp' else 'mp4',
             }
             if format_id != 'rtmp':
-                f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
+                f['protocol'] = 'm3u8_native' if state in ('ended', 'timed_out') else 'm3u8'
             formats.append(f)
         self._sort_formats(formats)
 
index f93bd19ff6dde40c87672b4fd18a3f1aab11382e..d40cca06f989b7c99329e1650497a06e9a6390e4 100644 (file)
@@ -7,7 +7,6 @@ from .common import InfoExtractor
 from ..utils import (
     determine_ext,
     ExtractorError,
-    sanitized_Request,
     urlencode_postdata,
 )
 
@@ -15,12 +14,12 @@ from ..utils import (
 class PromptFileIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
     _TEST = {
-        'url': 'http://www.promptfile.com/l/D21B4746E9-F01462F0FF',
-        'md5': 'd1451b6302da7215485837aaea882c4c',
+        'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
+        'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
         'info_dict': {
-            'id': 'D21B4746E9-F01462F0FF',
+            'id': '86D1CE8462-576CAAE416',
             'ext': 'mp4',
-            'title': 'Birds.mp4',
+            'title': 'oceans.mp4',
             'thumbnail': 're:^https?://.*\.jpg$',
         }
     }
@@ -33,14 +32,23 @@ class PromptFileIE(InfoExtractor):
             raise ExtractorError('Video %s does not exist' % video_id,
                                  expected=True)
 
+        chash = self._search_regex(
+            r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
         fields = self._hidden_inputs(webpage)
-        post = urlencode_postdata(fields)
-        req = sanitized_Request(url, post)
-        req.add_header('Content-type', 'application/x-www-form-urlencoded')
+        keys = list(fields.keys())
+        chash_key = keys[0] if len(keys) == 1 else next(
+            key for key in keys if key.startswith('cha'))
+        fields[chash_key] = chash + fields[chash_key]
+
         webpage = self._download_webpage(
-            req, video_id, 'Downloading video page')
+            url, video_id, 'Downloading video page',
+            data=urlencode_postdata(fields),
+            headers={'Content-type': 'application/x-www-form-urlencoded'})
 
-        url = self._html_search_regex(r'url:\s*\'([^\']+)\'', webpage, 'URL')
+        video_url = self._search_regex(
+            (r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
+             r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
+            webpage, 'video url', group='url')
         title = self._html_search_regex(
             r'<span.+title="([^"]+)">', webpage, 'title')
         thumbnail = self._html_search_regex(
@@ -49,7 +57,7 @@ class PromptFileIE(InfoExtractor):
 
         formats = [{
             'format_id': 'sd',
-            'url': url,
+            'url': video_url,
             'ext': determine_ext(title),
         }]
         self._sort_formats(formats)
index 7335dc2af971d4bc546de15b8a24e01359037048..873d4f981d90303dd06ded3d114bad7714c7b4ae 100644 (file)
@@ -122,7 +122,17 @@ class ProSiebenSat1BaseIE(InfoExtractor):
 class ProSiebenSat1IE(ProSiebenSat1BaseIE):
     IE_NAME = 'prosiebensat1'
     IE_DESC = 'ProSiebenSat.1 Digital'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            (?:
+                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
+                            )\.(?:de|at|ch)|
+                            ran\.de|fem\.com|advopedia\.de
+                        )
+                        /(?P<id>.+)
+                    '''
 
     _TESTS = [
         {
@@ -290,6 +300,24 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                 'skip_download': True,
             },
         },
+        {
+            # geo restricted to Germany
+            'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
+            'only_matching': True,
+        },
+        {
+            # geo restricted to Germany
+            'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.advopedia.de/videos/lenssen-klaert-auf/lenssen-klaert-auf-folge-8-staffel-3-feiertage-und-freie-tage',
+            'only_matching': True,
+        },
     ]
 
     _TOKEN = 'prosieben'
@@ -361,19 +389,28 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
     def _extract_playlist(self, url, webpage):
         playlist_id = self._html_search_regex(
             self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
-        for regex in self._PLAYLIST_CLIP_REGEXES:
-            playlist_clips = re.findall(regex, webpage)
-            if playlist_clips:
-                title = self._html_search_regex(
-                    self._TITLE_REGEXES, webpage, 'title')
-                description = self._html_search_regex(
-                    self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
-                entries = [
-                    self.url_result(
-                        re.match('(.+?//.+?)/', url).group(1) + clip_path,
-                        'ProSiebenSat1')
-                    for clip_path in playlist_clips]
-                return self.playlist_result(entries, playlist_id, title, description)
+        playlist = self._parse_json(
+            self._search_regex(
+                'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
+                webpage, 'playlist'),
+            playlist_id)
+        entries = []
+        for item in playlist:
+            clip_id = item.get('id') or item.get('upc')
+            if not clip_id:
+                continue
+            info = self._extract_video_info(url, clip_id)
+            info.update({
+                'id': clip_id,
+                'title': item.get('title') or item.get('teaser', {}).get('headline'),
+                'description': item.get('teaser', {}).get('description'),
+                'thumbnail': item.get('poster'),
+                'duration': float_or_none(item.get('duration')),
+                'series': item.get('tvShowTitle'),
+                'uploader': item.get('broadcastPublisher'),
+            })
+            entries.append(info)
+        return self.playlist_result(entries, playlist_id)
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
index 513c548290cec6952252f0c71387567ceb306fae..1a8114aa7d197ffa1da08a0560dde2736765db66 100644 (file)
@@ -260,7 +260,20 @@ class SoundcloudIE(InfoExtractor):
         return self._extract_info_dict(info, full_title, secret_token=token)
 
 
-class SoundcloudSetIE(SoundcloudIE):
+class SoundcloudPlaylistBaseIE(SoundcloudIE):
+    @staticmethod
+    def _extract_id(e):
+        return compat_str(e['id']) if e.get('id') else None
+
+    def _extract_track_entries(self, tracks):
+        return [
+            self.url_result(
+                track['permalink_url'], SoundcloudIE.ie_key(),
+                video_id=self._extract_id(track))
+            for track in tracks if track.get('permalink_url')]
+
+
+class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
     _VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[\w\d-]+)(?:/(?P<token>[^?/]+))?'
     IE_NAME = 'soundcloud:set'
     _TESTS = [{
@@ -299,7 +312,7 @@ class SoundcloudSetIE(SoundcloudIE):
             msgs = (compat_str(err['error_message']) for err in info['errors'])
             raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
 
-        entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in info['tracks']]
+        entries = self._extract_track_entries(info['tracks'])
 
         return {
             '_type': 'playlist',
@@ -309,7 +322,7 @@ class SoundcloudSetIE(SoundcloudIE):
         }
 
 
-class SoundcloudUserIE(SoundcloudIE):
+class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
     _VALID_URL = r'''(?x)
                         https?://
                             (?:(?:www|m)\.)?soundcloud\.com/
@@ -326,21 +339,21 @@ class SoundcloudUserIE(SoundcloudIE):
             'id': '114582580',
             'title': 'The Akashic Chronicler (All)',
         },
-        'playlist_mincount': 111,
+        'playlist_mincount': 74,
     }, {
         'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
         'info_dict': {
             'id': '114582580',
             'title': 'The Akashic Chronicler (Tracks)',
         },
-        'playlist_mincount': 50,
+        'playlist_mincount': 37,
     }, {
         'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
         'info_dict': {
             'id': '114582580',
             'title': 'The Akashic Chronicler (Playlists)',
         },
-        'playlist_mincount': 3,
+        'playlist_mincount': 2,
     }, {
         'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
         'info_dict': {
@@ -359,7 +372,7 @@ class SoundcloudUserIE(SoundcloudIE):
         'url': 'https://soundcloud.com/grynpyret/spotlight',
         'info_dict': {
             'id': '7098329',
-            'title': 'Grynpyret (Spotlight)',
+            'title': 'GRYNPYRET (Spotlight)',
         },
         'playlist_mincount': 1,
     }]
@@ -421,13 +434,14 @@ class SoundcloudUserIE(SoundcloudIE):
                 for cand in candidates:
                     if isinstance(cand, dict):
                         permalink_url = cand.get('permalink_url')
+                        entry_id = self._extract_id(cand)
                         if permalink_url and permalink_url.startswith('http'):
-                            return permalink_url
+                            return permalink_url, entry_id
 
             for e in collection:
-                permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
+                permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
                 if permalink_url:
-                    entries.append(self.url_result(permalink_url))
+                    entries.append(self.url_result(permalink_url, video_id=entry_id))
 
             next_href = response.get('next_href')
             if not next_href:
@@ -447,7 +461,7 @@ class SoundcloudUserIE(SoundcloudIE):
         }
 
 
-class SoundcloudPlaylistIE(SoundcloudIE):
+class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
     _VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
     IE_NAME = 'soundcloud:playlist'
     _TESTS = [{
@@ -477,7 +491,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
         data = self._download_json(
             base_url + data, playlist_id, 'Downloading playlist')
 
-        entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in data['tracks']]
+        entries = self._extract_track_entries(data['tracks'])
 
         return {
             '_type': 'playlist',
index c5a5843b6107b993b9a36f902e32c2cb97611dd1..3411fcf7eb753154aa034474641b5327be7ea127 100644 (file)
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
 import re
 
 from .common import InfoExtractor
+from ..compat import compat_urlparse
 from ..utils import (
     determine_ext,
     float_or_none,
@@ -13,6 +14,8 @@ from ..utils import (
     ExtractorError,
 )
 
+from .periscope import PeriscopeIE
+
 
 class TwitterBaseIE(InfoExtractor):
     def _get_vmap_video_url(self, vmap_url, video_id):
@@ -48,12 +51,12 @@ class TwitterCardIE(TwitterBaseIE):
         },
         {
             'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
-            'md5': 'd4724ffe6d2437886d004fa5de1043b3',
+            'md5': 'b6d9683dd3f48e340ded81c0e917ad46',
             'info_dict': {
                 'id': 'dq4Oj5quskI',
                 'ext': 'mp4',
                 'title': 'Ubuntu 11.10 Overview',
-                'description': 'Take a quick peek at what\'s new and improved in Ubuntu 11.10.\n\nOnce installed take a look at 10 Things to Do After Installing: http://www.omgubuntu.co.uk/2011/10/10...',
+                'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
                 'upload_date': '20111013',
                 'uploader': 'OMG! Ubuntu!',
                 'uploader_id': 'omgubuntu',
@@ -100,12 +103,17 @@ class TwitterCardIE(TwitterBaseIE):
             return self.url_result(iframe_url)
 
         config = self._parse_json(self._html_search_regex(
-            r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
+            r'data-(?:player-)?config="([^"]+)"', webpage,
+            'data player config', default='{}'),
             video_id)
 
         if config.get('source_type') == 'vine':
             return self.url_result(config['player_url'], 'Vine')
 
+        periscope_url = PeriscopeIE._extract_url(webpage)
+        if periscope_url:
+            return self.url_result(periscope_url, PeriscopeIE.ie_key())
+
         def _search_dimensions_in_video_url(a_format, video_url):
             m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
             if m:
@@ -244,10 +252,10 @@ class TwitterIE(InfoExtractor):
         'info_dict': {
             'id': '700207533655363584',
             'ext': 'mp4',
-            'title': 'Donte The Dumbass - BEAT PROD: @suhmeduh #Damndaniel',
-            'description': 'Donte The Dumbass on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
+            'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
+            'description': 'JG on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
             'thumbnail': 're:^https?://.*\.jpg',
-            'uploader': 'Donte The Dumbass',
+            'uploader': 'JG',
             'uploader_id': 'jaydingeer',
         },
         'params': {
@@ -278,6 +286,18 @@ class TwitterIE(InfoExtractor):
         'params': {
             'skip_download': True,  # requires ffmpeg
         },
+    }, {
+        'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
+        'info_dict': {
+            'id': '1zqKVVlkqLaKB',
+            'ext': 'mp4',
+            'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
+            'upload_date': '20160923',
+            'uploader_id': 'OPP_HSD',
+            'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
+            'timestamp': 1474613214,
+        },
+        'add_ie': ['Periscope'],
     }]
 
     def _real_extract(self, url):
@@ -328,13 +348,22 @@ class TwitterIE(InfoExtractor):
             })
             return info
 
+        twitter_card_url = None
         if 'class="PlayableMedia' in webpage:
+            twitter_card_url = '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid)
+        else:
+            twitter_card_iframe_url = self._search_regex(
+                r'data-full-card-iframe-url=([\'"])(?P<url>(?:(?!\1).)+)\1',
+                webpage, 'Twitter card iframe URL', default=None, group='url')
+            if twitter_card_iframe_url:
+                twitter_card_url = compat_urlparse.urljoin(url, twitter_card_iframe_url)
+
+        if twitter_card_url:
             info.update({
                 '_type': 'url_transparent',
                 'ie_key': 'TwitterCard',
-                'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
+                'url': twitter_card_url,
             })
-
             return info
 
         raise ExtractorError('There\'s no video in this tweet.')
index cd22df25a2623e63ccbdabe791e2c497ac09c656..f26e0732c2b0693456acec3e9fb2390b36016d97 100644 (file)
@@ -23,8 +23,9 @@ from ..utils import (
     unified_strdate,
     urlencode_postdata,
 )
-from .vimeo import VimeoIE
+from .dailymotion import DailymotionIE
 from .pladform import PladformIE
+from .vimeo import VimeoIE
 
 
 class VKBaseIE(InfoExtractor):
@@ -210,6 +211,23 @@ class VKIE(VKBaseIE):
                 'view_count': int,
             },
         },
+        {
+            # dailymotion embed
+            'url': 'https://vk.com/video-37468416_456239855',
+            'info_dict': {
+                'id': 'k3lz2cmXyRuJQSjGHUv',
+                'ext': 'mp4',
+                'title': 'md5:d52606645c20b0ddbb21655adaa4f56f',
+                'description': 'md5:c651358f03c56f1150b555c26d90a0fd',
+                'uploader': 'AniLibria.Tv',
+                'upload_date': '20160914',
+                'uploader_id': 'x1p5vl5',
+                'timestamp': 1473877246,
+            },
+            'params': {
+                'skip_download': True,
+            }
+        },
         {
             # video key is extra_data not url\d+
             'url': 'http://vk.com/video-110305615_171782105',
@@ -315,6 +333,10 @@ class VKIE(VKBaseIE):
                 m_rutube.group(1).replace('\\', ''))
             return self.url_result(rutube_url)
 
+        dailymotion_urls = DailymotionIE._extract_urls(info_page)
+        if dailymotion_urls:
+            return self.url_result(dailymotion_urls[0], DailymotionIE.ie_key())
+
         m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
         if m_opts:
             m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))
index b1b32ad44ecfd796e46219a87ea71caa3587face..f8e33149398bde16115114bb0323d2f286ee9d42 100644 (file)
@@ -9,13 +9,16 @@ class VoxMediaIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com/(?:[^/]+/)*(?P<id>[^/?]+)'
     _TESTS = [{
         'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
-        'md5': '73856edf3e89a711e70d5cf7cb280b37',
         'info_dict': {
             'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
             'ext': 'mp4',
             'title': 'Google\'s new material design direction',
             'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
         },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
         'add_ie': ['Ooyala'],
     }, {
         # data-ooyala-id
@@ -31,13 +34,16 @@ class VoxMediaIE(InfoExtractor):
     }, {
         # volume embed
         'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
-        'md5': '375c483c5080ab8cd85c9c84cfc2d1e4',
         'info_dict': {
             'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
             'ext': 'mp4',
             'title': 'The new frontier of LGBTQ civil rights, explained',
             'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
         },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
         'add_ie': ['Ooyala'],
     }, {
         # youtube embed
index 5ca903825243f4427ae6a6e5819a9c00376f7510..f86823112297d40aeb9e847d182f905bcf7afb76 100644 (file)
@@ -369,7 +369,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
     IE_NAME = 'youtube'
     _TESTS = [
         {
-            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
+            'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
             'info_dict': {
                 'id': 'BaW_jenozKc',
                 'ext': 'mp4',
@@ -389,7 +389,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
             }
         },
         {
-            'url': 'http://www.youtube.com/watch?v=UxxajLWwzqY',
+            'url': 'https://www.youtube.com/watch?v=UxxajLWwzqY',
             'note': 'Test generic use_cipher_signature video (#897)',
             'info_dict': {
                 'id': 'UxxajLWwzqY',
@@ -443,7 +443,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
             }
         },
         {
-            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
+            'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
             'note': 'Use the first video ID in the URL',
             'info_dict': {
                 'id': 'BaW_jenozKc',
@@ -465,7 +465,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
             },
         },
         {
-            'url': 'http://www.youtube.com/watch?v=a9LDPn-MO4I',
+            'url': 'https://www.youtube.com/watch?v=a9LDPn-MO4I',
             'note': '256k DASH audio (format 141) via DASH manifest',
             'info_dict': {
                 'id': 'a9LDPn-MO4I',
@@ -539,7 +539,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
         },
         # Normal age-gate video (No vevo, embed allowed)
         {
-            'url': 'http://youtube.com/watch?v=HtVdAasjOgU',
+            'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
             'info_dict': {
                 'id': 'HtVdAasjOgU',
                 'ext': 'mp4',
@@ -555,7 +555,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
         },
         # Age-gate video with encrypted signature
         {
-            'url': 'http://www.youtube.com/watch?v=6kLq3WMV1nU',
+            'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
             'info_dict': {
                 'id': '6kLq3WMV1nU',
                 'ext': 'mp4',
@@ -748,11 +748,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
             'skip': 'Not multifeed anymore',
         },
         {
-            'url': 'http://vid.plus/FlRa-iH7PGw',
+            'url': 'https://vid.plus/FlRa-iH7PGw',
             'only_matching': True,
         },
         {
-            'url': 'http://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
+            'url': 'https://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
             'only_matching': True,
         },
         {
@@ -1846,7 +1846,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
         'playlist_count': 2,
     }, {
         'note': 'embedded',
-        'url': 'http://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
+        'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
         'playlist_count': 4,
         'info_dict': {
             'title': 'JODA15',
@@ -1854,7 +1854,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
         }
     }, {
         'note': 'Embedded SWF player',
-        'url': 'http://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
+        'url': 'https://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
         'playlist_count': 4,
         'info_dict': {
             'title': 'JODA7',
@@ -2156,7 +2156,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
     IE_NAME = 'youtube:live'
 
     _TESTS = [{
-        'url': 'http://www.youtube.com/user/TheYoungTurks/live',
+        'url': 'https://www.youtube.com/user/TheYoungTurks/live',
         'info_dict': {
             'id': 'a48o2S1cPoo',
             'ext': 'mp4',
@@ -2176,7 +2176,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
             'skip_download': True,
         },
     }, {
-        'url': 'http://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
+        'url': 'https://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
         'only_matching': True,
     }]
 
@@ -2201,7 +2201,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
     IE_NAME = 'youtube:playlists'
 
     _TESTS = [{
-        'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
+        'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
         'playlist_mincount': 4,
         'info_dict': {
             'id': 'ThirstForScience',
@@ -2209,7 +2209,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
         },
     }, {
         # with "Load more" button
-        'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
+        'url': 'https://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
         'playlist_mincount': 70,
         'info_dict': {
             'id': 'igorkle1',
@@ -2442,10 +2442,10 @@ class YoutubeTruncatedURLIE(InfoExtractor):
     '''
 
     _TESTS = [{
-        'url': 'http://www.youtube.com/watch?annotation_id=annotation_3951667041',
+        'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
         'only_matching': True,
     }, {
-        'url': 'http://www.youtube.com/watch?',
+        'url': 'https://www.youtube.com/watch?',
         'only_matching': True,
     }, {
         'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
@@ -2466,7 +2466,7 @@ class YoutubeTruncatedURLIE(InfoExtractor):
             'Did you forget to quote the URL? Remember that & is a meta '
             'character in most shells, so you want to put the URL in quotes, '
             'like  youtube-dl '
-            '"http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
+            '"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
             ' or simply  youtube-dl BaW_jenozKc  .',
             expected=True)
 
index 9d31381814978e2b27c817c3dce50d35a74306d8..af0c2cfc4e14360c526cd629e9c41db38c8ea2ea 100644 (file)
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals
 
-__version__ = '2016.09.19'
+__version__ = '2016.09.27'