Merge pull request #7326 from remitamine/clipfish

author remitamine <remitamine@gmail.com>

Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)

committer remitamine <remitamine@gmail.com>

Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)
author remitamine <remitamine@gmail.com>
Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)
committer remitamine <remitamine@gmail.com>
Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)
diff --git a/AUTHORS b/AUTHORS

index cc552bcb2ceb5b00acc81f09ac93b899084f27c8..cdb56de3b9c3c65d3bcd8fa2326161f9d3fd2e3e 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -144,3 +144,6 @@ Lee Jenkins
  Anssi Hannula
  Lukáš Lalinský
  Qijiang Fan
+Rémy Léone
+Marco Ferragina
+reiv
diff --git a/Makefile b/Makefile

index fdb1abb60cacfe49295a7438e3d0f4f51c248359..f826c16857846635e2c84ae8954d44f9aac3e48b 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -61,34 +61,34 @@ youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
         chmod a+x youtube-dl
  
  README.md: youtube_dl/*.py youtube_dl/*/*.py
-       COLUMNS=80 python youtube_dl/__main__.py --help | python devscripts/make_readme.py
+       COLUMNS=80 $(PYTHON) youtube_dl/__main__.py --help | $(PYTHON) devscripts/make_readme.py
  
  CONTRIBUTING.md: README.md
-       python devscripts/make_contributing.py README.md CONTRIBUTING.md
+       $(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
  
  supportedsites:
-       python devscripts/make_supportedsites.py docs/supportedsites.md
+       $(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
  
  README.txt: README.md
         pandoc -f markdown -t plain README.md -o README.txt
  
  youtube-dl.1: README.md
-       python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
+       $(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md
         pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
         rm -f youtube-dl.1.temp.md
  
  youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
-       python devscripts/bash-completion.py
+       $(PYTHON) devscripts/bash-completion.py
  
  bash-completion: youtube-dl.bash-completion
  
  youtube-dl.zsh: youtube_dl/*.py youtube_dl/*/*.py devscripts/zsh-completion.in
-       python devscripts/zsh-completion.py
+       $(PYTHON) devscripts/zsh-completion.py
  
  zsh-completion: youtube-dl.zsh
  
  youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
-       python devscripts/fish-completion.py
+       $(PYTHON) devscripts/fish-completion.py
  
  fish-completion: youtube-dl.fish
  
diff --git a/README.md b/README.md

index 38db97c5980e8a3ff186e6e6d8d80fd801eb4e51..df419abe83ce3c8e5d8202ad12991aaefd4c55b9 100644 (file)
--- a/README.md
+++ b/README.md
@@ -319,7 +319,8 @@ which means you can modify it, redistribute it or use it however you like.
      --all-formats                    Download all available video formats
      --prefer-free-formats            Prefer free video formats unless a specific
                                       one is requested
-    -F, --list-formats               List all available formats
+    -F, --list-formats               List all available formats of specified
+                                     videos
      --youtube-skip-dash-manifest     Do not download the DASH manifests and
                                       related data on YouTube videos
      --merge-output-format FORMAT     If a merge is required (e.g.
@@ -329,8 +330,8 @@ which means you can modify it, redistribute it or use it however you like.
  
  ## Subtitle Options:
      --write-sub                      Write subtitle file
-    --write-auto-sub                 Write automatic subtitle file (YouTube
-                                     only)
+    --write-auto-sub                 Write automatically generated subtitle file
+                                     (YouTube only)
      --all-subs                       Download all the available subtitles of the
                                       video
      --list-subs                      List all available subtitles for the video
@@ -534,6 +535,12 @@ Most people asking this question are not aware that youtube-dl now defaults to d
  
  Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
  
+### Do I need any other programs?
+
+youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
+
+Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
+
  ### I have downloaded a video but how can I play it?
  
  Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 03561b87de38c56a0f641a497c694961d70ef04a..1df4086101f18f42daa5e436512136932bf680fe 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -67,7 +67,8 @@
   - **Bpb**: Bundeszentrale für politische Bildung
   - **BR**: Bayerischer Rundfunk Mediathek
   - **Break**
- - **Brightcove**
+ - **brightcove:legacy**
+ - **brightcove:new**
   - **bt:article**: Bergens Tidende Articles
   - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
   - **BuzzFeed**
@@ -93,6 +94,7 @@
   - **Clipsyndicate**
   - **Cloudy**
   - **Clubic**
+ - **Clyp**
   - **cmt.com**
   - **CNET**
   - **CNN**
@@ -122,10 +124,12 @@
   - **DctpTv**
   - **DeezerPlaylist**
   - **defense.gouv.fr**
+ - **democracynow**
   - **DHM**: Filmarchiv - Deutsches Historisches Museum
   - **Discovery**
   - **Dotsub**
   - **DouyuTV**: 斗鱼
+ - **DPlay**
   - **dramafever**
   - **dramafever:series**
   - **DRBonanza**
@@ -194,10 +198,10 @@
   - **Giga**
   - **Glide**: Glide mobile video messages (glide.me)
   - **Globo**
+ - **GloboArticle**
   - **GodTube**
   - **GoldenMoustache**
   - **Golem**
- - **GorillaVid**: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net and filehoot.com
   - **Goshgay**
   - **Groupon**
   - **Hark**
@@ -281,7 +285,7 @@
   - **macgamestore**: MacGameStore trailers
   - **mailru**: Видео@Mail.Ru
   - **Malemotion**
- - **MDR**
+ - **MDR**: MDR.DE and KiKA
   - **media.ccc.de**
   - **metacafe**
   - **Metacritic**
@@ -364,6 +368,7 @@
   - **nowness:playlist**
   - **nowness:series**
   - **NowTV**
+ - **NowTVList**
   - **nowvideo**: NowVideo
   - **npo**: npo.nl and ntr.nl
   - **npo.nl:live**
@@ -423,7 +428,6 @@
   - **qqmusic:playlist**: QQ音乐 - 歌单
   - **qqmusic:singer**: QQ音乐 - 歌手
   - **qqmusic:toplist**: QQ音乐 - 排行榜
- - **Quickscope**: Quick Scope
   - **QuickVid**
   - **R7**
   - **radio.de**
@@ -490,6 +494,7 @@
   - **soompi:show**
   - **soundcloud**
   - **soundcloud:playlist**
+ - **soundcloud:search**: Soundcloud search
   - **soundcloud:set**
   - **soundcloud:user**
   - **soundgasm**
@@ -616,7 +621,6 @@
   - **video.mit.edu**
   - **VideoDetective**
   - **videofy.me**
- - **videolectures.net**
   - **VideoMega**
   - **VideoPremium**
   - **VideoTt**: video.tt - Your True Tube
@@ -626,6 +630,7 @@
   - **vier**
   - **vier:videos**
   - **Viewster**
+ - **Viidea**
   - **viki**
   - **viki:channel**
   - **vimeo**
@@ -668,6 +673,7 @@
   - **WSJ**: Wall Street Journal
   - **XBef**
   - **XboxClips**
+ - **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
   - **XHamster**
   - **XHamsterEmbed**
   - **XMinus**
@@ -702,6 +708,7 @@
   - **youtube:show**: YouTube.com (multi-season) shows
   - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
   - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
+ - **youtube:user:playlists**: YouTube.com user playlists
   - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
   - **Zapiks**
   - **ZDF**
diff --git a/test/test_compat.py b/test/test_compat.py

index 4ee0dc99d0791095d11cdfbdde94f30b52feca64..b6bfad05e3c85c07854cc00c337a12caf493e849 100644 (file)
--- a/test/test_compat.py
+++ b/test/test_compat.py
@@ -13,8 +13,10 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  from youtube_dl.utils import get_filesystem_encoding
  from youtube_dl.compat import (
      compat_getenv,
+    compat_etree_fromstring,
      compat_expanduser,
      compat_shlex_split,
+    compat_str,
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
  )
@@ -71,5 +73,20 @@ class TestCompat(unittest.TestCase):
      def test_compat_shlex_split(self):
          self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
  
+    def test_compat_etree_fromstring(self):
+        xml = '''
+            <root foo="bar" spam="中文">
+                <normal>foo</normal>
+                <chinese>中文</chinese>
+                <foo><bar>spam</bar></foo>
+            </root>
+        '''
+        doc = compat_etree_fromstring(xml.encode('utf-8'))
+        self.assertTrue(isinstance(doc.attrib['foo'], compat_str))
+        self.assertTrue(isinstance(doc.attrib['spam'], compat_str))
+        self.assertTrue(isinstance(doc.find('normal').text, compat_str))
+        self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
+        self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_jsinterp.py b/test/test_jsinterp.py

index fc73e5dc29a5c8faab88f4604f99df4ee9de6b2e..63c350b8fa986fc63d70af43a6a0fdcaf5958eed 100644 (file)
--- a/test/test_jsinterp.py
+++ b/test/test_jsinterp.py
@@ -19,6 +19,9 @@ class TestJSInterpreter(unittest.TestCase):
          jsi = JSInterpreter('function x3(){return 42;}')
          self.assertEqual(jsi.call_function('x3'), 42)
  
+        jsi = JSInterpreter('var x5 = function(){return 42;}')
+        self.assertEqual(jsi.call_function('x5'), 42)
+
      def test_calc(self):
          jsi = JSInterpreter('function x4(a){return 2*a+1;}')
          self.assertEqual(jsi.call_function('x4', 3), 7)
diff --git a/test/test_subtitles.py b/test/test_subtitles.py

index 0343967d9d35c26f869825f9a3a41b1b0016f67a..75f0ea75fccae82d74d0efba9c39906ff410d367 100644 (file)
--- a/test/test_subtitles.py
+++ b/test/test_subtitles.py
@@ -28,6 +28,7 @@ from youtube_dl.extractor import (
      ThePlatformFeedIE,
      RTVEALaCartaIE,
      FunnyOrDieIE,
+    DemocracynowIE,
  )
  
  
@@ -346,5 +347,25 @@ class TestFunnyOrDieSubtitles(BaseTestSubtitles):
          self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
  
  
+class TestDemocracynowSubtitles(BaseTestSubtitles):
+    url = 'http://www.democracynow.org/shows/2015/7/3'
+    IE = DemocracynowIE
+
+    def test_allsubtitles(self):
+        self.DL.params['writesubtitles'] = True
+        self.DL.params['allsubtitles'] = True
+        subtitles = self.getSubtitles()
+        self.assertEqual(set(subtitles.keys()), set(['en']))
+        self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
+
+    def test_subtitles_in_page(self):
+        self.url = 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree'
+        self.DL.params['writesubtitles'] = True
+        self.DL.params['allsubtitles'] = True
+        subtitles = self.getSubtitles()
+        self.assertEqual(set(subtitles.keys()), set(['en']))
+        self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
+
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_utils.py b/test/test_utils.py

index 0c34f0e551dcc78121f288e032ab14dcea5eabad..501355c74ad9a745bf8788d0f2e2c603f11b39d8 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -21,6 +21,7 @@ from youtube_dl.utils import (
      clean_html,
      DateRange,
      detect_exe_version,
+    determine_ext,
      encodeFilename,
      escape_rfc3986,
      escape_url,
@@ -68,6 +69,9 @@ from youtube_dl.utils import (
      cli_valueless_option,
      cli_bool_option,
  )
+from youtube_dl.compat import (
+    compat_etree_fromstring,
+)
  
  
  class TestUtil(unittest.TestCase):
@@ -207,8 +211,8 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(unescapeHTML('%20;'), '%20;')
          self.assertEqual(unescapeHTML('&#x2F;'), '/')
          self.assertEqual(unescapeHTML('&#47;'), '/')
-        self.assertEqual(
-            unescapeHTML('&eacute;'), 'é')
+        self.assertEqual(unescapeHTML('&eacute;'), 'é')
+        self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
  
      def test_daterange(self):
          _20century = DateRange("19000101", "20000101")
@@ -233,6 +237,14 @@ class TestUtil(unittest.TestCase):
              unified_strdate('2/2/2015 6:47:40 PM', day_first=False),
              '20150202')
          self.assertEqual(unified_strdate('25-09-2014'), '20140925')
+        self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
+
+    def test_determine_ext(self):
+        self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
+        self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
  
      def test_find_xpath_attr(self):
          testxml = '''<root>
@@ -242,7 +254,7 @@ class TestUtil(unittest.TestCase):
              <node x="b" y="d" />
              <node x="" />
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
  
          self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n'), None)
          self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None)
@@ -263,7 +275,7 @@ class TestUtil(unittest.TestCase):
                  <url>http://server.com/download.mp3</url>
              </media:song>
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
          find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
          self.assertTrue(find('media:song') is not None)
          self.assertEqual(find('media:song/media:author').text, 'The Author')
@@ -275,9 +287,16 @@ class TestUtil(unittest.TestCase):
          p = xml.etree.ElementTree.SubElement(div, 'p')
          p.text = 'Foo'
          self.assertEqual(xpath_element(doc, 'div/p'), p)
+        self.assertEqual(xpath_element(doc, ['div/p']), p)
+        self.assertEqual(xpath_element(doc, ['div/bar', 'div/p']), p)
          self.assertEqual(xpath_element(doc, 'div/bar', default='default'), 'default')
+        self.assertEqual(xpath_element(doc, ['div/bar'], default='default'), 'default')
          self.assertTrue(xpath_element(doc, 'div/bar') is None)
+        self.assertTrue(xpath_element(doc, ['div/bar']) is None)
+        self.assertTrue(xpath_element(doc, ['div/bar'], 'div/baz') is None)
          self.assertRaises(ExtractorError, xpath_element, doc, 'div/bar', fatal=True)
+        self.assertRaises(ExtractorError, xpath_element, doc, ['div/bar'], fatal=True)
+        self.assertRaises(ExtractorError, xpath_element, doc, ['div/bar', 'div/baz'], fatal=True)
  
      def test_xpath_text(self):
          testxml = '''<root>
@@ -285,7 +304,7 @@ class TestUtil(unittest.TestCase):
                  <p>Foo</p>
              </div>
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
          self.assertEqual(xpath_text(doc, 'div/p'), 'Foo')
          self.assertEqual(xpath_text(doc, 'div/bar', default='default'), 'default')
          self.assertTrue(xpath_text(doc, 'div/bar') is None)
@@ -297,7 +316,7 @@ class TestUtil(unittest.TestCase):
                  <p x="a">Foo</p>
              </div>
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
          self.assertEqual(xpath_attr(doc, 'div/p', 'x'), 'a')
          self.assertEqual(xpath_attr(doc, 'div/bar', 'x'), None)
          self.assertEqual(xpath_attr(doc, 'div/p', 'y'), None)
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index 12977bf808024875e5f05cd15136ebe2b96923f5..9a8c7da05172e342959d2ec5ffa481ab951fa763 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -28,6 +28,7 @@ if os.name == 'nt':
      import ctypes
  
  from .compat import (
+    compat_basestring,
      compat_cookiejar,
      compat_expanduser,
      compat_get_terminal_size,
@@ -63,6 +64,7 @@ from .utils import (
      SameFileError,
      sanitize_filename,
      sanitize_path,
+    sanitized_Request,
      std_headers,
      subtitles_filename,
      UnavailableVideoError,
@@ -156,7 +158,7 @@ class YoutubeDL(object):
      writethumbnail:    Write the thumbnail image to a file
      write_all_thumbnails:  Write all thumbnail formats to files
      writesubtitles:    Write the video subtitles to a file
-    writeautomaticsub: Write the automatic subtitles to a file
+    writeautomaticsub: Write the automatically generated subtitles to a file
      allsubtitles:      Downloads all the subtitles of the video
                         (requires writesubtitles or writeautomaticsub)
      listsubtitles:     Lists all available subtitles for the video
@@ -572,7 +574,7 @@ class YoutubeDL(object):
                                   if v is not None)
              template_dict = collections.defaultdict(lambda: 'NA', template_dict)
  
-            outtmpl = sanitize_path(self.params.get('outtmpl', DEFAULT_OUTTMPL))
+            outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
              tmpl = compat_expanduser(outtmpl)
              filename = tmpl % template_dict
              # Temporary fix for #4787
@@ -580,7 +582,7 @@ class YoutubeDL(object):
              # to workaround encoding issues with subprocess on python2 @ Windows
              if sys.version_info < (3, 0) and sys.platform == 'win32':
                  filename = encodeFilename(filename, True).decode(preferredencoding())
-            return filename
+            return sanitize_path(filename)
          except ValueError as err:
              self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
              return None
@@ -833,6 +835,7 @@ class YoutubeDL(object):
                                                        extra_info=extra)
                  playlist_results.append(entry_result)
              ie_result['entries'] = playlist_results
+            self.to_screen('[download] Finished downloading playlist: %s' % playlist)
              return ie_result
          elif result_type == 'compat_list':
              self.report_warning(
@@ -937,7 +940,7 @@ class YoutubeDL(object):
                      filter_parts.append(string)
  
          def _remove_unused_ops(tokens):
-            # Remove operators that we don't use and join them with the sourrounding strings
+            # Remove operators that we don't use and join them with the surrounding strings
              # for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
              ALLOWED_OPS = ('/', '+', ',', '(', ')')
              last_string, last_start, last_end, last_line = None, None, None, None
@@ -1186,7 +1189,7 @@ class YoutubeDL(object):
          return res
  
      def _calc_cookies(self, info_dict):
-        pr = compat_urllib_request.Request(info_dict['url'])
+        pr = sanitized_Request(info_dict['url'])
          self.cookiejar.add_cookie_header(pr)
          return pr.get_header('Cookie')
  
@@ -1870,6 +1873,8 @@ class YoutubeDL(object):
  
      def urlopen(self, req):
          """ Start an HTTP download """
+        if isinstance(req, compat_basestring):
+            req = sanitized_Request(req)
          return self._opener.open(req, timeout=self._socket_timeout)
  
      def print_debug_header(self):
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 5e2ed4d4b1b48b751b69795f352e72e7efbe5e02..9f131f5db46e209a899fe3cc2d9b95ab17158bd5 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -377,7 +377,7 @@ def _real_main(argv=None):
      with YoutubeDL(ydl_opts) as ydl:
          # Update version
          if opts.update_self:
-            update_self(ydl.to_screen, opts.verbose)
+            update_self(ydl.to_screen, opts.verbose, ydl._opener)
  
          # Remove cache dir
          if opts.rm_cachedir:
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index d103ab9adf73ee664a0639e33191ef2ff89431ce..a3e85264acda8dbefe0883ea50c901302d01d29f 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -14,6 +14,7 @@ import socket
  import subprocess
  import sys
  import itertools
+import xml.etree.ElementTree
  
  
  try:
@@ -212,6 +213,43 @@ try:
  except ImportError:  # Python 2.6
      from xml.parsers.expat import ExpatError as compat_xml_parse_error
  
+if sys.version_info[0] >= 3:
+    compat_etree_fromstring = xml.etree.ElementTree.fromstring
+else:
+    # python 2.x tries to encode unicode strings with ascii (see the
+    # XMLParser._fixtext method)
+    etree = xml.etree.ElementTree
+
+    try:
+        _etree_iter = etree.Element.iter
+    except AttributeError:  # Python <=2.6
+        def _etree_iter(root):
+            for el in root.findall('*'):
+                yield el
+                for sub in _etree_iter(el):
+                    yield sub
+
+    # on 2.6 XML doesn't have a parser argument, function copied from CPython
+    # 2.7 source
+    def _XML(text, parser=None):
+        if not parser:
+            parser = etree.XMLParser(target=etree.TreeBuilder())
+        parser.feed(text)
+        return parser.close()
+
+    def _element_factory(*args, **kwargs):
+        el = etree.Element(*args, **kwargs)
+        for k, v in el.items():
+            if isinstance(v, bytes):
+                el.set(k, v.decode('utf-8'))
+        return el
+
+    def compat_etree_fromstring(text):
+        doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
+        for el in _etree_iter(doc):
+            if el.text is not None and isinstance(el.text, bytes):
+                el.text = el.text.decode('utf-8')
+        return doc
  
  try:
      from urllib.parse import parse_qs as compat_parse_qs
@@ -507,6 +545,7 @@ __all__ = [
      'compat_chr',
      'compat_cookiejar',
      'compat_cookies',
+    'compat_etree_fromstring',
      'compat_expanduser',
      'compat_get_terminal_size',
      'compat_getenv',
diff --git a/youtube_dl/downloader/common.py b/youtube_dl/downloader/common.py

index 29a4500d3a02920ec586d38338c659ff4c43124c..b8bf8daf8c3265f9baa53d3ac30ee78d1c149587 100644 (file)
--- a/youtube_dl/downloader/common.py
+++ b/youtube_dl/downloader/common.py
@@ -42,7 +42,7 @@ class FileDownloader(object):
      min_filesize:       Skip files smaller than this size
      max_filesize:       Skip files larger than this size
      xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
-                        (experimenatal)
+                        (experimental)
      external_downloader_args:  A list of additional command-line arguments for the
                          external downloader.
  
diff --git a/youtube_dl/downloader/dash.py b/youtube_dl/downloader/dash.py

index 8b6fa2753adbafcb1a0ab26788dcec2be5903638..535f2a7fc7236a5717f1b0bf0375a5876cf00226 100644 (file)
--- a/youtube_dl/downloader/dash.py
+++ b/youtube_dl/downloader/dash.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import FileDownloader
-from ..compat import compat_urllib_request
+from ..utils import sanitized_Request
  
  
  class DashSegmentsFD(FileDownloader):
@@ -22,7 +22,7 @@ class DashSegmentsFD(FileDownloader):
  
          def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
              self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
-            req = compat_urllib_request.Request(target_url)
+            req = sanitized_Request(target_url)
              if remaining_bytes is not None:
                  req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
  
diff --git a/youtube_dl/downloader/f4m.py b/youtube_dl/downloader/f4m.py

index 7f6143954d3d4fdd633eefd3b85a75ea5c07dda8..6170cc1552194104ea1f029ec9b54c2b051e7a9c 100644 (file)
--- a/youtube_dl/downloader/f4m.py
+++ b/youtube_dl/downloader/f4m.py
@@ -5,10 +5,10 @@ import io
  import itertools
  import os
  import time
-import xml.etree.ElementTree as etree
  
  from .fragment import FragmentFD
  from ..compat import (
+    compat_etree_fromstring,
      compat_urlparse,
      compat_urllib_error,
      compat_urllib_parse_urlparse,
@@ -290,7 +290,7 @@ class F4mFD(FragmentFD):
          man_url = urlh.geturl()
          manifest = urlh.read()
  
-        doc = etree.fromstring(manifest)
+        doc = compat_etree_fromstring(manifest)
          formats = [(int(f.attrib.get('bitrate', -1)), f)
                     for f in self._get_unencrypted_media(doc)]
          if requested_bitrate is None:
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index 9a83a73dd6b16db50cc4e72a343b1669c146a386..b5a3e11676e72d070ec1a48c6483d0cc630c70a7 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -13,6 +13,7 @@ from ..utils import (
      encodeArgument,
      encodeFilename,
      sanitize_open,
+    handle_youtubedl_headers,
  )
  
  
@@ -33,9 +34,10 @@ class HlsFD(FileDownloader):
          if info_dict['http_headers'] and re.match(r'^https?://', url):
              # Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
              # [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
+            headers = handle_youtubedl_headers(info_dict['http_headers'])
              args += [
                  '-headers',
-                ''.join('%s: %s\r\n' % (key, val) for key, val in info_dict['http_headers'].items())]
+                ''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
  
          args += ['-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc']
  
diff --git a/youtube_dl/downloader/http.py b/youtube_dl/downloader/http.py

index a29f5cf31fddf3f3b15f47311fafeef9ab4f3d7a..56840e026cc710a7c709da13f637e62842ccfd84 100644 (file)
--- a/youtube_dl/downloader/http.py
+++ b/youtube_dl/downloader/http.py
@@ -7,14 +7,12 @@ import time
  import re
  
  from .common import FileDownloader
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_error,
-)
+from ..compat import compat_urllib_error
  from ..utils import (
      ContentTooShortError,
      encodeFilename,
      sanitize_open,
+    sanitized_Request,
  )
  
  
@@ -29,8 +27,8 @@ class HttpFD(FileDownloader):
          add_headers = info_dict.get('http_headers')
          if add_headers:
              headers.update(add_headers)
-        basic_request = compat_urllib_request.Request(url, None, headers)
-        request = compat_urllib_request.Request(url, None, headers)
+        basic_request = sanitized_Request(url, None, headers)
+        request = sanitized_Request(url, None, headers)
  
          is_test = self.params.get('test', False)
  
diff --git a/youtube_dl/downloader/rtmp.py b/youtube_dl/downloader/rtmp.py

index f1d219ba97494d7a036354331aadfcb049817417..14d56db47decb9b828c9d78ff8bb91eddb5c7d8a 100644 (file)
--- a/youtube_dl/downloader/rtmp.py
+++ b/youtube_dl/downloader/rtmp.py
@@ -117,7 +117,7 @@ class RtmpFD(FileDownloader):
              return False
  
          # Download using rtmpdump. rtmpdump returns exit code 2 when
-        # the connection was interrumpted and resuming appears to be
+        # the connection was interrupted and resuming appears to be
          # possible. This is part of rtmpdump's normal usage, AFAIK.
          basic_args = [
              'rtmpdump', '--verbose', '-r', url,
diff --git a/youtube_dl/extractor/__init__.py b/youtube_dl/extractor/__init__.py

index f98e6487e467f63c5476bd7682723e8f6770773b..62f32f8c828c73a45d188e167bf5635955259f17 100644 (file)
--- a/youtube_dl/extractor/__init__.py
+++ b/youtube_dl/extractor/__init__.py
@@ -38,6 +38,7 @@ from .arte import (
  )
  from .atresplayer import AtresPlayerIE
  from .atttechchannel import ATTTechChannelIE
+from .audimedia import AudiMediaIE
  from .audiomack import AudiomackIE, AudiomackAlbumIE
  from .azubu import AzubuIE
  from .baidu import BaiduVideoIE
@@ -60,7 +61,10 @@ from .bloomberg import BloombergIE
  from .bpb import BpbIE
  from .br import BRIE
  from .breakcom import BreakIE
-from .brightcove import BrightcoveIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
  from .buzzfeed import BuzzFeedIE
  from .byutv import BYUtvIE
  from .c56 import C56IE
@@ -124,10 +128,12 @@ from .dbtv import DBTVIE
  from .dcn import DCNIE
  from .dctp import DctpTvIE
  from .deezer import DeezerPlaylistIE
+from .democracynow import DemocracynowIE
  from .dfb import DFBIE
  from .dhm import DHMIE
  from .dotsub import DotsubIE
  from .douyutv import DouyuTVIE
+from .dplay import DPlayIE
  from .dramafever import (
      DramaFeverIE,
      DramaFeverSeriesIE,
@@ -211,13 +217,15 @@ from .gfycat import GfycatIE
  from .giantbomb import GiantBombIE
  from .giga import GigaIE
  from .glide import GlideIE
-from .globo import GloboIE
+from .globo import (
+    GloboIE,
+    GloboArticleIE,
+)
  from .godtube import GodTubeIE
  from .goldenmoustache import GoldenMoustacheIE
  from .golem import GolemIE
  from .googleplus import GooglePlusIE
  from .googlesearch import GoogleSearchIE
-from .gorillavid import GorillaVidIE
  from .goshgay import GoshgayIE
  from .groupon import GrouponIE
  from .hark import HarkIE
@@ -414,7 +422,10 @@ from .nowness import (
      NownessPlaylistIE,
      NownessSeriesIE,
  )
-from .nowtv import NowTVIE
+from .nowtv import (
+    NowTVIE,
+    NowTVListIE,
+)
  from .nowvideo import NowVideoIE
  from .npo import (
      NPOIE,
@@ -452,10 +463,7 @@ from .orf import (
  from .parliamentliveuk import ParliamentLiveUKIE
  from .patreon import PatreonIE
  from .pbs import PBSIE
-from .periscope import (
-    PeriscopeIE,
-    QuickscopeIE,
-)
+from .periscope import PeriscopeIE
  from .philharmoniedeparis import PhilharmonieDeParisIE
  from .phoenix import PhoenixIE
  from .photobucket import PhotobucketIE
@@ -547,6 +555,10 @@ from .shahid import ShahidIE
  from .shared import SharedIE
  from .sharesix import ShareSixIE
  from .sina import SinaIE
+from .skynewsarabia import (
+    SkyNewsArabiaIE,
+    SkyNewsArabiaArticleIE,
+)
  from .slideshare import SlideshareIE
  from .slutload import SlutloadIE
  from .smotri import (
@@ -569,7 +581,8 @@ from .soundcloud import (
      SoundcloudIE,
      SoundcloudSetIE,
      SoundcloudUserIE,
-    SoundcloudPlaylistIE
+    SoundcloudPlaylistIE,
+    SoundcloudSearchIE
  )
  from .soundgasm import (
      SoundgasmIE,
@@ -720,7 +733,6 @@ from .vh1 import VH1IE
  from .vice import ViceIE
  from .viddler import ViddlerIE
  from .videodetective import VideoDetectiveIE
-from .videolecturesnet import VideoLecturesNetIE
  from .videofyme import VideofyMeIE
  from .videomega import VideoMegaIE
  from .videopremium import VideoPremiumIE
@@ -730,6 +742,7 @@ from .vidme import VidmeIE
  from .vidzi import VidziIE
  from .vier import VierIE, VierVideosIE
  from .viewster import ViewsterIE
+from .viidea import ViideaIE
  from .vimeo import (
      VimeoIE,
      VimeoAlbumIE,
@@ -782,6 +795,7 @@ from .wrzuta import WrzutaIE
  from .wsj import WSJIE
  from .xbef import XBefIE
  from .xboxclips import XboxClipsIE
+from .xfileshare import XFileShareIE
  from .xhamster import (
      XHamsterIE,
      XHamsterEmbedIE,
@@ -825,6 +839,7 @@ from .youtube import (
      YoutubeTruncatedIDIE,
      YoutubeTruncatedURLIE,
      YoutubeUserIE,
+    YoutubeUserPlaylistsIE,
      YoutubeWatchLaterIE,
  )
  from .zapiks import ZapiksIE
diff --git a/youtube_dl/extractor/aljazeera.py b/youtube_dl/extractor/aljazeera.py

index 184a14a4fa99632e825245170b94abab3cb68684..5b2c0dc9ac10aa826d5757f2fc75738c376219a3 100644 (file)
--- a/youtube_dl/extractor/aljazeera.py
+++ b/youtube_dl/extractor/aljazeera.py
@@ -15,7 +15,7 @@ class AlJazeeraIE(InfoExtractor):
              'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
              'uploader': 'Al Jazeera English',
          },
-        'add_ie': ['Brightcove'],
+        'add_ie': ['BrightcoveLegacy'],
          'skip': 'Not accessible from Travis CI server',
      }
  
@@ -32,5 +32,5 @@ class AlJazeeraIE(InfoExtractor):
                  'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
                  '&%40videoPlayer={0}'.format(brightcove_id)
              ),
-            'ie_key': 'Brightcove',
+            'ie_key': 'BrightcoveLegacy',
          }
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 6f465789b497a6625776c383ff699a64b0b5c346..73be6d2040b7197a94939ddbe5f3d7f81a92b750 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -14,8 +14,8 @@ from ..utils import (
      parse_duration,
      unified_strdate,
      xpath_text,
-    parse_xml,
  )
+from ..compat import compat_etree_fromstring
  
  
  class ARDMediathekIE(InfoExtractor):
@@ -161,7 +161,7 @@ class ARDMediathekIE(InfoExtractor):
              raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True)
  
          if re.search(r'[\?&]rss($|[=&])', url):
-            doc = parse_xml(webpage)
+            doc = compat_etree_fromstring(webpage.encode('utf-8'))
              if doc.tag == 'rss':
                  return GenericIE()._extract_rss(url, video_id, doc)
  
diff --git a/youtube_dl/extractor/atresplayer.py b/youtube_dl/extractor/atresplayer.py

index 29f8795d3dfe2bdae9993f9b1fd3d278cb8c3a9c..50e47ba0ae033dac996a28a216e0574174239781 100644 (file)
--- a/youtube_dl/extractor/atresplayer.py
+++ b/youtube_dl/extractor/atresplayer.py
@@ -7,11 +7,11 @@ from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
  )
  from ..utils import (
      int_or_none,
      float_or_none,
+    sanitized_Request,
      xpath_text,
      ExtractorError,
  )
@@ -63,7 +63,7 @@ class AtresPlayerIE(InfoExtractor):
              'j_password': password,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          response = self._download_webpage(
@@ -94,7 +94,7 @@ class AtresPlayerIE(InfoExtractor):
  
          formats = []
          for fmt in ['windows', 'android_tablet']:
-            request = compat_urllib_request.Request(
+            request = sanitized_Request(
                  self._URL_VIDEO_TEMPLATE.format(fmt, episode_id, timestamp_shifted, token))
              request.add_header('User-Agent', self._USER_AGENT)
  
diff --git a/youtube_dl/extractor/audimedia.py b/youtube_dl/extractor/audimedia.py

new file mode 100644 (file)

index 0000000..b0b089d
--- /dev/null
+++ b/youtube_dl/extractor/audimedia.py
@@ -0,0 +1,80 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+)
+
+
+class AudiMediaIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?audimedia\.tv/(?:en|de)/vid/(?P<id>[^/?#]+)'
+    _TEST = {
+        'url': 'https://audimedia.tv/en/vid/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test',
+        'md5': '79a8b71c46d49042609795ab59779b66',
+        'info_dict': {
+            'id': '1564',
+            'ext': 'mp4',
+            'title': '60 Seconds of Audi Sport 104/2015 - WEC Bahrain, Rookie Test',
+            'description': 'md5:60e5d30a78ced725f7b8d34370762941',
+            'upload_date': '20151124',
+            'timestamp': 1448354940,
+            'duration': 74022,
+            'view_count': int,
+        }
+    }
+    # extracted from https://audimedia.tv/assets/embed/embedded-player.js (dataSourceAuthToken)
+    _AUTH_TOKEN = 'e25b42847dba18c6c8816d5d8ce94c326e06823ebf0859ed164b3ba169be97f2'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        raw_payload = self._search_regex(r'<script[^>]+class="amtv-embed"[^>]+id="([^"]+)"', webpage, 'raw payload')
+        _, stage_mode, video_id, lang = raw_payload.split('-')
+
+        # TODO: handle s and e stage_mode (live streams and ended live streams)
+        if stage_mode not in ('s', 'e'):
+            request = sanitized_Request(
+                'https://audimedia.tv/api/video/v1/videos/%s?embed[]=video_versions&embed[]=thumbnail_image&where[content_language_iso]=%s' % (video_id, lang),
+                headers={'X-Auth-Token': self._AUTH_TOKEN})
+            json_data = self._download_json(request, video_id)['results']
+            formats = []
+
+            stream_url_hls = json_data.get('stream_url_hls')
+            if stream_url_hls:
+                m3u8_formats = self._extract_m3u8_formats(stream_url_hls, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
+
+            stream_url_hds = json_data.get('stream_url_hds')
+            if stream_url_hds:
+                f4m_formats = self._extract_f4m_formats(json_data.get('stream_url_hds') + '?hdcore=3.4.0', video_id, -1, f4m_id='hds', fatal=False)
+                if f4m_formats:
+                    formats.extend(f4m_formats)
+
+            for video_version in json_data.get('video_versions'):
+                video_version_url = video_version.get('download_url') or video_version.get('stream_url')
+                if not video_version_url:
+                    continue
+                formats.append({
+                    'url': video_version_url,
+                    'width': int_or_none(video_version.get('width')),
+                    'height': int_or_none(video_version.get('height')),
+                    'abr': int_or_none(video_version.get('audio_bitrate')),
+                    'vbr': int_or_none(video_version.get('video_bitrate')),
+                })
+            self._sort_formats(formats)
+
+            return {
+                'id': video_id,
+                'title': json_data['title'],
+                'description': json_data.get('subtitle'),
+                'thumbnail': json_data.get('thumbnail_image', {}).get('file'),
+                'timestamp': parse_iso8601(json_data.get('publication_date')),
+                'duration': int_or_none(json_data.get('duration')),
+                'view_count': int_or_none(json_data.get('view_count')),
+                'formats': formats,
+            }
diff --git a/youtube_dl/extractor/bambuser.py b/youtube_dl/extractor/bambuser.py

index 8dff1d6e377c0c246cfc958821b1d18cae4b2b64..da986e06350d047f9a70dc4f42bfeb27b04afd0e 100644 (file)
--- a/youtube_dl/extractor/bambuser.py
+++ b/youtube_dl/extractor/bambuser.py
@@ -6,13 +6,13 @@ import itertools
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_request,
      compat_str,
  )
  from ..utils import (
      ExtractorError,
      int_or_none,
      float_or_none,
+    sanitized_Request,
  )
  
  
@@ -57,7 +57,7 @@ class BambuserIE(InfoExtractor):
              'pass': password,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          request.add_header('Referer', self._LOGIN_URL)
          response = self._download_webpage(
@@ -126,7 +126,7 @@ class BambuserChannelIE(InfoExtractor):
                  '&sort=created&access_mode=0%2C1%2C2&limit={count}'
                  '&method=broadcast&format=json&vid_older_than={last}'
              ).format(user=user, count=self._STEP, last=last_id)
-            req = compat_urllib_request.Request(req_url)
+            req = sanitized_Request(req_url)
              # Without setting this header, we wouldn't get any result
              req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
              data = self._download_json(
diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py

index 2cdce1eb9568923f91dce28313b61acd7abcd9bb..7fb80aa38fc39825fd7338b40fc994ccc0c8a185 100644 (file)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -2,7 +2,6 @@
  from __future__ import unicode_literals
  
  import re
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..utils import (
@@ -14,18 +13,22 @@ from ..utils import (
      remove_end,
      unescapeHTML,
  )
-from ..compat import compat_HTTPError
+from ..compat import (
+    compat_etree_fromstring,
+    compat_HTTPError,
+)
  
  
  class BBCCoUkIE(InfoExtractor):
      IE_NAME = 'bbc.co.uk'
      IE_DESC = 'BBC iPlayer'
-    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:programmes/(?!articles/)|iplayer(?:/[^/]+)?/(?:episode/|playlist/))|music/clips[/#])(?P<id>[\da-z]{8})'
+    _ID_REGEX = r'[pb][\da-z]{7}'
+    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:programmes/(?!articles/)|iplayer(?:/[^/]+)?/(?:episode/|playlist/))|music/clips[/#])(?P<id>%s)' % _ID_REGEX
  
      _MEDIASELECTOR_URLS = [
          # Provides HQ HLS streams with even better quality that pc mediaset but fails
          # with geolocation in some cases when it's even not geo restricted at all (e.g.
-        # http://www.bbc.co.uk/programmes/b06bp7lf)
+        # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
          'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
          'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
      ]
@@ -332,7 +335,7 @@ class BBCCoUkIE(InfoExtractor):
                  return self._download_media_selector_url(
                      mediaselector_url % programme_id, programme_id)
              except BBCCoUkIE.MediaSelectionError as e:
-                if e.id in ('notukerror', 'geolocation'):
+                if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
                      last_exception = e
                      continue
                  self._raise_extractor_error(e)
@@ -343,8 +346,8 @@ class BBCCoUkIE(InfoExtractor):
              media_selection = self._download_xml(
                  url, programme_id, 'Downloading media selection XML')
          except ExtractorError as ee:
-            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
-                media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().decode('utf-8'))
+            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code in (403, 404):
+                media_selection = compat_etree_fromstring(ee.cause.read().decode('utf-8'))
              else:
                  raise
          return self._process_media_selector(media_selection, programme_id)
@@ -463,7 +466,7 @@ class BBCCoUkIE(InfoExtractor):
  
          if not programme_id:
              programme_id = self._search_regex(
-                r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
+                r'"vpid"\s*:\s*"(%s)"' % self._ID_REGEX, webpage, 'vpid', fatal=False, default=None)
  
          if programme_id:
              formats, subtitles = self._download_media_selector(programme_id)
@@ -778,8 +781,9 @@ class BBCIE(BBCCoUkIE):
  
          # single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
          programme_id = self._search_regex(
-            [r'data-video-player-vpid="([\da-z]{8})"',
-             r'<param[^>]+name="externalIdentifier"[^>]+value="([\da-z]{8})"'],
+            [r'data-video-player-vpid="(%s)"' % self._ID_REGEX,
+             r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
+             r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
              webpage, 'vpid', default=None)
  
          if programme_id:
@@ -814,7 +818,7 @@ class BBCIE(BBCCoUkIE):
  
          # Multiple video article (e.g.
          # http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460)
-        EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+[\da-z]{8}(?:\b[^"]+)?'
+        EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+%s(?:\b[^"]+)?' % self._ID_REGEX
          entries = []
          for match in extract_all(r'new\s+SMP\(({.+?})\)'):
              embed_url = match.get('playerSettings', {}).get('externalEmbedUrl')
diff --git a/youtube_dl/extractor/beeg.py b/youtube_dl/extractor/beeg.py

index e6c928699c1e2f66de4af8a0fa623f149918c378..e63c2ac004f444ef642673121c5e25abbe089320 100644 (file)
--- a/youtube_dl/extractor/beeg.py
+++ b/youtube_dl/extractor/beeg.py
@@ -1,6 +1,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_chr,
+    compat_ord,
+    compat_urllib_parse_unquote,
+)
  from ..utils import (
      int_or_none,
      parse_iso8601,
@@ -29,16 +34,35 @@ class BeegIE(InfoExtractor):
          video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://beeg.com/api/v1/video/%s' % video_id, video_id)
+            'http://beeg.com/api/v3/video/%s' % video_id, video_id)
+
+        def decrypt_key(key):
+            # Reverse engineered from http://static.beeg.com/cpl/1067.js
+            a = '8RPUUCS35ZWp3ADnKcSmpH71ZusrROo'
+            e = compat_urllib_parse_unquote(key)
+            return ''.join([
+                compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 25)
+                for n in range(len(e))])
+
+        def decrypt_url(encrypted_url):
+            encrypted_url = self._proto_relative_url(
+                encrypted_url.replace('{DATA_MARKERS}', ''), 'http:')
+            key = self._search_regex(
+                r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
+            if not key:
+                return encrypted_url
+            return encrypted_url.replace(key, decrypt_key(key))
  
          formats = []
          for format_id, video_url in video.items():
+            if not video_url:
+                continue
              height = self._search_regex(
                  r'^(\d+)[pP]$', format_id, 'height', default=None)
              if not height:
                  continue
              formats.append({
-                'url': self._proto_relative_url(video_url.replace('{DATA_MARKERS}', ''), 'http:'),
+                'url': decrypt_url(video_url),
                  'format_id': format_id,
                  'height': int(height),
              })
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index ecc17ebebca9e1819fc804f37d48dcceb80c44c5..59beb11bce71bfc6ef9b036ad123dc44e872d0be 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -2,141 +2,109 @@
  from __future__ import unicode_literals
  
  import re
-import itertools
-import json
-import xml.etree.ElementTree as ET
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
-    unified_strdate,
+    unescapeHTML,
      ExtractorError,
+    xpath_text,
  )
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
+    _VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
  
      _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
          'md5': '2c301e4dab317596e837c3e7633e7d86',
          'info_dict': {
-            'id': '1074402_part1',
+            'id': '1554319',
              'ext': 'flv',
              'title': '【金坷垃】金泡沫',
-            'duration': 308,
+            'duration': 308313,
              'upload_date': '20140420',
              'thumbnail': 're:^https?://.+\.jpg',
+            'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
+            'timestamp': 1397983878,
+            'uploader': '菊子桑',
          },
      }, {
          'url': 'http://www.bilibili.com/video/av1041170/',
          'info_dict': {
              'id': '1041170',
              'title': '【BD1080P】刀语【诸神&异域】',
+            'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦！~',
+            'uploader': '枫叶逝去',
+            'timestamp': 1396501299,
          },
          'playlist_count': 9,
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        if '(此视频不存在或被删除)' in webpage:
-            raise ExtractorError(
-                'The video does not exist or was deleted', expected=True)
-
-        if '>你没有权限浏览！ 由于版权相关问题 我们不对您所在的地区提供服务<' in webpage:
-            raise ExtractorError(
-                'The video is not available in your region due to copyright reasons',
-                expected=True)
-
-        video_code = self._search_regex(
-            r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')
-
-        title = self._html_search_meta(
-            'media:title', video_code, 'title', fatal=True)
-        duration_str = self._html_search_meta(
-            'duration', video_code, 'duration')
-        if duration_str is None:
-            duration = None
-        else:
-            duration_mobj = re.match(
-                r'^T(?:(?P<hours>[0-9]+)H)?(?P<minutes>[0-9]+)M(?P<seconds>[0-9]+)S$',
-                duration_str)
-            duration = (
-                int_or_none(duration_mobj.group('hours'), default=0) * 3600 +
-                int(duration_mobj.group('minutes')) * 60 +
-                int(duration_mobj.group('seconds')))
-        upload_date = unified_strdate(self._html_search_meta(
-            'uploadDate', video_code, fatal=False))
-        thumbnail = self._html_search_meta(
-            'thumbnailUrl', video_code, 'thumbnail', fatal=False)
-
-        cid = self._search_regex(r'cid=(\d+)', webpage, 'cid')
-
-        entries = []
-
-        lq_page = self._download_webpage(
-            'http://interface.bilibili.com/v_cdn_play?appkey=1&cid=%s' % cid,
-            video_id,
-            note='Downloading LQ video info'
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        page_num = mobj.group('page_num') or '1'
+
+        view_data = self._download_json(
+            'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
+            video_id)
+        if 'error' in view_data:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
+
+        cid = view_data['cid']
+        title = unescapeHTML(view_data['title'])
+
+        doc = self._download_xml(
+            'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid,
+            cid,
+            'Downloading page %s/%s' % (page_num, view_data['pages'])
          )
-        try:
-            err_info = json.loads(lq_page)
-            raise ExtractorError(
-                'BiliBili said: ' + err_info['error_text'], expected=True)
-        except ValueError:
-            pass
  
-        lq_doc = ET.fromstring(lq_page)
-        lq_durls = lq_doc.findall('./durl')
+        if xpath_text(doc, './result') == 'error':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True)
  
-        hq_doc = self._download_xml(
-            'http://interface.bilibili.com/playurl?appkey=1&cid=%s' % cid,
-            video_id,
-            note='Downloading HQ video info',
-            fatal=False,
-        )
-        if hq_doc is not False:
-            hq_durls = hq_doc.findall('./durl')
-            assert len(lq_durls) == len(hq_durls)
-        else:
-            hq_durls = itertools.repeat(None)
+        entries = []
  
-        i = 1
-        for lq_durl, hq_durl in zip(lq_durls, hq_durls):
+        for durl in doc.findall('./durl'):
+            size = xpath_text(durl, ['./filesize', './size'])
              formats = [{
-                'format_id': 'lq',
-                'quality': 1,
-                'url': lq_durl.find('./url').text,
-                'filesize': int_or_none(
-                    lq_durl.find('./size'), get_attr='text'),
+                'url': durl.find('./url').text,
+                'filesize': int_or_none(size),
+                'ext': 'flv',
              }]
-            if hq_durl is not None:
-                formats.append({
-                    'format_id': 'hq',
-                    'quality': 2,
-                    'ext': 'flv',
-                    'url': hq_durl.find('./url').text,
-                    'filesize': int_or_none(
-                        hq_durl.find('./size'), get_attr='text'),
-                })
-            self._sort_formats(formats)
+            backup_urls = durl.find('./backup_url')
+            if backup_urls is not None:
+                for backup_url in backup_urls.findall('./url'):
+                    formats.append({'url': backup_url.text})
+            formats.reverse()
  
              entries.append({
-                'id': '%s_part%d' % (video_id, i),
+                'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
                  'title': title,
+                'duration': int_or_none(xpath_text(durl, './length'), 1000),
                  'formats': formats,
-                'duration': duration,
-                'upload_date': upload_date,
-                'thumbnail': thumbnail,
              })
  
-            i += 1
-
-        return {
-            '_type': 'multi_video',
-            'entries': entries,
-            'id': video_id,
-            'title': title
+        info = {
+            'id': compat_str(cid),
+            'title': title,
+            'description': view_data.get('description'),
+            'thumbnail': view_data.get('pic'),
+            'uploader': view_data.get('author'),
+            'timestamp': int_or_none(view_data.get('created')),
+            'view_count': int_or_none(view_data.get('play')),
+            'duration': int_or_none(xpath_text(doc, './timelength')),
          }
+
+        if len(entries) == 1:
+            entries[0].update(info)
+            return entries[0]
+        else:
+            info.update({
+                '_type': 'multi_video',
+                'id': video_id,
+                'entries': entries,
+            })
+            return info
diff --git a/youtube_dl/extractor/bliptv.py b/youtube_dl/extractor/bliptv.py

index c3296283d0dfd1dd753b6e31082e48361414baa3..35375f7b1ead97c90d01cf2356c30017fb0f47dc 100644 (file)
--- a/youtube_dl/extractor/bliptv.py
+++ b/youtube_dl/extractor/bliptv.py
@@ -4,14 +4,12 @@ import re
  
  from .common import InfoExtractor
  
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      clean_html,
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
      unescapeHTML,
      xpath_text,
      xpath_with_ns,
@@ -219,7 +217,7 @@ class BlipTVIE(InfoExtractor):
          for lang, url in subtitles_urls.items():
              # For some weird reason, blip.tv serves a video instead of subtitles
              # when we request with a common UA
-            req = compat_urllib_request.Request(url)
+            req = sanitized_Request(url)
              req.add_header('User-Agent', 'youtube-dl')
              subtitles[lang] = [{
                  # The extension is 'srt' but it's actually an 'ass' file
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 0dca29b712c79a27fb621f094a6f64ab503ba3df..ebeef8f2ab19e9dfcf6cdbc9c30616958344876d 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -6,9 +6,9 @@ from .common import InfoExtractor
  
  
  class BloombergIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.bloomberg\.com/news/videos/[^/]+/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?bloomberg\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2',
          # The md5 checksum changes
          'info_dict': {
@@ -17,22 +17,39 @@ class BloombergIE(InfoExtractor):
              'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
              'description': 'md5:a8ba0302912d03d246979735c17d2761',
          },
-    }
+    }, {
+        'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.bloomberg.com/politics/videos/2015-11-25/karl-rove-on-jeb-bush-s-struggles-stopping-trump',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          name = self._match_id(url)
          webpage = self._download_webpage(url, name)
-        video_id = self._search_regex(r'"bmmrId":"(.+?)"', webpage, 'id')
+        video_id = self._search_regex(
+            r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
+            webpage, 'id', group='url')
          title = re.sub(': Video$', '', self._og_search_title(webpage))
  
          embed_info = self._download_json(
              'http://www.bloomberg.com/api/embed?id=%s' % video_id, video_id)
          formats = []
          for stream in embed_info['streams']:
-            if stream["muxing_format"] == "TS":
-                formats.extend(self._extract_m3u8_formats(stream['url'], video_id))
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            if stream['muxing_format'] == 'TS':
+                m3u8_formats = self._extract_m3u8_formats(
+                    stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
              else:
-                formats.extend(self._extract_f4m_formats(stream['url'], video_id))
+                f4m_formats = self._extract_f4m_formats(
+                    stream_url, video_id, f4m_id='hds', fatal=False)
+                if f4m_formats:
+                    formats.extend(f4m_formats)
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 4721c22930f15cb51d0daaac294eeeca3a329092..f5ebae1e68e456c158476a02d9df49feac97e02a 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -3,15 +3,14 @@ from __future__ import unicode_literals
  
  import re
  import json
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_etree_fromstring,
      compat_parse_qs,
      compat_str,
      compat_urllib_parse,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
      compat_xml_parse_error,
  )
@@ -20,12 +19,18 @@ from ..utils import (
      ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
+    float_or_none,
+    js_to_json,
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
      unescapeHTML,
      unsmuggle_url,
  )
  
  
-class BrightcoveIE(InfoExtractor):
+class BrightcoveLegacyIE(InfoExtractor):
+    IE_NAME = 'brightcove:legacy'
      _VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
      _FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
  
@@ -119,7 +124,7 @@ class BrightcoveIE(InfoExtractor):
          object_str = fix_xml_ampersands(object_str)
  
          try:
-            object_doc = xml.etree.ElementTree.fromstring(object_str.encode('utf-8'))
+            object_doc = compat_etree_fromstring(object_str.encode('utf-8'))
          except compat_xml_parse_error:
              return
  
@@ -245,7 +250,7 @@ class BrightcoveIE(InfoExtractor):
  
      def _get_video_info(self, video_id, query_str, query, referer=None):
          request_url = self._FEDERATED_URL_TEMPLATE % query_str
-        req = compat_urllib_request.Request(request_url)
+        req = sanitized_Request(request_url)
          linkBase = query.get('linkBaseURL')
          if linkBase is not None:
              referer = linkBase[0]
@@ -346,3 +351,172 @@ class BrightcoveIE(InfoExtractor):
          if 'url' not in info and not info.get('formats'):
              raise ExtractorError('Unable to extract video url for %s' % info['id'])
          return info
+
+
+class BrightcoveNewIE(InfoExtractor):
+    IE_NAME = 'brightcove:new'
+    _VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+)'
+    _TESTS = [{
+        'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
+        'md5': 'c8100925723840d4b0d243f7025703be',
+        'info_dict': {
+            'id': '4463358922001',
+            'ext': 'mp4',
+            'title': 'Meet the man behind Popcorn Time',
+            'description': 'md5:eac376a4fe366edc70279bfb681aea16',
+            'duration': 165.768,
+            'timestamp': 1441391203,
+            'upload_date': '20150904',
+            'uploader_id': '929656772001',
+            'formats': 'mincount:22',
+        },
+    }, {
+        # with rtmp streams
+        'url': 'http://players.brightcove.net/4036320279001/5d112ed9-283f-485f-a7f9-33f42e8bc042_default/index.html?videoId=4279049078001',
+        'info_dict': {
+            'id': '4279049078001',
+            'ext': 'mp4',
+            'title': 'Titansgrave: Chapter 0',
+            'description': 'Titansgrave: Chapter 0',
+            'duration': 1242.058,
+            'timestamp': 1433556729,
+            'upload_date': '20150606',
+            'uploader_id': '4036320279001',
+            'formats': 'mincount:41',
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        # Reference:
+        # 1. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideoiniframe
+        # 2. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideousingjavascript)
+        # 3. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/embed-in-page.html
+
+        entries = []
+
+        # Look for iframe embeds [1]
+        for _, url in re.findall(
+                r'<iframe[^>]+src=(["\'])((?:https?:)//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
+            entries.append(url)
+
+        # Look for embed_in_page embeds [2]
+        for video_id, account_id, player_id, embed in re.findall(
+                # According to examples from [3] it's unclear whether video id
+                # may be optional and what to do when it is
+                r'''(?sx)
+                    <video[^>]+
+                        data-video-id=["\'](\d+)["\'][^>]*>.*?
+                    </video>.*?
+                    <script[^>]+
+                        src=["\'](?:https?:)?//players\.brightcove\.net/
+                        (\d+)/([\da-f-]+)_([^/]+)/index\.min\.js
+                ''', webpage):
+            entries.append(
+                'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
+                % (account_id, player_id, embed, video_id))
+
+        return entries
+
+    def _real_extract(self, url):
+        account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(
+            'http://players.brightcove.net/%s/%s_%s/index.min.js'
+            % (account_id, player_id, embed), video_id)
+
+        policy_key = None
+
+        catalog = self._search_regex(
+            r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
+        if catalog:
+            catalog = self._parse_json(
+                js_to_json(catalog), video_id, fatal=False)
+            if catalog:
+                policy_key = catalog.get('policyKey')
+
+        if not policy_key:
+            policy_key = self._search_regex(
+                r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+                webpage, 'policy key', group='pk')
+
+        req = sanitized_Request(
+            'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
+            % (account_id, video_id),
+            headers={'Accept': 'application/json;pk=%s' % policy_key})
+        json_data = self._download_json(req, video_id)
+
+        title = json_data['name']
+
+        formats = []
+        for source in json_data.get('sources', []):
+            source_type = source.get('type')
+            src = source.get('src')
+            if source_type == 'application/x-mpegURL':
+                if not src:
+                    continue
+                m3u8_formats = self._extract_m3u8_formats(
+                    src, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
+            else:
+                streaming_src = source.get('streaming_src')
+                stream_name, app_name = source.get('stream_name'), source.get('app_name')
+                if not src and not streaming_src and (not stream_name or not app_name):
+                    continue
+                tbr = float_or_none(source.get('avg_bitrate'), 1000)
+                height = int_or_none(source.get('height'))
+                f = {
+                    'tbr': tbr,
+                    'width': int_or_none(source.get('width')),
+                    'height': height,
+                    'filesize': int_or_none(source.get('size')),
+                    'container': source.get('container'),
+                    'vcodec': source.get('codec'),
+                    'ext': source.get('container').lower(),
+                }
+
+                def build_format_id(kind):
+                    format_id = kind
+                    if tbr:
+                        format_id += '-%dk' % int(tbr)
+                    if height:
+                        format_id += '-%dp' % height
+                    return format_id
+
+                if src or streaming_src:
+                    f.update({
+                        'url': src or streaming_src,
+                        'format_id': build_format_id('http' if src else 'http-streaming'),
+                        'preference': 2 if src else 1,
+                    })
+                else:
+                    f.update({
+                        'url': app_name,
+                        'play_path': stream_name,
+                        'format_id': build_format_id('rtmp'),
+                    })
+                formats.append(f)
+        self._sort_formats(formats)
+
+        description = json_data.get('description')
+        thumbnail = json_data.get('thumbnail')
+        timestamp = parse_iso8601(json_data.get('published_at'))
+        duration = float_or_none(json_data.get('duration'), 1000)
+        tags = json_data.get('tags', [])
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'uploader_id': account_id,
+            'formats': formats,
+            'tags': tags,
+        }
diff --git a/youtube_dl/extractor/cbs.py b/youtube_dl/extractor/cbs.py

index 75fffb1563ae9f95bf862ad156111b6962a8429e..40d07ab181ff8462599f89e520a1ebc1711fe1b3 100644 (file)
--- a/youtube_dl/extractor/cbs.py
+++ b/youtube_dl/extractor/cbs.py
@@ -1,6 +1,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import (
+    sanitized_Request,
+    smuggle_url,
+)
  
  
  class CBSIE(InfoExtractor):
@@ -46,13 +50,19 @@ class CBSIE(InfoExtractor):
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
+        request = sanitized_Request(url)
+        # Android UA is served with higher quality (720p) streams (see
+        # https://github.com/rg3/youtube-dl/issues/7490)
+        request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
+        webpage = self._download_webpage(request, display_id)
          real_id = self._search_regex(
              [r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
              webpage, 'real video ID')
          return {
              '_type': 'url_transparent',
              'ie_key': 'ThePlatform',
-            'url': 'theplatform:%s' % real_id,
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
+                {'force_smil_url': True}),
              'display_id': display_id,
          }
diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py

index 52e61d85b3a20bc939771cee2b94188d32f16d17..f9a64a0a2ec77ec75daeb3024fadcbbc80e6e9e8 100644 (file)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@@ -67,9 +67,12 @@ class CBSNewsIE(InfoExtractor):
                  'format_id': format_id,
              }
              if uri.startswith('rtmp'):
+                play_path = re.sub(
+                    r'{slistFilePath}', '',
+                    uri.split('<break>')[-1].split('{break}')[-1])
                  fmt.update({
                      'app': 'ondemand?auth=cbs',
-                    'play_path': 'mp4:' + uri.split('<break>')[-1],
+                    'play_path': 'mp4:' + play_path,
                      'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
                      'page_url': 'http://www.cbsnews.com',
                      'ext': 'flv',
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index e857e66f4c07a11b81ba92382a11216fa1a7d50e..6f7b2a70de302aa23a80e24da4126d237e653798 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -5,7 +5,6 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_request,
      compat_urllib_parse,
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
@@ -13,6 +12,7 @@ from ..compat import (
  from ..utils import (
      ExtractorError,
      float_or_none,
+    sanitized_Request,
  )
  
  
@@ -100,7 +100,7 @@ class CeskaTelevizeIE(InfoExtractor):
              'requestSource': 'iVysilani',
          }
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
              data=compat_urllib_parse.urlencode(data))
  
@@ -115,7 +115,7 @@ class CeskaTelevizeIE(InfoExtractor):
          if playlist_url == 'error_region':
              raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
  
-        req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
+        req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
          req.add_header('Referer', url)
  
          playlist_title = self._og_search_title(webpage)
diff --git a/youtube_dl/extractor/cmt.py b/youtube_dl/extractor/cmt.py

index e96c59f718a5dc412a2ce7eaa962d6bdca98e187..f1311b14f8f1c572c9647bcd36de3f19de676373 100644 (file)
--- a/youtube_dl/extractor/cmt.py
+++ b/youtube_dl/extractor/cmt.py
@@ -4,7 +4,7 @@ from .mtv import MTVIE
  
  class CMTIE(MTVIE):
      IE_NAME = 'cmt.com'
-    _VALID_URL = r'https?://www\.cmt\.com/videos/.+?/(?P<videoid>[^/]+)\.jhtml'
+    _VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
      _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
  
      _TESTS = [{
@@ -16,4 +16,7 @@ class CMTIE(MTVIE):
              'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
              'description': 'Blame It All On My Roots',
          },
+    }, {
+        'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
+        'only_matching': True,
      }]
diff --git a/youtube_dl/extractor/collegerama.py b/youtube_dl/extractor/collegerama.py

index fedd48490c4ef1169d9620c7b3a7436c6a9f6d1c..40667a0f12621c72732d51f24eb733365b403bd4 100644 (file)
--- a/youtube_dl/extractor/collegerama.py
+++ b/youtube_dl/extractor/collegerama.py
@@ -3,10 +3,10 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      float_or_none,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -52,7 +52,7 @@ class CollegeRamaIE(InfoExtractor):
              }
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
              json.dumps(player_options_request))
          request.add_header('Content-Type', 'application/json')
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 10c0d5d1f91f49846176d28a0edfb8f3cfa61801..6ab2d68d6f3137ff7a9b4b201102959a7e732e9a 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -10,7 +10,6 @@ import re
  import socket
  import sys
  import time
-import xml.etree.ElementTree
  
  from ..compat import (
      compat_cookiejar,
@@ -20,9 +19,9 @@ from ..compat import (
      compat_urllib_error,
      compat_urllib_parse,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
      compat_str,
+    compat_etree_fromstring,
  )
  from ..utils import (
      NO_DEFAULT,
@@ -37,6 +36,7 @@ from ..utils import (
      int_or_none,
      RegexNotFoundError,
      sanitize_filename,
+    sanitized_Request,
      unescapeHTML,
      unified_strdate,
      url_basename,
@@ -167,7 +167,7 @@ class InfoExtractor(object):
                      "ext" will be calculated from URL if missing
      automatic_captions: Like 'subtitles', used by the YoutubeIE for
                      automatically generated captions
-    duration:       Length of the video in seconds, as an integer.
+    duration:       Length of the video in seconds, as an integer or float.
      view_count:     How many users have watched the video on the platform.
      like_count:     Number of positive ratings of the video
      dislike_count:  Number of negative ratings of the video
@@ -310,11 +310,11 @@ class InfoExtractor(object):
      @classmethod
      def ie_key(cls):
          """A string for getting the InfoExtractor with get_info_extractor"""
-        return cls.__name__[:-2]
+        return compat_str(cls.__name__[:-2])
  
      @property
      def IE_NAME(self):
-        return type(self).__name__[:-2]
+        return compat_str(type(self).__name__[:-2])
  
      def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
          """ Returns the response handle """
@@ -461,7 +461,7 @@ class InfoExtractor(object):
              return xml_string
          if transform_source:
              xml_string = transform_source(xml_string)
-        return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
+        return compat_etree_fromstring(xml_string.encode('utf-8'))
  
      def _download_json(self, url_or_request, video_id,
                         note='Downloading JSON metadata',
@@ -891,6 +891,11 @@ class InfoExtractor(object):
          if not media_nodes:
              manifest_version = '2.0'
              media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
+        base_url = xpath_text(
+            manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
+            'base URL', default=None)
+        if base_url:
+            base_url = base_url.strip()
          for i, media_el in enumerate(media_nodes):
              if manifest_version == '2.0':
                  media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
@@ -898,7 +903,7 @@ class InfoExtractor(object):
                      continue
                  manifest_url = (
                      media_url if media_url.startswith('http://') or media_url.startswith('https://')
-                    else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
+                    else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url))
                  # If media_url is itself a f4m manifest do the recursive extraction
                  # since bitrates in parent manifest (this one) and media_url manifest
                  # may differ leading to inability to resolve the format by requested
@@ -943,13 +948,14 @@ class InfoExtractor(object):
              if re.match(r'^https?://', u)
              else compat_urlparse.urljoin(m3u8_url, u))
  
-        m3u8_doc, urlh = self._download_webpage_handle(
+        res = self._download_webpage_handle(
              m3u8_url, video_id,
              note=note or 'Downloading m3u8 information',
              errnote=errnote or 'Failed to download m3u8 information',
              fatal=fatal)
-        if m3u8_doc is False:
-            return m3u8_doc
+        if res is False:
+            return res
+        m3u8_doc, urlh = res
          m3u8_url = urlh.geturl()
          last_info = None
          last_media = None
@@ -1279,7 +1285,7 @@ class InfoExtractor(object):
  
      def _get_cookies(self, url):
          """ Return a compat_cookies.SimpleCookie with the cookies for the url """
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          self._downloader.cookiejar.add_cookie_header(req)
          return compat_cookies.SimpleCookie(req.get_header('Cookie'))
  
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index f8ce10111df8078638d9c56b837f354b2345ad73..00d943f7747d6252edaae9ba8f9c2dc6efedc015 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -5,12 +5,12 @@ import re
  import json
  import base64
  import zlib
-import xml.etree.ElementTree
  
  from hashlib import sha1
  from math import pow, sqrt, floor
  from .common import InfoExtractor
  from ..compat import (
+    compat_etree_fromstring,
      compat_urllib_parse,
      compat_urllib_parse_unquote,
      compat_urllib_request,
@@ -21,7 +21,9 @@ from ..utils import (
      bytes_to_intlist,
      intlist_to_bytes,
      int_or_none,
+    lowercase_escape,
      remove_end,
+    sanitized_Request,
      unified_strdate,
      urlencode_postdata,
      xpath_text,
@@ -45,7 +47,7 @@ class CrunchyrollBaseIE(InfoExtractor):
              'name': username,
              'password': password,
          })
-        login_request = compat_urllib_request.Request(login_url, data)
+        login_request = sanitized_Request(login_url, data)
          login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self._download_webpage(login_request, None, False, 'Wrong login info')
  
@@ -54,7 +56,7 @@ class CrunchyrollBaseIE(InfoExtractor):
  
      def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None):
          request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
-                   else compat_urllib_request.Request(url_or_request))
+                   else sanitized_Request(url_or_request))
          # Accept-Language must be set explicitly to accept any language to avoid issues
          # similar to https://github.com/rg3/youtube-dl/issues/6797.
          # Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
@@ -104,7 +106,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              'id': '589804',
              'ext': 'flv',
              'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11',
-            'description': 'md5:fe2743efedb49d279552926d0bd0cd9e',
+            'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'Danny Choo Network',
              'upload_date': '20120213',
@@ -234,7 +236,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          return output
  
      def _extract_subtitles(self, subtitle):
-        sub_root = xml.etree.ElementTree.fromstring(subtitle)
+        sub_root = compat_etree_fromstring(subtitle)
          return [{
              'ext': 'srt',
              'data': self._convert_subtitles_to_srt(sub_root),
@@ -287,11 +289,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          if 'To view this, please log in to verify you are 18 or older.' in webpage:
              self.raise_login_required()
  
-        video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, 'video_title', flags=re.DOTALL)
+        video_title = self._html_search_regex(
+            r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
+            webpage, 'video_title')
          video_title = re.sub(r' {2,}', ' ', video_title)
-        video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, 'video_description', default='')
-        if not video_description:
-            video_description = None
+        video_description = self._html_search_regex(
+            r'<script[^>]*>\s*.+?\[media_id=%s\].+?"description"\s*:\s*"([^"]+)' % video_id,
+            webpage, 'description', default=None)
+        if video_description:
+            video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
          video_upload_date = self._html_search_regex(
              [r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
              webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
@@ -302,7 +308,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
              'video_uploader', fatal=False)
  
          playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
-        playerdata_req = compat_urllib_request.Request(playerdata_url)
+        playerdata_req = sanitized_Request(playerdata_url)
          playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
          playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
          playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
@@ -314,7 +320,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
              stream_quality, stream_format = self._FORMAT_IDS[fmt]
              video_format = fmt + 'p'
-            streamdata_req = compat_urllib_request.Request(
+            streamdata_req = sanitized_Request(
                  'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
                  % (stream_id, stream_format, stream_quality),
                  compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index fbefd37d09a98bb19c82b4c09b7b08c99d147d35..7b685d157ddc07de54d59ba52cfc7db0940f6eb2 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -9,6 +9,7 @@ from ..utils import (
      find_xpath_attr,
      smuggle_url,
      determine_ext,
+    ExtractorError,
  )
  from .senateisvp import SenateISVPIE
  
@@ -18,33 +19,32 @@ class CSpanIE(InfoExtractor):
      IE_DESC = 'C-SPAN'
      _TESTS = [{
          'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
-        'md5': '8e44ce11f0f725527daccc453f553eb0',
+        'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
          'info_dict': {
              'id': '315139',
              'ext': 'mp4',
              'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
-            'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
+            'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
          },
          'skip': 'Regularly fails on travis, for unknown reasons',
      }, {
          'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
-        # For whatever reason, the served video alternates between
-        # two different ones
+        'md5': '8e5fbfabe6ad0f89f3012a7943c1287b',
          'info_dict': {
-            'id': '340723',
+            'id': 'c4486943',
              'ext': 'mp4',
-            'title': 'International Health Care Models',
+            'title': 'CSPAN - International Health Care Models',
              'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
          }
      }, {
          'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
-        'md5': '446562a736c6bf97118e389433ed88d4',
+        'md5': '2ae5051559169baadba13fc35345ae74',
          'info_dict': {
              'id': '342759',
              'ext': 'mp4',
              'title': 'General Motors Ignition Switch Recall',
              'duration': 14848,
-            'description': 'md5:70c7c3b8fa63fa60d42772440596034c'
+            'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
          },
      }, {
          # Video from senate.gov
@@ -57,67 +57,77 @@ class CSpanIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        page_id = mobj.group('id')
-        webpage = self._download_webpage(url, page_id)
-        video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        matches = re.search(r'data-(prog|clip)id=\'([0-9]+)\'', webpage)
+        if matches:
+            video_type, video_id = matches.groups()
+            if video_type == 'prog':
+                video_type = 'program'
+        else:
+            senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
+            if senate_isvp_url:
+                title = self._og_search_title(webpage)
+                surl = smuggle_url(senate_isvp_url, {'force_title': title})
+                return self.url_result(surl, 'SenateISVP', video_id, title)
  
-        description = self._html_search_regex(
-            [
-                # The full description
-                r'<div class=\'expandable\'>(.*?)<a href=\'#\'',
-                # If the description is small enough the other div is not
-                # present, otherwise this is a stripped version
-                r'<p class=\'initial\'>(.*?)</p>'
-            ],
-            webpage, 'description', flags=re.DOTALL, default=None)
+        def get_text_attr(d, attr):
+            return d.get(attr, {}).get('#text')
  
-        info_url = 'http://c-spanvideo.org/videoLibrary/assets/player/ajax-player.php?os=android&html5=program&id=' + video_id
-        data = self._download_json(info_url, video_id)
+        data = self._download_json(
+            'http://www.c-span.org/assets/player/ajax-player.php?os=android&html5=%s&id=%s' % (video_type, video_id),
+            video_id)['video']
+        if data['@status'] != 'Success':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, get_text_attr(data, 'error')), expected=True)
  
          doc = self._download_xml(
-            'http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
+            'http://www.c-span.org/common/services/flashXml.php?%sid=%s' % (video_type, video_id),
              video_id)
  
+        description = self._html_search_meta('description', webpage)
+
          title = find_xpath_attr(doc, './/string', 'name', 'title').text
          thumbnail = find_xpath_attr(doc, './/string', 'name', 'poster').text
  
-        senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
-        if senate_isvp_url:
-            surl = smuggle_url(senate_isvp_url, {'force_title': title})
-            return self.url_result(surl, 'SenateISVP', video_id, title)
-
-        files = data['video']['files']
-        try:
-            capfile = data['video']['capfile']['#text']
-        except KeyError:
-            capfile = None
+        files = data['files']
+        capfile = get_text_attr(data, 'capfile')
  
-        entries = [{
-            'id': '%s_%d' % (video_id, partnum + 1),
-            'title': (
-                title if len(files) == 1 else
-                '%s part %d' % (title, partnum + 1)),
-            'url': unescapeHTML(f['path']['#text']),
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': int_or_none(f.get('length', {}).get('#text')),
-            'subtitles': {
-                'en': [{
-                    'url': capfile,
-                    'ext': determine_ext(capfile, 'dfxp')
-                }],
-            } if capfile else None,
-        } for partnum, f in enumerate(files)]
+        entries = []
+        for partnum, f in enumerate(files):
+            formats = []
+            for quality in f['qualities']:
+                formats.append({
+                    'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
+                    'url': unescapeHTML(get_text_attr(quality, 'file')),
+                    'height': int_or_none(get_text_attr(quality, 'height')),
+                    'tbr': int_or_none(get_text_attr(quality, 'bitrate')),
+                })
+            self._sort_formats(formats)
+            entries.append({
+                'id': '%s_%d' % (video_id, partnum + 1),
+                'title': (
+                    title if len(files) == 1 else
+                    '%s part %d' % (title, partnum + 1)),
+                'formats': formats,
+                'description': description,
+                'thumbnail': thumbnail,
+                'duration': int_or_none(get_text_attr(f, 'length')),
+                'subtitles': {
+                    'en': [{
+                        'url': capfile,
+                        'ext': determine_ext(capfile, 'dfxp')
+                    }],
+                } if capfile else None,
+            })
  
          if len(entries) == 1:
              entry = dict(entries[0])
-            entry['id'] = video_id
+            entry['id'] = 'c' + video_id if video_type == 'clip' else video_id
              return entry
          else:
              return {
                  '_type': 'playlist',
                  'entries': entries,
                  'title': title,
-                'id': video_id,
+                'id': 'c' + video_id if video_type == 'clip' else video_id,
              }
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index bc782393173f315b096a640bf20bfa45cb89eb8d..ab7f3aec42ff7807f03095ba269bc5f989541870 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -7,15 +7,13 @@ import itertools
  
  from .common import InfoExtractor
  
-from ..compat import (
-    compat_str,
-    compat_urllib_request,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
      str_to_int,
      unescapeHTML,
  )
@@ -25,7 +23,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
      @staticmethod
      def _build_request(url):
          """Build a request with the family filter disabled"""
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          request.add_header('Cookie', 'family_filter=off; ff=off')
          return request
  
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index 2122176254eeacce3241a9e517a70daf90b0b187..133cdc50b8c8379021d6d7fffb7c7c28dd2ba30d 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -13,8 +13,8 @@ from ..utils import (
  
  
  class DBTVIE(InfoExtractor):
-    _VALID_URL = r'http://dbtv\.no/(?P<id>[0-9]+)#(?P<display_id>.+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
+    _TESTS = [{
          'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
          'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
          'info_dict': {
@@ -30,12 +30,18 @@ class DBTVIE(InfoExtractor):
              'view_count': int,
              'categories': list,
          }
-    }
+    }, {
+        'url': 'http://dbtv.no/3649835190001',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
  
          data = self._download_json(
              'http://api.dbtv.no/discovery/%s' % video_id, display_id)
diff --git a/youtube_dl/extractor/dcn.py b/youtube_dl/extractor/dcn.py

index 6f2fea5ff18cddb3c033ce0cadf061e414773652..9737cff14b86849a7a4b4bd838902a4d71278c93 100644 (file)
--- a/youtube_dl/extractor/dcn.py
+++ b/youtube_dl/extractor/dcn.py
@@ -2,13 +2,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -36,7 +34,7 @@ class DCNIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
              headers={'Origin': 'http://www.dcndigital.ae'})
  
diff --git a/youtube_dl/extractor/democracynow.py b/youtube_dl/extractor/democracynow.py

new file mode 100644 (file)

index 0000000..6cd395e
--- /dev/null
+++ b/youtube_dl/extractor/democracynow.py
@@ -0,0 +1,88 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import os.path
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    url_basename,
+    remove_start,
+)
+
+
+class DemocracynowIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)'
+    IE_NAME = 'democracynow'
+    _TESTS = [{
+        'url': 'http://www.democracynow.org/shows/2015/7/3',
+        'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
+        'info_dict': {
+            'id': '2015-0703-001',
+            'ext': 'mp4',
+            'title': 'July 03, 2015 - Democracy Now!',
+            'description': 'A daily independent global news hour with Amy Goodman & Juan González "What to the Slave is 4th of July?": James Earl Jones Reads Frederick Douglass\u2019 Historic Speech : "This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag : "We Shall Overcome": Remembering Folk Icon, Activist Pete Seeger in His Own Words & Songs',
+        },
+    }, {
+        'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
+        'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
+        'info_dict': {
+            'id': '2015-0703-001',
+            'ext': 'mp4',
+            'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
+            'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
+        },
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        description = self._og_search_description(webpage)
+
+        json_data = self._parse_json(self._search_regex(
+            r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
+            display_id)
+        video_id = None
+        formats = []
+
+        default_lang = 'en'
+
+        subtitles = {}
+
+        def add_subtitle_item(lang, info_dict):
+            if lang not in subtitles:
+                subtitles[lang] = []
+            subtitles[lang].append(info_dict)
+
+        # chapter_file are not subtitles
+        if 'caption_file' in json_data:
+            add_subtitle_item(default_lang, {
+                'url': compat_urlparse.urljoin(url, json_data['caption_file']),
+            })
+
+        for subtitle_item in json_data.get('captions', []):
+            lang = subtitle_item.get('language', '').lower() or default_lang
+            add_subtitle_item(lang, {
+                'url': compat_urlparse.urljoin(url, subtitle_item['url']),
+            })
+
+        for key in ('file', 'audio', 'video'):
+            media_url = json_data.get(key, '')
+            if not media_url:
+                continue
+            media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
+            video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
+            formats.append({
+                'url': media_url,
+            })
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id or display_id,
+            'title': json_data['title'],
+            'description': description,
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

new file mode 100644 (file)

index 0000000..6cda56a
--- /dev/null
+++ b/youtube_dl/extractor/dplay.py
@@ -0,0 +1,51 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import time
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DPlayIE(InfoExtractor):
+    _VALID_URL = r'http://www\.dplay\.se/[^/]+/(?P<id>[^/?#]+)'
+
+    _TEST = {
+        'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
+        'info_dict': {
+            'id': '3172',
+            'ext': 'mp4',
+            'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
+            'title': 'Svensken lär sig njuta av livet',
+            'duration': 2650,
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._search_regex(
+            r'data-video-id="(\d+)"', webpage, 'video id')
+
+        info = self._download_json(
+            'http://www.dplay.se/api/v2/ajax/videos?video_id=' + video_id,
+            video_id)['data'][0]
+
+        self._set_cookie(
+            'secure.dplay.se', 'dsc-geo',
+            '{"countryCode":"NL","expiry":%d}' % ((time.time() + 20 * 60) * 1000))
+        # TODO: consider adding support for 'stream_type=hds', it seems to
+        # require setting some cookies
+        manifest_url = self._download_json(
+            'https://secure.dplay.se/secure/api/v2/user/authorization/stream/%s?stream_type=hls' % video_id,
+            video_id, 'Getting manifest url for hls stream')['hls']
+        formats = self._extract_m3u8_formats(
+            manifest_url, video_id, ext='mp4', entry_protocol='m3u8_native')
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': info['title'],
+            'formats': formats,
+            'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
+        }
diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py

index 38e6597c80f203b30a90a13c92027a4a5a305bd7..d836c1a6c27ccd757e4e1f6574e8851c90775193 100644 (file)
--- a/youtube_dl/extractor/dramafever.py
+++ b/youtube_dl/extractor/dramafever.py
@@ -7,7 +7,6 @@ from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
      compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
@@ -16,6 +15,7 @@ from ..utils import (
      determine_ext,
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -51,7 +51,7 @@ class DramaFeverBaseIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          response = self._download_webpage(
              request, None, 'Logging in as %s' % username)
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index 1f00386feae15d00a4421b3166335c15f3b01aa9..e5aadcd25ccccb6f9838d0bd1417edc2fbe3bd0f 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -2,14 +2,17 @@
  from __future__ import unicode_literals
  
  import base64
+import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import qualities
+from ..utils import (
+    qualities,
+    sanitized_Request,
+)
  
  
  class DumpertIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
+    _VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
      _TESTS = [{
          'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
          'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
@@ -26,10 +29,12 @@ class DumpertIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        protocol = mobj.group('protocol')
  
-        url = 'https://www.dumpert.nl/mediabase/' + video_id
-        req = compat_urllib_request.Request(url)
+        url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'nsfw=1; cpc=10')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/eitb.py b/youtube_dl/extractor/eitb.py

index 2cba825325ad46caf931c5382c54dab263b7c15a..c83845fc2caaf7b1c4d382e4c77e98855c9c4d7a 100644 (file)
--- a/youtube_dl/extractor/eitb.py
+++ b/youtube_dl/extractor/eitb.py
@@ -1,39 +1,92 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
-from ..utils import ExtractorError
+from ..utils import (
+    float_or_none,
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+)
  
  
  class EitbIE(InfoExtractor):
      IE_NAME = 'eitb.tv'
-    _VALID_URL = r'https?://www\.eitb\.tv/(eu/bideoa|es/video)/[^/]+/(?P<playlist_id>\d+)/(?P<chapter_id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?eitb\.tv/(?:eu/bideoa|es/video)/[^/]+/\d+/(?P<id>\d+)'
  
      _TEST = {
-        'add_ie': ['Brightcove'],
-        'url': 'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/2677100210001/2743577154001/lasa-y-zabala-30-anos/',
+        'url': 'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/4104995148001/4090227752001/lasa-y-zabala-30-anos/',
          'md5': 'edf4436247185adee3ea18ce64c47998',
          'info_dict': {
-            'id': '2743577154001',
+            'id': '4090227752001',
              'ext': 'mp4',
              'title': '60 minutos (Lasa y Zabala, 30 años)',
-            # All videos from eitb has this description in the brightcove info
-            'description': '.',
-            'uploader': 'Euskal Telebista',
+            'description': 'Programa de reportajes de actualidad.',
+            'duration': 3996.76,
+            'timestamp': 1381789200,
+            'upload_date': '20131014',
+            'tags': list,
          },
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        chapter_id = mobj.group('chapter_id')
-        webpage = self._download_webpage(url, chapter_id)
-        bc_url = BrightcoveIE._extract_brightcove_url(webpage)
-        if bc_url is None:
-            raise ExtractorError('Could not extract the Brightcove url')
-        # The BrightcoveExperience object doesn't contain the video id, we set
-        # it manually
-        bc_url += '&%40videoPlayer={0}'.format(chapter_id)
-        return self.url_result(bc_url, BrightcoveIE.ie_key())
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://mam.eitb.eus/mam/REST/ServiceMultiweb/Video/MULTIWEBTV/%s/' % video_id,
+            video_id, 'Downloading video JSON')
+
+        media = video['web_media'][0]
+
+        formats = []
+        for rendition in media['RENDITIONS']:
+            video_url = rendition.get('PMD_URL')
+            if not video_url:
+                continue
+            tbr = float_or_none(rendition.get('ENCODING_RATE'), 1000)
+            format_id = 'http'
+            if tbr:
+                format_id += '-%d' % int(tbr)
+            formats.append({
+                'url': rendition['PMD_URL'],
+                'format_id': format_id,
+                'width': int_or_none(rendition.get('FRAME_WIDTH')),
+                'height': int_or_none(rendition.get('FRAME_HEIGHT')),
+                'tbr': tbr,
+            })
+
+        hls_url = media.get('HLS_SURL')
+        if hls_url:
+            request = sanitized_Request(
+                'http://mam.eitb.eus/mam/REST/ServiceMultiweb/DomainRestrictedSecurity/TokenAuth/',
+                headers={'Referer': url})
+            token_data = self._download_json(
+                request, video_id, 'Downloading auth token', fatal=False)
+            if token_data:
+                token = token_data.get('token')
+                if token:
+                    m3u8_formats = self._extract_m3u8_formats(
+                        '%s?hdnts=%s' % (hls_url, token), video_id, m3u8_id='hls', fatal=False)
+                    if m3u8_formats:
+                        formats.extend(m3u8_formats)
+
+        hds_url = media.get('HDS_SURL')
+        if hds_url:
+            f4m_formats = self._extract_f4m_formats(
+                '%s?hdcore=3.7.0' % hds_url.replace('euskalsvod', 'euskalvod'),
+                video_id, f4m_id='hds', fatal=False)
+            if f4m_formats:
+                formats.extend(f4m_formats)
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': media.get('NAME_ES') or media.get('name') or media['NAME_EU'],
+            'description': media.get('SHORT_DESC_ES') or video.get('desc_group') or media.get('SHORT_DESC_EU'),
+            'thumbnail': media.get('STILL_URL') or media.get('THUMBNAIL_URL'),
+            'duration': float_or_none(media.get('LENGTH'), 1000),
+            'timestamp': parse_iso8601(media.get('BROADCST_DATE'), ' '),
+            'tags': media.get('TAGS'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/escapist.py b/youtube_dl/extractor/escapist.py

index c85b4c458d95882f56675fa135aab1f3492b6194..a3d7bbbcb3f45a4c098397d0622fc59324412fcc 100644 (file)
--- a/youtube_dl/extractor/escapist.py
+++ b/youtube_dl/extractor/escapist.py
@@ -3,13 +3,12 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-
  from ..utils import (
      determine_ext,
      clean_html,
      int_or_none,
      float_or_none,
+    sanitized_Request,
  )
  
  
@@ -75,7 +74,7 @@ class EscapistIE(InfoExtractor):
          video_id = ims_video['videoID']
          key = ims_video['hash']
  
-        config_req = compat_urllib_request.Request(
+        config_req = sanitized_Request(
              'http://www.escapistmagazine.com/videos/'
              'vidconfig.php?videoID=%s&hash=%s' % (video_id, key))
          config_req.add_header('Referer', url)
diff --git a/youtube_dl/extractor/everyonesmixtape.py b/youtube_dl/extractor/everyonesmixtape.py

index d872d828fcc8e10fea4770e1e56ab21cda027336..493d38af8202ae02af1cb21d0b7edb5fd3a23c4c 100644 (file)
--- a/youtube_dl/extractor/everyonesmixtape.py
+++ b/youtube_dl/extractor/everyonesmixtape.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -42,7 +40,7 @@ class EveryonesMixtapeIE(InfoExtractor):
          playlist_id = mobj.group('id')
  
          pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id
-        pllist_req = compat_urllib_request.Request(pllist_url)
+        pllist_req = sanitized_Request(pllist_url)
          pllist_req.add_header('X-Requested-With', 'XMLHttpRequest')
  
          playlist_list = self._download_json(
@@ -55,7 +53,7 @@ class EveryonesMixtapeIE(InfoExtractor):
              raise ExtractorError('Playlist id not found')
  
          pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no
-        pl_req = compat_urllib_request.Request(pl_url)
+        pl_req = sanitized_Request(pl_url)
          pl_req.add_header('X-Requested-With', 'XMLHttpRequest')
          playlist = self._download_json(
              pl_req, playlist_id, note='Downloading playlist info')
diff --git a/youtube_dl/extractor/extremetube.py b/youtube_dl/extractor/extremetube.py

index c826a5404a4f7da298927460f1f8e41dd013d3a7..3403581fddf08a0928a8e4c5b22e740117646bd2 100644 (file)
--- a/youtube_dl/extractor/extremetube.py
+++ b/youtube_dl/extractor/extremetube.py
@@ -3,23 +3,20 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_request,
-)
  from ..utils import (
-    qualities,
+    int_or_none,
+    sanitized_Request,
      str_to_int,
  )
  
  
  class ExtremeTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>extremetube\.com/.*?video/.+?(?P<id>[0-9]+))(?:[/?&]|$)'
+    _VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
      _TESTS = [{
          'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
          'md5': '344d0c6d50e2f16b06e49ca011d8ac69',
          'info_dict': {
-            'id': '652431',
+            'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
              'ext': 'mp4',
              'title': 'Music Video 14 british euro brit european cumshots swallow',
              'uploader': 'unknown',
@@ -29,14 +26,18 @@ class ExtremeTubeIE(InfoExtractor):
      }, {
          'url': 'http://www.extremetube.com/gay/video/abcde-1234',
          'only_matching': True,
+    }, {
+        'url': 'http://www.extremetube.com/video/latina-slut-fucked-by-fat-black-dick',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.extremetube.com/video/652431',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        url = 'http://www.' + mobj.group('url')
+        video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
@@ -49,20 +50,36 @@ class ExtremeTubeIE(InfoExtractor):
              r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
              webpage, 'view count', fatal=False))
  
-        flash_vars = compat_parse_qs(self._search_regex(
-            r'<param[^>]+?name="flashvars"[^>]+?value="([^"]+)"', webpage, 'flash vars'))
+        flash_vars = self._parse_json(
+            self._search_regex(
+                r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
+            video_id)
  
          formats = []
-        quality = qualities(['180p', '240p', '360p', '480p', '720p', '1080p'])
-        for k, vals in flash_vars.items():
-            m = re.match(r'quality_(?P<quality>[0-9]+p)$', k)
-            if m is not None:
-                formats.append({
-                    'format_id': m.group('quality'),
-                    'quality': quality(m.group('quality')),
-                    'url': vals[0],
+        for quality_key, video_url in flash_vars.items():
+            height = int_or_none(self._search_regex(
+                r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
+            if not height:
+                continue
+            f = {
+                'url': video_url,
+            }
+            mobj = re.search(
+                r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
+            if mobj:
+                height = int(mobj.group('height'))
+                bitrate = int(mobj.group('bitrate'))
+                f.update({
+                    'format_id': '%dp-%dk' % (height, bitrate),
+                    'height': height,
+                    'tbr': bitrate,
                  })
-
+            else:
+                f.update({
+                    'format_id': '%dp' % height,
+                    'height': height,
+                })
+            formats.append(f)
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index f53c5161501df162a1480d5cbb8a82b18682e4cc..321eec59ef672eb6f88bf07f78466ba83456afbc 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -10,11 +10,11 @@ from ..compat import (
      compat_str,
      compat_urllib_error,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
      limit_length,
+    sanitized_Request,
      urlencode_postdata,
      get_element_by_id,
      clean_html,
@@ -73,7 +73,7 @@ class FacebookIE(InfoExtractor):
          if useremail is None:
              return
  
-        login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
+        login_page_req = sanitized_Request(self._LOGIN_URL)
          login_page_req.add_header('Cookie', 'locale=en_US')
          login_page = self._download_webpage(login_page_req, None,
                                              note='Downloading login page',
@@ -94,7 +94,7 @@ class FacebookIE(InfoExtractor):
              'timezone': '-60',
              'trynum': '1',
          }
-        request = compat_urllib_request.Request(self._LOGIN_URL, urlencode_postdata(login_form))
+        request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          try:
              login_results = self._download_webpage(request, None,
@@ -109,7 +109,7 @@ class FacebookIE(InfoExtractor):
                      r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h'),
                  'name_action_selected': 'dont_save',
              }
-            check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
+            check_req = sanitized_Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
              check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
              check_response = self._download_webpage(check_req, None,
                                                      note='Confirming login')
@@ -164,7 +164,7 @@ class FacebookIE(InfoExtractor):
          if not video_title:
              video_title = self._html_search_regex(
                  r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
-                webpage, 'alternative title', fatal=False)
+                webpage, 'alternative title', default=None)
              video_title = limit_length(video_title, 80)
          if not video_title:
              video_title = 'Facebook video #%s' % video_id
diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py

index a406945e8b2f648a36483d7477362137024da7aa..92e8c571f7b7da41916a5b1ed1742ffc2107f0c3 100644 (file)
--- a/youtube_dl/extractor/fc2.py
+++ b/youtube_dl/extractor/fc2.py
@@ -12,6 +12,7 @@ from ..compat import (
  from ..utils import (
      encode_dict,
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -57,7 +58,7 @@ class FC2IE(InfoExtractor):
          }
  
          login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)
  
          login_results = self._download_webpage(request, None, note='Logging in', errnote='Unable to log in')
@@ -66,7 +67,7 @@ class FC2IE(InfoExtractor):
              return False
  
          # this is also needed
-        login_redir = compat_urllib_request.Request('http://id.fc2.com/?mode=redirect&login=done')
+        login_redir = sanitized_Request('http://id.fc2.com/?mode=redirect&login=done')
          self._download_webpage(
              login_redir, None, note='Login redirect', errnote='Login redirect failed')
  
diff --git a/youtube_dl/extractor/flickr.py b/youtube_dl/extractor/flickr.py

index 2fe76d661bb432580cd2bd3f48c85035a4b6d7d9..91cd46e76cbacaf2dac242d14987b2ae777d6995 100644 (file)
--- a/youtube_dl/extractor/flickr.py
+++ b/youtube_dl/extractor/flickr.py
@@ -3,10 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      ExtractorError,
      find_xpath_attr,
+    sanitized_Request,
  )
  
  
@@ -30,7 +30,7 @@ class FlickrIE(InfoExtractor):
          video_id = mobj.group('id')
          video_uploader_id = mobj.group('uploader_id')
          webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id
-        req = compat_urllib_request.Request(webpage_url)
+        req = sanitized_Request(webpage_url)
          req.add_header(
              'User-Agent',
              # it needs a more recent version
diff --git a/youtube_dl/extractor/fourtube.py b/youtube_dl/extractor/fourtube.py

index fb6d108c0a62730c7ebc110a0bb599ca1f812c62..fc4a5a0fbf01801d598e20a9addd29ebef4a298e 100644 (file)
--- a/youtube_dl/extractor/fourtube.py
+++ b/youtube_dl/extractor/fourtube.py
@@ -3,12 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_duration,
      parse_iso8601,
+    sanitized_Request,
      str_to_int,
  )
  
@@ -93,7 +91,7 @@ class FourTubeIE(InfoExtractor):
              b'Content-Type': b'application/x-www-form-urlencoded',
              b'Origin': b'http://www.4tube.com',
          }
-        token_req = compat_urllib_request.Request(token_url, b'{}', headers)
+        token_req = sanitized_Request(token_url, b'{}', headers)
          tokens = self._download_json(token_req, video_id)
          formats = [{
              'url': tokens[format]['token'],
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

index f5f13689c0a8310d0e241f0cd24617d90799c701..7f21d7410c4515e23c4ebde5895d56de3b8a8f79 100644 (file)
--- a/youtube_dl/extractor/funnyordie.py
+++ b/youtube_dl/extractor/funnyordie.py
@@ -45,11 +45,20 @@ class FunnyOrDieIE(InfoExtractor):
  
          links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
  
-        bitrates = self._html_search_regex(r'<source src="[^"]+/v,((?:\d+,)+)\.mp4\.csmil', webpage, 'video bitrates')
-        bitrates = [int(b) for b in bitrates.rstrip(',').split(',')]
-        bitrates.sort()
+        m3u8_url = self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
+            webpage, 'm3u8 url', default=None, group='url')
  
          formats = []
+
+        m3u8_formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
+        if m3u8_formats:
+            formats.extend(m3u8_formats)
+
+        bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)[,/]', m3u8_url)]
+        bitrates.sort()
+
          for bitrate in bitrates:
              for link in links:
                  formats.append({
diff --git a/youtube_dl/extractor/gametrailers.py b/youtube_dl/extractor/gametrailers.py

index a6ab795aef1bab4a56b2655515983aed35886a77..c3f031d9cd4341184cc3b70eea77c1f360a1a3c6 100644 (file)
--- a/youtube_dl/extractor/gametrailers.py
+++ b/youtube_dl/extractor/gametrailers.py
@@ -1,19 +1,62 @@
  from __future__ import unicode_literals
  
-from .mtv import MTVServicesInfoExtractor
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+    url_basename,
+)
  
  
-class GametrailersIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'http://www\.gametrailers\.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
+class GametrailersIE(InfoExtractor):
+    _VALID_URL = r'http://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
+
      _TEST = {
-        'url': 'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
-        'md5': '4c8e67681a0ea7ec241e8c09b3ea8cf7',
+        'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
+        'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
          'info_dict': {
-            'id': '70e9a5d7-cf25-4a10-9104-6f3e7342ae0d',
+            'id': '2983958',
              'ext': 'mp4',
-            'title': 'E3 2013: Debut Trailer',
-            'description': 'Faith is back!  Check out the World Premiere trailer for Mirror\'s Edge 2 straight from the EA Press Conference at E3 2013!',
+            'display_id': '116437-Just-Cause-3-Review',
+            'title': 'Just Cause 3 - Review',
+            'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
          },
      }
  
-    _FEED_URL = 'http://www.gametrailers.com/feeds/mrss'
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        title = self._html_search_regex(
+            r'<title>(.+?)\|', webpage, 'title').strip()
+        embed_url = self._proto_relative_url(
+            self._search_regex(
+                r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
+                'embed url'),
+            scheme='http:')
+        video_id = url_basename(embed_url)
+        embed_page = self._download_webpage(embed_url, video_id)
+        embed_vars_json = self._search_regex(
+            r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
+            'embed vars')
+        info = self._parse_json(embed_vars_json, video_id)
+
+        formats = []
+        for media in info['media']:
+            if media['mediaPurpose'] == 'play':
+                formats.append({
+                    'url': media['uri'],
+                    'height': media['height'],
+                    'width:': media['width'],
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': info.get('thumbUri'),
+            'description': self._og_search_description(webpage),
+            'duration': int_or_none(info.get('videoLengthInSeconds')),
+            'age_limit': parse_age_limit(info.get('audienceRating')),
+        }
diff --git a/youtube_dl/extractor/gdcvault.py b/youtube_dl/extractor/gdcvault.py

index a6834db433de249ed2d0f9ffa7c07c3ff5becc16..3befd3e7b8f9ffeb48a31a284b5971b3a8a5cfab 100644 (file)
--- a/youtube_dl/extractor/gdcvault.py
+++ b/youtube_dl/extractor/gdcvault.py
@@ -3,13 +3,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      remove_end,
      HEADRequest,
+    sanitized_Request,
  )
  
  
@@ -125,7 +123,7 @@ class GDCVaultIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(login_url, compat_urllib_parse.urlencode(login_form))
+        request = sanitized_Request(login_url, compat_urllib_parse.urlencode(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self._download_webpage(request, display_id, 'Logging in')
          start_page = self._download_webpage(webpage_url, display_id, 'Getting authenticated video page')
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index ca5fbafb2606ff096025d8f6a86bff598410d4d0..5075d131ec66debcdb0296b6e9e1c87598a12836 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -9,8 +9,8 @@ import sys
  from .common import InfoExtractor
  from .youtube import YoutubeIE
  from ..compat import (
+    compat_etree_fromstring,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
      compat_urlparse,
      compat_xml_parse_error,
  )
@@ -21,7 +21,7 @@ from ..utils import (
      HEADRequest,
      is_html,
      orderedSet,
-    parse_xml,
+    sanitized_Request,
      smuggle_url,
      unescapeHTML,
      unified_strdate,
@@ -30,7 +30,10 @@ from ..utils import (
      url_basename,
      xpath_text,
  )
-from .brightcove import BrightcoveIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
  from .nbc import NBCSportsVPlayerIE
  from .ooyala import OoyalaIE
  from .rutv import RUTVIE
@@ -141,6 +144,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Automatics, robotics and biocybernetics',
                  'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
+                'upload_date': '20130627',
                  'formats': 'mincount:16',
                  'subtitles': 'mincount:1',
              },
@@ -274,7 +278,7 @@ class GenericIE(InfoExtractor):
          # it also tests brightcove videos that need to set the 'Referer' in the
          # http requests
          {
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
              'info_dict': {
                  'id': '2765128793001',
@@ -298,7 +302,7 @@ class GenericIE(InfoExtractor):
                  'uploader': 'thestar.com',
                  'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
              },
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
          },
          {
              'url': 'http://www.championat.com/video/football/v/87/87499.html',
@@ -313,7 +317,7 @@ class GenericIE(InfoExtractor):
          },
          {
              # https://github.com/rg3/youtube-dl/issues/3541
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.kijk.nl/sbs6/leermijvrouwenkennen/videos/jqMiXKAYan2S/aflevering-1',
              'info_dict': {
                  'id': '3866516442001',
@@ -819,6 +823,19 @@ class GenericIE(InfoExtractor):
                  'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
              },
          },
+        # Kaltura embed protected with referrer
+        {
+            'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
+            'info_dict': {
+                'id': '1_g4fbemnq',
+                'ext': 'mp4',
+                'title': 'Violetta - Achter De Schermen - Ruggero',
+                'description': 'Achter de schermen met Ruggero',
+                'timestamp': 1435133761,
+                'upload_date': '20150624',
+                'uploader_id': 'echojecka',
+            },
+        },
          # Eagle.Platform embed (generic URL)
          {
              'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -1030,6 +1047,31 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'cinemasnob',
              },
+        },
+        # BrightcoveInPageEmbed embed
+        {
+            'url': 'http://www.geekandsundry.com/tabletop-bonus-wils-final-thoughts-on-dread/',
+            'info_dict': {
+                'id': '4238694884001',
+                'ext': 'flv',
+                'title': 'Tabletop: Dread, Last Thoughts',
+                'description': 'Tabletop: Dread, Last Thoughts',
+                'duration': 51690,
+            },
+        },
+        # JWPlayer with M3U8
+        {
+            'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
+            'info_dict': {
+                'id': 'playlist',
+                'ext': 'mp4',
+                'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
+                'uploader': 'ren.tv',
+            },
+            'params': {
+                # m3u8 downloads
+                'skip_download': True,
+            }
          }
      ]
  
@@ -1173,7 +1215,7 @@ class GenericIE(InfoExtractor):
  
          full_response = None
          if head_response is False:
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.add_header('Accept-Encoding', '*')
              full_response = self._request_webpage(request, video_id)
              head_response = full_response
@@ -1202,7 +1244,7 @@ class GenericIE(InfoExtractor):
                  '%s on generic information extractor.' % ('Forcing' if force else 'Falling back'))
  
          if not full_response:
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              # Some webservers may serve compressed content of rather big size (e.g. gzipped flac)
              # making it impossible to download only chunk of the file (yet we need only 512kB to
              # test whether it's HTML or not). According to youtube-dl default Accept-Encoding
@@ -1237,7 +1279,7 @@ class GenericIE(InfoExtractor):
  
          # Is it an RSS feed, a SMIL file or a XSPF playlist?
          try:
-            doc = parse_xml(webpage)
+            doc = compat_etree_fromstring(webpage.encode('utf-8'))
              if doc.tag == 'rss':
                  return self._extract_rss(url, video_id, doc)
              elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
@@ -1289,14 +1331,14 @@ class GenericIE(InfoExtractor):
              return self.playlist_result(
                  urlrs, playlist_id=video_id, playlist_title=video_title)
  
-        # Look for BrightCove:
-        bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
+        # Look for Brightcove Legacy Studio embeds
+        bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
          if bc_urls:
              self.to_screen('Brightcove video detected.')
              entries = [{
                  '_type': 'url',
                  'url': smuggle_url(bc_url, {'Referer': url}),
-                'ie_key': 'Brightcove'
+                'ie_key': 'BrightcoveLegacy'
              } for bc_url in bc_urls]
  
              return {
@@ -1306,6 +1348,11 @@ class GenericIE(InfoExtractor):
                  'entries': entries,
              }
  
+        # Look for Brightcove New Studio embeds
+        bc_urls = BrightcoveNewIE._extract_urls(webpage)
+        if bc_urls:
+            return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
+
          # Look for embedded rtl.nl player
          matches = re.findall(
              r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1671,10 +1718,12 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'Zapiks')
  
          # Look for Kaltura embeds
-        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage) or
-                re.search(r'(?s)(["\'])(?:https?:)?//cdnapisec\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?\1.*?entry_id\s*:\s*(["\'])(?P<id>[^\2]+?)\2', webpage))
+        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or
+                re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
          if mobj is not None:
-            return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura')
+            return self.url_result(smuggle_url(
+                'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
+                {'source_url': url}), 'Kaltura')
  
          # Look for Eagle.Platform embeds
          mobj = re.search(
@@ -1719,7 +1768,7 @@ class GenericIE(InfoExtractor):
  
          # Look for UDN embeds
          mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._VALID_URL, webpage)
+            r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
          if mobj is not None:
              return self.url_result(
                  compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed')
@@ -1839,6 +1888,7 @@ class GenericIE(InfoExtractor):
  
          entries = []
          for video_url in found:
+            video_url = video_url.replace('\\/', '/')
              video_url = compat_urlparse.urljoin(url, video_url)
              video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
  
@@ -1850,25 +1900,24 @@ class GenericIE(InfoExtractor):
              # here's a fun little line of code for you:
              video_id = os.path.splitext(video_id)[0]
  
+            entry_info_dict = {
+                'id': video_id,
+                'uploader': video_uploader,
+                'title': video_title,
+                'age_limit': age_limit,
+            }
+
              ext = determine_ext(video_url)
              if ext == 'smil':
-                entries.append({
-                    'id': video_id,
-                    'formats': self._extract_smil_formats(video_url, video_id),
-                    'uploader': video_uploader,
-                    'title': video_title,
-                    'age_limit': age_limit,
-                })
+                entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)
              elif ext == 'xspf':
                  return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
+            elif ext == 'm3u8':
+                entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
              else:
-                entries.append({
-                    'id': video_id,
-                    'url': video_url,
-                    'uploader': video_uploader,
-                    'title': video_title,
-                    'age_limit': age_limit,
-                })
+                entry_info_dict['url'] = video_url
+
+            entries.append(entry_info_dict)
  
          if len(entries) == 1:
              return entries[0]
diff --git a/youtube_dl/extractor/globo.py b/youtube_dl/extractor/globo.py

index 33d6432a6f29942d2bf7e53e6cf9adf353d79b25..c65ef6bcf39d405faed9ac6d3c5cbbf3f666626c 100644 (file)
--- a/youtube_dl/extractor/globo.py
+++ b/youtube_dl/extractor/globo.py
@@ -14,79 +14,58 @@ from ..utils import (
      ExtractorError,
      float_or_none,
      int_or_none,
+    str_or_none,
  )
  
  
  class GloboIE(InfoExtractor):
-    _VALID_URL = 'https?://.+?\.globo\.com/(?P<id>.+)'
+    _VALID_URL = '(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
  
      _API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
      _SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
  
-    _VIDEOID_REGEXES = [
-        r'\bdata-video-id="(\d+)"',
-        r'\bdata-player-videosids="(\d+)"',
-        r'<div[^>]+\bid="(\d+)"',
-    ]
-
      _RESIGN_EXPIRATION = 86400
  
-    _TESTS = [
-        {
-            'url': 'http://globotv.globo.com/sportv/futebol-nacional/v/os-gols-de-atletico-mg-3-x-2-santos-pela-24a-rodada-do-brasileirao/3654973/',
-            'md5': '03ebf41cb7ade43581608b7d9b71fab0',
-            'info_dict': {
-                'id': '3654973',
-                'ext': 'mp4',
-                'title': 'Os gols de Atlético-MG 3 x 2 Santos pela 24ª rodada do Brasileirão',
-                'duration': 251.585,
-                'uploader': 'SporTV',
-                'uploader_id': 698,
-                'like_count': int,
-            }
-        },
-        {
-            'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
-            'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
-            'info_dict': {
-                'id': '3607726',
-                'ext': 'mp4',
-                'title': 'Mercedes-Benz GLA passa por teste de colisão na Europa',
-                'duration': 103.204,
-                'uploader': 'Globo.com',
-                'uploader_id': 265,
-                'like_count': int,
-            }
-        },
-        {
-            'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
-            'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
-            'info_dict': {
-                'id': '3652183',
-                'ext': 'mp4',
-                'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
-                'duration': 110.711,
-                'uploader': 'Rede Globo',
-                'uploader_id': 196,
-                'like_count': int,
-            }
+    _TESTS = [{
+        'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
+        'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
+        'info_dict': {
+            'id': '3607726',
+            'ext': 'mp4',
+            'title': 'Mercedes-Benz GLA passa por teste de colisão na Europa',
+            'duration': 103.204,
+            'uploader': 'Globo.com',
+            'uploader_id': '265',
          },
-        {
-            'url': 'http://globotv.globo.com/canal-brasil/sangue-latino/t/todos-os-videos/v/ator-e-diretor-argentino-ricado-darin-fala-sobre-utopias-e-suas-perdas/3928201/',
-            'md5': 'c1defca721ce25b2354e927d3e4b3dec',
-            'info_dict': {
-                'id': '3928201',
-                'ext': 'mp4',
-                'title': 'Ator e diretor argentino, Ricado Darín fala sobre utopias e suas perdas',
-                'duration': 1472.906,
-                'uploader': 'Canal Brasil',
-                'uploader_id': 705,
-                'like_count': int,
-            }
+    }, {
+        'url': 'http://globoplay.globo.com/v/4581987/',
+        'md5': 'f36a1ecd6a50da1577eee6dd17f67eff',
+        'info_dict': {
+            'id': '4581987',
+            'ext': 'mp4',
+            'title': 'Acidentes de trânsito estão entre as maiores causas de queda de energia em SP',
+            'duration': 137.973,
+            'uploader': 'Rede Globo',
+            'uploader_id': '196',
          },
-    ]
-
-    class MD5():
+    }, {
+        'url': 'http://canalbrasil.globo.com/programas/sangue-latino/videos/3928201.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://globosatplay.globo.com/globonews/v/4472924/',
+        'only_matching': True,
+    }, {
+        'url': 'http://globotv.globo.com/t/programa/v/clipe-sexo-e-as-negas-adeus/3836166/',
+        'only_matching': True,
+    }, {
+        'url': 'http://globotv.globo.com/canal-brasil/sangue-latino/t/todos-os-videos/v/ator-e-diretor-argentino-ricado-darin-fala-sobre-utopias-e-suas-perdas/3928201/',
+        'only_matching': True,
+    }, {
+        'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
+        'only_matching': True,
+    }]
+
+    class MD5:
          HEX_FORMAT_LOWERCASE = 0
          HEX_FORMAT_UPPERCASE = 1
          BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = ''
@@ -353,9 +332,6 @@ class GloboIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
-        video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
-
          video = self._download_json(
              self._API_URL_TEMPLATE % video_id, video_id)['videos'][0]
  
@@ -364,7 +340,7 @@ class GloboIE(InfoExtractor):
          formats = []
          for resource in video['resources']:
              resource_id = resource.get('_id')
-            if not resource_id:
+            if not resource_id or resource_id.endswith('manifest'):
                  continue
  
              security = self._download_json(
@@ -393,20 +369,23 @@ class GloboIE(InfoExtractor):
              resource_url = resource['url']
              signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
              if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(signed_url, resource_id, 'mp4'))
+                m3u8_formats = self._extract_m3u8_formats(
+                    signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
              else:
                  formats.append({
                      'url': signed_url,
-                    'format_id': resource_id,
-                    'height': resource.get('height'),
+                    'format_id': 'http-%s' % resource_id,
+                    'height': int_or_none(resource.get('height')),
                  })
  
          self._sort_formats(formats)
  
          duration = float_or_none(video.get('duration'), 1000)
-        like_count = int_or_none(video.get('likes'))
          uploader = video.get('channel')
-        uploader_id = video.get('channel_id')
+        uploader_id = str_or_none(video.get('channel_id'))
  
          return {
              'id': video_id,
@@ -414,6 +393,46 @@ class GloboIE(InfoExtractor):
              'duration': duration,
              'uploader': uploader,
              'uploader_id': uploader_id,
-            'like_count': like_count,
              'formats': formats
          }
+
+
+class GloboArticleIE(InfoExtractor):
+    _VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+
+    _VIDEOID_REGEXES = [
+        r'\bdata-video-id=["\'](\d{7,})',
+        r'\bdata-player-videosids=["\'](\d{7,})',
+        r'\bvideosIDs\s*:\s*["\'](\d{7,})',
+        r'\bdata-id=["\'](\d{7,})',
+        r'<div[^>]+\bid=["\'](\d{7,})',
+    ]
+
+    _TESTS = [{
+        'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
+        'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
+        'info_dict': {
+            'id': '3652183',
+            'ext': 'mp4',
+            'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
+            'duration': 110.711,
+            'uploader': 'Rede Globo',
+            'uploader_id': '196',
+        }
+    }, {
+        'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://gshow.globo.com/programas/tv-xuxa/O-Programa/noticia/2014/01/xuxa-e-junno-namoram-muuuito-em-luau-de-zeze-di-camargo-e-luciano.html',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if GloboIE.suitable(url) else super(GloboArticleIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
+        return self.url_result('globo:%s' % video_id, 'Globo')
diff --git a/youtube_dl/extractor/googleplus.py b/youtube_dl/extractor/googleplus.py

index fcefe54cd1207f1a57000c04b7fb460590f2024e..731bacd673bd57fe82411268c5920a3e9c7447ac 100644 (file)
--- a/youtube_dl/extractor/googleplus.py
+++ b/youtube_dl/extractor/googleplus.py
@@ -61,7 +61,7 @@ class GooglePlusIE(InfoExtractor):
              'width': int(width),
              'height': int(height),
          } for width, height, video_url in re.findall(
-            r'\d+,(\d+),(\d+),"(https?://redirector\.googlevideo\.com.*?)"', webpage)]
+            r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent.com.*?)"', webpage)]
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/gorillavid.py b/youtube_dl/extractor/gorillavid.py

deleted file mode 100644 (file)

index d23e3ea..0000000
--- a/youtube_dl/extractor/gorillavid.py
+++ /dev/null
@@ -1,126 +0,0 @@
-# -*- coding: utf-8 -*-
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
-from ..utils import (
-    ExtractorError,
-    encode_dict,
-    int_or_none,
-)
-
-
-class GorillaVidIE(InfoExtractor):
-    IE_DESC = 'GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net and filehoot.com'
-    _VALID_URL = r'''(?x)
-        https?://(?P<host>(?:www\.)?
-            (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com))/
-        (?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
-    '''
-
-    _FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<'
-
-    _TESTS = [{
-        'url': 'http://gorillavid.in/06y9juieqpmi',
-        'md5': '5ae4a3580620380619678ee4875893ba',
-        'info_dict': {
-            'id': '06y9juieqpmi',
-            'ext': 'flv',
-            'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
-        'only_matching': True,
-    }, {
-        'url': 'http://daclips.in/3rso4kdn6f9m',
-        'md5': '1ad8fd39bb976eeb66004d3a4895f106',
-        'info_dict': {
-            'id': '3rso4kdn6f9m',
-            'ext': 'mp4',
-            'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
-            'thumbnail': 're:http://.*\.jpg',
-        }
-    }, {
-        # video with countdown timeout
-        'url': 'http://fastvideo.in/1qmdn1lmsmbw',
-        'md5': '8b87ec3f6564a3108a0e8e66594842ba',
-        'info_dict': {
-            'id': '1qmdn1lmsmbw',
-            'ext': 'mp4',
-            'title': 'Man of Steel - Trailer',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://realvid.net/ctn2y6p2eviw',
-        'md5': 'b2166d2cf192efd6b6d764c18fd3710e',
-        'info_dict': {
-            'id': 'ctn2y6p2eviw',
-            'ext': 'flv',
-            'title': 'rdx 1955',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://movpod.in/0wguyyxi1yca',
-        'only_matching': True,
-    }, {
-        'url': 'http://filehoot.com/3ivfabn7573c.html',
-        'info_dict': {
-            'id': '3ivfabn7573c',
-            'ext': 'mp4',
-            'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
-            'thumbnail': 're:http://.*\.jpg',
-        }
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        url = 'http://%s/%s' % (mobj.group('host'), video_id)
-        webpage = self._download_webpage(url, video_id)
-
-        if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-
-        fields = self._hidden_inputs(webpage)
-
-        if fields['op'] == 'download1':
-            countdown = int_or_none(self._search_regex(
-                r'<span id="countdown_str">(?:[Ww]ait)?\s*<span id="cxc">(\d+)</span>\s*(?:seconds?)?</span>',
-                webpage, 'countdown', default=None))
-            if countdown:
-                self._sleep(countdown, video_id)
-
-            post = compat_urllib_parse.urlencode(encode_dict(fields))
-
-            req = compat_urllib_request.Request(url, post)
-            req.add_header('Content-type', 'application/x-www-form-urlencoded')
-
-            webpage = self._download_webpage(req, video_id, 'Downloading video page')
-
-        title = self._search_regex(
-            [r'style="z-index: [0-9]+;">([^<]+)</span>', r'<td nowrap>([^<]+)</td>', r'>Watch (.+) '],
-            webpage, 'title', default=None) or self._og_search_title(webpage)
-        video_url = self._search_regex(
-            r'file\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'file url')
-        thumbnail = self._search_regex(
-            r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', fatal=False)
-
-        formats = [{
-            'format_id': 'sd',
-            'url': video_url,
-            'quality': 1,
-        }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py

index a19b31ac0605392b044ca14fad4ffa4e4752b281..7d8698655666f8de4e8850ac2684a16dd28810af 100644 (file)
--- a/youtube_dl/extractor/hearthisat.py
+++ b/youtube_dl/extractor/hearthisat.py
@@ -4,12 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      HEADRequest,
+    sanitized_Request,
      str_to_int,
      urlencode_postdata,
      urlhandle_detect_ext,
@@ -47,7 +45,7 @@ class HearThisAtIE(InfoExtractor):
              r'intTrackId\s*=\s*(\d+)', webpage, 'track ID')
  
          payload = urlencode_postdata({'tracks[]': track_id})
-        req = compat_urllib_request.Request(self._PLAYLIST_URL, payload)
+        req = sanitized_Request(self._PLAYLIST_URL, payload)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          track = self._download_json(req, track_id, 'Downloading playlist')[0]
diff --git a/youtube_dl/extractor/hotnewhiphop.py b/youtube_dl/extractor/hotnewhiphop.py

index 651784b73940032fd65c9675043f045b0bcf4ff2..31e219945412398909053ff464245763a671ae19 100644 (file)
--- a/youtube_dl/extractor/hotnewhiphop.py
+++ b/youtube_dl/extractor/hotnewhiphop.py
@@ -3,13 +3,11 @@ from __future__ import unicode_literals
  import base64
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      HEADRequest,
+    sanitized_Request,
  )
  
  
@@ -41,7 +39,7 @@ class HotNewHipHopIE(InfoExtractor):
              ('mediaType', 's'),
              ('mediaId', video_id),
          ])
-        r = compat_urllib_request.Request(
+        r = sanitized_Request(
              'http://www.hotnewhiphop.com/ajax/media/getActions/', data=reqdata)
          r.add_header('Content-Type', 'application/x-www-form-urlencoded')
          mkd = self._download_json(
diff --git a/youtube_dl/extractor/hypem.py b/youtube_dl/extractor/hypem.py

index aa0724a02353840e5f5533a1eedbc7005aa63008..cca3dd4985dd9ce1587852b34e83e0d8f9e27249 100644 (file)
--- a/youtube_dl/extractor/hypem.py
+++ b/youtube_dl/extractor/hypem.py
@@ -4,12 +4,10 @@ import json
  import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -32,7 +30,7 @@ class HypemIE(InfoExtractor):
          data = {'ax': 1, 'ts': time.time()}
          data_encoded = compat_urllib_parse.urlencode(data)
          complete_url = url + "?" + data_encoded
-        request = compat_urllib_request.Request(complete_url)
+        request = sanitized_Request(complete_url)
          response, urlh = self._download_webpage_handle(
              request, track_id, 'Downloading webpage with the url')
          cookie = urlh.headers.get('Set-Cookie', '')
@@ -52,7 +50,7 @@ class HypemIE(InfoExtractor):
          title = track['song']
  
          serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key)
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              serve_url, '', {'Content-Type': 'application/json'})
          request.add_header('cookie', cookie)
          song_data = self._download_json(request, track_id, 'Downloading metadata')
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 3d78f78c46d1ad004339bc33ebcb09d1286e5092..c158f206410467e8c66a8bc2526d0436cc1a4e3c 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -10,8 +10,8 @@ from ..utils import (
  
  
  class InstagramIE(InfoExtractor):
-    _VALID_URL = r'https://instagram\.com/p/(?P<id>[\da-zA-Z]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
+    _TESTS = [{
          'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
          'md5': '0d2da106a9d2631273e192b372806516',
          'info_dict': {
@@ -21,7 +21,10 @@ class InstagramIE(InfoExtractor):
              'title': 'Video by naomipq',
              'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
          }
-    }
+    }, {
+        'url': 'https://instagram.com/p/-Cmh1cukG2/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index 821c8ec109236b787b9afa2985e450ff8a647595..36baf3245353604ac67af1500029c4be6a67ed4f 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -6,12 +6,10 @@ from random import random
  from math import floor
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
      remove_end,
+    sanitized_Request,
  )
  
  
@@ -61,7 +59,7 @@ class IPrimaIE(InfoExtractor):
              (floor(random() * 1073741824), floor(random() * 1073741824))
          )
  
-        req = compat_urllib_request.Request(player_url)
+        req = sanitized_Request(player_url)
          req.add_header('Referer', url)
          playerpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index e825944443392153d8a0d456f5ad5dc80b2c9f77..029878d24dcfd01c3344fa0cc144526b496577c5 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -5,11 +5,9 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -78,7 +76,7 @@ class IviIE(InfoExtractor):
              ]
          }
  
-        request = compat_urllib_request.Request(api_url, json.dumps(data))
+        request = sanitized_Request(api_url, json.dumps(data))
  
          video_json_page = self._download_webpage(
              request, video_id, 'Downloading video JSON')
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 3dca0e566f886888987bfb68360b9a0a51b04575..583b1a5adbc2692d78b5c01dc0063e37822e910c 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -2,12 +2,18 @@
  from __future__ import unicode_literals
  
  import re
+import base64
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import (
+    compat_urllib_parse,
+    compat_urlparse,
+)
  from ..utils import (
+    clean_html,
      ExtractorError,
      int_or_none,
+    unsmuggle_url,
  )
  
  
@@ -16,7 +22,7 @@ class KalturaIE(InfoExtractor):
                  (?:
                      kaltura:(?P<partner_id_s>\d+):(?P<id_s>[0-9a-z_]+)|
                      https?://
-                        (:?(?:www|cdnapisec)\.)?kaltura\.com/
+                        (:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/
                          (?:
                              (?:
                                  # flash player
@@ -121,31 +127,47 @@ class KalturaIE(InfoExtractor):
              video_id, actions, note='Downloading video info JSON')
  
      def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+
          mobj = re.match(self._VALID_URL, url)
          partner_id = mobj.group('partner_id_s') or mobj.group('partner_id') or mobj.group('partner_id_html5')
          entry_id = mobj.group('id_s') or mobj.group('id') or mobj.group('id_html5')
  
          info, source_data = self._get_video_info(entry_id, partner_id)
  
-        formats = [{
-            'format_id': '%(fileExt)s-%(bitrate)s' % f,
-            'ext': f['fileExt'],
-            'tbr': f['bitrate'],
-            'fps': f.get('frameRate'),
-            'filesize_approx': int_or_none(f.get('size'), invscale=1024),
-            'container': f.get('containerFormat'),
-            'vcodec': f.get('videoCodecId'),
-            'height': f.get('height'),
-            'width': f.get('width'),
-            'url': '%s/flavorId/%s' % (info['dataUrl'], f['id']),
-        } for f in source_data['flavorAssets']]
+        source_url = smuggled_data.get('source_url')
+        if source_url:
+            referrer = base64.b64encode(
+                '://'.join(compat_urlparse.urlparse(source_url)[:2])
+                .encode('utf-8')).decode('utf-8')
+        else:
+            referrer = None
+
+        formats = []
+        for f in source_data['flavorAssets']:
+            video_url = '%s/flavorId/%s' % (info['dataUrl'], f['id'])
+            if referrer:
+                video_url += '?referrer=%s' % referrer
+            formats.append({
+                'format_id': '%(fileExt)s-%(bitrate)s' % f,
+                'ext': f.get('fileExt'),
+                'tbr': int_or_none(f['bitrate']),
+                'fps': int_or_none(f.get('frameRate')),
+                'filesize_approx': int_or_none(f.get('size'), invscale=1024),
+                'container': f.get('containerFormat'),
+                'vcodec': f.get('videoCodecId'),
+                'height': int_or_none(f.get('height')),
+                'width': int_or_none(f.get('width')),
+                'url': video_url,
+            })
+        self._check_formats(formats, entry_id)
          self._sort_formats(formats)
  
          return {
              'id': entry_id,
              'title': info['name'],
              'formats': formats,
-            'description': info.get('description'),
+            'description': clean_html(info.get('description')),
              'thumbnail': info.get('thumbnailUrl'),
              'duration': info.get('duration'),
              'timestamp': info.get('createdAt'),
diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py

index 82eddec511850ade9b4786636027597baf75dd29..d79261bb50b969e2dc6643c2ecb2ca198cc2af12 100644 (file)
--- a/youtube_dl/extractor/keezmovies.py
+++ b/youtube_dl/extractor/keezmovies.py
@@ -4,10 +4,8 @@ import os
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_urlparse
+from ..utils import sanitized_Request
  
  
  class KeezMoviesIE(InfoExtractor):
@@ -26,7 +24,7 @@ class KeezMoviesIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/letv.py b/youtube_dl/extractor/letv.py

index effd9eb922c5b164fc5a8b622b8f780b0139d0a9..be648000ea1bfec896c7228159170e16aa8338ee 100644 (file)
--- a/youtube_dl/extractor/letv.py
+++ b/youtube_dl/extractor/letv.py
@@ -8,13 +8,13 @@ import time
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_request,
      compat_ord,
  )
  from ..utils import (
      determine_ext,
      ExtractorError,
      parse_iso8601,
+    sanitized_Request,
      int_or_none,
      encode_data_uri,
  )
@@ -114,7 +114,7 @@ class LetvIE(InfoExtractor):
              'tkey': self.calc_time_key(int(time.time())),
              'domain': 'www.letv.com'
          }
-        play_json_req = compat_urllib_request.Request(
+        play_json_req = sanitized_Request(
              'http://api.letv.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params)
          )
          cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index 5c973e75ccf630c71b50dcc590e93185f3a5c5ea..d4e1ae99d4d91c887cfc2d55c29d8211fc48a80f 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -7,12 +7,12 @@ from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
      clean_html,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -25,7 +25,7 @@ class LyndaBaseIE(InfoExtractor):
          self._login()
  
      def _login(self):
-        (username, password) = self._get_login_info()
+        username, password = self._get_login_info()
          if username is None:
              return
  
@@ -35,7 +35,7 @@ class LyndaBaseIE(InfoExtractor):
              'remember': 'false',
              'stayPut': 'false'
          }
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          login_page = self._download_webpage(
              request, None, 'Logging in as %s' % username)
@@ -64,7 +64,7 @@ class LyndaBaseIE(InfoExtractor):
                      'remember': 'false',
                      'stayPut': 'false',
                  }
-                request = compat_urllib_request.Request(
+                request = sanitized_Request(
                      self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8'))
                  login_page = self._download_webpage(
                      request, None,
@@ -82,6 +82,15 @@ class LyndaBaseIE(InfoExtractor):
                          expected=True)
              raise ExtractorError('Unable to log in')
  
+    def _logout(self):
+        username, _ = self._get_login_info()
+        if username is None:
+            return
+
+        self._download_webpage(
+            'http://www.lynda.com/ajax/logout.aspx', None,
+            'Logging out', 'Unable to log out', fatal=False)
+
  
  class LyndaIE(LyndaBaseIE):
      IE_NAME = 'lynda'
@@ -108,51 +117,47 @@ class LyndaIE(LyndaBaseIE):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        page = self._download_webpage(
+        video = self._download_json(
              'http://www.lynda.com/ajax/player?videoId=%s&type=video' % video_id,
              video_id, 'Downloading video JSON')
-        video_json = json.loads(page)
  
-        if 'Status' in video_json:
+        if 'Status' in video:
              raise ExtractorError(
-                'lynda returned error: %s' % video_json['Message'], expected=True)
+                'lynda returned error: %s' % video['Message'], expected=True)
  
-        if video_json['HasAccess'] is False:
+        if video.get('HasAccess') is False:
              self.raise_login_required('Video %s is only available for members' % video_id)
  
-        video_id = compat_str(video_json['ID'])
-        duration = video_json['DurationInSeconds']
-        title = video_json['Title']
+        video_id = compat_str(video.get('ID') or video_id)
+        duration = int_or_none(video.get('DurationInSeconds'))
+        title = video['Title']
  
          formats = []
  
-        fmts = video_json.get('Formats')
+        fmts = video.get('Formats')
          if fmts:
-            formats.extend([
-                {
-                    'url': fmt['Url'],
-                    'ext': fmt['Extension'],
-                    'width': fmt['Width'],
-                    'height': fmt['Height'],
-                    'filesize': fmt['FileSize'],
-                    'format_id': str(fmt['Resolution'])
-                } for fmt in fmts])
-
-        prioritized_streams = video_json.get('PrioritizedStreams')
+            formats.extend([{
+                'url': f['Url'],
+                'ext': f.get('Extension'),
+                'width': int_or_none(f.get('Width')),
+                'height': int_or_none(f.get('Height')),
+                'filesize': int_or_none(f.get('FileSize')),
+                'format_id': compat_str(f.get('Resolution')) if f.get('Resolution') else None,
+            } for f in fmts if f.get('Url')])
+
+        prioritized_streams = video.get('PrioritizedStreams')
          if prioritized_streams:
              for prioritized_stream_id, prioritized_stream in prioritized_streams.items():
-                formats.extend([
-                    {
-                        'url': video_url,
-                        'width': int_or_none(format_id),
-                        'format_id': '%s-%s' % (prioritized_stream_id, format_id),
-                    } for format_id, video_url in prioritized_stream.items()
-                ])
+                formats.extend([{
+                    'url': video_url,
+                    'width': int_or_none(format_id),
+                    'format_id': '%s-%s' % (prioritized_stream_id, format_id),
+                } for format_id, video_url in prioritized_stream.items()])
  
          self._check_formats(formats, video_id)
          self._sort_formats(formats)
  
-        subtitles = self.extract_subtitles(video_id, page)
+        subtitles = self.extract_subtitles(video_id)
  
          return {
              'id': video_id,
@@ -183,7 +188,7 @@ class LyndaIE(LyndaBaseIE):
          if srt:
              return srt
  
-    def _get_subtitles(self, video_id, webpage):
+    def _get_subtitles(self, video_id):
          url = 'http://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
          subs = self._download_json(url, None, False)
          if subs:
@@ -205,12 +210,13 @@ class LyndaCourseIE(LyndaBaseIE):
          course_path = mobj.group('coursepath')
          course_id = mobj.group('courseid')
  
-        page = self._download_webpage(
+        course = self._download_json(
              'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
              course_id, 'Downloading course JSON')
-        course_json = json.loads(page)
  
-        if 'Status' in course_json and course_json['Status'] == 'NotFound':
+        self._logout()
+
+        if course.get('Status') == 'NotFound':
              raise ExtractorError(
                  'Course %s does not exist' % course_id, expected=True)
  
@@ -220,12 +226,13 @@ class LyndaCourseIE(LyndaBaseIE):
          # Might want to extract videos right here from video['Formats'] as it seems 'Formats' is not provided
          # by single video API anymore
  
-        for chapter in course_json['Chapters']:
-            for video in chapter['Videos']:
-                if video['HasAccess'] is False:
+        for chapter in course['Chapters']:
+            for video in chapter.get('Videos', []):
+                if video.get('HasAccess') is False:
                      unaccessible_videos += 1
                      continue
-                videos.append(video['ID'])
+                if video.get('ID'):
+                    videos.append(video['ID'])
  
          if unaccessible_videos > 0:
              self._downloader.report_warning(
@@ -238,6 +245,6 @@ class LyndaCourseIE(LyndaBaseIE):
                  'Lynda')
              for video_id in videos]
  
-        course_title = course_json['Title']
+        course_title = course.get('Title')
  
          return self.playlist_result(entries, course_id, course_title)
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

index fc7499958d750338116f5e14546e15c52377c0b9..88334889e950915304bca35c8c42bf104c143307 100644 (file)
--- a/youtube_dl/extractor/mdr.py
+++ b/youtube_dl/extractor/mdr.py
@@ -1,64 +1,169 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    determine_ext,
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+    xpath_text,
+)
  
  
  class MDRIE(InfoExtractor):
-    _VALID_URL = r'^(?P<domain>https?://(?:www\.)?mdr\.de)/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)(?:_|\.html)'
+    IE_DESC = 'MDR.DE and KiKA'
+    _VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+(?P<id>\d+)(?:_.+?)?\.html'
  
-    # No tests, MDR regularily deletes its videos
-    _TEST = {
+    _TESTS = [{
+        # MDR regularily deletes its videos
          'url': 'http://www.mdr.de/fakt/video189002.html',
          'only_matching': True,
-    }
+    }, {
+        # audio
+        'url': 'http://www.mdr.de/kultur/audio1312272_zc-15948bad_zs-86171fdd.html',
+        'md5': '64c4ee50f0a791deb9479cd7bbe9d2fa',
+        'info_dict': {
+            'id': '1312272',
+            'ext': 'mp3',
+            'title': 'Feuilleton vom 30. Oktober 2015',
+            'duration': 250,
+            'uploader': 'MITTELDEUTSCHER RUNDFUNK',
+        },
+    }, {
+        'url': 'http://www.kika.de/baumhaus/videos/video19636.html',
+        'md5': '4930515e36b06c111213e80d1e4aad0e',
+        'info_dict': {
+            'id': '19636',
+            'ext': 'mp4',
+            'title': 'Baumhaus vom 30. Oktober 2015',
+            'duration': 134,
+            'uploader': 'KIKA',
+        },
+    }, {
+        'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/videos/video8182.html',
+        'md5': '5fe9c4dd7d71e3b238f04b8fdd588357',
+        'info_dict': {
+            'id': '8182',
+            'ext': 'mp4',
+            'title': 'Beutolomäus und der geheime Weihnachtswunsch',
+            'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
+            'timestamp': 1419047100,
+            'upload_date': '20141220',
+            'duration': 4628,
+            'uploader': 'KIKA',
+        },
+    }, {
+        'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/einzelsendung2534.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        m = re.match(self._VALID_URL, url)
-        video_id = m.group('video_id')
-        domain = m.group('domain')
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        data_url = self._search_regex(
+            r'dataURL\s*:\s*(["\'])(?P<url>/.+/(?:video|audio)[0-9]+-avCustom\.xml)\1',
+            webpage, 'data url', group='url')
  
-        # determine title and media streams from webpage
-        html = self._download_webpage(url, video_id)
+        doc = self._download_xml(
+            compat_urlparse.urljoin(url, data_url), video_id)
  
-        title = self._html_search_regex(r'<h[12]>(.*?)</h[12]>', html, 'title')
-        xmlurl = self._search_regex(
-            r'dataURL:\'(/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, 'XML URL')
+        title = xpath_text(doc, ['./title', './broadcast/broadcastName'], 'title', fatal=True)
  
-        doc = self._download_xml(domain + xmlurl, video_id)
          formats = []
-        for a in doc.findall('./assets/asset'):
-            url_el = a.find('./progressiveDownloadUrl')
-            if url_el is None:
-                continue
-            abr = int(a.find('bitrateAudio').text) // 1000
-            media_type = a.find('mediaType').text
-            format = {
-                'abr': abr,
-                'filesize': int(a.find('fileSize').text),
-                'url': url_el.text,
-            }
-
-            vbr_el = a.find('bitrateVideo')
-            if vbr_el is None:
-                format.update({
-                    'vcodec': 'none',
-                    'format_id': '%s-%d' % (media_type, abr),
-                })
-            else:
-                vbr = int(vbr_el.text) // 1000
-                format.update({
-                    'vbr': vbr,
-                    'width': int(a.find('frameWidth').text),
-                    'height': int(a.find('frameHeight').text),
-                    'format_id': '%s-%d' % (media_type, vbr),
-                })
-            formats.append(format)
+        processed_urls = []
+        for asset in doc.findall('./assets/asset'):
+            for source in (
+                    'progressiveDownload',
+                    'dynamicHttpStreamingRedirector',
+                    'adaptiveHttpStreamingRedirector'):
+                url_el = asset.find('./%sUrl' % source)
+                if url_el is None:
+                    continue
+
+                video_url = url_el.text
+                if video_url in processed_urls:
+                    continue
+
+                processed_urls.append(video_url)
+
+                vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000)
+                abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
+
+                ext = determine_ext(url_el.text)
+                if ext == 'm3u8':
+                    url_formats = self._extract_m3u8_formats(
+                        video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                        preference=0, m3u8_id='HLS', fatal=False)
+                elif ext == 'f4m':
+                    url_formats = self._extract_f4m_formats(
+                        video_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
+                        preference=0, f4m_id='HDS', fatal=False)
+                else:
+                    media_type = xpath_text(asset, './mediaType', 'media type', default='MP4')
+                    vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000)
+                    abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
+                    filesize = int_or_none(xpath_text(asset, './fileSize', 'file size'))
+
+                    f = {
+                        'url': video_url,
+                        'format_id': '%s-%d' % (media_type, vbr or abr),
+                        'filesize': filesize,
+                        'abr': abr,
+                        'preference': 1,
+                    }
+
+                    if vbr:
+                        width = int_or_none(xpath_text(asset, './frameWidth', 'width'))
+                        height = int_or_none(xpath_text(asset, './frameHeight', 'height'))
+                        f.update({
+                            'vbr': vbr,
+                            'width': width,
+                            'height': height,
+                        })
+
+                    url_formats = [f]
+
+                if not url_formats:
+                    continue
+
+                if not vbr:
+                    for f in url_formats:
+                        abr = f.get('tbr') or abr
+                        if 'tbr' in f:
+                            del f['tbr']
+                        f.update({
+                            'abr': abr,
+                            'vcodec': 'none',
+                        })
+
+                formats.extend(url_formats)
+
          self._sort_formats(formats)
  
+        description = xpath_text(doc, './broadcast/broadcastDescription', 'description')
+        timestamp = parse_iso8601(
+            xpath_text(
+                doc, [
+                    './broadcast/broadcastDate',
+                    './broadcast/broadcastStartDate',
+                    './broadcast/broadcastEndDate'],
+                'timestamp', default=None))
+        duration = parse_duration(xpath_text(doc, './duration', 'duration'))
+        uploader = xpath_text(doc, './rights', 'uploader')
+
          return {
              'id': video_id,
              'title': title,
+            'description': description,
+            'timestamp': timestamp,
+            'duration': duration,
+            'uploader': uploader,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index 6e2e73a5162f10ea5818b636da579c932b4f2e7d..3c786a36dfb6c8908668cdf3191e660487862c82 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -7,12 +7,12 @@ from ..compat import (
      compat_parse_qs,
      compat_urllib_parse,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
  )
  from ..utils import (
      determine_ext,
      ExtractorError,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -117,7 +117,7 @@ class MetacafeIE(InfoExtractor):
              'filters': '0',
              'submit': "Continue - I'm over 18",
          }
-        request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
+        request = sanitized_Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self.report_age_confirmation()
          self._download_webpage(request, None, False, 'Unable to confirm age')
@@ -142,7 +142,7 @@ class MetacafeIE(InfoExtractor):
                  return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
  
          # Retrieve video webpage to extract further information
-        req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
+        req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
  
          # AnyClip videos require the flashversion cookie so that we get the link
          # to the mp4 file
diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py

index 14934b7ec5579d3b7cfb4b16e5308e81301ace63..e46b23a6f73990e14e73d911cb2e8253fba08a85 100644 (file)
--- a/youtube_dl/extractor/minhateca.py
+++ b/youtube_dl/extractor/minhateca.py
@@ -2,14 +2,12 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      int_or_none,
      parse_duration,
      parse_filesize,
+    sanitized_Request,
  )
  
  
@@ -39,7 +37,7 @@ class MinhatecaIE(InfoExtractor):
              ('fileId', video_id),
              ('__RequestVerificationToken', token),
          ]
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://minhateca.com.br/action/License/Download',
              data=compat_urllib_parse.urlencode(token_data))
          req.add_header('Content-Type', 'application/x-www-form-urlencoded')
diff --git a/youtube_dl/extractor/miomio.py b/youtube_dl/extractor/miomio.py

index a784fc5fba41c5931f6b1f040042e1900a6ff791..170ebd9eb9e285f91e4b8bd85c05b13f745039a8 100644 (file)
--- a/youtube_dl/extractor/miomio.py
+++ b/youtube_dl/extractor/miomio.py
@@ -8,6 +8,7 @@ from ..utils import (
      xpath_text,
      int_or_none,
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -51,6 +52,8 @@ class MioMioIE(InfoExtractor):
          mioplayer_path = self._search_regex(
              r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
  
+        http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
+
          xml_config = self._search_regex(
              r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
              webpage, 'xml config')
@@ -60,14 +63,12 @@ class MioMioIE(InfoExtractor):
              'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/xml.php?id=%s&r=%s' % (id, random.randint(100, 999)),
              video_id)
  
-        # the following xml contains the actual configuration information on the video file(s)
-        vid_config = self._download_xml(
+        vid_config_request = sanitized_Request(
              'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/sina.php?{0}'.format(xml_config),
-            video_id)
+            headers=http_headers)
  
-        http_headers = {
-            'Referer': 'http://www.miomio.tv%s' % mioplayer_path,
-        }
+        # the following xml contains the actual configuration information on the video file(s)
+        vid_config = self._download_xml(vid_config_request, video_id)
  
          if not int_or_none(xpath_text(vid_config, 'timelength')):
              raise ExtractorError('Unable to load videos!', expected=True)
diff --git a/youtube_dl/extractor/mit.py b/youtube_dl/extractor/mit.py

index f088ab9e22c4342eceb9d6df405ee864db377ebf..29ca45778a17654c4d2125ceda177b71cffca8a8 100644 (file)
--- a/youtube_dl/extractor/mit.py
+++ b/youtube_dl/extractor/mit.py
@@ -86,7 +86,7 @@ class MITIE(TechTVMITIE):
          webpage = self._download_webpage(url, page_title)
          embed_url = self._search_regex(
              r'<iframe .*?src="(.+?)"', webpage, 'embed url')
-        return self.url_result(embed_url, ie='TechTVMIT')
+        return self.url_result(embed_url)
  
  
  class OCWMITIE(InfoExtractor):
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 3142fcde2d338a098d66d5469b313fb698c65d09..c595f20775efd8e4aed348e4886ac55209e2c6e2 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -1,7 +1,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import (
+    compat_urllib_parse,
+    compat_urlparse,
+)
  from ..utils import (
      encode_dict,
      get_element_by_attribute,
@@ -15,7 +18,7 @@ class MiTeleIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
-        'md5': '757b0b66cbd7e0a97226d7d3156cb3e9',
+        'md5': '0ff1a13aebb35d9bc14081ff633dd324',
          'info_dict': {
              'id': '0NF1jJnxS1Wu3pHrmvFyw2',
              'display_id': 'programa-144',
@@ -34,6 +37,7 @@ class MiTeleIE(InfoExtractor):
  
          config_url = self._search_regex(
              r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
+        config_url = compat_urlparse.urljoin(url, config_url)
  
          config = self._download_json(
              config_url, display_id, 'Downloading config JSON')
diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py

index 5a66302f6ec317f89c4153248565159ebd075010..d930b96343bd081f029ad1bb7ce1ceee3c241f59 100644 (file)
--- a/youtube_dl/extractor/moevideo.py
+++ b/youtube_dl/extractor/moevideo.py
@@ -5,13 +5,11 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -80,7 +78,7 @@ class MoeVideoIE(InfoExtractor):
          ]
          r_json = json.dumps(r)
          post = compat_urllib_parse.urlencode({'r': r_json})
-        req = compat_urllib_request.Request(self._API_URL, post)
+        req = sanitized_Request(self._API_URL, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          response = self._download_json(req, video_id)
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index 9bf99a54a98c4838c2b878db3ec165c867602110..f8226cbb29efa1a5e538f9ce3a46ca1d38214604 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -7,8 +7,8 @@ from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
+from ..utils import sanitized_Request
  
  
  class MofosexIE(InfoExtractor):
@@ -29,7 +29,7 @@ class MofosexIE(InfoExtractor):
          video_id = mobj.group('id')
          url = 'http://www.' + mobj.group('url')
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/moniker.py b/youtube_dl/extractor/moniker.py

index 7c0c4e50e7c00d5bf66b30ee77deabc562fe2f07..f6bf94f2f6b3d6868fa098994a2d275f8bb59a99 100644 (file)
--- a/youtube_dl/extractor/moniker.py
+++ b/youtube_dl/extractor/moniker.py
@@ -5,13 +5,11 @@ import os.path
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      remove_start,
+    sanitized_Request,
  )
  
  
@@ -81,7 +79,7 @@ class MonikerIE(InfoExtractor):
              orig_webpage, 'builtin URL', default=None, group='url')
  
          if builtin_url:
-            req = compat_urllib_request.Request(builtin_url)
+            req = sanitized_Request(builtin_url)
              req.add_header('Referer', url)
              webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
              title = self._og_search_title(orig_webpage).strip()
@@ -94,7 +92,7 @@ class MonikerIE(InfoExtractor):
              headers = {
                  b'Content-Type': b'application/x-www-form-urlencoded',
              }
-            req = compat_urllib_request.Request(url, post, headers)
+            req = sanitized_Request(url, post, headers)
              webpage = self._download_webpage(
                  req, video_id, note='Downloading video page ...')
  
diff --git a/youtube_dl/extractor/mooshare.py b/youtube_dl/extractor/mooshare.py

index 7603af5e2f567bdbba116c8fca26591e7f2a3b1c..7cc7f054f6bba16b0ea44554de4a515bf7020342 100644 (file)
--- a/youtube_dl/extractor/mooshare.py
+++ b/youtube_dl/extractor/mooshare.py
@@ -3,12 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -59,7 +57,7 @@ class MooshareIE(InfoExtractor):
              'hash': hash_key,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py

index 04e17d0551c7a46feff1822c4dc4be38d00cc520..1564cb71f6844a3e83fd958ddd0c192be835fc3a 100644 (file)
--- a/youtube_dl/extractor/movieclips.py
+++ b/youtube_dl/extractor/movieclips.py
@@ -1,80 +1,40 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-)
-from ..utils import (
-    ExtractorError,
-    clean_html,
-)
+from ..utils import sanitized_Request
  
  
  class MovieClipsIE(InfoExtractor):
-    _VALID_URL = r'https?://movieclips\.com/(?P<id>[\da-zA-Z]+)(?:-(?P<display_id>[\da-z-]+))?'
+    _VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/(?P<id>[^/?#]+)'
      _TEST = {
-        'url': 'http://movieclips.com/Wy7ZU-my-week-with-marilyn-movie-do-you-love-me/',
+        'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597?autoPlay=true&playlistId=5',
          'info_dict': {
-            'id': 'Wy7ZU',
-            'display_id': 'my-week-with-marilyn-movie-do-you-love-me',
+            'id': 'pKIGmG83AqD9',
+            'display_id': 'warcraft-trailer-1-561180739597',
              'ext': 'mp4',
-            'title': 'My Week with Marilyn - Do You Love Me?',
-            'description': 'md5:e86795bd332fe3cff461e7c8dc542acb',
+            'title': 'Warcraft Trailer 1',
+            'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
              'thumbnail': 're:^https?://.*\.jpg$',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        }
+        'add_ie': ['ThePlatform'],
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-        show_id = display_id or video_id
-
-        config = self._download_xml(
-            'http://config.movieclips.com/player/config/%s' % video_id,
-            show_id, 'Downloading player config')
-
-        if config.find('./country-region').text == 'false':
-            raise ExtractorError(
-                '%s said: %s' % (self.IE_NAME, config.find('./region_alert').text), expected=True)
-
-        properties = config.find('./video/properties')
-        smil_file = properties.attrib['smil_file']
+        display_id = self._match_id(url)
  
-        smil = self._download_xml(smil_file, show_id, 'Downloading SMIL')
-        base_url = smil.find('./head/meta').attrib['base']
-
-        formats = []
-        for video in smil.findall('./body/switch/video'):
-            vbr = int(video.attrib['system-bitrate']) / 1000
-            src = video.attrib['src']
-            formats.append({
-                'url': base_url,
-                'play_path': src,
-                'ext': src.split(':')[0],
-                'vbr': vbr,
-                'format_id': '%dk' % vbr,
-            })
-
-        self._sort_formats(formats)
-
-        title = '%s - %s' % (properties.attrib['clip_movie_title'], properties.attrib['clip_title'])
-        description = clean_html(compat_str(properties.attrib['clip_description']))
-        thumbnail = properties.attrib['image']
-        categories = properties.attrib['clip_categories'].split(',')
+        req = sanitized_Request(url)
+        # it doesn't work if it thinks the browser it's too old
+        req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)')
+        webpage = self._download_webpage(req, display_id)
+        theplatform_link = self._html_search_regex(r'src="(http://player.theplatform.com/p/.*?)"', webpage, 'theplatform link')
+        title = self._html_search_regex(r'<title[^>]*>([^>]+)-\s*\d+\s*|\s*Movieclips.com</title>', webpage, 'title')
+        description = self._html_search_meta('description', webpage)
  
          return {
-            'id': video_id,
-            'display_id': display_id,
+            '_type': 'url_transparent',
+            'url': theplatform_link,
              'title': title,
+            'display_id': display_id,
              'description': description,
-            'thumbnail': thumbnail,
-            'categories': categories,
-            'formats': formats,
          }
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 302c9bf35bc6bb533c10f35ea11cd500012bba0a..d887583e6600806dcb8d21bde241b1169a53e6b0 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -5,7 +5,6 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_request,
      compat_str,
  )
  from ..utils import (
@@ -13,6 +12,7 @@ from ..utils import (
      find_xpath_attr,
      fix_xml_ampersands,
      HEADRequest,
+    sanitized_Request,
      unescapeHTML,
      url_basename,
      RegexNotFoundError,
@@ -53,7 +53,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
  
      def _extract_mobile_video_formats(self, mtvn_id):
          webpage_url = self._MOBILE_TEMPLATE % mtvn_id
-        req = compat_urllib_request.Request(webpage_url)
+        req = sanitized_Request(webpage_url)
          # Otherwise we get a webpage that would execute some javascript
          req.add_header('User-Agent', 'curl/7')
          webpage = self._download_webpage(req, mtvn_id,
diff --git a/youtube_dl/extractor/myvideo.py b/youtube_dl/extractor/myvideo.py

index c96f472a39e569c7dfb88682d36fad9ed6ce2c10..36ab388b2fef3c686cdae958e555ed49b0399baa 100644 (file)
--- a/youtube_dl/extractor/myvideo.py
+++ b/youtube_dl/extractor/myvideo.py
@@ -11,10 +11,10 @@ from ..compat import (
      compat_ord,
      compat_urllib_parse,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -83,7 +83,7 @@ class MyVideoIE(InfoExtractor):
  
          mobj = re.search(r'data-video-service="/service/data/video/%s/config' % video_id, webpage)
          if mobj is not None:
-            request = compat_urllib_request.Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
+            request = sanitized_Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
              response = self._download_webpage(request, video_id,
                                                'Downloading video info')
              info = json.loads(base64.b64decode(response).decode('utf-8'))
diff --git a/youtube_dl/extractor/nba.py b/youtube_dl/extractor/nba.py

index 944096e1ca15de964fcdf896adf988c9aa2264bd..77a3b49ef09adedfa150264a68b00992b273c94f 100644 (file)
--- a/youtube_dl/extractor/nba.py
+++ b/youtube_dl/extractor/nba.py
@@ -1,63 +1,92 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
-    remove_end,
      parse_duration,
+    int_or_none,
+    xpath_text,
+    xpath_attr,
  )
  
  
  class NBAIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?:nba/)?video(?P<id>/[^?]*?)/?(?:/index\.html)?(?:\?.*)?$'
+    _VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?P<path>(?:[^/]+/)?video/(?P<id>[^?]*?))/?(?:/index\.html)?(?:\?.*)?$'
      _TESTS = [{
          'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
-        'md5': 'c0edcfc37607344e2ff8f13c378c88a4',
+        'md5': '9e7729d3010a9c71506fd1248f74e4f4',
          'info_dict': {
-            'id': '0021200253-okc-bkn-recap.nba',
-            'ext': 'mp4',
+            'id': '0021200253-okc-bkn-recap',
+            'ext': 'flv',
              'title': 'Thunder vs. Nets',
              'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
              'duration': 181,
+            'timestamp': 1354638466,
+            'upload_date': '20121204',
          },
      }, {
          'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/',
          'only_matching': True,
      }, {
-        'url': 'http://watch.nba.com/nba/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
+        'url': 'http://watch.nba.com/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
+        'md5': 'b2b39b81cf28615ae0c3360a3f9668c4',
          'info_dict': {
-            'id': '0041400301-cle-atl-recap.nba',
+            'id': '0041400301-cle-atl-recap',
              'ext': 'mp4',
-            'title': 'NBA GAME TIME | Video: Hawks vs. Cavaliers Game 1',
+            'title': 'Hawks vs. Cavaliers Game 1',
              'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d',
              'duration': 228,
-        },
-        'params': {
-            'skip_download': True,
+            'timestamp': 1432134543,
+            'upload_date': '20150520',
          }
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = 'http://ht-mobile.cdn.turner.com/nba/big' + video_id + '_nba_1280x720.mp4'
+        path, video_id = re.match(self._VALID_URL, url).groups()
+        video_info = self._download_xml('http://www.nba.com/%s.xml' % path, video_id)
+        video_id = xpath_text(video_info, 'slug')
+        title = xpath_text(video_info, 'headline')
+        description = xpath_text(video_info, 'description')
+        duration = parse_duration(xpath_text(video_info, 'length'))
+        timestamp = int_or_none(xpath_attr(video_info, 'dateCreated', 'uts'))
  
-        shortened_video_id = video_id.rpartition('/')[2]
-        title = remove_end(
-            self._og_search_title(webpage, default=shortened_video_id), ' : NBA.com')
+        thumbnails = []
+        for image in video_info.find('images'):
+            thumbnails.append({
+                'id': image.attrib.get('cut'),
+                'url': image.text,
+                'width': int_or_none(image.attrib.get('width')),
+                'height': int_or_none(image.attrib.get('height')),
+            })
  
-        description = self._og_search_description(webpage)
-        duration_str = self._html_search_meta(
-            'duration', webpage, 'duration', default=None)
-        if not duration_str:
-            duration_str = self._html_search_regex(
-                r'Duration:</b>\s*(\d+:\d+)', webpage, 'duration', fatal=False)
-        duration = parse_duration(duration_str)
+        formats = []
+        for video_file in video_info.find('files').iter('file'):
+            video_url = video_file.text
+            if video_url.startswith('/'):
+                continue
+            if video_url.endswith('.m3u8'):
+                formats.extend(self._extract_m3u8_formats(video_url, video_id, m3u8_id='hls'))
+            elif video_url.endswith('.f4m'):
+                formats.extend(self._extract_f4m_formats(video_url + '?hdcore=3.4.1.1', video_id, f4m_id='hds'))
+            else:
+                key = video_file.attrib.get('bitrate')
+                width, height, bitrate = re.search(r'(\d+)x(\d+)(?:_(\d+))?', key).groups()
+                formats.append({
+                    'format_id': key,
+                    'url': video_url,
+                    'width': int_or_none(width),
+                    'height': int_or_none(height),
+                    'tbr': int_or_none(bitrate),
+                })
+        self._sort_formats(formats)
  
          return {
-            'id': shortened_video_id,
-            'url': video_url,
+            'id': video_id,
              'title': title,
              'description': description,
              'duration': duration,
+            'timestamp': timestamp,
+            'thumbnails': thumbnails,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index ba06d8a98aeb179ce67b61d12ac6be2db8d8c0db..16213eed90119266207bd1936adf098050a92c4b 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -23,7 +23,7 @@ class NDRBaseIE(InfoExtractor):
  class NDRIE(NDRBaseIE):
      IE_NAME = 'ndr'
      IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
-    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)+(?P<id>[^/?#]+),[\da-z]+\.html'
+    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
      _TESTS = [{
          # httpVideo, same content id
          'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@@ -78,6 +78,9 @@ class NDRIE(NDRBaseIE):
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'https://www.ndr.de/Fettes-Brot-Ferris-MC-und-Thees-Uhlmann-live-on-stage,festivalsommer116.html',
+        'only_matching': True,
      }]
  
      def _extract_embed(self, webpage, display_id):
@@ -102,7 +105,7 @@ class NDRIE(NDRBaseIE):
  class NJoyIE(NDRBaseIE):
      IE_NAME = 'njoy'
      IE_DESC = 'N-JOY'
-    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)+(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
+    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
      _TESTS = [{
          # httpVideo, same content id
          'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
@@ -235,7 +238,7 @@ class NDREmbedBaseIE(InfoExtractor):
  
  class NDREmbedIE(NDREmbedBaseIE):
      IE_NAME = 'ndr:embed'
-    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)+(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
+    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
      _TESTS = [{
          'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
          'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@@ -329,7 +332,7 @@ class NDREmbedIE(NDREmbedBaseIE):
  
  class NJoyEmbedIE(NDREmbedBaseIE):
      IE_NAME = 'njoy:embed'
-    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)+(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
+    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
      _TESTS = [{
          # httpVideo
          'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',
diff --git a/youtube_dl/extractor/neteasemusic.py b/youtube_dl/extractor/neteasemusic.py

index a8e0a64ed4933644965fd07c3eb3216fc532c915..15eca825ad9925ac4ab90902123edebc432c0500 100644 (file)
--- a/youtube_dl/extractor/neteasemusic.py
+++ b/youtube_dl/extractor/neteasemusic.py
@@ -8,11 +8,11 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_request,
      compat_urllib_parse,
      compat_str,
      compat_itertools_count,
  )
+from ..utils import sanitized_Request
  
  
  class NetEaseMusicBaseIE(InfoExtractor):
@@ -40,7 +40,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
              if not details:
                  continue
              formats.append({
-                'url': 'http://m1.music.126.net/%s/%s.%s' %
+                'url': 'http://m5.music.126.net/%s/%s.%s' %
                         (cls._encrypt(details['dfsId']), details['dfsId'],
                          details['extension']),
                  'ext': details.get('extension'),
@@ -56,7 +56,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
          return int(round(ms / 1000.0))
  
      def query_api(self, endpoint, video_id, note):
-        req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint))
+        req = sanitized_Request('%s%s' % (self._API_BASE, endpoint))
          req.add_header('Referer', self._API_BASE)
          return self._download_json(req, video_id, note)
  
diff --git a/youtube_dl/extractor/nfb.py b/youtube_dl/extractor/nfb.py

index ea077254b4320fe18e59eb9b67461b13c146b873..5bd15f7a72f5aeb49d91391e11ddffaa1a52f44f 100644 (file)
--- a/youtube_dl/extractor/nfb.py
+++ b/youtube_dl/extractor/nfb.py
@@ -1,10 +1,8 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
  
  
  class NFBIE(InfoExtractor):
@@ -40,8 +38,9 @@ class NFBIE(InfoExtractor):
          uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
                                             page, 'director name', fatal=False)
  
-        request = compat_urllib_request.Request('https://www.nfb.ca/film/%s/player_config' % video_id,
-                                                compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
+        request = sanitized_Request(
+            'https://www.nfb.ca/film/%s/player_config' % video_id,
+            compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
  
diff --git a/youtube_dl/extractor/niconico.py b/youtube_dl/extractor/niconico.py

index bda1cff056d35ffd9cb876a0cc2fc141a6ce2ba1..586e52a4a4f49151c55ed99baf9cadd136863bbd 100644 (file)
--- a/youtube_dl/extractor/niconico.py
+++ b/youtube_dl/extractor/niconico.py
@@ -8,7 +8,6 @@ import datetime
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
@@ -17,6 +16,7 @@ from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
+    sanitized_Request,
      xpath_text,
      determine_ext,
  )
@@ -102,7 +102,7 @@ class NiconicoIE(InfoExtractor):
              'password': password,
          }
          login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://secure.nicovideo.jp/secure/login', login_data)
          login_results = self._download_webpage(
              request, None, note='Logging in', errnote='Unable to log in')
@@ -145,7 +145,7 @@ class NiconicoIE(InfoExtractor):
                  'k': thumb_play_key,
                  'v': video_id
              })
-            flv_info_request = compat_urllib_request.Request(
+            flv_info_request = sanitized_Request(
                  'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
                  {'Content-Type': 'application/x-www-form-urlencoded'})
              flv_info_webpage = self._download_webpage(
diff --git a/youtube_dl/extractor/noco.py b/youtube_dl/extractor/noco.py

index a53e27b274eaa21ac15a1dc5077001d520832696..76bd21e6de74642362043183ab3288797e16ae46 100644 (file)
--- a/youtube_dl/extractor/noco.py
+++ b/youtube_dl/extractor/noco.py
@@ -9,7 +9,6 @@ from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
  )
  from ..utils import (
      clean_html,
@@ -17,6 +16,7 @@ from ..utils import (
      int_or_none,
      float_or_none,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -74,7 +74,7 @@ class NocoIE(InfoExtractor):
              'username': username,
              'password': password,
          }
-        request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
+        request = sanitized_Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
  
          login = self._download_json(request, None, 'Logging in as %s' % username)
diff --git a/youtube_dl/extractor/nosvideo.py b/youtube_dl/extractor/nosvideo.py

index f5ef856db0155dd84f10d5db4a8cef8e6c08213c..eab816e4916bc2fae7d72cde598cb5b5f69bfde4 100644 (file)
--- a/youtube_dl/extractor/nosvideo.py
+++ b/youtube_dl/extractor/nosvideo.py
@@ -4,11 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
      urlencode_postdata,
      xpath_text,
      xpath_with_ns,
@@ -41,7 +39,7 @@ class NosVideoIE(InfoExtractor):
              'op': 'download1',
              'method_free': 'Continue to Video',
          }
-        req = compat_urllib_request.Request(url, urlencode_postdata(fields))
+        req = sanitized_Request(url, urlencode_postdata(fields))
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
          webpage = self._download_webpage(req, video_id,
                                           'Downloading download page')
diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py

index 04d779890af1960d65b070d0b2f80e429db21d07..6163e885558859bb00abb4be7d40da9852132c08 100644 (file)
--- a/youtube_dl/extractor/novamov.py
+++ b/youtube_dl/extractor/novamov.py
@@ -3,11 +3,13 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
+    NO_DEFAULT,
+    encode_dict,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -38,19 +40,40 @@ class NovaMovIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
-        page = self._download_webpage(
-            'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
+        url = 'http://%s/video/%s' % (self._HOST, video_id)
  
-        if re.search(self._FILE_DELETED_REGEX, page) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+        webpage = self._download_webpage(
+            url, video_id, 'Downloading video page')
  
-        filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
+        if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
-        title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
-        description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
+        def extract_filekey(default=NO_DEFAULT):
+            return self._search_regex(
+                self._FILEKEY_REGEX, webpage, 'filekey', default=default)
+
+        filekey = extract_filekey(default=None)
+
+        if not filekey:
+            fields = self._hidden_inputs(webpage)
+            post_url = self._search_regex(
+                r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage,
+                'post url', default=url, group='url')
+            if not post_url.startswith('http'):
+                post_url = compat_urlparse.urljoin(url, post_url)
+            request = sanitized_Request(
+                post_url, urlencode_postdata(encode_dict(fields)))
+            request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+            request.add_header('Referer', post_url)
+            webpage = self._download_webpage(
+                request, video_id, 'Downloading continue to the video page')
+
+        filekey = extract_filekey()
+
+        title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title', fatal=False)
+        description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
  
          api_response = self._download_webpage(
              'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py

index b97f62fdb839f4cdb395c3bf0fa152ec8eace0ca..d480fb58c4695b29b370f54cd2bde747de04c8f1 100644 (file)
--- a/youtube_dl/extractor/nowness.py
+++ b/youtube_dl/extractor/nowness.py
@@ -1,12 +1,12 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
  from .common import InfoExtractor
-from ..utils import ExtractorError
-from ..compat import (
-    compat_str,
-    compat_urllib_request,
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -22,10 +22,10 @@ class NownessBaseIE(InfoExtractor):
                              'http://www.nowness.com/iframe?id=%s' % video_id, video_id,
                              note='Downloading player JavaScript',
                              errnote='Unable to download player JavaScript')
-                        bc_url = BrightcoveIE._extract_brightcove_url(player_code)
+                        bc_url = BrightcoveLegacyIE._extract_brightcove_url(player_code)
                          if bc_url is None:
                              raise ExtractorError('Could not find player definition')
-                        return self.url_result(bc_url, 'Brightcove')
+                        return self.url_result(bc_url, 'BrightcoveLegacy')
                      elif source == 'vimeo':
                          return self.url_result('http://vimeo.com/%s' % video_id, 'Vimeo')
                      elif source == 'youtube':
@@ -37,7 +37,7 @@ class NownessBaseIE(InfoExtractor):
  
      def _api_request(self, url, request_path):
          display_id = self._match_id(url)
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://api.nowness.com/api/' + request_path % display_id,
              headers={
                  'X-Nowness-Language': 'zh-cn' if 'cn.nowness.com' in url else 'en-us',
diff --git a/youtube_dl/extractor/nowtv.py b/youtube_dl/extractor/nowtv.py

index b0bdffc4ea168cf2138340f6ab16f0b4f6644dd4..67e34b294520faa6d8320c719471dcfe89c71fa9 100644 (file)
--- a/youtube_dl/extractor/nowtv.py
+++ b/youtube_dl/extractor/nowtv.py
@@ -1,6 +1,8 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
@@ -13,8 +15,63 @@ from ..utils import (
  )
  
  
-class NowTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/(?:player|preview)'
+class NowTVBaseIE(InfoExtractor):
+    _VIDEO_FIELDS = (
+        'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
+        'broadcastStartDate', 'seoUrl', 'duration', 'files',
+        'format.defaultImage169Format', 'format.defaultImage169Logo')
+
+    def _extract_video(self, info, display_id=None):
+        video_id = compat_str(info['id'])
+
+        files = info['files']
+        if not files:
+            if info.get('geoblocked', False):
+                raise ExtractorError(
+                    'Video %s is not available from your location due to geo restriction' % video_id,
+                    expected=True)
+            if not info.get('free', True):
+                raise ExtractorError(
+                    'Video %s is not available for free' % video_id, expected=True)
+
+        formats = []
+        for item in files['items']:
+            if determine_ext(item['path']) != 'f4v':
+                continue
+            app, play_path = remove_start(item['path'], '/').split('/', 1)
+            formats.append({
+                'url': 'rtmpe://fms.rtl.de',
+                'app': app,
+                'play_path': 'mp4:%s' % play_path,
+                'ext': 'flv',
+                'page_url': 'http://rtlnow.rtl.de',
+                'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
+                'tbr': int_or_none(item.get('bitrate')),
+            })
+        self._sort_formats(formats)
+
+        title = info['title']
+        description = info.get('articleLong') or info.get('articleShort')
+        timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
+        duration = parse_duration(info.get('duration'))
+
+        f = info.get('format', {})
+        thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+
+        return {
+            'id': video_id,
+            'display_id': display_id or info.get('seoUrl'),
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
+        }
+
+
+class NowTVIE(NowTVBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:list/[^/]+/)?(?P<id>[^/]+)/(?:player|preview)'
  
      _TESTS = [{
          # rtl
@@ -23,7 +80,7 @@ class NowTVIE(InfoExtractor):
              'id': '203519',
              'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
              'ext': 'flv',
-            'title': 'Die neuen Bauern und eine Hochzeit',
+            'title': 'Inka Bause stellt die neuen Bauern vor',
              'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
              'thumbnail': 're:^https?://.*\.jpg$',
              'timestamp': 1432580700,
@@ -136,58 +193,65 @@ class NowTVIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-        display_id_split = display_id.split('/')
-        if len(display_id) > 2:
-            display_id = '/'.join((display_id_split[0], display_id_split[-1]))
+        mobj = re.match(self._VALID_URL, url)
+        display_id = '%s/%s' % (mobj.group('show_id'), mobj.group('id'))
  
          info = self._download_json(
-            'https://api.nowtv.de/v3/movies/%s?fields=id,title,free,geoblocked,articleLong,articleShort,broadcastStartDate,seoUrl,duration,format,files' % display_id,
-            display_id)
+            'https://api.nowtv.de/v3/movies/%s?fields=%s'
+            % (display_id, ','.join(self._VIDEO_FIELDS)), display_id)
  
-        video_id = compat_str(info['id'])
+        return self._extract_video(info, display_id)
  
-        files = info['files']
-        if not files:
-            if info.get('geoblocked', False):
-                raise ExtractorError(
-                    'Video %s is not available from your location due to geo restriction' % video_id,
-                    expected=True)
-            if not info.get('free', True):
-                raise ExtractorError(
-                    'Video %s is not available for free' % video_id, expected=True)
  
-        formats = []
-        for item in files['items']:
-            if determine_ext(item['path']) != 'f4v':
-                continue
-            app, play_path = remove_start(item['path'], '/').split('/', 1)
-            formats.append({
-                'url': 'rtmpe://fms.rtl.de',
-                'app': app,
-                'play_path': 'mp4:%s' % play_path,
-                'ext': 'flv',
-                'page_url': 'http://rtlnow.rtl.de',
-                'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
-                'tbr': int_or_none(item.get('bitrate')),
-            })
-        self._sort_formats(formats)
+class NowTVListIE(NowTVBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/list/(?P<id>[^?/#&]+)$'
  
-        title = info['title']
-        description = info.get('articleLong') or info.get('articleShort')
-        timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
-        duration = parse_duration(info.get('duration'))
+    _SHOW_FIELDS = ('title', )
+    _SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
  
-        f = info.get('format', {})
-        thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+    _TESTS = [{
+        'url': 'http://www.nowtv.at/rtl/stern-tv/list/aktuell',
+        'info_dict': {
+            'id': '17006',
+            'title': 'stern TV - Aktuell',
+        },
+        'playlist_count': 1,
+    }, {
+        'url': 'http://www.nowtv.at/rtl/das-supertalent/list/free-staffel-8',
+        'info_dict': {
+            'id': '20716',
+            'title': 'Das Supertalent - FREE Staffel 8',
+        },
+        'playlist_count': 14,
+    }]
  
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'duration': duration,
-            'formats': formats,
-        }
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        show_id = mobj.group('show_id')
+        season_id = mobj.group('id')
+
+        fields = []
+        fields.extend(self._SHOW_FIELDS)
+        fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
+        fields.extend(
+            'formatTabs.formatTabPages.container.movies.%s' % field
+            for field in self._VIDEO_FIELDS)
+
+        list_info = self._download_json(
+            'https://api.nowtv.de/v3/formats/seo?fields=%s&name=%s.php'
+            % (','.join(fields), show_id),
+            season_id)
+
+        season = next(
+            season for season in list_info['formatTabs']['items']
+            if season.get('seoheadline') == season_id)
+
+        title = '%s - %s' % (list_info['title'], season['headline'])
+
+        entries = []
+        for container in season['formatTabPages']['items']:
+            for info in ((container.get('container') or {}).get('movies') or {}).get('items') or []:
+                entries.append(self._extract_video(info))
+
+        return self.playlist_result(
+            entries, compat_str(season.get('id') or season_id), title)
diff --git a/youtube_dl/extractor/nowvideo.py b/youtube_dl/extractor/nowvideo.py

index 17baa96796fafbf5d70b220d2f796cdc707d216b..57ee3d3662b291c6acc18fecc220671df0c706eb 100644 (file)
--- a/youtube_dl/extractor/nowvideo.py
+++ b/youtube_dl/extractor/nowvideo.py
@@ -7,9 +7,9 @@ class NowVideoIE(NovaMovIE):
      IE_NAME = 'nowvideo'
      IE_DESC = 'NowVideo'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|ec|sx|eu|at|ag|co|li)'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
  
-    _HOST = 'www.nowvideo.ch'
+    _HOST = 'www.nowvideo.to'
  
      _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
      _FILEKEY_REGEX = r'var fkzd="([^"]+)";'
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index 8ac38a174b4c64bbc8a3b40ec36c10d6fab595fa..6ff13050dc7e5d6ee583678a2e07ea2383277891 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -6,6 +6,7 @@ import re
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
+    determine_ext,
      ExtractorError,
      float_or_none,
      parse_duration,
@@ -48,12 +49,22 @@ class NRKIE(InfoExtractor):
              'http://v8.psapi.nrk.no/mediaelement/%s' % video_id,
              video_id, 'Downloading media JSON')
  
-        if data['usageRights']['isGeoBlocked']:
-            raise ExtractorError(
-                'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
-                expected=True)
+        media_url = data.get('mediaUrl')
  
-        video_url = data['mediaUrl'] + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81'
+        if not media_url:
+            if data['usageRights']['isGeoBlocked']:
+                raise ExtractorError(
+                    'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
+                    expected=True)
+
+        if determine_ext(media_url) == 'f4m':
+            formats = self._extract_f4m_formats(
+                media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
+        else:
+            formats = [{
+                'url': media_url,
+                'ext': 'flv',
+            }]
  
          duration = parse_duration(data.get('duration'))
  
@@ -67,12 +78,11 @@ class NRKIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'url': video_url,
-            'ext': 'flv',
              'title': data['title'],
              'description': data['description'],
              'duration': duration,
              'thumbnail': thumbnail,
+            'formats': formats,
          }
  
  
diff --git a/youtube_dl/extractor/nuvid.py b/youtube_dl/extractor/nuvid.py

index 57928f2aedcc0acfa5ba71d6e9f0a62af9d67b71..9fa7cefadc79ef1d8bda971dc52483a0b8d998eb 100644 (file)
--- a/youtube_dl/extractor/nuvid.py
+++ b/youtube_dl/extractor/nuvid.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_duration,
+    sanitized_Request,
      unified_strdate,
  )
  
@@ -33,7 +31,7 @@ class NuvidIE(InfoExtractor):
          formats = []
  
          for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
-            request = compat_urllib_request.Request(
+            request = sanitized_Request(
                  'http://m.nuvid.com/play/%s' % video_id)
              request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
              webpage = self._download_webpage(
diff --git a/youtube_dl/extractor/patreon.py b/youtube_dl/extractor/patreon.py

index 6cdc2638b4930dc92835d71f673b560dea99022d..ec8876c28551af6e717ac49cbd22da46d096c67c 100644 (file)
--- a/youtube_dl/extractor/patreon.py
+++ b/youtube_dl/extractor/patreon.py
@@ -2,9 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    js_to_json,
-)
+from ..utils import js_to_json
  
  
  class PatreonIE(InfoExtractor):
@@ -65,7 +63,7 @@ class PatreonIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://www.patreon.com/processLogin',
              compat_urllib_parse.urlencode(login_form).encode('utf-8')
          )
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index 3448736a258458ff1a2b4a2ee126ce703894e6dc..b787e2a73c66a0ff1bf2d17b9f20c13bf166aae7 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -8,6 +8,7 @@ from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
+    strip_jsonp,
      unified_strdate,
      US_RATINGS,
  )
@@ -21,7 +22,7 @@ class PBSIE(InfoExtractor):
             # Article with embedded player (or direct video)
             (?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
             # Player
-           video\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
+           (?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
          )
      '''
  
@@ -153,6 +154,26 @@ class PBSIE(InfoExtractor):
              'params': {
                  'skip_download': True,  # requires ffmpeg
              },
+        },
+        {
+            # Frontline video embedded via flp2012.js
+            'url': 'http://www.pbs.org/wgbh/pages/frontline/the-atomic-artists',
+            'info_dict': {
+                'id': '2070868960',
+                'display_id': 'the-atomic-artists',
+                'ext': 'mp4',
+                'title': 'FRONTLINE - The Atomic Artists',
+                'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
+                'duration': 723,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            },
+            'params': {
+                'skip_download': True,  # requires ffmpeg
+            },
+        },
+        {
+            'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
+            'only_matching': True,
          }
      ]
      _ERRORS = {
@@ -191,9 +212,30 @@ class PBSIE(InfoExtractor):
              if media_id:
                  return media_id, presumptive_id, upload_date
  
-            url = self._search_regex(
-                r'(?s)<iframe[^>]+?(?:[a-z-]+?=["\'].*?["\'][^>]+?)*?\bsrc=["\']([^\'"]+partnerplayer[^\'"]+)["\']',
-                webpage, 'player URL')
+            # Fronline video embedded via flp
+            video_id = self._search_regex(
+                r'videoid\s*:\s*"([\d+a-z]{7,})"', webpage, 'videoid', default=None)
+            if video_id:
+                # pkg_id calculation is reverse engineered from
+                # http://www.pbs.org/wgbh/pages/frontline/js/flp2012.js
+                prg_id = self._search_regex(
+                    r'videoid\s*:\s*"([\d+a-z]{7,})"', webpage, 'videoid')[7:]
+                if 'q' in prg_id:
+                    prg_id = prg_id.split('q')[1]
+                prg_id = int(prg_id, 16)
+                getdir = self._download_json(
+                    'http://www.pbs.org/wgbh/pages/frontline/.json/getdir/getdir%d.json' % prg_id,
+                    presumptive_id, 'Downloading getdir JSON',
+                    transform_source=strip_jsonp)
+                return getdir['mid'], presumptive_id, upload_date
+
+            for iframe in re.findall(r'(?s)<iframe(.+?)></iframe>', webpage):
+                url = self._search_regex(
+                    r'src=(["\'])(?P<url>.+?partnerplayer.+?)\1', iframe,
+                    'player URL', default=None, group='url')
+                if url:
+                    break
+
              mobj = re.match(self._VALID_URL, url)
  
          player_id = mobj.group('player_id')
@@ -221,7 +263,7 @@ class PBSIE(InfoExtractor):
              return self.playlist_result(entries, display_id)
  
          info = self._download_json(
-            'http://video.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
+            'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
              display_id)
  
          formats = []
diff --git a/youtube_dl/extractor/periscope.py b/youtube_dl/extractor/periscope.py

index 8ad9367584e3c7f5e524baf3fb5333b1b2819a5d..63cc764bb8eceed80893606e640466b355279a0d 100644 (file)
--- a/youtube_dl/extractor/periscope.py
+++ b/youtube_dl/extractor/periscope.py
@@ -2,17 +2,14 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import parse_iso8601
  
  
  class PeriscopeIE(InfoExtractor):
      IE_DESC = 'Periscope'
-    _VALID_URL = r'https?://(?:www\.)?periscope\.tv/w/(?P<id>[^/?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?periscope\.tv/[^/]+/(?P<id>[^/?#]+)'
+    # Alive example URLs can be found here http://onperiscope.com/
+    _TESTS = [{
          'url': 'https://www.periscope.tv/w/aJUQnjY3MjA3ODF8NTYxMDIyMDl2zCg2pECBgwTqRpQuQD352EMPTKQjT4uqlM3cgWFA-g==',
          'md5': '65b57957972e503fcbbaeed8f4fa04ca',
          'info_dict': {
@@ -25,11 +22,18 @@ class PeriscopeIE(InfoExtractor):
              'uploader_id': '1465763',
          },
          'skip': 'Expires in 24 hours',
-    }
+    }, {
+        'url': 'https://www.periscope.tv/w/1ZkKzPbMVggJv',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.periscope.tv/bastaakanoggano/1OdKrlkZZjOJX',
+        'only_matching': True,
+    }]
  
-    def _call_api(self, method, token):
+    def _call_api(self, method, value):
+        attribute = 'token' if len(value) > 13 else 'broadcast_id'
          return self._download_json(
-            'https://api.periscope.tv/api/v2/%s?token=%s' % (method, token), token)
+            'https://api.periscope.tv/api/v2/%s?%s=%s' % (method, attribute, value), value)
  
      def _real_extract(self, url):
          token = self._match_id(url)
@@ -76,24 +80,3 @@ class PeriscopeIE(InfoExtractor):
              'thumbnails': thumbnails,
              'formats': formats,
          }
-
-
-class QuickscopeIE(InfoExtractor):
-    IE_DESC = 'Quick Scope'
-    _VALID_URL = r'https?://watchonperiscope\.com/broadcast/(?P<id>\d+)'
-    _TEST = {
-        'url': 'https://watchonperiscope.com/broadcast/56180087',
-        'only_matching': True,
-    }
-
-    def _real_extract(self, url):
-        broadcast_id = self._match_id(url)
-        request = compat_urllib_request.Request(
-            'https://watchonperiscope.com/api/accessChannel', compat_urllib_parse.urlencode({
-                'broadcast_id': broadcast_id,
-                'entry_ticket': '',
-                'from_push': 'false',
-                'uses_sessions': 'true',
-            }).encode('utf-8'))
-        return self.url_result(
-            self._download_json(request, broadcast_id)['share_url'], 'Periscope')
diff --git a/youtube_dl/extractor/played.py b/youtube_dl/extractor/played.py

index 8a1c296dda8b57611a0e464387be43ab0fc9a370..2856af96f49cf7928a0dd4fe79ccd287592fb3c1 100644 (file)
--- a/youtube_dl/extractor/played.py
+++ b/youtube_dl/extractor/played.py
@@ -5,12 +5,10 @@ import re
  import os.path
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -46,7 +44,7 @@ class PlayedIE(InfoExtractor):
          headers = {
              b'Content-Type': b'application/x-www-form-urlencoded',
          }
-        req = compat_urllib_request.Request(url, post, headers)
+        req = sanitized_Request(url, post, headers)
          webpage = self._download_webpage(
              req, video_id, note='Downloading video page ...')
  
diff --git a/youtube_dl/extractor/pluralsight.py b/youtube_dl/extractor/pluralsight.py

index fd32836ccaf61b171f9dcafbbfce24c7764b18c1..aa7dbcb63ad50b196cd15b2d0c44772ae3d5149e 100644 (file)
--- a/youtube_dl/extractor/pluralsight.py
+++ b/youtube_dl/extractor/pluralsight.py
@@ -1,29 +1,35 @@
  from __future__ import unicode_literals
  
-import re
  import json
+import random
+import collections
  
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
      int_or_none,
      parse_duration,
+    sanitized_Request,
  )
  
  
-class PluralsightIE(InfoExtractor):
+class PluralsightBaseIE(InfoExtractor):
+    _API_BASE = 'http://app.pluralsight.com'
+
+
+class PluralsightIE(PluralsightBaseIE):
      IE_NAME = 'pluralsight'
-    _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/training/player\?author=(?P<author>[^&]+)&name=(?P<name>[^&]+)(?:&mode=live)?&clip=(?P<clip>\d+)&course=(?P<course>[^&]+)'
-    _LOGIN_URL = 'https://www.pluralsight.com/id/'
+    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?'
+    _LOGIN_URL = 'https://app.pluralsight.com/id/'
+
      _NETRC_MACHINE = 'pluralsight'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.pluralsight.com/training/player?author=mike-mckeown&name=hosting-sql-server-windows-azure-iaas-m7-mgmt&mode=live&clip=3&course=hosting-sql-server-windows-azure-iaas',
          'md5': '4d458cf5cf4c593788672419a8dd4cf8',
          'info_dict': {
@@ -33,7 +39,14 @@ class PluralsightIE(InfoExtractor):
              'duration': 338,
          },
          'skip': 'Requires pluralsight account credentials',
-    }
+    }, {
+        'url': 'https://app.pluralsight.com/training/player?course=angularjs-get-started&author=scott-allen&name=angularjs-get-started-m1-introduction&clip=0&mode=live',
+        'only_matching': True,
+    }, {
+        # available without pluralsight account
+        'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started',
+        'only_matching': True,
+    }]
  
      def _real_initialize(self):
          self._login()
@@ -41,7 +54,7 @@ class PluralsightIE(InfoExtractor):
      def _login(self):
          (username, password) = self._get_login_info()
          if username is None:
-            self.raise_login_required('Pluralsight account is required')
+            return
  
          login_page = self._download_webpage(
              self._LOGIN_URL, None, 'Downloading login page')
@@ -60,7 +73,7 @@ class PluralsightIE(InfoExtractor):
          if not post_url.startswith('http'):
              post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              post_url, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
@@ -73,30 +86,47 @@ class PluralsightIE(InfoExtractor):
          if error:
              raise ExtractorError('Unable to login: %s' % error, expected=True)
  
+        if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
+            raise ExtractorError('Unable to log in')
+
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        author = mobj.group('author')
-        name = mobj.group('name')
-        clip_id = mobj.group('clip')
-        course = mobj.group('course')
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+
+        author = qs.get('author', [None])[0]
+        name = qs.get('name', [None])[0]
+        clip_id = qs.get('clip', [None])[0]
+        course = qs.get('course', [None])[0]
+
+        if any(not f for f in (author, name, clip_id, course,)):
+            raise ExtractorError('Invalid URL', expected=True)
  
          display_id = '%s-%s' % (name, clip_id)
  
          webpage = self._download_webpage(url, display_id)
  
-        collection = self._parse_json(
-            self._search_regex(
-                r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
-                webpage, 'modules'),
-            display_id)
+        modules = self._search_regex(
+            r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
+            webpage, 'modules', default=None)
+
+        if modules:
+            collection = self._parse_json(modules, display_id)
+        else:
+            # Webpage may be served in different layout (see
+            # https://github.com/rg3/youtube-dl/issues/7607)
+            collection = self._parse_json(
+                self._search_regex(
+                    r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'),
+                display_id)['course']['modules']
  
          module, clip = None, None
  
          for module_ in collection:
-            if module_.get('moduleName') == name:
+            if name in (module_.get('moduleName'), module_.get('name')):
                  module = module_
                  for clip_ in module_.get('clips', []):
                      clip_index = clip_.get('clipIndex')
+                    if clip_index is None:
+                        clip_index = clip_.get('index')
                      if clip_index is None:
                          continue
                      if compat_str(clip_index) == clip_id:
@@ -112,13 +142,33 @@ class PluralsightIE(InfoExtractor):
              'high': {'width': 1024, 'height': 768},
          }
  
+        AllowedQuality = collections.namedtuple('AllowedQuality', ['ext', 'qualities'])
+
          ALLOWED_QUALITIES = (
-            ('webm', ('high',)),
-            ('mp4', ('low', 'medium', 'high',)),
+            AllowedQuality('webm', ('high',)),
+            AllowedQuality('mp4', ('low', 'medium', 'high',)),
          )
  
+        # In order to minimize the number of calls to ViewClip API and reduce
+        # the probability of being throttled or banned by Pluralsight we will request
+        # only single format until formats listing was explicitly requested.
+        if self._downloader.params.get('listformats', False):
+            allowed_qualities = ALLOWED_QUALITIES
+        else:
+            def guess_allowed_qualities():
+                req_format = self._downloader.params.get('format') or 'best'
+                req_format_split = req_format.split('-')
+                if len(req_format_split) > 1:
+                    req_ext, req_quality = req_format_split
+                    for allowed_quality in ALLOWED_QUALITIES:
+                        if req_ext == allowed_quality.ext and req_quality in allowed_quality.qualities:
+                            return (AllowedQuality(req_ext, (req_quality, )), )
+                req_ext = 'webm' if self._downloader.params.get('prefer_free_formats') else 'mp4'
+                return (AllowedQuality(req_ext, ('high', )), )
+            allowed_qualities = guess_allowed_qualities()
+
          formats = []
-        for ext, qualities in ALLOWED_QUALITIES:
+        for ext, qualities in allowed_qualities:
              for quality in qualities:
                  f = QUALITIES[quality].copy()
                  clip_post = {
@@ -131,13 +181,24 @@ class PluralsightIE(InfoExtractor):
                      'mt': ext,
                      'q': '%dx%d' % (f['width'], f['height']),
                  }
-                request = compat_urllib_request.Request(
-                    'http://www.pluralsight.com/training/Player/ViewClip',
+                request = sanitized_Request(
+                    '%s/training/Player/ViewClip' % self._API_BASE,
                      json.dumps(clip_post).encode('utf-8'))
                  request.add_header('Content-Type', 'application/json;charset=utf-8')
                  format_id = '%s-%s' % (ext, quality)
                  clip_url = self._download_webpage(
                      request, display_id, 'Downloading %s URL' % format_id, fatal=False)
+
+                # Pluralsight tracks multiple sequential calls to ViewClip API and start
+                # to return 429 HTTP errors after some time (see
+                # https://github.com/rg3/youtube-dl/pull/6989). Moreover it may even lead
+                # to account ban (see https://github.com/rg3/youtube-dl/issues/6842).
+                # To somewhat reduce the probability of these consequences
+                # we will sleep random amount of time before each call to ViewClip.
+                self._sleep(
+                    random.randint(2, 5), display_id,
+                    '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
+
                  if not clip_url:
                      continue
                  f.update({
@@ -163,10 +224,10 @@ class PluralsightIE(InfoExtractor):
          }
  
  
-class PluralsightCourseIE(InfoExtractor):
+class PluralsightCourseIE(PluralsightBaseIE):
      IE_NAME = 'pluralsight:course'
-    _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/courses/(?P<id>[^/]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:library/)?courses/(?P<id>[^/]+)'
+    _TESTS = [{
          # Free course from Pluralsight Starter Subscription for Microsoft TechNet
          # https://offers.pluralsight.com/technet?loc=zTS3z&prod=zOTprodz&tech=zOttechz&prog=zOTprogz&type=zSOz&media=zOTmediaz&country=zUSz
          'url': 'http://www.pluralsight.com/courses/hosting-sql-server-windows-azure-iaas',
@@ -176,7 +237,14 @@ class PluralsightCourseIE(InfoExtractor):
              'description': 'md5:61b37e60f21c4b2f91dc621a977d0986',
          },
          'playlist_count': 31,
-    }
+    }, {
+        # available without pluralsight account
+        'url': 'https://www.pluralsight.com/courses/angularjs-get-started',
+        'only_matching': True,
+    }, {
+        'url': 'https://app.pluralsight.com/library/courses/understanding-microsoft-azure-amazon-aws/table-of-contents',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          course_id = self._match_id(url)
@@ -184,14 +252,14 @@ class PluralsightCourseIE(InfoExtractor):
          # TODO: PSM cookie
  
          course = self._download_json(
-            'http://www.pluralsight.com/data/course/%s' % course_id,
+            '%s/data/course/%s' % (self._API_BASE, course_id),
              course_id, 'Downloading course JSON')
  
          title = course['title']
          description = course.get('description') or course.get('shortDescription')
  
          course_data = self._download_json(
-            'http://www.pluralsight.com/data/course/content/%s' % course_id,
+            '%s/data/course/content/%s' % (self._API_BASE, course_id),
              course_id, 'Downloading course data JSON')
  
          entries = []
@@ -201,7 +269,7 @@ class PluralsightCourseIE(InfoExtractor):
                  if not player_parameters:
                      continue
                  entries.append(self.url_result(
-                    'http://www.pluralsight.com/training/player?%s' % player_parameters,
+                    '%s/training/player?%s' % (self._API_BASE, player_parameters),
                      'Pluralsight'))
  
          return self.playlist_result(entries, course_id, title, description)
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index dbb2c3bd95fdd88df1edb6ea7a1a416262076620..57c78ba52a994a9c2aff224470b86b913702241f 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -36,7 +36,8 @@ class PornHdIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id or video_id)
  
          title = self._html_search_regex(
-            r'<title>(.+) porn HD.+?</title>', webpage, 'title')
+            [r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)',
+             r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
          description = self._html_search_regex(
              r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
          view_count = int_or_none(self._html_search_regex(
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index a656ad85a1d5a6b7773e98c2a9be585803898db1..08275687dde33e4668c167c1db4831d36427cd41 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -8,10 +8,10 @@ from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
      str_to_int,
  )
  from ..aes import (
@@ -53,7 +53,7 @@ class PornHubIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
@@ -147,7 +147,8 @@ class PornHubPlaylistIE(InfoExtractor):
  
          entries = [
              self.url_result('http://www.pornhub.com/%s' % video_url, 'PornHub')
-            for video_url in set(re.findall('href="/?(view_video\.php\?viewkey=\d+[^"]*)"', webpage))
+            for video_url in set(re.findall(
+                r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
          ]
  
          playlist = self._parse_json(
diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py

index 34735c51e19c7dbbb1c07f2fc4a203df4dda70a9..5398e708b68337b76739282abf6c00e8a39745ab 100644 (file)
--- a/youtube_dl/extractor/pornotube.py
+++ b/youtube_dl/extractor/pornotube.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -46,7 +44,7 @@ class PornotubeIE(InfoExtractor):
              'authenticationSpaceKey': originAuthenticationSpaceKey,
              'credentials': 'Clip Application',
          }
-        token_req = compat_urllib_request.Request(
+        token_req = sanitized_Request(
              'https://api.aebn.net/auth/v1/token/primal',
              data=json.dumps(token_req_data).encode('utf-8'))
          token_req.add_header('Content-Type', 'application/json')
@@ -56,7 +54,7 @@ class PornotubeIE(InfoExtractor):
          token = token_answer['tokenKey']
  
          # Get video URL
-        delivery_req = compat_urllib_request.Request(
+        delivery_req = sanitized_Request(
              'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id)
          delivery_req.add_header('Authorization', token)
          delivery_info = self._download_json(
@@ -64,7 +62,7 @@ class PornotubeIE(InfoExtractor):
          video_url = delivery_info['mediaUrl']
  
          # Get additional info (title etc.)
-        info_req = compat_urllib_request.Request(
+        info_req = sanitized_Request(
              'https://api.aebn.net/content/v1/clips/%s?expand='
              'title,description,primaryImageNumber,startSecond,endSecond,'
              'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,'
diff --git a/youtube_dl/extractor/primesharetv.py b/youtube_dl/extractor/primesharetv.py

index 304359dc5b189b8ce27c967c2d369b26db334532..85aae95765370249023d8202b9d51c44acb99a97 100644 (file)
--- a/youtube_dl/extractor/primesharetv.py
+++ b/youtube_dl/extractor/primesharetv.py
@@ -1,11 +1,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..compat import compat_urllib_parse
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
  )
-from ..utils import ExtractorError
  
  
  class PrimeShareTVIE(InfoExtractor):
@@ -41,7 +41,7 @@ class PrimeShareTVIE(InfoExtractor):
              webpage, 'wait time', default=7)) + 1
          self._sleep(wait_time, video_id)
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              url, compat_urllib_parse.urlencode(fields), headers)
          video_page = self._download_webpage(
              req, video_id, 'Downloading video page')
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

index 8190ed6766ce5c878fc82700524ec6d012d70a57..d5357283addc5e1faafebff51044083eeb4fafa3 100644 (file)
--- a/youtube_dl/extractor/promptfile.py
+++ b/youtube_dl/extractor/promptfile.py
@@ -4,13 +4,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      determine_ext,
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -37,7 +35,7 @@ class PromptFileIE(InfoExtractor):
  
          fields = self._hidden_inputs(webpage)
          post = compat_urllib_parse.urlencode(fields)
-        req = compat_urllib_request.Request(url, post)
+        req = sanitized_Request(url, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
          webpage = self._download_webpage(
              req, video_id, 'Downloading video page')
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index effcf1db37b06d1af40d99233359f0df295eaa03..baa54a3afd10244ef2cda281c0785b24fe19e9cb 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -20,7 +20,7 @@ from ..utils import (
  class ProSiebenSat1IE(InfoExtractor):
      IE_NAME = 'prosiebensat1'
      IE_DESC = 'ProSiebenSat.1 Digital'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany)\.(?:de|at)|ran\.de|fem\.com)/(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
  
      _TESTS = [
          {
diff --git a/youtube_dl/extractor/qqmusic.py b/youtube_dl/extractor/qqmusic.py

index c98539f6a63c05fd99e0758fe38295afce2304a2..1ba3bbddf654dbe6a407465df0d1fe89af789106 100644 (file)
--- a/youtube_dl/extractor/qqmusic.py
+++ b/youtube_dl/extractor/qqmusic.py
@@ -7,11 +7,11 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    sanitized_Request,
      strip_jsonp,
      unescapeHTML,
      clean_html,
  )
-from ..compat import compat_urllib_request
  
  
  class QQMusicIE(InfoExtractor):
@@ -201,7 +201,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
          singer_desc = None
  
          if singer_id:
-            req = compat_urllib_request.Request(
+            req = sanitized_Request(
                  'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg?utf8=1&outCharset=utf-8&format=xml&singerid=%s' % singer_id)
              req.add_header(
                  'Referer', 'http://s.plcloud.music.qq.com/xhr_proxy_utf8.html')
diff --git a/youtube_dl/extractor/rtbf.py b/youtube_dl/extractor/rtbf.py

index 04a66df90df7c5c5f05f9554cf84bbc305ec4cd3..e42b319a3e224aa6b078cad7756e5c44b7f620d8 100644 (file)
--- a/youtube_dl/extractor/rtbf.py
+++ b/youtube_dl/extractor/rtbf.py
@@ -9,8 +9,8 @@ from ..utils import (
  
  
  class RTBFIE(InfoExtractor):
-    _VALID_URL = r'https?://www.rtbf.be/video/[^\?]+\?id=(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?rtbf\.be/(?:video/[^?]+\?.*\bid=|ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=)(?P<id>\d+)'
+    _TESTS = [{
          'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
          'md5': '799f334ddf2c0a582ba80c44655be570',
          'info_dict': {
@@ -19,7 +19,14 @@ class RTBFIE(InfoExtractor):
              'title': 'Les Diables au coeur (épisode 2)',
              'duration': 3099,
          }
-    }
+    }, {
+        # geo restricted
+        'url': 'http://www.rtbf.be/ouftivi/heros/detail_scooby-doo-mysteres-associes?id=1097&videoId=2057442',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858',
+        'only_matching': True,
+    }]
  
      _QUALITIES = [
          ('mobile', 'mobile'),
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index 5b97d33caec2a08c79e648483ec67a1c288b6e4f..603d7bd00620cef13cd957a54053dc7f77010e05 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -6,11 +6,11 @@ import re
  import time
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request, compat_urlparse
  from ..utils import (
      ExtractorError,
      float_or_none,
      remove_end,
+    sanitized_Request,
      std_headers,
      struct_unpack,
  )
@@ -102,20 +102,14 @@ class RTVEALaCartaIE(InfoExtractor):
          if info['state'] == 'DESPU':
              raise ExtractorError('The video is no longer available', expected=True)
          png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id)
-        png_request = compat_urllib_request.Request(png_url)
+        png_request = sanitized_Request(png_url)
          png_request.add_header('Referer', url)
          png = self._download_webpage(png_request, video_id, 'Downloading url information')
          video_url = _decrypt_url(png)
          if not video_url.endswith('.f4m'):
-            auth_url = video_url.replace(
+            video_url = video_url.replace(
                  'resources/', 'auth/resources/'
              ).replace('.net.rtve', '.multimedia.cdn.rtve')
-            video_path = self._download_webpage(
-                auth_url, video_id, 'Getting video url')
-            # Use mvod1.akcdn instead of flash.akamaihd.multimedia.cdn to get
-            # the right Content-Length header and the mp4 format
-            video_url = compat_urlparse.urljoin(
-                'http://mvod1.akcdn.rtve.es/', video_path)
  
          subtitles = None
          if info.get('sbtFile') is not None:
diff --git a/youtube_dl/extractor/rutube.py b/youtube_dl/extractor/rutube.py

index d94dc7399f33d91543dcf85a37cab7863304ea15..6b09550b01d19b76484adb3301ec2a56156b5b1f 100644 (file)
--- a/youtube_dl/extractor/rutube.py
+++ b/youtube_dl/extractor/rutube.py
@@ -9,7 +9,7 @@ from ..compat import (
      compat_str,
  )
  from ..utils import (
-    ExtractorError,
+    determine_ext,
      unified_strdate,
  )
  
@@ -51,10 +51,25 @@ class RutubeIE(InfoExtractor):
              'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
              video_id, 'Downloading options JSON')
  
-        m3u8_url = options['video_balancer'].get('m3u8')
-        if m3u8_url is None:
-            raise ExtractorError('Couldn\'t find m3u8 manifest url')
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        formats = []
+        for format_id, format_url in options['video_balancer'].items():
+            ext = determine_ext(format_url)
+            if ext == 'm3u8':
+                m3u8_formats = self._extract_m3u8_formats(
+                    format_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
+            elif ext == 'f4m':
+                f4m_formats = self._extract_f4m_formats(
+                    format_url, video_id, f4m_id=format_id, fatal=False)
+                if f4m_formats:
+                    formats.extend(f4m_formats)
+            else:
+                formats.append({
+                    'url': format_url,
+                    'format_id': format_id,
+                })
+        self._sort_formats(formats)
  
          return {
              'id': video['id'],
@@ -74,9 +89,9 @@ class RutubeIE(InfoExtractor):
  class RutubeEmbedIE(InfoExtractor):
      IE_NAME = 'rutube:embed'
      IE_DESC = 'Rutube embedded videos'
-    _VALID_URL = 'https?://rutube\.ru/video/embed/(?P<id>[0-9]+)'
+    _VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',
          'info_dict': {
              'id': 'a10e53b86e8f349080f718582ce4c661',
@@ -90,7 +105,10 @@ class RutubeEmbedIE(InfoExtractor):
          'params': {
              'skip_download': 'Requires ffmpeg',
          },
-    }
+    }, {
+        'url': 'http://rutube.ru/play/embed/8083783',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          embed_id = self._match_id(url)
diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py

index a16b73ff4025fb78a9358282c6f01c523327f8e2..e417bf66147a7d6d8dc9d604fd7fed09acb873ca 100644 (file)
--- a/youtube_dl/extractor/ruutu.py
+++ b/youtube_dl/extractor/ruutu.py
@@ -57,16 +57,21 @@ class RuutuIE(InfoExtractor):
                      extract_formats(child)
                  elif child.tag.endswith('File'):
                      video_url = child.text
-                    if not video_url or video_url in processed_urls or 'NOT_USED' in video_url:
+                    if (not video_url or video_url in processed_urls or
+                            any(p in video_url for p in ('NOT_USED', 'NOT-USED'))):
                          return
                      processed_urls.append(video_url)
                      ext = determine_ext(video_url)
                      if ext == 'm3u8':
-                        formats.extend(self._extract_m3u8_formats(
-                            video_url, video_id, 'mp4', m3u8_id='hls'))
+                        m3u8_formats = self._extract_m3u8_formats(
+                            video_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
+                        if m3u8_formats:
+                            formats.extend(m3u8_formats)
                      elif ext == 'f4m':
-                        formats.extend(self._extract_f4m_formats(
-                            video_url, video_id, f4m_id='hds'))
+                        f4m_formats = self._extract_f4m_formats(
+                            video_url, video_id, f4m_id='hds', fatal=False)
+                        if f4m_formats:
+                            formats.extend(f4m_formats)
                      else:
                          proto = compat_urllib_parse_urlparse(video_url).scheme
                          if not child.tag.startswith('HTTP') and proto != 'rtmp':
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index a602af6928d2a9d054fc8670342a6ddf7d9ef4da..91970426103c0384a4998407443907182d7976b5 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -4,14 +4,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
  
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
      smuggle_url,
      std_headers,
  )
@@ -58,7 +56,7 @@ class SafariBaseIE(InfoExtractor):
              'next': '',
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form), headers=headers)
          login_page = self._download_webpage(
              request, None, 'Logging in as %s' % username)
@@ -112,11 +110,11 @@ class SafariIE(SafariBaseIE):
              '%s/%s/chapter-content/%s.html' % (self._API_BASE, course_id, part),
              part)
  
-        bc_url = BrightcoveIE._extract_brightcove_url(webpage)
+        bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
          if not bc_url:
              raise ExtractorError('Could not extract Brightcove URL from %s' % url, expected=True)
  
-        return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'Brightcove')
+        return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'BrightcoveLegacy')
  
  
  class SafariCourseIE(SafariBaseIE):
diff --git a/youtube_dl/extractor/sandia.py b/youtube_dl/extractor/sandia.py

index 9c88167f002fd664df0e6cdcf7cd1eb76b10a7d5..759898a492f43c67179409c563be42e864deae5f 100644 (file)
--- a/youtube_dl/extractor/sandia.py
+++ b/youtube_dl/extractor/sandia.py
@@ -6,14 +6,12 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      int_or_none,
      js_to_json,
      mimetype2ext,
+    sanitized_Request,
      unified_strdate,
  )
  
@@ -37,7 +35,7 @@ class SandiaIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/shared.py b/youtube_dl/extractor/shared.py

index c5636e8e92fdf772bb102675294c2e100315b5ca..8eda3c8648a093213426437e151d217a1115da92 100644 (file)
--- a/youtube_dl/extractor/shared.py
+++ b/youtube_dl/extractor/shared.py
@@ -3,13 +3,11 @@ from __future__ import unicode_literals
  import base64
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -46,7 +44,7 @@ class SharedIE(InfoExtractor):
                  'Video %s does not exist' % video_id, expected=True)
  
          download_form = self._hidden_inputs(webpage)
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              url, compat_urllib_parse.urlencode(download_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
diff --git a/youtube_dl/extractor/sharesix.py b/youtube_dl/extractor/sharesix.py

index ac3e3adf22ad194a8af3e833ae4d8acf7484e8b4..f1ea9bdb208c79cb59c189c5a3b17a9e1dec460a 100644 (file)
--- a/youtube_dl/extractor/sharesix.py
+++ b/youtube_dl/extractor/sharesix.py
@@ -4,12 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      parse_duration,
+    sanitized_Request,
  )
  
  
@@ -50,7 +48,7 @@ class ShareSixIE(InfoExtractor):
              'method_free': 'Free'
          }
          post = compat_urllib_parse.urlencode(fields)
-        req = compat_urllib_request.Request(url, post)
+        req = sanitized_Request(url, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          webpage = self._download_webpage(req, video_id,
diff --git a/youtube_dl/extractor/sina.py b/youtube_dl/extractor/sina.py

index 0891a441f85f42b75d91f1d267fabdd1b5e952ce..b2258a0f64c9f985edb5354ae5c11a381ebcc8af 100644 (file)
--- a/youtube_dl/extractor/sina.py
+++ b/youtube_dl/extractor/sina.py
@@ -4,10 +4,8 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
  
  
  class SinaIE(InfoExtractor):
@@ -61,7 +59,7 @@ class SinaIE(InfoExtractor):
          if mobj.group('token') is not None:
              # The video id is in the redirected url
              self.to_screen('Getting video id')
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.get_method = lambda: 'HEAD'
              (_, urlh) = self._download_webpage_handle(request, 'NA', False)
              return self._real_extract(urlh.geturl())
diff --git a/youtube_dl/extractor/skynewsarabia.py b/youtube_dl/extractor/skynewsarabia.py

new file mode 100644 (file)

index 0000000..f09fee1
--- /dev/null
+++ b/youtube_dl/extractor/skynewsarabia.py
@@ -0,0 +1,117 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    parse_iso8601,
+    parse_duration,
+)
+
+
+class SkyNewArabiaBaseIE(InfoExtractor):
+    _IMAGE_BASE_URL = 'http://www.skynewsarabia.com/web/images'
+
+    def _call_api(self, path, value):
+        return self._download_json('http://api.skynewsarabia.com/web/rest/v2/%s/%s.json' % (path, value), value)
+
+    def _get_limelight_media_id(self, url):
+        return self._search_regex(r'/media/[^/]+/([a-z0-9]{32})', url, 'limelight media id')
+
+    def _get_image_url(self, image_path_template, width='1600', height='1200'):
+        return self._IMAGE_BASE_URL + image_path_template.format(width=width, height=height)
+
+    def _extract_video_info(self, video_data):
+        video_id = compat_str(video_data['id'])
+        topic = video_data.get('topicTitle')
+        return {
+            '_type': 'url_transparent',
+            'url': 'limelight:media:%s' % self._get_limelight_media_id(video_data['videoUrl'][0]['url']),
+            'id': video_id,
+            'title': video_data['headline'],
+            'description': video_data.get('summary'),
+            'thumbnail': self._get_image_url(video_data['mediaAsset']['imageUrl']),
+            'timestamp': parse_iso8601(video_data.get('date')),
+            'duration': parse_duration(video_data.get('runTime')),
+            'tags': video_data.get('tags', []),
+            'categories': [topic] if topic else [],
+            'webpage_url': 'http://www.skynewsarabia.com/web/video/%s' % video_id,
+            'ie_key': 'LimelightMedia',
+        }
+
+
+class SkyNewsArabiaIE(SkyNewArabiaBaseIE):
+    IE_NAME = 'skynewsarabia:video'
+    _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.skynewsarabia.com/web/video/794902/%D9%86%D8%B5%D9%81-%D9%85%D9%84%D9%8A%D9%88%D9%86-%D9%85%D8%B5%D8%A8%D8%A7%D8%AD-%D8%B4%D8%AC%D8%B1%D8%A9-%D9%83%D8%B1%D9%8A%D8%B3%D9%85%D8%A7%D8%B3',
+        'info_dict': {
+            'id': '794902',
+            'ext': 'flv',
+            'title': 'نصف مليون مصباح على شجرة كريسماس',
+            'description': 'md5:22f1b27f0850eeb10c7e59b1f16eb7c6',
+            'upload_date': '20151128',
+            'timestamp': 1448697198,
+            'duration': 2119,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._call_api('video', video_id)
+        return self._extract_video_info(video_data)
+
+
+class SkyNewsArabiaArticleIE(SkyNewArabiaBaseIE):
+    IE_NAME = 'skynewsarabia:video'
+    _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/article/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.skynewsarabia.com/web/article/794549/%D8%A7%D9%94%D8%AD%D8%AF%D8%A7%D8%AB-%D8%A7%D9%84%D8%B4%D8%B1%D9%82-%D8%A7%D9%84%D8%A7%D9%94%D9%88%D8%B3%D8%B7-%D8%AE%D8%B1%D9%8A%D8%B7%D8%A9-%D8%A7%D9%84%D8%A7%D9%94%D9%84%D8%B9%D8%A7%D8%A8-%D8%A7%D9%84%D8%B0%D9%83%D9%8A%D8%A9',
+        'info_dict': {
+            'id': '794549',
+            'ext': 'flv',
+            'title': 'بالفيديو.. ألعاب ذكية تحاكي واقع المنطقة',
+            'description': 'md5:0c373d29919a851e080ee4edd0c5d97f',
+            'upload_date': '20151126',
+            'timestamp': 1448559336,
+            'duration': 281.6,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.skynewsarabia.com/web/article/794844/%D8%A7%D8%B3%D8%AA%D9%87%D8%AF%D8%A7%D9%81-%D9%82%D9%88%D8%A7%D8%B1%D8%A8-%D8%A7%D9%94%D8%B3%D9%84%D8%AD%D8%A9-%D9%84%D9%85%D9%8A%D9%84%D9%8A%D8%B4%D9%8A%D8%A7%D8%AA-%D8%A7%D9%84%D8%AD%D9%88%D8%AB%D9%8A-%D9%88%D8%B5%D8%A7%D9%84%D8%AD',
+        'info_dict': {
+            'id': '794844',
+            'title': 'إحباط تهريب أسلحة لميليشيات الحوثي وصالح بجنوب اليمن',
+            'description': 'md5:5c927b8b2e805796e7f693538d96fc7e',
+        },
+        'playlist_mincount': 2,
+    }]
+
+    def _real_extract(self, url):
+        article_id = self._match_id(url)
+        article_data = self._call_api('article', article_id)
+        media_asset = article_data['mediaAsset']
+        if media_asset['type'] == 'VIDEO':
+            topic = article_data.get('topicTitle')
+            return {
+                '_type': 'url_transparent',
+                'url': 'limelight:media:%s' % self._get_limelight_media_id(media_asset['videoUrl'][0]['url']),
+                'id': article_id,
+                'title': article_data['headline'],
+                'description': article_data.get('summary'),
+                'thumbnail': self._get_image_url(media_asset['imageUrl']),
+                'timestamp': parse_iso8601(article_data.get('date')),
+                'tags': article_data.get('tags', []),
+                'categories': [topic] if topic else [],
+                'webpage_url': url,
+                'ie_key': 'LimelightMedia',
+            }
+        entries = [self._extract_video_info(item) for item in article_data.get('inlineItems', []) if item['type'] == 'VIDEO']
+        return self.playlist_result(entries, article_id, article_data['headline'], article_data.get('summary'))
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index 35a81ee87fda041cb8208793e3f4fe34d3445cfd..30210c8a332903fb6542b1479bd2ee142f8709d8 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -7,13 +7,11 @@ import hashlib
  import uuid
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
      unified_strdate,
  )
  
@@ -176,7 +174,7 @@ class SmotriIE(InfoExtractor):
          if video_password:
              video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
@@ -339,7 +337,7 @@ class SmotriBroadcastIE(InfoExtractor):
                  'password': password,
              }
  
-            request = compat_urllib_request.Request(
+            request = sanitized_Request(
                  broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form))
              request.add_header('Content-Type', 'application/x-www-form-urlencoded')
              broadcast_page = self._download_webpage(
diff --git a/youtube_dl/extractor/sohu.py b/youtube_dl/extractor/sohu.py

index ba2d5e19bc0d1de322b4b12ed5b8c0dc31157f7f..daf6ad555be2a84dc89ec8dfcabcdfde78cf0194 100644 (file)
--- a/youtube_dl/extractor/sohu.py
+++ b/youtube_dl/extractor/sohu.py
@@ -6,11 +6,11 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
-    compat_urllib_request,
      compat_urllib_parse,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -96,7 +96,7 @@ class SohuIE(InfoExtractor):
              else:
                  base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
  
-            req = compat_urllib_request.Request(base_data_url + vid_id)
+            req = sanitized_Request(base_data_url + vid_id)
  
              cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
              if cn_verification_proxy:
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 2b60d354a15b19f2ae72a24ec793399f9bab0744..02e64e09436a5299c5d4f87f1a3ba871c63af230 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -4,13 +4,17 @@ from __future__ import unicode_literals
  import re
  import itertools
  
-from .common import InfoExtractor
+from .common import (
+    InfoExtractor,
+    SearchInfoExtractor
+)
  from ..compat import (
      compat_str,
      compat_urlparse,
      compat_urllib_parse,
  )
  from ..utils import (
+    encode_dict,
      ExtractorError,
      int_or_none,
      unified_strdate,
@@ -469,3 +473,60 @@ class SoundcloudPlaylistIE(SoundcloudIE):
              'description': data.get('description'),
              'entries': entries,
          }
+
+
+class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
+    IE_NAME = 'soundcloud:search'
+    IE_DESC = 'Soundcloud search'
+    _MAX_RESULTS = float('inf')
+    _TESTS = [{
+        'url': 'scsearch15:post-avant jazzcore',
+        'info_dict': {
+            'title': 'post-avant jazzcore',
+        },
+        'playlist_count': 15,
+    }]
+
+    _SEARCH_KEY = 'scsearch'
+    _MAX_RESULTS_PER_PAGE = 200
+    _DEFAULT_RESULTS_PER_PAGE = 50
+    _API_V2_BASE = 'https://api-v2.soundcloud.com'
+
+    def _get_collection(self, endpoint, collection_id, **query):
+        limit = min(
+            query.get('limit', self._DEFAULT_RESULTS_PER_PAGE),
+            self._MAX_RESULTS_PER_PAGE)
+        query['limit'] = limit
+        query['client_id'] = self._CLIENT_ID
+        query['linked_partitioning'] = '1'
+        query['offset'] = 0
+        data = compat_urllib_parse.urlencode(encode_dict(query))
+        next_url = '{0}{1}?{2}'.format(self._API_V2_BASE, endpoint, data)
+
+        collected_results = 0
+
+        for i in itertools.count(1):
+            response = self._download_json(
+                next_url, collection_id, 'Downloading page {0}'.format(i),
+                'Unable to download API page')
+
+            collection = response.get('collection', [])
+            if not collection:
+                break
+
+            collection = list(filter(bool, collection))
+            collected_results += len(collection)
+
+            for item in collection:
+                yield self.url_result(item['uri'], SoundcloudIE.ie_key())
+
+            if not collection or collected_results >= limit:
+                break
+
+            next_url = response.get('next_href')
+            if not next_url:
+                break
+
+    def _get_n_results(self, query, n):
+        tracks = self._get_collection('/search/tracks', query, limit=n, q=query)
+        return self.playlist_result(tracks, playlist_title=query)
diff --git a/youtube_dl/extractor/space.py b/youtube_dl/extractor/space.py

index c2d0d36a6935c40553419621678ce8987c4f2dbd..ebb5d6ec0ffe6f0b9056bdbe58ab36abea019c3d 100644 (file)
--- a/youtube_dl/extractor/space.py
+++ b/youtube_dl/extractor/space.py
@@ -3,14 +3,14 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
  from ..utils import RegexNotFoundError, ExtractorError
  
  
  class SpaceIE(InfoExtractor):
      _VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
      _TEST = {
-        'add_ie': ['Brightcove'],
+        'add_ie': ['BrightcoveLegacy'],
          'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
          'info_dict': {
              'id': '2780937028001',
@@ -31,8 +31,8 @@ class SpaceIE(InfoExtractor):
              brightcove_url = self._og_search_video_url(webpage)
          except RegexNotFoundError:
              # Other videos works fine with the info from the object
-            brightcove_url = BrightcoveIE._extract_brightcove_url(webpage)
+            brightcove_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
          if brightcove_url is None:
              raise ExtractorError(
                  'The webpage does not contain a video', expected=True)
-        return self.url_result(brightcove_url, BrightcoveIE.ie_key())
+        return self.url_result(brightcove_url, BrightcoveLegacyIE.ie_key())
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 9e8fb35b2ebfc1343944db37438d8a7bdd4e70ea..692fd78e886c0a6a932adce4659f2564beeab7e6 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -6,9 +6,9 @@ from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
  from ..utils import (
+    sanitized_Request,
      str_to_int,
      unified_strdate,
  )
@@ -51,7 +51,7 @@ class SpankwireIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
  
-        req = compat_urllib_request.Request('http://www.' + mobj.group('url'))
+        req = sanitized_Request('http://www.' + mobj.group('url'))
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/spiegel.py b/youtube_dl/extractor/spiegel.py

index 5bd3c00875234c5efcf68772178b672261cc2a9f..39a7aaf9d630203dc1796b3b5621aad3c433f575 100644 (file)
--- a/youtube_dl/extractor/spiegel.py
+++ b/youtube_dl/extractor/spiegel.py
@@ -58,7 +58,8 @@ class SpiegelIE(InfoExtractor):
          description = self._html_search_meta('description', webpage, 'description')
  
          base_url = self._search_regex(
-            r'var\s+server\s*=\s*"([^"]+)\"', webpage, 'server URL')
+            [r'server\s*:\s*(["\'])(?P<url>.+?)\1', r'var\s+server\s*=\s*"(?P<url>[^"]+)\"'],
+            webpage, 'server URL', group='url')
  
          xml_url = base_url + video_id + '.xml'
          idoc = self._download_xml(xml_url, video_id)
diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py

index 7ec6c613f916b69f15c5f24c8084125b6b1c1e04..ebb75f05940d8c452da925766080ffb3152a0250 100644 (file)
--- a/youtube_dl/extractor/sportdeutschland.py
+++ b/youtube_dl/extractor/sportdeutschland.py
@@ -4,11 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -54,7 +52,7 @@ class SportDeutschlandIE(InfoExtractor):
  
          api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
              sport_id, video_id)
-        req = compat_urllib_request.Request(api_url, headers={
+        req = sanitized_Request(api_url, headers={
              'Accept': 'application/vnd.vidibus.v2.html+json',
              'Referer': url,
          })
diff --git a/youtube_dl/extractor/streamcloud.py b/youtube_dl/extractor/streamcloud.py

index d4e1340158da92de09776b39a4dfebe8e38aeddf..77841b94686a27feb00968e5f3b67729504cee6c 100644 (file)
--- a/youtube_dl/extractor/streamcloud.py
+++ b/youtube_dl/extractor/streamcloud.py
@@ -4,10 +4,8 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse
+from ..utils import sanitized_Request
  
  
  class StreamcloudIE(InfoExtractor):
@@ -43,7 +41,7 @@ class StreamcloudIE(InfoExtractor):
          headers = {
              b'Content-Type': b'application/x-www-form-urlencoded',
          }
-        req = compat_urllib_request.Request(url, post, headers)
+        req = sanitized_Request(url, post, headers)
  
          webpage = self._download_webpage(
              req, video_id, note='Downloading video page ...')
diff --git a/youtube_dl/extractor/streamcz.py b/youtube_dl/extractor/streamcz.py

index e92b93285c92ad9d049f2092fba9b70884057e8f..d3d2b7eb7a6fa9db4008365e62e046b83490b064 100644 (file)
--- a/youtube_dl/extractor/streamcz.py
+++ b/youtube_dl/extractor/streamcz.py
@@ -5,11 +5,9 @@ import hashlib
  import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -54,7 +52,7 @@ class StreamCZIE(InfoExtractor):
          video_id = self._match_id(url)
          api_path = '/episode/%s' % video_id
  
-        req = compat_urllib_request.Request(self._API_URL + api_path)
+        req = sanitized_Request(self._API_URL + api_path)
          req.add_header('Api-Password', _get_api_key(api_path))
          data = self._download_json(req, video_id)
  
diff --git a/youtube_dl/extractor/tapely.py b/youtube_dl/extractor/tapely.py

index 744f9db38d53b0829ed34d342ba618927ae7db08..ed560bd246f4e588b9d63be4dcc0f34388d46f89 100644 (file)
--- a/youtube_dl/extractor/tapely.py
+++ b/youtube_dl/extractor/tapely.py
@@ -4,14 +4,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      clean_html,
      ExtractorError,
      float_or_none,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -53,7 +51,7 @@ class TapelyIE(InfoExtractor):
          display_id = mobj.group('id')
  
          playlist_url = self._API_URL.format(display_id)
-        request = compat_urllib_request.Request(playlist_url)
+        request = sanitized_Request(playlist_url)
          request.add_header('X-Requested-With', 'XMLHttpRequest')
          request.add_header('Accept', 'application/json')
          request.add_header('Referer', url)
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

index 25edc310008ef0da7b407c97db86ce5e49c78a50..1555aa77cac30c18de3f0c2db9e13ea00cc569f6 100644 (file)
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@@ -139,6 +139,11 @@ class ThePlatformIE(ThePlatformBaseIE):
              'upload_date': '20150701',
              'categories': ['Today/Shows/Orange Room', 'Today/Sections/Money', 'Today/Topics/Tech', "Today/Topics/Editor's picks"],
          },
+    }, {
+        # From http://www.nbc.com/the-blacklist/video/sir-crispin-crandall/2928790?onid=137781#vc137781=1
+        # geo-restricted (US), HLS encrypted with AES-128
+        'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -182,8 +187,12 @@ class ThePlatformIE(ThePlatformBaseIE):
              # Seems there's no pattern for the interested script filename, so
              # I try one by one
              for script in reversed(scripts):
-                feed_script = self._download_webpage(script, video_id, 'Downloading feed script')
-                feed_id = self._search_regex(r'defaultFeedId\s*:\s*"([^"]+)"', feed_script, 'default feed id', default=None)
+                feed_script = self._download_webpage(
+                    self._proto_relative_url(script, 'http:'),
+                    video_id, 'Downloading feed script')
+                feed_id = self._search_regex(
+                    r'defaultFeedId\s*:\s*"([^"]+)"', feed_script,
+                    'default feed id', default=None)
                  if feed_id is not None:
                      break
              if feed_id is None:
@@ -193,6 +202,15 @@ class ThePlatformIE(ThePlatformBaseIE):
  
          if smuggled_data.get('force_smil_url', False):
              smil_url = url
+        # Explicitly specified SMIL (see https://github.com/rg3/youtube-dl/issues/7385)
+        elif '/guid/' in url:
+            webpage = self._download_webpage(url, video_id)
+            smil_url = self._search_regex(
+                r'<link[^>]+href=(["\'])(?P<url>.+?)\1[^>]+type=["\']application/smil\+xml',
+                webpage, 'smil url', group='url')
+            path = self._search_regex(
+                r'link\.theplatform\.com/s/((?:[^/?#&]+/)+[^/?#&]+)', smil_url, 'path')
+            smil_url += '?' if '?' not in smil_url else '&' + 'formats=m3u,mpeg4&format=SMIL'
          elif mobj.group('config'):
              config_url = url + '&form=json'
              config_url = config_url.replace('swf/', 'config/')
diff --git a/youtube_dl/extractor/tlc.py b/youtube_dl/extractor/tlc.py

index 13263614cc06b099d929ee71564899ac3620f76a..d6d038a8d7a80db41ef75f7d13f13fe2ce0411c8 100644 (file)
--- a/youtube_dl/extractor/tlc.py
+++ b/youtube_dl/extractor/tlc.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
+from .brightcove import BrightcoveLegacyIE
  from .discovery import DiscoveryIE
  from ..compat import compat_urlparse
  
@@ -66,6 +66,6 @@ class TlcDeIE(InfoExtractor):
  
          return {
              '_type': 'url',
-            'url': BrightcoveIE._extract_brightcove_url(iframe),
-            'ie': BrightcoveIE.ie_key(),
+            'url': BrightcoveLegacyIE._extract_brightcove_url(iframe),
+            'ie': BrightcoveLegacyIE.ie_key(),
          }
diff --git a/youtube_dl/extractor/trilulilu.py b/youtube_dl/extractor/trilulilu.py

index 185accc4b6b6ebaad3f1aa9379fe8f7c8f6d33ea..5156325275a6d2de5900b36979cbc5bd62995b35 100644 (file)
--- a/youtube_dl/extractor/trilulilu.py
+++ b/youtube_dl/extractor/trilulilu.py
@@ -1,80 +1,93 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
+)
  
  
  class TriluliluIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?trilulilu\.ro/(?:video-[^/]+/)?(?P<id>[^/#\?]+)'
-    _TEST = {
-        'url': 'http://www.trilulilu.ro/video-animatie/big-buck-bunny-1',
-        'md5': 'c1450a00da251e2769b74b9005601cac',
+    _VALID_URL = r'https?://(?:(?:www|m)\.)?trilulilu\.ro/(?:[^/]+/)?(?P<id>[^/#\?]+)'
+    _TESTS = [{
+        'url': 'http://www.trilulilu.ro/big-buck-bunny-1',
+        'md5': '68da087b676a6196a413549212f60cc6',
          'info_dict': {
              'id': 'ae2899e124140b',
              'ext': 'mp4',
              'title': 'Big Buck Bunny',
              'description': ':) pentru copilul din noi',
+            'uploader_id': 'chipy',
+            'upload_date': '20120304',
+            'timestamp': 1330830647,
+            'uploader': 'chipy',
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
          },
-    }
+    }, {
+        'url': 'http://www.trilulilu.ro/adena-ft-morreti-inocenta',
+        'md5': '929dfb8729dc71750463af88bbbbf4a4',
+        'info_dict': {
+            'id': 'f299710e3c91c5',
+            'ext': 'mp4',
+            'title': 'Adena ft. Morreti - Inocenta',
+            'description': 'pop music',
+            'uploader_id': 'VEVOmixt',
+            'upload_date': '20151204',
+            'uploader': 'VEVOmixt',
+            'timestamp': 1449187937,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        if re.search(r'Fişierul nu este disponibil pentru vizionare în ţara dumneavoastră', webpage):
-            raise ExtractorError(
-                'This video is not available in your country.', expected=True)
-        elif re.search('Fişierul poate fi accesat doar de către prietenii lui', webpage):
-            raise ExtractorError('This video is private.', expected=True)
+        media_info = self._download_json('http://m.trilulilu.ro/%s?format=json' % display_id, display_id)
  
-        flashvars_str = self._search_regex(
-            r'block_flash_vars\s*=\s*(\{[^\}]+\})', webpage, 'flashvars', fatal=False, default=None)
+        media_class = media_info.get('class')
+        if media_class not in ('video', 'audio'):
+            raise ExtractorError('not a video or an audio')
  
-        if flashvars_str:
-            flashvars = self._parse_json(flashvars_str, display_id)
-        else:
-            raise ExtractorError(
-                'This page does not contain videos', expected=True)
-
-        if flashvars['isMP3'] == 'true':
-            raise ExtractorError(
-                'Audio downloads are currently not supported', expected=True)
-
-        video_id = flashvars['hash']
-        title = self._og_search_title(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        description = self._og_search_description(webpage, default=None)
+        user = media_info.get('user', {})
  
-        format_url = ('http://fs%(server)s.trilulilu.ro/%(hash)s/'
-                      'video-formats2' % flashvars)
-        format_doc = self._download_xml(
-            format_url, video_id,
-            note='Downloading formats',
-            errnote='Error while downloading formats')
+        thumbnail = media_info.get('cover_url')
+        if thumbnail:
+            thumbnail.format(width='1600', height='1200')
  
-        video_url_template = (
-            'http://fs%(server)s.trilulilu.ro/stream.php?type=video'
-            '&source=site&hash=%(hash)s&username=%(userid)s&'
-            'key=ministhebest&format=%%s&sig=&exp=' %
-            flashvars)
-        formats = [
-            {
-                'format_id': fnode.text.partition('-')[2],
-                'url': video_url_template % fnode.text,
-                'ext': fnode.text.partition('-')[0]
-            }
-
-            for fnode in format_doc.findall('./formats/format')
-        ]
+        # TODO: get correct ext for audio files
+        stream_type = media_info.get('stream_type')
+        formats = [{
+            'url': media_info['href'],
+            'ext': stream_type,
+        }]
+        if media_info.get('is_hd'):
+            formats.append({
+                'format_id': 'hd',
+                'url': media_info['hrefhd'],
+                'ext': stream_type,
+            })
+        if media_class == 'audio':
+            formats[0]['vcodec'] = 'none'
+        else:
+            formats[0]['format_id'] = 'sd'
  
          return {
-            'id': video_id,
+            'id': media_info['identifier'].split('|')[1],
              'display_id': display_id,
              'formats': formats,
-            'title': title,
-            'description': description,
+            'title': media_info['title'],
+            'description': media_info.get('description'),
              'thumbnail': thumbnail,
+            'uploader_id': user.get('username'),
+            'uploader': user.get('fullname'),
+            'timestamp': parse_iso8601(media_info.get('published'), ' '),
+            'duration': int_or_none(media_info.get('duration')),
+            'view_count': int_or_none(media_info.get('count_views')),
+            'like_count': int_or_none(media_info.get('count_likes')),
+            'comment_count': int_or_none(media_info.get('count_comments')),
          }
diff --git a/youtube_dl/extractor/tube8.py b/youtube_dl/extractor/tube8.py

index c9cb69333f7da0a9f4fe009e79b06433bca83726..46ef61ff5cafbac172b975e64be9e86ae5e3de91 100644 (file)
--- a/youtube_dl/extractor/tube8.py
+++ b/youtube_dl/extractor/tube8.py
@@ -4,12 +4,10 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
      int_or_none,
+    sanitized_Request,
      str_to_int,
  )
  from ..aes import aes_decrypt_text
@@ -42,7 +40,7 @@ class Tube8IE(InfoExtractor):
          video_id = mobj.group('id')
          display_id = mobj.group('display_id')
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, display_id)
  
diff --git a/youtube_dl/extractor/tubitv.py b/youtube_dl/extractor/tubitv.py

index 4f86b3ee927541c8f31936103c3319ce48b97e72..6d78b5dfea0030f125062ad8ef46a1c6e731e4ea 100644 (file)
--- a/youtube_dl/extractor/tubitv.py
+++ b/youtube_dl/extractor/tubitv.py
@@ -5,13 +5,11 @@ import codecs
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request
-)
+from ..compat import compat_urllib_parse
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -44,7 +42,7 @@ class TubiTvIE(InfoExtractor):
              'password': password,
          }
          payload = compat_urllib_parse.urlencode(form_data).encode('utf-8')
-        request = compat_urllib_request.Request(self._LOGIN_URL, payload)
+        request = sanitized_Request(self._LOGIN_URL, payload)
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          login_page = self._download_webpage(
              request, None, False, 'Wrong login info')
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 3ec08b67479396b35ade6150b7bc0d9ff6428df3..69882da6337bc6f20d0578f6d612e799429db3e8 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -11,7 +11,6 @@ from ..compat import (
      compat_str,
      compat_urllib_parse,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
@@ -20,6 +19,7 @@ from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -48,7 +48,7 @@ class TwitchBaseIE(InfoExtractor):
          for cookie in self._downloader.cookiejar:
              if cookie.name == 'api_token':
                  headers['Twitch-Api-Token'] = cookie.value
-        request = compat_urllib_request.Request(url, headers=headers)
+        request = sanitized_Request(url, headers=headers)
          response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
          self._handle_error(response)
          return response
@@ -80,7 +80,7 @@ class TwitchBaseIE(InfoExtractor):
          if not post_url.startswith('http'):
              post_url = compat_urlparse.urljoin(redirect_url, post_url)
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              post_url, compat_urllib_parse.urlencode(encode_dict(login_form)).encode('utf-8'))
          request.add_header('Referer', redirect_url)
          response = self._download_webpage(
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 9d3e46b946843ae0da6b9de525c4aa4b8b3f4cbb..a161f046b2532805d864a26e083de06f68cf7a1f 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -4,11 +4,13 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      float_or_none,
      xpath_text,
      remove_end,
+    int_or_none,
+    ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -18,7 +20,7 @@ class TwitterCardIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
-            'md5': '7d2f6b4d2eb841a7ccc893d479bfceb4',
+            'md5': '4fa26a35f9d1bf4b646590ba8e84be19',
              'info_dict': {
                  'id': '560070183650213889',
                  'ext': 'mp4',
@@ -50,6 +52,20 @@ class TwitterCardIE(InfoExtractor):
                  'uploader': 'OMG! Ubuntu!',
                  'uploader_id': 'omgubuntu',
              },
+            'add_ie': ['Youtube'],
+        },
+        {
+            'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
+            'md5': 'ab2745d0b0ce53319a534fccaa986439',
+            'info_dict': {
+                'id': 'iBb2x00UVlv',
+                'ext': 'mp4',
+                'upload_date': '20151113',
+                'uploader_id': '1189339351084113920',
+                'uploader': '@ArsenalTerje',
+                'title': 'Vine by @ArsenalTerje',
+            },
+            'add_ie': ['Vine'],
          }
      ]
  
@@ -65,15 +81,15 @@ class TwitterCardIE(InfoExtractor):
          config = None
          formats = []
          for user_agent in USER_AGENTS:
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.add_header('User-Agent', user_agent)
              webpage = self._download_webpage(request, video_id)
  
-            youtube_url = self._html_search_regex(
-                r'<iframe[^>]+src="((?:https?:)?//www.youtube.com/embed/[^"]+)"',
-                webpage, 'youtube iframe', default=None)
-            if youtube_url:
-                return self.url_result(youtube_url, 'Youtube')
+            iframe_url = self._html_search_regex(
+                r'<iframe[^>]+src="((?:https?:)?//(?:www.youtube.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
+                webpage, 'video iframe', default=None)
+            if iframe_url:
+                return self.url_result(iframe_url)
  
              config = self._parse_json(self._html_search_regex(
                  r'data-player-config="([^"]+)"', webpage, 'data player config'),
@@ -120,9 +136,9 @@ class TwitterIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.|m\.|mobile\.)?twitter\.com/(?P<user_id>[^/]+)/status/(?P<id>\d+)'
      _TEMPLATE_URL = 'https://twitter.com/%s/status/%s'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'https://twitter.com/freethenipple/status/643211948184596480',
-        'md5': '31cd83a116fc41f99ae3d909d4caf6a0',
+        'md5': 'db6612ec5d03355953c3ca9250c97e5e',
          'info_dict': {
              'id': '643211948184596480',
              'ext': 'mp4',
@@ -133,7 +149,30 @@ class TwitterIE(InfoExtractor):
              'uploader': 'FREE THE NIPPLE',
              'uploader_id': 'freethenipple',
          },
-    }
+    }, {
+        'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
+        'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
+        'info_dict': {
+            'id': '657991469417025536',
+            'ext': 'mp4',
+            'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai',
+            'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"',
+            'thumbnail': 're:^https?://.*\.png',
+            'uploader': 'Gifs',
+            'uploader_id': 'giphz',
+        },
+    }, {
+        'url': 'https://twitter.com/starwars/status/665052190608723968',
+        'md5': '39b7199856dee6cd4432e72c74bc69d4',
+        'info_dict': {
+            'id': '665052190608723968',
+            'ext': 'mp4',
+            'title': 'Star Wars - A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens.',
+            'description': 'Star Wars on Twitter: "A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens."',
+            'uploader_id': 'starwars',
+            'uploader': 'Star Wars',
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -144,23 +183,46 @@ class TwitterIE(InfoExtractor):
  
          username = remove_end(self._og_search_title(webpage), ' on Twitter')
  
-        title = self._og_search_description(webpage).strip('').replace('\n', ' ')
+        title = description = self._og_search_description(webpage).strip('').replace('\n', ' ').strip('“”')
  
          # strip  'https -_t.co_BJYgOjSeGA' junk from filenames
-        mobj = re.match(r'“(.*)\s+(https?://[^ ]+)”', title)
-        title, short_url = mobj.groups()
-
-        card_id = self._search_regex(
-            r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url')
-        card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
+        title = re.sub(r'\s+(https?://[^ ]+)', '', title)
  
-        return {
-            '_type': 'url_transparent',
-            'ie_key': 'TwitterCard',
+        info = {
              'uploader_id': user_id,
              'uploader': username,
-            'url': card_url,
              'webpage_url': url,
-            'description': '%s on Twitter: "%s %s"' % (username, title, short_url),
+            'description': '%s on Twitter: "%s"' % (username, description),
              'title': username + ' - ' + title,
          }
+
+        card_id = self._search_regex(
+            r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url', default=None)
+        if card_id:
+            card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
+            info.update({
+                '_type': 'url_transparent',
+                'ie_key': 'TwitterCard',
+                'url': card_url,
+            })
+            return info
+
+        mobj = re.search(r'''(?x)
+            <video[^>]+class="animated-gif"[^>]+
+                (?:data-height="(?P<height>\d+)")?[^>]+
+                (?:data-width="(?P<width>\d+)")?[^>]+
+                (?:poster="(?P<poster>[^"]+)")?[^>]*>\s*
+                <source[^>]+video-src="(?P<url>[^"]+)"
+        ''', webpage)
+
+        if mobj:
+            info.update({
+                'id': twid,
+                'url': mobj.group('url'),
+                'height': int_or_none(mobj.group('height')),
+                'width': int_or_none(mobj.group('width')),
+                'thumbnail': mobj.group('poster'),
+            })
+            return info
+
+        raise ExtractorError('There\'s not video in this tweet.')
diff --git a/youtube_dl/extractor/udemy.py b/youtube_dl/extractor/udemy.py

index 365d8b4bfe19a6d89965a68957ea4901d7722037..59832b1ece75d480afdfa16c3c398a701f532a06 100644 (file)
--- a/youtube_dl/extractor/udemy.py
+++ b/youtube_dl/extractor/udemy.py
@@ -1,14 +1,16 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..compat import (
+    compat_HTTPError,
      compat_urllib_parse,
      compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
+    float_or_none,
+    int_or_none,
+    sanitized_Request,
  )
  
  
@@ -17,6 +19,8 @@ class UdemyIE(InfoExtractor):
      _VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
      _LOGIN_URL = 'https://www.udemy.com/join/login-popup/?displayType=ajax&showSkipButton=1'
      _ORIGIN_URL = 'https://www.udemy.com'
+    _SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
+    _ALREADY_ENROLLED = '>You are already taking this course.<'
      _NETRC_MACHINE = 'udemy'
  
      _TESTS = [{
@@ -32,6 +36,29 @@ class UdemyIE(InfoExtractor):
          'skip': 'Requires udemy account credentials',
      }]
  
+    def _enroll_course(self, webpage, course_id):
+        enroll_url = self._search_regex(
+            r'href=(["\'])(?P<url>https?://(?:www\.)?udemy\.com/course/subscribe/.+?)\1',
+            webpage, 'enroll url', group='url',
+            default='https://www.udemy.com/course/subscribe/?courseId=%s' % course_id)
+        webpage = self._download_webpage(enroll_url, course_id, 'Enrolling in the course')
+        if self._SUCCESSFULLY_ENROLLED in webpage:
+            self.to_screen('%s: Successfully enrolled in' % course_id)
+        elif self._ALREADY_ENROLLED in webpage:
+            self.to_screen('%s: Already enrolled in' % course_id)
+
+    def _download_lecture(self, course_id, lecture_id):
+        return self._download_json(
+            'https://www.udemy.com/api-2.0/users/me/subscribed-courses/%s/lectures/%s?%s' % (
+                course_id, lecture_id, compat_urllib_parse.urlencode({
+                    'video_only': '',
+                    'auto_play': '',
+                    'fields[lecture]': 'title,description,asset',
+                    'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data',
+                    'instructorPreviewMode': 'False',
+                })),
+            lecture_id, 'Downloading lecture JSON')
+
      def _handle_error(self, response):
          if not isinstance(response, dict):
              return
@@ -53,12 +80,13 @@ class UdemyIE(InfoExtractor):
                  headers['X-Udemy-Client-Id'] = cookie.value
              elif cookie.name == 'access_token':
                  headers['X-Udemy-Bearer-Token'] = cookie.value
+                headers['X-Udemy-Authorization'] = 'Bearer %s' % cookie.value
  
          if isinstance(url_or_request, compat_urllib_request.Request):
              for header, value in headers.items():
                  url_or_request.add_header(header, value)
          else:
-            url_or_request = compat_urllib_request.Request(url_or_request, headers=headers)
+            url_or_request = sanitized_Request(url_or_request, headers=headers)
  
          response = super(UdemyIE, self)._download_json(url_or_request, video_id, note)
          self._handle_error(response)
@@ -70,7 +98,7 @@ class UdemyIE(InfoExtractor):
      def _login(self):
          (username, password) = self._get_login_info()
          if username is None:
-            self.raise_login_required('Udemy account is required')
+            return
  
          login_popup = self._download_webpage(
              self._LOGIN_URL, None, 'Downloading login popup')
@@ -89,7 +117,7 @@ class UdemyIE(InfoExtractor):
              'password': password.encode('utf-8'),
          })
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          request.add_header('Referer', self._ORIGIN_URL)
          request.add_header('Origin', self._ORIGIN_URL)
@@ -108,44 +136,76 @@ class UdemyIE(InfoExtractor):
      def _real_extract(self, url):
          lecture_id = self._match_id(url)
  
-        lecture = self._download_json(
-            'https://www.udemy.com/api-1.1/lectures/%s' % lecture_id,
-            lecture_id, 'Downloading lecture JSON')
+        webpage = self._download_webpage(url, lecture_id)
+
+        course_id = self._search_regex(
+            r'data-course-id=["\'](\d+)', webpage, 'course id')
+
+        try:
+            lecture = self._download_lecture(course_id, lecture_id)
+        except ExtractorError as e:
+            # Error could possibly mean we are not enrolled in the course
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                self._enroll_course(webpage, course_id)
+                lecture_id = self._download_lecture(course_id, lecture_id)
+            else:
+                raise
+
+        title = lecture['title']
+        description = lecture.get('description')
  
-        asset_type = lecture.get('assetType') or lecture.get('asset_type')
+        asset = lecture['asset']
+
+        asset_type = asset.get('assetType') or asset.get('asset_type')
          if asset_type != 'Video':
              raise ExtractorError(
                  'Lecture %s is not a video' % lecture_id, expected=True)
  
-        asset = lecture['asset']
-
          stream_url = asset.get('streamUrl') or asset.get('stream_url')
-        mobj = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url)
-        if mobj:
-            return self.url_result(mobj.group(1), 'Youtube')
+        if stream_url:
+            youtube_url = self._search_regex(
+                r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url, 'youtube URL', default=None)
+            if youtube_url:
+                return self.url_result(youtube_url, 'Youtube')
  
          video_id = asset['id']
          thumbnail = asset.get('thumbnailUrl') or asset.get('thumbnail_url')
-        duration = asset['data']['duration']
-
-        download_url = asset.get('downloadUrl') or asset.get('download_url')
-
-        video = download_url.get('Video') or download_url.get('video')
-        video_480p = download_url.get('Video480p') or download_url.get('video_480p')
-
-        formats = [
-            {
-                'url': video_480p[0],
-                'format_id': '360p',
-            },
-            {
-                'url': video[0],
-                'format_id': '720p',
-            },
-        ]
-
-        title = lecture['title']
-        description = lecture['description']
+        duration = float_or_none(asset.get('data', {}).get('duration'))
+        outputs = asset.get('data', {}).get('outputs', {})
+
+        formats = []
+        for format_ in asset.get('download_urls', {}).get('Video', []):
+            video_url = format_.get('file')
+            if not video_url:
+                continue
+            format_id = format_.get('label')
+            f = {
+                'url': format_['file'],
+                'height': int_or_none(format_id),
+            }
+            if format_id:
+                # Some videos contain additional metadata (e.g.
+                # https://www.udemy.com/ios9-swift/learn/#/lecture/3383208)
+                output = outputs.get(format_id)
+                if isinstance(output, dict):
+                    f.update({
+                        'format_id': '%sp' % (output.get('label') or format_id),
+                        'width': int_or_none(output.get('width')),
+                        'height': int_or_none(output.get('height')),
+                        'vbr': int_or_none(output.get('video_bitrate_in_kbps')),
+                        'vcodec': output.get('video_codec'),
+                        'fps': int_or_none(output.get('frame_rate')),
+                        'abr': int_or_none(output.get('audio_bitrate_in_kbps')),
+                        'acodec': output.get('audio_codec'),
+                        'asr': int_or_none(output.get('audio_sample_rate')),
+                        'tbr': int_or_none(output.get('total_bitrate_in_kbps')),
+                        'filesize': int_or_none(output.get('file_size_in_bytes')),
+                    })
+                else:
+                    f['format_id'] = '%sp' % format_id
+            formats.append(f)
+
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
@@ -159,9 +219,7 @@ class UdemyIE(InfoExtractor):
  
  class UdemyCourseIE(UdemyIE):
      IE_NAME = 'udemy:course'
-    _VALID_URL = r'https?://www\.udemy\.com/(?P<coursepath>[\da-z-]+)'
-    _SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
-    _ALREADY_ENROLLED = '>You are already taking this course.<'
+    _VALID_URL = r'https?://www\.udemy\.com/(?P<id>[\da-z-]+)'
      _TESTS = []
  
      @classmethod
@@ -169,24 +227,18 @@ class UdemyCourseIE(UdemyIE):
          return False if UdemyIE.suitable(url) else super(UdemyCourseIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        course_path = mobj.group('coursepath')
+        course_path = self._match_id(url)
+
+        webpage = self._download_webpage(url, course_path)
  
          response = self._download_json(
              'https://www.udemy.com/api-1.1/courses/%s' % course_path,
              course_path, 'Downloading course JSON')
  
-        course_id = int(response['id'])
-        course_title = response['title']
+        course_id = response['id']
+        course_title = response.get('title')
  
-        webpage = self._download_webpage(
-            'https://www.udemy.com/course/subscribe/?courseId=%s' % course_id,
-            course_id, 'Enrolling in the course')
-
-        if self._SUCCESSFULLY_ENROLLED in webpage:
-            self.to_screen('%s: Successfully enrolled in' % course_id)
-        elif self._ALREADY_ENROLLED in webpage:
-            self.to_screen('%s: Already enrolled in' % course_id)
+        self._enroll_course(webpage, course_id)
  
          response = self._download_json(
              'https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py

index 2151f83382d6b3185722b54de2d0eab2a988c6ae..ee35b7227372c0ddc128dfc694577578f9fc6009 100644 (file)
--- a/youtube_dl/extractor/udn.py
+++ b/youtube_dl/extractor/udn.py
@@ -12,7 +12,8 @@ from ..compat import compat_urlparse
  
  class UDNEmbedIE(InfoExtractor):
      IE_DESC = '聯合影音'
-    _VALID_URL = r'https?://video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+    _PROTOCOL_RELATIVE_VALID_URL = r'//video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+    _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL
      _TESTS = [{
          'url': 'http://video.udn.com/embed/news/300040',
          'md5': 'de06b4c90b042c128395a88f0384817e',
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index 722eb52368825b92c88506ff33d79bf1f2f91a32..1e740fbe6ba2df1ddc57545fc8bd8d92d71afc56 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -4,11 +4,11 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -49,7 +49,7 @@ class Vbox7IE(InfoExtractor):
  
          info_url = "http://vbox7.com/play/magare.do"
          data = compat_urllib_parse.urlencode({'as3': '1', 'vid': video_id})
-        info_request = compat_urllib_request.Request(info_url, data)
+        info_request = sanitized_Request(info_url, data)
          info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          info_response = self._download_webpage(info_request, video_id, 'Downloading info webpage')
          if info_response is None:
diff --git a/youtube_dl/extractor/veoh.py b/youtube_dl/extractor/veoh.py

index 01e258e32218c227c5de3caf60588baab56e9045..9633f7ffeec865c69c77a0e2d7475399a998d44a 100644 (file)
--- a/youtube_dl/extractor/veoh.py
+++ b/youtube_dl/extractor/veoh.py
@@ -4,12 +4,10 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -110,7 +108,7 @@ class VeohIE(InfoExtractor):
          if 'class="adultwarning-container"' in webpage:
              self.report_age_confirmation()
              age_limit = 18
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.add_header('Cookie', 'confirmedAdult=true')
              webpage = self._download_webpage(request, video_id)
  
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

index 3c8d2a9437af3021df921f5692ad5ae984b14ead..1a0ff3395598027ebd8de05a609faca987c14e9e 100644 (file)
--- a/youtube_dl/extractor/vessel.py
+++ b/youtube_dl/extractor/vessel.py
@@ -4,10 +4,10 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      ExtractorError,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -33,7 +33,7 @@ class VesselIE(InfoExtractor):
      @staticmethod
      def make_json_request(url, data):
          payload = json.dumps(data).encode('utf-8')
-        req = compat_urllib_request.Request(url, payload)
+        req = sanitized_Request(url, payload)
          req.add_header('Content-Type', 'application/json; charset=utf-8')
          return req
  
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index c17094f8193f7678cc3d0a912c3d970f38e6bf7c..571289421e2d63e9e55b036a54ee952e98ccfead 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -1,15 +1,13 @@
  from __future__ import unicode_literals
  
  import re
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
+from ..compat import compat_etree_fromstring
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -73,7 +71,7 @@ class VevoIE(InfoExtractor):
      _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/'
  
      def _real_initialize(self):
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://www.vevo.com/auth', data=b'')
          webpage = self._download_webpage(
              req, None,
@@ -97,7 +95,7 @@ class VevoIE(InfoExtractor):
          if last_version['version'] == -1:
              raise ExtractorError('Unable to extract last version of the video')
  
-        renditions = xml.etree.ElementTree.fromstring(last_version['data'])
+        renditions = compat_etree_fromstring(last_version['data'])
          formats = []
          # Already sorted from worst to best quality
          for rend in renditions.findall('rendition'):
@@ -114,7 +112,7 @@ class VevoIE(InfoExtractor):
  
      def _formats_from_smil(self, smil_xml):
          formats = []
-        smil_doc = xml.etree.ElementTree.fromstring(smil_xml.encode('utf-8'))
+        smil_doc = compat_etree_fromstring(smil_xml.encode('utf-8'))
          els = smil_doc.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
          for el in els:
              src = el.attrib['src']
diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py

index 8516a2940cb38c7e030e504e6a29f88fcd3946a1..40ffbad2aa57df2145b3c315c20ee3f87c400541 100644 (file)
--- a/youtube_dl/extractor/viddler.py
+++ b/youtube_dl/extractor/viddler.py
@@ -4,9 +4,7 @@ from .common import InfoExtractor
  from ..utils import (
      float_or_none,
      int_or_none,
-)
-from ..compat import (
-    compat_urllib_request
+    sanitized_Request,
  )
  
  
@@ -65,7 +63,7 @@ class ViddlerIE(InfoExtractor):
              'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' %
              video_id)
          headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'}
-        request = compat_urllib_request.Request(json_url, None, headers)
+        request = sanitized_Request(json_url, None, headers)
          data = self._download_json(request, video_id)['video']
  
          formats = []
diff --git a/youtube_dl/extractor/videofyme.py b/youtube_dl/extractor/videofyme.py

index 94f9e9be94f9a420fd0207339085cbc35aba6805..cd3f50a63b70745b157dfd4d2f67549ead6d0de2 100644 (file)
--- a/youtube_dl/extractor/videofyme.py
+++ b/youtube_dl/extractor/videofyme.py
@@ -2,8 +2,8 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
-    find_xpath_attr,
      int_or_none,
+    parse_iso8601,
  )
  
  
@@ -18,33 +18,35 @@ class VideofyMeIE(InfoExtractor):
              'id': '1100701',
              'ext': 'mp4',
              'title': 'This is VideofyMe',
-            'description': None,
+            'description': '',
+            'upload_date': '20130326',
+            'timestamp': 1364288959,
              'uploader': 'VideofyMe',
              'uploader_id': 'thisisvideofyme',
              'view_count': int,
+            'likes': int,
+            'comment_count': int,
          },
-
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        config = self._download_xml('http://sunshine.videofy.me/?videoId=%s' % video_id,
-                                    video_id)
-        video = config.find('video')
-        sources = video.find('sources')
-        url_node = next(node for node in [find_xpath_attr(sources, 'source', 'id', 'HQ %s' % key)
-                                          for key in ['on', 'av', 'off']] if node is not None)
-        video_url = url_node.find('url').text
-        view_count = int_or_none(self._search_regex(
-            r'([0-9]+)', video.find('views').text, 'view count', fatal=False))
+
+        config = self._download_json('http://vf-player-info-loader.herokuapp.com/%s.json' % video_id, video_id)['videoinfo']
+
+        video = config.get('video')
+        blog = config.get('blog', {})
  
          return {
              'id': video_id,
-            'title': video.find('title').text,
-            'url': video_url,
-            'thumbnail': video.find('thumb').text,
-            'description': video.find('description').text,
-            'uploader': config.find('blog/name').text,
-            'uploader_id': video.find('identifier').text,
-            'view_count': view_count,
+            'title': video['title'],
+            'url': video['sources']['source']['url'],
+            'thumbnail': video.get('thumb'),
+            'description': video.get('description'),
+            'timestamp': parse_iso8601(video.get('date')),
+            'uploader': blog.get('name'),
+            'uploader_id': blog.get('identifier'),
+            'view_count': int_or_none(self._search_regex(r'([0-9]+)', video.get('views'), 'view count', fatal=False)),
+            'likes': int_or_none(video.get('likes')),
+            'comment_count': int_or_none(video.get('nrOfComments')),
          }
diff --git a/youtube_dl/extractor/videolecturesnet.py b/youtube_dl/extractor/videolecturesnet.py

deleted file mode 100644 (file)

index 649ac94..0000000
--- a/youtube_dl/extractor/videolecturesnet.py
+++ /dev/null
@@ -1,82 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_HTTPError,
-    compat_urlparse,
-)
-from ..utils import (
-    ExtractorError,
-    parse_duration,
-)
-
-
-class VideoLecturesNetIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?videolectures\.net/(?P<id>[^/#?]+)/*(?:[#?].*)?$'
-    IE_NAME = 'videolectures.net'
-
-    _TESTS = [{
-        'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/',
-        'info_dict': {
-            'id': 'promogram_igor_mekjavic_eng',
-            'ext': 'mp4',
-            'title': 'Automatics, robotics and biocybernetics',
-            'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
-            'upload_date': '20130627',
-            'duration': 565,
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        # video with invalid direct format links (HTTP 403)
-        'url': 'http://videolectures.net/russir2010_filippova_nlp/',
-        'info_dict': {
-            'id': 'russir2010_filippova_nlp',
-            'ext': 'flv',
-            'title': 'NLP at Google',
-            'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3',
-            'duration': 5352,
-            'thumbnail': 're:http://.*\.jpg',
-        },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-    }, {
-        'url': 'http://videolectures.net/deeplearning2015_montreal/',
-        'info_dict': {
-            'id': 'deeplearning2015_montreal',
-            'title': 'Deep Learning Summer School, Montreal 2015',
-            'description': 'md5:90121a40cc6926df1bf04dcd8563ed3b',
-        },
-        'playlist_count': 30,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        smil_url = 'http://videolectures.net/%s/video/1/smil.xml' % video_id
-
-        try:
-            smil = self._download_smil(smil_url, video_id)
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
-                # Probably a playlist
-                webpage = self._download_webpage(url, video_id)
-                entries = [
-                    self.url_result(compat_urlparse.urljoin(url, video_url), 'VideoLecturesNet')
-                    for _, video_url in re.findall(r'<a[^>]+href=(["\'])(.+?)\1[^>]+id=["\']lec=\d+', webpage)]
-                playlist_title = self._html_search_meta('title', webpage, 'title', fatal=True)
-                playlist_description = self._html_search_meta('description', webpage, 'description')
-                return self.playlist_result(entries, video_id, playlist_title, playlist_description)
-
-        info = self._parse_smil(smil, smil_url, video_id)
-
-        info['id'] = video_id
-
-        switch = smil.find('.//switch')
-        if switch is not None:
-            info['duration'] = parse_duration(switch.attrib.get('dur'))
-
-        return info
diff --git a/youtube_dl/extractor/videomega.py b/youtube_dl/extractor/videomega.py

index 78ff6310a07f6864abda658ca1e804e26594460b..87aca327b01d5e2489fa86ab283dca953037f51f 100644 (file)
--- a/youtube_dl/extractor/videomega.py
+++ b/youtube_dl/extractor/videomega.py
@@ -4,7 +4,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
+from ..utils import sanitized_Request
  
  
  class VideoMegaIE(InfoExtractor):
@@ -30,7 +30,7 @@ class VideoMegaIE(InfoExtractor):
          video_id = self._match_id(url)
  
          iframe_url = 'http://videomega.tv/cdn.php?ref=%s' % video_id
-        req = compat_urllib_request.Request(iframe_url)
+        req = sanitized_Request(iframe_url)
          req.add_header('Referer', url)
          req.add_header('Cookie', 'noadvtday=0')
          webpage = self._download_webpage(req, video_id)
diff --git a/youtube_dl/extractor/vidzi.py b/youtube_dl/extractor/vidzi.py

index 08a5a7b8ddbace34da0a4c6bb751562eb51a95ba..2ba9f31dfcd95587540324608753470104a25f6e 100644 (file)
--- a/youtube_dl/extractor/vidzi.py
+++ b/youtube_dl/extractor/vidzi.py
@@ -20,8 +20,14 @@ class VidziIE(InfoExtractor):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        video_url = self._html_search_regex(
-            r'{\s*file\s*:\s*"([^"]+)"\s*}', webpage, 'video url')
+        video_host = self._html_search_regex(
+            r'id=\'vplayer\'><img src="http://(.*?)/i', webpage,
+            'video host')
+        video_hash = self._html_search_regex(
+            r'\|([a-z0-9]+)\|hls\|type', webpage, 'video_hash')
+        ext = self._html_search_regex(
+            r'\|tracks\|([a-z0-9]+)\|', webpage, 'video ext')
+        video_url = 'http://' + video_host + '/' + video_hash + '/v.' + ext
          title = self._html_search_regex(
              r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
  
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

index 7cf930d699fd75ad564ead644a4371d401df137a..185b1c1194f72122c99b26abd7ffb32d41934388 100644 (file)
--- a/youtube_dl/extractor/viewster.py
+++ b/youtube_dl/extractor/viewster.py
@@ -4,7 +4,6 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_request,
      compat_urllib_parse,
      compat_urllib_parse_unquote,
  )
@@ -13,6 +12,7 @@ from ..utils import (
      ExtractorError,
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
      HEADRequest,
  )
  
@@ -76,7 +76,7 @@ class ViewsterIE(InfoExtractor):
      _ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01'
  
      def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True):
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          request.add_header('Accept', self._ACCEPT_HEADER)
          request.add_header('Auth-token', self._AUTH_TOKEN)
          return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal)
diff --git a/youtube_dl/extractor/viidea.py b/youtube_dl/extractor/viidea.py

new file mode 100644 (file)

index 0000000..525e303
--- /dev/null
+++ b/youtube_dl/extractor/viidea.py
@@ -0,0 +1,188 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urlparse,
+    compat_str,
+)
+from ..utils import (
+    parse_duration,
+    js_to_json,
+    parse_iso8601,
+)
+
+
+class ViideaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)http://(?:www\.)?(?:
+            videolectures\.net|
+            flexilearn\.viidea\.net|
+            presentations\.ocwconsortium\.org|
+            video\.travel-zoom\.si|
+            video\.pomp-forum\.si|
+            tv\.nil\.si|
+            video\.hekovnik.com|
+            video\.szko\.si|
+            kpk\.viidea\.com|
+            inside\.viidea\.net|
+            video\.kiberpipa\.org|
+            bvvideo\.si|
+            kongres\.viidea\.net|
+            edemokracija\.viidea\.com
+        )(?:/lecture)?/(?P<id>[^/]+)(?:/video/(?P<part>\d+))?/*(?:[#?].*)?$'''
+
+    _TESTS = [{
+        'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/',
+        'info_dict': {
+            'id': '20171',
+            'display_id': 'promogram_igor_mekjavic_eng',
+            'ext': 'mp4',
+            'title': 'Automatics, robotics and biocybernetics',
+            'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1372349289,
+            'upload_date': '20130627',
+            'duration': 565,
+        },
+    }, {
+        # video with invalid direct format links (HTTP 403)
+        'url': 'http://videolectures.net/russir2010_filippova_nlp/',
+        'info_dict': {
+            'id': '14891',
+            'display_id': 'russir2010_filippova_nlp',
+            'ext': 'flv',
+            'title': 'NLP at Google',
+            'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1284375600,
+            'upload_date': '20100913',
+            'duration': 5352,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # event playlist
+        'url': 'http://videolectures.net/deeplearning2015_montreal/',
+        'info_dict': {
+            'id': '23181',
+            'title': 'Deep Learning Summer School, Montreal 2015',
+            'description': 'md5:0533a85e4bd918df52a01f0e1ebe87b7',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1438560000,
+        },
+        'playlist_count': 30,
+    }, {
+        # multi part lecture
+        'url': 'http://videolectures.net/mlss09uk_bishop_ibi/',
+        'info_dict': {
+            'id': '9737',
+            'display_id': 'mlss09uk_bishop_ibi',
+            'title': 'Introduction To Bayesian Inference',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1251622800,
+        },
+        'playlist': [{
+            'info_dict': {
+                'id': '9737_part1',
+                'display_id': 'mlss09uk_bishop_ibi_part1',
+                'ext': 'wmv',
+                'title': 'Introduction To Bayesian Inference (Part 1)',
+                'thumbnail': 're:http://.*\.jpg',
+                'duration': 4622,
+                'timestamp': 1251622800,
+                'upload_date': '20090830',
+            },
+        }, {
+            'info_dict': {
+                'id': '9737_part2',
+                'display_id': 'mlss09uk_bishop_ibi_part2',
+                'ext': 'wmv',
+                'title': 'Introduction To Bayesian Inference (Part 2)',
+                'thumbnail': 're:http://.*\.jpg',
+                'duration': 5641,
+                'timestamp': 1251622800,
+                'upload_date': '20090830',
+            },
+        }],
+        'playlist_count': 2,
+    }]
+
+    def _real_extract(self, url):
+        lecture_slug, explicit_part_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(url, lecture_slug)
+
+        cfg = self._parse_json(self._search_regex(
+            [r'cfg\s*:\s*({.+?})\s*,\s*[\da-zA-Z_]+\s*:\s*\(?\s*function',
+             r'cfg\s*:\s*({[^}]+})'],
+            webpage, 'cfg'), lecture_slug, js_to_json)
+
+        lecture_id = compat_str(cfg['obj_id'])
+
+        base_url = self._proto_relative_url(cfg['livepipe'], 'http:')
+
+        lecture_data = self._download_json(
+            '%s/site/api/lecture/%s?format=json' % (base_url, lecture_id),
+            lecture_id)['lecture'][0]
+
+        lecture_info = {
+            'id': lecture_id,
+            'display_id': lecture_slug,
+            'title': lecture_data['title'],
+            'timestamp': parse_iso8601(lecture_data.get('time')),
+            'description': lecture_data.get('description_wiki'),
+            'thumbnail': lecture_data.get('thumb'),
+        }
+
+        playlist_entries = []
+        lecture_type = lecture_data.get('type')
+        parts = [compat_str(video) for video in cfg.get('videos', [])]
+        if parts:
+            multipart = len(parts) > 1
+
+            def extract_part(part_id):
+                smil_url = '%s/%s/video/%s/smil.xml' % (base_url, lecture_slug, part_id)
+                smil = self._download_smil(smil_url, lecture_id)
+                info = self._parse_smil(smil, smil_url, lecture_id)
+                info['id'] = lecture_id if not multipart else '%s_part%s' % (lecture_id, part_id)
+                info['display_id'] = lecture_slug if not multipart else '%s_part%s' % (lecture_slug, part_id)
+                if multipart:
+                    info['title'] += ' (Part %s)' % part_id
+                switch = smil.find('.//switch')
+                if switch is not None:
+                    info['duration'] = parse_duration(switch.attrib.get('dur'))
+                item_info = lecture_info.copy()
+                item_info.update(info)
+                return item_info
+
+            if explicit_part_id or not multipart:
+                result = extract_part(explicit_part_id or parts[0])
+            else:
+                result = {
+                    '_type': 'multi_video',
+                    'entries': [extract_part(part) for part in parts],
+                }
+                result.update(lecture_info)
+
+            # Immediately return explicitly requested part or non event item
+            if explicit_part_id or lecture_type != 'evt':
+                return result
+
+            playlist_entries.append(result)
+
+        # It's probably a playlist
+        if not parts or lecture_type == 'evt':
+            playlist_webpage = self._download_webpage(
+                '%s/site/ajax/drilldown/?id=%s' % (base_url, lecture_id), lecture_id)
+            entries = [
+                self.url_result(compat_urlparse.urljoin(url, video_url), 'Viidea')
+                for _, video_url in re.findall(
+                    r'<a[^>]+href=(["\'])(.+?)\1[^>]+id=["\']lec=\d+', playlist_webpage)]
+            playlist_entries.extend(entries)
+
+        playlist = self.playlist_result(playlist_entries, lecture_id)
+        playlist.update(lecture_info)
+        return playlist
diff --git a/youtube_dl/extractor/viki.py b/youtube_dl/extractor/viki.py

index ddbd395c89ee183dee719f549cbe534c64835769..a63c236177eb6f232cc544f3b8406cbe934d7d5c 100644 (file)
--- a/youtube_dl/extractor/viki.py
+++ b/youtube_dl/extractor/viki.py
@@ -7,14 +7,14 @@ import hmac
  import hashlib
  import itertools
  
+from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
      parse_age_limit,
      parse_iso8601,
+    sanitized_Request,
  )
-from ..compat import compat_urllib_request
-from .common import InfoExtractor
  
  
  class VikiBaseIE(InfoExtractor):
@@ -43,7 +43,7 @@ class VikiBaseIE(InfoExtractor):
              hashlib.sha1
          ).hexdigest()
          url = self._API_URL_TEMPLATE % (query, sig)
-        return compat_urllib_request.Request(
+        return sanitized_Request(
              url, json.dumps(post_data).encode('utf-8')) if post_data else url
  
      def _call_api(self, path, video_id, note, timestamp=None, post_data=None):
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index 2437ae1eb2cfafc34441d00a6a5b725e78f7cbb3..f392ccf1cda14b5667745ae36c36f8a2fa201797 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -8,15 +8,15 @@ import itertools
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
+    encode_dict,
      ExtractorError,
      InAdvancePagedList,
      int_or_none,
      RegexNotFoundError,
+    sanitized_Request,
      smuggle_url,
      std_headers,
      unified_strdate,
@@ -40,17 +40,17 @@ class VimeoBaseInfoExtractor(InfoExtractor):
          self.report_login()
          webpage = self._download_webpage(self._LOGIN_URL, None, False)
          token, vuid = self._extract_xsrft_and_vuid(webpage)
-        data = urlencode_postdata({
+        data = urlencode_postdata(encode_dict({
              'action': 'login',
              'email': username,
              'password': password,
              'service': 'vimeo',
              'token': token,
-        })
-        login_request = compat_urllib_request.Request(self._LOGIN_URL, data)
+        }))
+        login_request = sanitized_Request(self._LOGIN_URL, data)
          login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        login_request.add_header('Cookie', 'vuid=%s' % vuid)
          login_request.add_header('Referer', self._LOGIN_URL)
+        self._set_vimeo_cookie('vuid', vuid)
          self._download_webpage(login_request, None, False, 'Wrong login info')
  
      def _extract_xsrft_and_vuid(self, webpage):
@@ -62,6 +62,9 @@ class VimeoBaseInfoExtractor(InfoExtractor):
              webpage, 'vuid', group='vuid')
          return xsrft, vuid
  
+    def _set_vimeo_cookie(self, name, value):
+        self._set_cookie('vimeo.com', name, value)
+
  
  class VimeoIE(VimeoBaseInfoExtractor):
      """Information extractor for vimeo.com."""
@@ -186,6 +189,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'note': 'Video not completely processed, "failed" seed status',
              'only_matching': True,
          },
+        {
+            'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
+            'only_matching': True,
+        },
      ]
  
      @staticmethod
@@ -208,17 +215,17 @@ class VimeoIE(VimeoBaseInfoExtractor):
          if password is None:
              raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
          token, vuid = self._extract_xsrft_and_vuid(webpage)
-        data = urlencode_postdata({
+        data = urlencode_postdata(encode_dict({
              'password': password,
              'token': token,
-        })
+        }))
          if url.startswith('http://'):
              # vimeo only supports https now, but the user can give an http url
              url = url.replace('http://', 'https://')
-        password_request = compat_urllib_request.Request(url + '/password', data)
+        password_request = sanitized_Request(url + '/password', data)
          password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        password_request.add_header('Cookie', 'clip_test2=1; vuid=%s' % vuid)
          password_request.add_header('Referer', url)
+        self._set_vimeo_cookie('vuid', vuid)
          return self._download_webpage(
              password_request, video_id,
              'Verifying the password', 'Wrong password')
@@ -227,9 +234,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
          password = self._downloader.params.get('videopassword', None)
          if password is None:
              raise ExtractorError('This video is protected by a password, use the --video-password option')
-        data = compat_urllib_parse.urlencode({'password': password})
+        data = urlencode_postdata(encode_dict({'password': password}))
          pass_url = url + '/check-password'
-        password_request = compat_urllib_request.Request(pass_url, data)
+        password_request = sanitized_Request(pass_url, data)
          password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          return self._download_json(
              password_request, video_id,
@@ -258,7 +265,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              url = 'https://vimeo.com/' + video_id
  
          # Retrieve video webpage to extract further information
-        request = compat_urllib_request.Request(url, None, headers)
+        request = sanitized_Request(url, None, headers)
          try:
              webpage = self._download_webpage(request, video_id)
          except ExtractorError as ee:
@@ -384,47 +391,29 @@ class VimeoIE(VimeoBaseInfoExtractor):
              like_count = None
              comment_count = None
  
-        # Vimeo specific: extract request signature and timestamp
-        sig = config['request']['signature']
-        timestamp = config['request']['timestamp']
-
-        # Vimeo specific: extract video codec and quality information
-        # First consider quality, then codecs, then take everything
-        codecs = [('vp6', 'flv'), ('vp8', 'flv'), ('h264', 'mp4')]
-        files = {'hd': [], 'sd': [], 'other': []}
-        config_files = config["video"].get("files") or config["request"].get("files")
-        for codec_name, codec_extension in codecs:
-            for quality in config_files.get(codec_name, []):
-                format_id = '-'.join((codec_name, quality)).lower()
-                key = quality if quality in files else 'other'
-                video_url = None
-                if isinstance(config_files[codec_name], dict):
-                    file_info = config_files[codec_name][quality]
-                    video_url = file_info.get('url')
-                else:
-                    file_info = {}
-                if video_url is None:
-                    video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
-                        % (video_id, sig, timestamp, quality, codec_name.upper())
-
-                files[key].append({
-                    'ext': codec_extension,
-                    'url': video_url,
-                    'format_id': format_id,
-                    'width': int_or_none(file_info.get('width')),
-                    'height': int_or_none(file_info.get('height')),
-                    'tbr': int_or_none(file_info.get('bitrate')),
-                })
          formats = []
-        m3u8_url = config_files.get('hls', {}).get('all')
+        config_files = config['video'].get('files') or config['request'].get('files', {})
+        for f in config_files.get('progressive', []):
+            video_url = f.get('url')
+            if not video_url:
+                continue
+            formats.append({
+                'url': video_url,
+                'format_id': 'http-%s' % f.get('quality'),
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'fps': int_or_none(f.get('fps')),
+                'tbr': int_or_none(f.get('bitrate')),
+            })
+        m3u8_url = config_files.get('hls', {}).get('url')
          if m3u8_url:
              m3u8_formats = self._extract_m3u8_formats(
                  m3u8_url, video_id, 'mp4', 'm3u8_native', 0, 'hls', fatal=False)
              if m3u8_formats:
                  formats.extend(m3u8_formats)
-        for key in ('other', 'sd', 'hd'):
-            formats += files[key]
-        self._sort_formats(formats)
+        # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
+        # at the same time without actual units specified. This lead to wrong sorting.
+        self._sort_formats(formats, field_preference=('height', 'width', 'fps', 'format_id'))
  
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
@@ -488,21 +477,20 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
          token, vuid = self._extract_xsrft_and_vuid(webpage)
          fields['token'] = token
          fields['password'] = password
-        post = urlencode_postdata(fields)
+        post = urlencode_postdata(encode_dict(fields))
          password_path = self._search_regex(
              r'action="([^"]+)"', login_form, 'password URL')
          password_url = compat_urlparse.urljoin(page_url, password_path)
-        password_request = compat_urllib_request.Request(password_url, post)
+        password_request = sanitized_Request(password_url, post)
          password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
-        password_request.add_header('Cookie', 'vuid=%s' % vuid)
-        self._set_cookie('vimeo.com', 'xsrft', token)
+        self._set_vimeo_cookie('vuid', vuid)
+        self._set_vimeo_cookie('xsrft', token)
  
          return self._download_webpage(
              password_request, list_id,
              'Verifying the password', 'Wrong password')
  
-    def _extract_videos(self, list_id, base_url):
-        video_ids = []
+    def _title_and_entries(self, list_id, base_url):
          for pagenum in itertools.count(1):
              page_url = self._page_url(base_url, pagenum)
              webpage = self._download_webpage(
@@ -511,18 +499,18 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
  
              if pagenum == 1:
                  webpage = self._login_list_password(page_url, list_id, webpage)
+                yield self._extract_list_title(webpage)
+
+            for video_id in re.findall(r'id="clip_(\d+?)"', webpage):
+                yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
  
-            video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
              if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
                  break
  
-        entries = [self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
-                   for video_id in video_ids]
-        return {'_type': 'playlist',
-                'id': list_id,
-                'title': self._extract_list_title(webpage),
-                'entries': entries,
-                }
+    def _extract_videos(self, list_id, base_url):
+        title_and_entries = self._title_and_entries(list_id, base_url)
+        list_title = next(title_and_entries)
+        return self.playlist_result(title_and_entries, list_id, list_title)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -583,7 +571,7 @@ class VimeoAlbumIE(VimeoChannelIE):
  
  class VimeoGroupsIE(VimeoAlbumIE):
      IE_NAME = 'vimeo:group'
-    _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)'
+    _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)(?:/(?!videos?/\d+)|$)'
      _TESTS = [{
          'url': 'https://vimeo.com/groups/rolexawards',
          'info_dict': {
@@ -652,7 +640,7 @@ class VimeoWatchLaterIE(VimeoChannelIE):
  
      def _page_url(self, base_url, pagenum):
          url = '%s/page:%d/' % (base_url, pagenum)
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          # Set the header to get a partial html page with the ids,
          # the normal page doesn't contain them.
          request.add_header('X-Requested-With', 'XMLHttpRequest')
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index 765e9e6fd4088f10b7fdd01652e8c62f2dcd05cd..d99a42a9f1dd982b50cc2b324714d6a75cb15d9b 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -8,11 +8,11 @@ from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
      orderedSet,
+    sanitized_Request,
      str_to_int,
      unescapeHTML,
      unified_strdate,
@@ -182,7 +182,7 @@ class VKIE(InfoExtractor):
              'pass': password.encode('cp1251'),
          })
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://login.vk.com/?act=login',
              compat_urllib_parse.urlencode(login_form).encode('utf-8'))
          login_page = self._download_webpage(
@@ -281,9 +281,13 @@ class VKIE(InfoExtractor):
              mobj.group(1) + ' ' + mobj.group(2)
              upload_date = unified_strdate(mobj.group(1) + ' ' + mobj.group(2))
  
-        view_count = str_to_int(self._search_regex(
-            r'"mv_views_count_number"[^>]*>([\d,.]+) views<',
-            info_page, 'view count', fatal=False))
+        view_count = None
+        views = self._html_search_regex(
+            r'"mv_views_count_number"[^>]*>(.+?\bviews?)<',
+            info_page, 'view count', fatal=False)
+        if views:
+            view_count = str_to_int(self._search_regex(
+                r'([\d,.]+)', views, 'view count', fatal=False))
  
          formats = [{
              'format_id': k,
diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py

index ccf1928b5d323f277b4e8a47bd4d008e821b147c..357594a11debd4e4946e6fd29b0f2b4d4fb241b9 100644 (file)
--- a/youtube_dl/extractor/vodlocker.py
+++ b/youtube_dl/extractor/vodlocker.py
@@ -2,14 +2,15 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..compat import compat_urllib_parse
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
  )
  
  
  class VodlockerIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vodlocker\.com/(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
+    _VALID_URL = r'https?://(?:www\.)?vodlocker\.com/(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
  
      _TESTS = [{
          'url': 'http://vodlocker.com/e8wvyzz4sl42',
@@ -26,12 +27,18 @@ class VodlockerIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
+        if any(p in webpage for p in (
+                '>THIS FILE WAS DELETED<',
+                '>File Not Found<',
+                'The file you were looking for could not be found, sorry for any inconvenience.<')):
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
          fields = self._hidden_inputs(webpage)
  
          if fields['op'] == 'download1':
              self._sleep(3, video_id)  # they do detect when requests happen too fast!
              post = compat_urllib_parse.urlencode(fields)
-            req = compat_urllib_request.Request(url, post)
+            req = sanitized_Request(url, post)
              req.add_header('Content-type', 'application/x-www-form-urlencoded')
              webpage = self._download_webpage(
                  req, video_id, 'Downloading video page')
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 254383d6cf0d6267e0423db5f9a1a3143161e314..93d15a556dedb6e0589dc5f393e0a01ad5b4a8a0 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -3,14 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -37,7 +35,7 @@ class VoiceRepublicIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              compat_urlparse.urljoin(url, '/talks/%s' % display_id))
          # Older versions of Firefox get redirected to an "upgrade browser" page
          req.add_header('User-Agent', 'youtube-dl')
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

index 13a079151c9c879561e3e538c49f3122f85b349b..fdb16d91c25ae4a3d3c8314a76050d09cf86ef3a 100644 (file)
--- a/youtube_dl/extractor/wistia.py
+++ b/youtube_dl/extractor/wistia.py
@@ -1,8 +1,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
+)
  
  
  class WistiaIE(InfoExtractor):
@@ -23,7 +25,7 @@ class WistiaIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        request = compat_urllib_request.Request(self._API_URL.format(video_id))
+        request = sanitized_Request(self._API_URL.format(video_id))
          request.add_header('Referer', url)  # Some videos require this.
          data_json = self._download_json(request, video_id)
          if data_json.get('error'):
diff --git a/youtube_dl/extractor/wsj.py b/youtube_dl/extractor/wsj.py

index 2ddf29a694ec6365e9089bc18536320489b4d2c3..5a897371d1d69a95e08f7b4da4d457b3236e09cc 100644 (file)
--- a/youtube_dl/extractor/wsj.py
+++ b/youtube_dl/extractor/wsj.py
@@ -84,6 +84,5 @@ class WSJIE(InfoExtractor):
              'duration': duration,
              'upload_date': upload_date,
              'title': title,
-            'formats': formats,
              'categories': categories,
          }
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

new file mode 100644 (file)

index 0000000..a3236e6
--- /dev/null
+++ b/youtube_dl/extractor/xfileshare.py
@@ -0,0 +1,136 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse
+from ..utils import (
+    ExtractorError,
+    encode_dict,
+    int_or_none,
+    sanitized_Request,
+)
+
+
+class XFileShareIE(InfoExtractor):
+    IE_DESC = 'XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me'
+    _VALID_URL = r'''(?x)
+        https?://(?P<host>(?:www\.)?
+            (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com|vidto\.me))/
+        (?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
+    '''
+
+    _FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<'
+
+    _TESTS = [{
+        'url': 'http://gorillavid.in/06y9juieqpmi',
+        'md5': '5ae4a3580620380619678ee4875893ba',
+        'info_dict': {
+            'id': '06y9juieqpmi',
+            'ext': 'flv',
+            'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://daclips.in/3rso4kdn6f9m',
+        'md5': '1ad8fd39bb976eeb66004d3a4895f106',
+        'info_dict': {
+            'id': '3rso4kdn6f9m',
+            'ext': 'mp4',
+            'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
+            'thumbnail': 're:http://.*\.jpg',
+        }
+    }, {
+        # video with countdown timeout
+        'url': 'http://fastvideo.in/1qmdn1lmsmbw',
+        'md5': '8b87ec3f6564a3108a0e8e66594842ba',
+        'info_dict': {
+            'id': '1qmdn1lmsmbw',
+            'ext': 'mp4',
+            'title': 'Man of Steel - Trailer',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://realvid.net/ctn2y6p2eviw',
+        'md5': 'b2166d2cf192efd6b6d764c18fd3710e',
+        'info_dict': {
+            'id': 'ctn2y6p2eviw',
+            'ext': 'flv',
+            'title': 'rdx 1955',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://movpod.in/0wguyyxi1yca',
+        'only_matching': True,
+    }, {
+        'url': 'http://filehoot.com/3ivfabn7573c.html',
+        'info_dict': {
+            'id': '3ivfabn7573c',
+            'ext': 'mp4',
+            'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
+            'thumbnail': 're:http://.*\.jpg',
+        }
+    }, {
+        'url': 'http://vidto.me/ku5glz52nqe1.html',
+        'info_dict': {
+            'id': 'ku5glz52nqe1',
+            'ext': 'mp4',
+            'title': 'test'
+        }
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        url = 'http://%s/%s' % (mobj.group('host'), video_id)
+        webpage = self._download_webpage(url, video_id)
+
+        if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
+        fields = self._hidden_inputs(webpage)
+
+        if fields['op'] == 'download1':
+            countdown = int_or_none(self._search_regex(
+                r'<span id="countdown_str">(?:[Ww]ait)?\s*<span id="cxc">(\d+)</span>\s*(?:seconds?)?</span>',
+                webpage, 'countdown', default=None))
+            if countdown:
+                self._sleep(countdown, video_id)
+
+            post = compat_urllib_parse.urlencode(encode_dict(fields))
+
+            req = sanitized_Request(url, post)
+            req.add_header('Content-type', 'application/x-www-form-urlencoded')
+
+            webpage = self._download_webpage(req, video_id, 'Downloading video page')
+
+        title = (self._search_regex(
+            [r'style="z-index: [0-9]+;">([^<]+)</span>',
+             r'<td nowrap>([^<]+)</td>',
+             r'>Watch (.+) ',
+             r'<h2 class="video-page-head">([^<]+)</h2>'],
+            webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
+        video_url = self._search_regex(
+            [r'file\s*:\s*["\'](http[^"\']+)["\'],',
+             r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'],
+            webpage, 'file url')
+        thumbnail = self._search_regex(
+            r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
+
+        formats = [{
+            'format_id': 'sd',
+            'url': video_url,
+            'quality': 1,
+        }]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/xtube.py b/youtube_dl/extractor/xtube.py

index 779e4f46a1dd5315c6a9be3dad09e65c07a205b2..a1fe24050003273f736d605fa662318e4dc6e17d 100644 (file)
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -3,12 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse_unquote,
-)
+from ..compat import compat_urllib_parse_unquote
  from ..utils import (
      parse_duration,
+    sanitized_Request,
      str_to_int,
  )
  
@@ -32,7 +30,7 @@ class XTubeIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/xvideos.py b/youtube_dl/extractor/xvideos.py

index 5dcf2fdd12f9140f0bd373fd5db41c93f4b18b38..710ad5041988b0e1c932b135af91a27036dfd664 100644 (file)
--- a/youtube_dl/extractor/xvideos.py
+++ b/youtube_dl/extractor/xvideos.py
@@ -3,14 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_unquote
  from ..utils import (
      clean_html,
      ExtractorError,
      determine_ext,
+    sanitized_Request,
  )
  
  
@@ -48,7 +46,7 @@ class XVideosIE(InfoExtractor):
              'url': video_url,
          }]
  
-        android_req = compat_urllib_request.Request(url)
+        android_req = sanitized_Request(url)
          android_req.add_header('User-Agent', self._ANDROID_USER_AGENT)
          android_webpage = self._download_webpage(android_req, video_id, fatal=False)
  
diff --git a/youtube_dl/extractor/yandexmusic.py b/youtube_dl/extractor/yandexmusic.py

index 08dc81f3a0354213a7faadb7c9800c03af2368b5..d3cc1a29fa473fee2f58e91323774633be00fc4b 100644 (file)
--- a/youtube_dl/extractor/yandexmusic.py
+++ b/youtube_dl/extractor/yandexmusic.py
@@ -8,11 +8,11 @@ from .common import InfoExtractor
  from ..compat import (
      compat_str,
      compat_urllib_parse,
-    compat_urllib_request,
  )
  from ..utils import (
      int_or_none,
      float_or_none,
+    sanitized_Request,
  )
  
  
@@ -154,7 +154,7 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
          if len(tracks) < len(track_ids):
              present_track_ids = set([compat_str(track['id']) for track in tracks if track.get('id')])
              missing_track_ids = set(map(compat_str, track_ids)) - set(present_track_ids)
-            request = compat_urllib_request.Request(
+            request = sanitized_Request(
                  'https://music.yandex.ru/handlers/track-entries.jsx',
                  compat_urllib_parse.urlencode({
                      'entries': ','.join(missing_track_ids),
diff --git a/youtube_dl/extractor/youku.py b/youtube_dl/extractor/youku.py

index 2e81d92238307e8914769d6fc48d03befd6af2bf..69ecc837a4d6d94c82a1055cb7cc41d6e6663763 100644 (file)
--- a/youtube_dl/extractor/youku.py
+++ b/youtube_dl/extractor/youku.py
@@ -4,12 +4,13 @@ from __future__ import unicode_literals
  import base64
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
-
  from ..compat import (
      compat_urllib_parse,
      compat_ord,
-    compat_urllib_request,
+)
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -187,7 +188,7 @@ class YoukuIE(InfoExtractor):
          video_id = self._match_id(url)
  
          def retrieve_data(req_url, note):
-            req = compat_urllib_request.Request(req_url)
+            req = sanitized_Request(req_url)
  
              cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
              if cn_verification_proxy:
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index 9bf8d1eeb741c6fdf189e96ef0ccb0a6b88e2d9d..dd724085add2adbcacde458a23902cf07395382f 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -3,9 +3,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      int_or_none,
+    sanitized_Request,
      str_to_int,
      unescapeHTML,
      unified_strdate,
@@ -63,7 +63,7 @@ class YouPornIE(InfoExtractor):
          video_id = mobj.group('id')
          display_id = mobj.group('display_id')
  
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          request.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(request, display_id)
  
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index d7eda7aa72d29987c8d5f460e224387b38b69977..9b39505ba71cf09880e6d8fcec1910b8806204c0 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -20,7 +20,6 @@ from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
      compat_str,
  )
@@ -35,6 +34,7 @@ from ..utils import (
      orderedSet,
      parse_duration,
      remove_start,
+    sanitized_Request,
      smuggle_url,
      str_to_int,
      unescapeHTML,
@@ -114,7 +114,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
  
          login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('ascii')
  
-        req = compat_urllib_request.Request(self._LOGIN_URL, login_data)
+        req = sanitized_Request(self._LOGIN_URL, login_data)
          login_results = self._download_webpage(
              req, None,
              note='Logging in', errnote='unable to log in', fatal=False)
@@ -147,7 +147,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
  
              tfa_data = compat_urllib_parse.urlencode(encode_dict(tfa_form_strs)).encode('ascii')
  
-            tfa_req = compat_urllib_request.Request(self._TWOFACTOR_URL, tfa_data)
+            tfa_req = sanitized_Request(self._TWOFACTOR_URL, tfa_data)
              tfa_results = self._download_webpage(
                  tfa_req, None,
                  note='Submitting TFA code', errnote='unable to submit tfa', fatal=False)
@@ -178,15 +178,13 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
              return
  
  
-class YoutubePlaylistBaseInfoExtractor(InfoExtractor):
-    # Extract the video ids from the playlist pages
+class YoutubeEntryListBaseInfoExtractor(InfoExtractor):
+    # Extract entries from page with "Load more" button
      def _entries(self, page, playlist_id):
          more_widget_html = content_html = page
          for page_num in itertools.count(1):
-            for video_id, video_title in self.extract_videos_from_page(content_html):
-                yield self.url_result(
-                    video_id, 'Youtube', video_id=video_id,
-                    video_title=video_title)
+            for entry in self._process_page(content_html):
+                yield entry
  
              mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
              if not mobj:
@@ -203,6 +201,12 @@ class YoutubePlaylistBaseInfoExtractor(InfoExtractor):
                  break
              more_widget_html = more['load_more_widget_html']
  
+
+class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+    def _process_page(self, content):
+        for video_id, video_title in self.extract_videos_from_page(content):
+            yield self.url_result(video_id, 'Youtube', video_id, video_title)
+
      def extract_videos_from_page(self, page):
          ids_in_page = []
          titles_in_page = []
@@ -224,6 +228,19 @@ class YoutubePlaylistBaseInfoExtractor(InfoExtractor):
          return zip(ids_in_page, titles_in_page)
  
  
+class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+    def _process_page(self, content):
+        for playlist_id in re.findall(r'href="/?playlist\?list=(.+?)"', content):
+            yield self.url_result(
+                'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist')
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        webpage = self._download_webpage(url, playlist_id)
+        title = self._og_search_title(webpage, fatal=False)
+        return self.playlist_result(self._entries(webpage, playlist_id), playlist_id, title)
+
+
  class YoutubeIE(YoutubeBaseInfoExtractor):
      IE_DESC = 'YouTube.com'
      _VALID_URL = r"""(?x)^
@@ -241,7 +258,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                               |(?:                                             # or the v= param in all its forms
                                   (?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)?  # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
                                   (?:\?|\#!?)                                  # the params delimiter ? or # or #!
-                                 (?:.*?&)??                                   # any other preceding param (like /?s=tuff&v=xxxx)
+                                 (?:.*?[&;])??                                # any other preceding param (like /?s=tuff&v=xxxx or ?s=tuff&amp;v=V36LpHqtcDY)
                                   v=
                               )
                           ))
@@ -329,6 +346,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          '247': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
          '248': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
          '271': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
+        # itag 272 videos are either 3840x2160 (e.g. RtoitU2A-3E) or 7680x4320 (sLprVF6d7Ug)
          '272': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
          '302': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
          '303': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
@@ -409,7 +427,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Principal Sexually Assaults A Teacher - Episode 117 - 8th June 2012',
                  'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
                  'uploader': 'SET India',
-                'uploader_id': 'setindia'
+                'uploader_id': 'setindia',
+                'age_limit': 18,
              }
          },
          {
@@ -546,7 +565,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'lqQg6PlCWgI',
                  'ext': 'mp4',
-                'upload_date': '20120724',
+                'upload_date': '20150827',
                  'uploader_id': 'olympic',
                  'description': 'HO09  - Women -  GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
                  'uploader': 'Olympics',
@@ -674,6 +693,47 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          {
              'url': 'http://vid.plus/FlRa-iH7PGw',
              'only_matching': True,
+        },
+        {
+            # Title with JS-like syntax "};" (see https://github.com/rg3/youtube-dl/issues/7468)
+            'url': 'https://www.youtube.com/watch?v=lsguqyKfVQg',
+            'info_dict': {
+                'id': 'lsguqyKfVQg',
+                'ext': 'mp4',
+                'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
+                'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
+                'upload_date': '20151119',
+                'uploader_id': 'IronSoulElf',
+                'uploader': 'IronSoulElf',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            # Tags with '};' (see https://github.com/rg3/youtube-dl/issues/7468)
+            'url': 'https://www.youtube.com/watch?v=Ms7iBXnlUO8',
+            'only_matching': True,
+        },
+        {
+            # Video with yt:stretch=17:0
+            'url': 'https://www.youtube.com/watch?v=Q39EVAstoRM',
+            'info_dict': {
+                'id': 'Q39EVAstoRM',
+                'ext': 'mp4',
+                'title': 'Clash Of Clans#14 Dicas De Ataque Para CV 4',
+                'description': 'md5:ee18a25c350637c8faff806845bddee9',
+                'upload_date': '20151107',
+                'uploader_id': 'UCCr7TALkRbo3EtFzETQF1LA',
+                'uploader': 'CH GAMER DROID',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY',
+            'only_matching': True,
          }
      ]
  
@@ -703,7 +763,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
  
      def _extract_signature_function(self, video_id, player_url, example_sig):
          id_m = re.match(
-            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?)?\.(?P<ext>[a-z]+)$',
+            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|/base)?\.(?P<ext>[a-z]+)$',
              player_url)
          if not id_m:
              raise ExtractorError('Cannot identify player %r' % player_url)
@@ -858,16 +918,33 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              return {}
          return sub_lang_list
  
+    def _get_ytplayer_config(self, video_id, webpage):
+        patterns = (
+            # User data may contain arbitrary character sequences that may affect
+            # JSON extraction with regex, e.g. when '};' is contained the second
+            # regex won't capture the whole JSON. Yet working around by trying more
+            # concrete regex first keeping in mind proper quoted string handling
+            # to be implemented in future that will replace this workaround (see
+            # https://github.com/rg3/youtube-dl/issues/7468,
+            # https://github.com/rg3/youtube-dl/pull/7599)
+            r';ytplayer\.config\s*=\s*({.+?});ytplayer',
+            r';ytplayer\.config\s*=\s*({.+?});',
+        )
+        config = self._search_regex(
+            patterns, webpage, 'ytplayer.config', default=None)
+        if config:
+            return self._parse_json(
+                uppercase_escape(config), video_id, fatal=False)
+
      def _get_automatic_captions(self, video_id, webpage):
          """We need the webpage for getting the captions url, pass it as an
             argument to speed up the process."""
          self.to_screen('%s: Looking for automatic captions' % video_id)
-        mobj = re.search(r';ytplayer.config = ({.*?});', webpage)
+        player_config = self._get_ytplayer_config(video_id, webpage)
          err_msg = 'Couldn\'t find automatic captions for %s' % video_id
-        if mobj is None:
+        if not player_config:
              self._downloader.report_warning(err_msg)
              return {}
-        player_config = json.loads(mobj.group(1))
          try:
              args = player_config['args']
              caption_url = args['ttsurl']
@@ -1074,10 +1151,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              age_gate = False
              video_info = None
              # Try looking directly into the video webpage
-            mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
-            if mobj:
-                json_code = uppercase_escape(mobj.group(1))
-                ytplayer_config = json.loads(json_code)
+            ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
+            if ytplayer_config:
                  args = ytplayer_config['args']
                  if args.get('url_encoded_fmt_stream_map'):
                      # Convert to the same format returned by compat_parse_qs
@@ -1107,6 +1182,17 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      if not video_info:
                          video_info = get_video_info
                      if 'token' in get_video_info:
+                        # Different get_video_info requests may report different results, e.g.
+                        # some may report video unavailability, but some may serve it without
+                        # any complaint (see https://github.com/rg3/youtube-dl/issues/7362,
+                        # the original webpage as well as el=info and el=embedded get_video_info
+                        # requests report video unavailability due to geo restriction while
+                        # el=detailpage succeeds and returns valid data). This is probably
+                        # due to YouTube measures against IP ranges of hosting providers.
+                        # Working around by preferring the first succeeded video_info containing
+                        # the token if no such video_info yet was found.
+                        if 'token' not in video_info:
+                            video_info = get_video_info
                          break
          if 'token' not in video_info:
              if 'reason' in video_info:
@@ -1332,7 +1418,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                                  player_desc = 'flash player %s' % player_version
                              else:
                                  player_version = self._search_regex(
-                                    r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
+                                    [r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js', r'(?:www|player)-([^/]+)/base\.js'],
                                      player_url,
                                      'html5 player', fatal=False)
                                  player_desc = 'html5 player %s' % player_version
@@ -1394,6 +1480,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              manifest_url = video_info['hlsvp'][0]
              url_map = self._extract_from_m3u8(manifest_url, video_id)
              formats = _map_to_format_list(url_map)
+            # Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
+            for a_format in formats:
+                a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
          else:
              raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
  
@@ -1431,10 +1520,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              r'<meta\s+property="og:video:tag".*?content="yt:stretch=(?P<w>[0-9]+):(?P<h>[0-9]+)">',
              video_webpage)
          if stretched_m:
-            ratio = float(stretched_m.group('w')) / float(stretched_m.group('h'))
-            for f in formats:
-                if f.get('vcodec') != 'none':
-                    f['stretched_ratio'] = ratio
+            w = float(stretched_m.group('w'))
+            h = float(stretched_m.group('h'))
+            # yt:stretch may hold invalid ratio data (e.g. for Q39EVAstoRM ratio is 17:0).
+            # We will only process correct ratios.
+            if w > 0 and h > 0:
+                ratio = w / h
+                for f in formats:
+                    if f.get('vcodec') != 'none':
+                        f['stretched_ratio'] = ratio
  
          self._sort_formats(formats)
  
@@ -1473,7 +1567,7 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor, YoutubePlaylistBaseInfoExtract
                          youtube\.com/
                          (?:
                             (?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
-                           \? (?:.*?&)*? (?:p|a|list)=
+                           \? (?:.*?[&;])*? (?:p|a|list)=
                          |  p/
                          )
                          (
@@ -1604,7 +1698,7 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor, YoutubePlaylistBaseInfoExtract
                  self.report_warning('Youtube gives an alert message: ' + match)
  
          playlist_title = self._html_search_regex(
-            r'(?s)<h1 class="pl-header-title[^"]*">\s*(.*?)\s*</h1>',
+            r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
              page, 'title')
  
          return self.playlist_result(self._entries(page, playlist_id), playlist_id, playlist_title)
@@ -1731,6 +1825,29 @@ class YoutubeUserIE(YoutubeChannelIE):
              return super(YoutubeUserIE, cls).suitable(url)
  
  
+class YoutubeUserPlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
+    IE_DESC = 'YouTube.com user playlists'
+    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/user/(?P<id>[^/]+)/playlists'
+    IE_NAME = 'youtube:user:playlists'
+
+    _TESTS = [{
+        'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
+        'playlist_mincount': 4,
+        'info_dict': {
+            'id': 'ThirstForScience',
+            'title': 'Thirst for Science',
+        },
+    }, {
+        # with "Load more" button
+        'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
+        'playlist_mincount': 70,
+        'info_dict': {
+            'id': 'igorkle1',
+            'title': 'Игорь Клейнер',
+        },
+    }]
+
+
  class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
      IE_DESC = 'YouTube.com searches'
      # there doesn't appear to be a real limit, for example if you search for
@@ -1826,7 +1943,7 @@ class YoutubeSearchURLIE(InfoExtractor):
          }
  
  
-class YoutubeShowIE(InfoExtractor):
+class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
      IE_DESC = 'YouTube.com (multi-season) shows'
      _VALID_URL = r'https?://www\.youtube\.com/show/(?P<id>[^?#]*)'
      IE_NAME = 'youtube:show'
@@ -1840,26 +1957,9 @@ class YoutubeShowIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        playlist_id = mobj.group('id')
-        webpage = self._download_webpage(
-            'https://www.youtube.com/show/%s/playlists' % playlist_id, playlist_id, 'Downloading show webpage')
-        # There's one playlist for each season of the show
-        m_seasons = list(re.finditer(r'href="(/playlist\?list=.*?)"', webpage))
-        self.to_screen('%s: Found %s seasons' % (playlist_id, len(m_seasons)))
-        entries = [
-            self.url_result(
-                'https://www.youtube.com' + season.group(1), 'YoutubePlaylist')
-            for season in m_seasons
-        ]
-        title = self._og_search_title(webpage, fatal=False)
-
-        return {
-            '_type': 'playlist',
-            'id': playlist_id,
-            'title': title,
-            'entries': entries,
-        }
+        playlist_id = self._match_id(url)
+        return super(YoutubeShowIE, self)._real_extract(
+            'https://www.youtube.com/show/%s/playlists' % playlist_id)
  
  
  class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
diff --git a/youtube_dl/jsinterp.py b/youtube_dl/jsinterp.py

index 0e0c7d90d5aa2fbb8039dddf642ac4692f2974a7..2191e8b8956563830245b574702dc68e6a3b8dbd 100644 (file)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@@ -214,7 +214,7 @@ class JSInterpreter(object):
          obj = {}
          obj_m = re.search(
              (r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
-            r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' +
+            r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' +
              r'\}\s*;',
              self.code)
          fields = obj_m.group('fields')
@@ -232,10 +232,10 @@ class JSInterpreter(object):
      def extract_function(self, funcname):
          func_m = re.search(
              r'''(?x)
-                (?:function\s+%s|[{;]%s\s*=\s*function)\s*
+                (?:function\s+%s|[{;]%s\s*=\s*function|var\s+%s\s*=\s*function)\s*
                  \((?P<args>[^)]*)\)\s*
                  \{(?P<code>[^}]+)\}''' % (
-                re.escape(funcname), re.escape(funcname)),
+                re.escape(funcname), re.escape(funcname), re.escape(funcname)),
              self.code)
          if func_m is None:
              raise ExtractorError('Could not find JS function %r' % funcname)
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 3dd6d290b830615326c872aaa044156e350fb295..c46e136bffd6f7a54b4d3a9e0883f42e63a11591 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -338,7 +338,7 @@ def parseOpts(overrideArguments=None):
      video_format.add_option(
          '-F', '--list-formats',
          action='store_true', dest='listformats',
-        help='List all available formats')
+        help='List all available formats of requested videos')
      video_format.add_option(
          '--youtube-include-dash-manifest',
          action='store_true', dest='youtube_include_dash_manifest', default=True,
@@ -363,7 +363,7 @@ def parseOpts(overrideArguments=None):
      subtitles.add_option(
          '--write-auto-sub', '--write-automatic-sub',
          action='store_true', dest='writeautomaticsub', default=False,
-        help='Write automatic subtitle file (YouTube only)')
+        help='Write automatically generated subtitle file (YouTube only)')
      subtitles.add_option(
          '--all-subs',
          action='store_true', dest='allsubtitles', default=False,
diff --git a/youtube_dl/postprocessor/ffmpeg.py b/youtube_dl/postprocessor/ffmpeg.py

index 4f320e124ae8fcd44f44e2753e73d2408d03efd8..5ed723bc6cae2a464e34fad6c48b5ba71e62dab9 100644 (file)
--- a/youtube_dl/postprocessor/ffmpeg.py
+++ b/youtube_dl/postprocessor/ffmpeg.py
@@ -272,7 +272,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
              return [], information
  
          try:
-            self._downloader.to_screen('[' + self.basename + '] Destination: ' + new_path)
+            self._downloader.to_screen('[ffmpeg] Destination: ' + new_path)
              self.run_ffmpeg(path, new_path, acodec, more_opts)
          except AudioConversionError as e:
              raise PostProcessingError(
diff --git a/youtube_dl/update.py b/youtube_dl/update.py

index fc7ac8305d71c8cce077ef3040cd0903ac9f09c5..074eb64a757e1d4745fa5e29a4569dc02b92f854 100644 (file)
--- a/youtube_dl/update.py
+++ b/youtube_dl/update.py
@@ -9,11 +9,8 @@ import subprocess
  import sys
  from zipimport import zipimporter
  
-from .compat import (
-    compat_str,
-    compat_urllib_request,
-)
-from .utils import make_HTTPS_handler
+from .compat import compat_str
+
  from .version import __version__
  
  
@@ -47,7 +44,7 @@ def rsa_verify(message, signature, key):
      return True
  
  
-def update_self(to_screen, verbose):
+def update_self(to_screen, verbose, opener):
      """Update the program file with the latest version from the repository"""
  
      UPDATE_URL = "https://rg3.github.io/youtube-dl/update/"
@@ -59,9 +56,6 @@ def update_self(to_screen, verbose):
          to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
          return
  
-    https_handler = make_HTTPS_handler({})
-    opener = compat_urllib_request.build_opener(https_handler)
-
      # Check if there is a new version
      try:
          newversion = opener.open(VERSION_URL).read().decode('utf-8').strip()
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 558c9c7d5a21c646a11720221fd1648c137ac68e..d0606b4bcd3d4706912f753441608dff721d7699 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -36,6 +36,7 @@ import zlib
  from .compat import (
      compat_basestring,
      compat_chr,
+    compat_etree_fromstring,
      compat_html_entities,
      compat_http_client,
      compat_kwargs,
@@ -178,10 +179,19 @@ def xpath_with_ns(path, ns_map):
  
  
  def xpath_element(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
-    if sys.version_info < (2, 7):  # Crazy 2.6
-        xpath = xpath.encode('ascii')
+    def _find_xpath(xpath):
+        if sys.version_info < (2, 7):  # Crazy 2.6
+            xpath = xpath.encode('ascii')
+        return node.find(xpath)
+
+    if isinstance(xpath, (str, compat_str)):
+        n = _find_xpath(xpath)
+    else:
+        for xp in xpath:
+            n = _find_xpath(xp)
+            if n is not None:
+                break
  
-    n = node.find(xpath)
      if n is None:
          if default is not NO_DEFAULT:
              return default
@@ -356,13 +366,20 @@ def sanitize_path(s):
      if drive_or_unc:
          norm_path.pop(0)
      sanitized_path = [
-        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|\.$)', '#', path_part)
+        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|[\s.]$)', '#', path_part)
          for path_part in norm_path]
      if drive_or_unc:
          sanitized_path.insert(0, drive_or_unc + os.path.sep)
      return os.path.join(*sanitized_path)
  
  
+# Prepend protocol-less URLs with `http:` scheme in order to mitigate the number of
+# unwanted failures due to missing protocol
+def sanitized_Request(url, *args, **kwargs):
+    return compat_urllib_request.Request(
+        'http:%s' % url if url.startswith('//') else url, *args, **kwargs)
+
+
  def orderedSet(iterable):
      """ Remove all duplicates from the input iterable """
      res = []
@@ -386,10 +403,14 @@ def _htmlentity_transform(entity):
              numstr = '0%s' % numstr
          else:
              base = 10
-        return compat_chr(int(numstr, base))
+        # See https://github.com/rg3/youtube-dl/issues/7518
+        try:
+            return compat_chr(int(numstr, base))
+        except ValueError:
+            pass
  
      # Unknown entity in name, return its literal representation
-    return ('&%s;' % entity)
+    return '&%s;' % entity
  
  
  def unescapeHTML(s):
@@ -642,6 +663,16 @@ def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
      return hc
  
  
+def handle_youtubedl_headers(headers):
+    filtered_headers = headers
+
+    if 'Youtubedl-no-compression' in filtered_headers:
+        filtered_headers = dict((k, v) for k, v in filtered_headers.items() if k.lower() != 'accept-encoding')
+        del filtered_headers['Youtubedl-no-compression']
+
+    return filtered_headers
+
+
  class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
      """Handler for HTTP requests and responses.
  
@@ -649,7 +680,7 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
      the standard headers to every HTTP request and handles gzipped and
      deflated responses from web servers. If compression is to be avoided in
      a particular request, the original request in the program code only has
-    to include the HTTP header "Youtubedl-No-Compression", which will be
+    to include the HTTP header "Youtubedl-no-compression", which will be
      removed before making the real request.
  
      Part of this code was copied from:
@@ -710,10 +741,8 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
              # The dict keys are capitalized because of this bug by urllib
              if h.capitalize() not in req.headers:
                  req.add_header(h, v)
-        if 'Youtubedl-no-compression' in req.headers:
-            if 'Accept-encoding' in req.headers:
-                del req.headers['Accept-encoding']
-            del req.headers['Youtubedl-no-compression']
+
+        req.headers = handle_youtubedl_headers(req.headers)
  
          if sys.version_info < (2, 7) and '#' in req.get_full_url():
              # Python 2.6 is brain-dead when it comes to fragments
@@ -901,7 +930,8 @@ def unified_strdate(date_str, day_first=True):
          timetuple = email.utils.parsedate_tz(date_str)
          if timetuple:
              upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
-    return upload_date
+    if upload_date is not None:
+        return compat_str(upload_date)
  
  
  def determine_ext(url, default_ext='unknown_video'):
@@ -910,6 +940,21 @@ def determine_ext(url, default_ext='unknown_video'):
      guess = url.partition('?')[0].rpartition('.')[2]
      if re.match(r'^[A-Za-z0-9]+$', guess):
          return guess
+    elif guess.rstrip('/') in (
+            'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
+            'flv', 'f4v', 'f4a', 'f4b',
+            'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus',
+            'mkv', 'mka', 'mk3d',
+            'avi', 'divx',
+            'mov',
+            'asf', 'wmv', 'wma',
+            '3gp', '3g2',
+            'mp3',
+            'flac',
+            'ape',
+            'wav',
+            'f4f', 'f4m', 'm3u8', 'smil'):
+        return guess.rstrip('/')
      else:
          return default_ext
  
@@ -1653,30 +1698,9 @@ def urlencode_postdata(*args, **kargs):
  
  
  def encode_dict(d, encoding='utf-8'):
-    return dict((k.encode(encoding), v.encode(encoding)) for k, v in d.items())
-
-
-try:
-    etree_iter = xml.etree.ElementTree.Element.iter
-except AttributeError:  # Python <=2.6
-    etree_iter = lambda n: n.findall('.//*')
-
-
-def parse_xml(s):
-    class TreeBuilder(xml.etree.ElementTree.TreeBuilder):
-        def doctype(self, name, pubid, system):
-            pass  # Ignore doctypes
-
-    parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder())
-    kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {}
-    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
-    # Fix up XML parser in Python 2.x
-    if sys.version_info < (3, 0):
-        for n in etree_iter(tree):
-            if n.text is not None:
-                if not isinstance(n.text, compat_str):
-                    n.text = n.text.decode('utf-8')
-    return tree
+    def encode(v):
+        return v.encode(encoding) if isinstance(v, compat_basestring) else v
+    return dict((encode(k), encode(v)) for k, v in d.items())
  
  
  US_RATINGS = {
@@ -1979,7 +2003,7 @@ def dfxp2srt(dfxp_data):
  
          return out
  
-    dfxp = xml.etree.ElementTree.fromstring(dfxp_data.encode('utf-8'))
+    dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8'))
      out = []
      paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall('.//p')
  
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 125e8ccf57ff2313c6b96a6c0e57c1e4bf12d766..bd0de9f53ce9657cd1ba53ad18ab5b62b2765072 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2015.10.24'
+__version__ = '2015.11.27.1'
author	remitamine <remitamine@gmail.com>
	Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)
committer	remitamine <remitamine@gmail.com>
	Fri, 4 Dec 2015 06:57:58 +0000 (07:57 +0100)
AUTHORS		patch \| blob \| history
Makefile		patch \| blob \| history
README.md		patch \| blob \| history
docs/supportedsites.md		patch \| blob \| history
test/test_compat.py		patch \| blob \| history
test/test_jsinterp.py		patch \| blob \| history
test/test_subtitles.py		patch \| blob \| history
test/test_utils.py		patch \| blob \| history
youtube_dl/YoutubeDL.py		patch \| blob \| history
youtube_dl/__init__.py		patch \| blob \| history
youtube_dl/compat.py		patch \| blob \| history
youtube_dl/downloader/common.py		patch \| blob \| history
youtube_dl/downloader/dash.py		patch \| blob \| history
youtube_dl/downloader/f4m.py		patch \| blob \| history
youtube_dl/downloader/hls.py		patch \| blob \| history
youtube_dl/downloader/http.py		patch \| blob \| history
youtube_dl/downloader/rtmp.py		patch \| blob \| history
youtube_dl/extractor/__init__.py		patch \| blob \| history
youtube_dl/extractor/aljazeera.py		patch \| blob \| history
youtube_dl/extractor/ard.py		patch \| blob \| history
youtube_dl/extractor/atresplayer.py		patch \| blob \| history
youtube_dl/extractor/audimedia.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/bambuser.py		patch \| blob \| history
youtube_dl/extractor/bbc.py		patch \| blob \| history
youtube_dl/extractor/beeg.py		patch \| blob \| history
youtube_dl/extractor/bilibili.py		patch \| blob \| history
youtube_dl/extractor/bliptv.py		patch \| blob \| history
youtube_dl/extractor/bloomberg.py		patch \| blob \| history
youtube_dl/extractor/brightcove.py		patch \| blob \| history
youtube_dl/extractor/cbs.py		patch \| blob \| history
youtube_dl/extractor/cbsnews.py		patch \| blob \| history
youtube_dl/extractor/ceskatelevize.py		patch \| blob \| history
youtube_dl/extractor/cmt.py		patch \| blob \| history
youtube_dl/extractor/collegerama.py		patch \| blob \| history
youtube_dl/extractor/common.py		patch \| blob \| history
youtube_dl/extractor/crunchyroll.py		patch \| blob \| history
youtube_dl/extractor/cspan.py		patch \| blob \| history
youtube_dl/extractor/dailymotion.py		patch \| blob \| history
youtube_dl/extractor/dbtv.py		patch \| blob \| history
youtube_dl/extractor/dcn.py		patch \| blob \| history
youtube_dl/extractor/democracynow.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/dplay.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/dramafever.py		patch \| blob \| history
youtube_dl/extractor/dumpert.py		patch \| blob \| history
youtube_dl/extractor/eitb.py		patch \| blob \| history
youtube_dl/extractor/escapist.py		patch \| blob \| history
youtube_dl/extractor/everyonesmixtape.py		patch \| blob \| history
youtube_dl/extractor/extremetube.py		patch \| blob \| history
youtube_dl/extractor/facebook.py		patch \| blob \| history
youtube_dl/extractor/fc2.py		patch \| blob \| history
youtube_dl/extractor/flickr.py		patch \| blob \| history
youtube_dl/extractor/fourtube.py		patch \| blob \| history
youtube_dl/extractor/funnyordie.py		patch \| blob \| history
youtube_dl/extractor/gametrailers.py		patch \| blob \| history
youtube_dl/extractor/gdcvault.py		patch \| blob \| history
youtube_dl/extractor/generic.py		patch \| blob \| history
youtube_dl/extractor/globo.py		patch \| blob \| history
youtube_dl/extractor/googleplus.py		patch \| blob \| history
youtube_dl/extractor/gorillavid.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/hearthisat.py		patch \| blob \| history
youtube_dl/extractor/hotnewhiphop.py		patch \| blob \| history
youtube_dl/extractor/hypem.py		patch \| blob \| history
youtube_dl/extractor/instagram.py		patch \| blob \| history
youtube_dl/extractor/iprima.py		patch \| blob \| history
youtube_dl/extractor/ivi.py		patch \| blob \| history
youtube_dl/extractor/kaltura.py		patch \| blob \| history
youtube_dl/extractor/keezmovies.py		patch \| blob \| history
youtube_dl/extractor/letv.py		patch \| blob \| history
youtube_dl/extractor/lynda.py		patch \| blob \| history
youtube_dl/extractor/mdr.py		patch \| blob \| history
youtube_dl/extractor/metacafe.py		patch \| blob \| history
youtube_dl/extractor/minhateca.py		patch \| blob \| history
youtube_dl/extractor/miomio.py		patch \| blob \| history
youtube_dl/extractor/mit.py		patch \| blob \| history
youtube_dl/extractor/mitele.py		patch \| blob \| history
youtube_dl/extractor/moevideo.py		patch \| blob \| history
youtube_dl/extractor/mofosex.py		patch \| blob \| history
youtube_dl/extractor/moniker.py		patch \| blob \| history
youtube_dl/extractor/mooshare.py		patch \| blob \| history
youtube_dl/extractor/movieclips.py		patch \| blob \| history
youtube_dl/extractor/mtv.py		patch \| blob \| history
youtube_dl/extractor/myvideo.py		patch \| blob \| history
youtube_dl/extractor/nba.py		patch \| blob \| history
youtube_dl/extractor/ndr.py		patch \| blob \| history
youtube_dl/extractor/neteasemusic.py		patch \| blob \| history
youtube_dl/extractor/nfb.py		patch \| blob \| history
youtube_dl/extractor/niconico.py		patch \| blob \| history
youtube_dl/extractor/noco.py		patch \| blob \| history
youtube_dl/extractor/nosvideo.py		patch \| blob \| history
youtube_dl/extractor/novamov.py		patch \| blob \| history
youtube_dl/extractor/nowness.py		patch \| blob \| history
youtube_dl/extractor/nowtv.py		patch \| blob \| history
youtube_dl/extractor/nowvideo.py		patch \| blob \| history
youtube_dl/extractor/nrk.py		patch \| blob \| history
youtube_dl/extractor/nuvid.py		patch \| blob \| history
youtube_dl/extractor/patreon.py		patch \| blob \| history
youtube_dl/extractor/pbs.py		patch \| blob \| history
youtube_dl/extractor/periscope.py		patch \| blob \| history
youtube_dl/extractor/played.py		patch \| blob \| history
youtube_dl/extractor/pluralsight.py		patch \| blob \| history
youtube_dl/extractor/pornhd.py		patch \| blob \| history
youtube_dl/extractor/pornhub.py		patch \| blob \| history
youtube_dl/extractor/pornotube.py		patch \| blob \| history
youtube_dl/extractor/primesharetv.py		patch \| blob \| history
youtube_dl/extractor/promptfile.py		patch \| blob \| history
youtube_dl/extractor/prosiebensat1.py		patch \| blob \| history
youtube_dl/extractor/qqmusic.py		patch \| blob \| history
youtube_dl/extractor/rtbf.py		patch \| blob \| history
youtube_dl/extractor/rtve.py		patch \| blob \| history
youtube_dl/extractor/rutube.py		patch \| blob \| history
youtube_dl/extractor/ruutu.py		patch \| blob \| history
youtube_dl/extractor/safari.py		patch \| blob \| history
youtube_dl/extractor/sandia.py		patch \| blob \| history
youtube_dl/extractor/shared.py		patch \| blob \| history
youtube_dl/extractor/sharesix.py		patch \| blob \| history
youtube_dl/extractor/sina.py		patch \| blob \| history
youtube_dl/extractor/skynewsarabia.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/smotri.py		patch \| blob \| history
youtube_dl/extractor/sohu.py		patch \| blob \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| history
youtube_dl/extractor/space.py		patch \| blob \| history
youtube_dl/extractor/spankwire.py		patch \| blob \| history
youtube_dl/extractor/spiegel.py		patch \| blob \| history
youtube_dl/extractor/sportdeutschland.py		patch \| blob \| history
youtube_dl/extractor/streamcloud.py		patch \| blob \| history
youtube_dl/extractor/streamcz.py		patch \| blob \| history
youtube_dl/extractor/tapely.py		patch \| blob \| history
youtube_dl/extractor/theplatform.py		patch \| blob \| history
youtube_dl/extractor/tlc.py		patch \| blob \| history
youtube_dl/extractor/trilulilu.py		patch \| blob \| history
youtube_dl/extractor/tube8.py		patch \| blob \| history
youtube_dl/extractor/tubitv.py		patch \| blob \| history
youtube_dl/extractor/twitch.py		patch \| blob \| history
youtube_dl/extractor/twitter.py		patch \| blob \| history
youtube_dl/extractor/udemy.py		patch \| blob \| history
youtube_dl/extractor/udn.py		patch \| blob \| history
youtube_dl/extractor/vbox7.py		patch \| blob \| history
youtube_dl/extractor/veoh.py		patch \| blob \| history
youtube_dl/extractor/vessel.py		patch \| blob \| history
youtube_dl/extractor/vevo.py		patch \| blob \| history
youtube_dl/extractor/viddler.py		patch \| blob \| history
youtube_dl/extractor/videofyme.py		patch \| blob \| history
youtube_dl/extractor/videolecturesnet.py	[deleted file]	patch \| blob \| history
youtube_dl/extractor/videomega.py		patch \| blob \| history
youtube_dl/extractor/vidzi.py		patch \| blob \| history
youtube_dl/extractor/viewster.py		patch \| blob \| history
youtube_dl/extractor/viidea.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/viki.py		patch \| blob \| history
youtube_dl/extractor/vimeo.py		patch \| blob \| history
youtube_dl/extractor/vk.py		patch \| blob \| history
youtube_dl/extractor/vodlocker.py		patch \| blob \| history
youtube_dl/extractor/voicerepublic.py		patch \| blob \| history
youtube_dl/extractor/wistia.py		patch \| blob \| history
youtube_dl/extractor/wsj.py		patch \| blob \| history
youtube_dl/extractor/xfileshare.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/xtube.py		patch \| blob \| history
youtube_dl/extractor/xvideos.py		patch \| blob \| history
youtube_dl/extractor/yandexmusic.py		patch \| blob \| history
youtube_dl/extractor/youku.py		patch \| blob \| history
youtube_dl/extractor/youporn.py		patch \| blob \| history
youtube_dl/extractor/youtube.py		patch \| blob \| history
youtube_dl/jsinterp.py		patch \| blob \| history
youtube_dl/options.py		patch \| blob \| history
youtube_dl/postprocessor/ffmpeg.py		patch \| blob \| history
youtube_dl/update.py		patch \| blob \| history
youtube_dl/utils.py		patch \| blob \| history
youtube_dl/version.py		patch \| blob \| history