Merge branch 'akamai_pv' of https://github.com/remitamine/youtube-dl into remitamine...

author Yen Chi Hsuan <yan12125@gmail.com>

Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)

committer Yen Chi Hsuan <yan12125@gmail.com>

Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)
author Yen Chi Hsuan <yan12125@gmail.com>
Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)
committer Yen Chi Hsuan <yan12125@gmail.com>
Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md

new file mode 100644 (file)

index 0000000..c208eb6
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,58 @@
+## Please follow the guide below
+
+- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
+- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
+- Use *Preview* tab to see how your issue will actually look like
+
+---
+
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24**
+
+### Before submitting an *issue* make sure you have:
+- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
+- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
+
+### What is the purpose of your *issue*?
+- [ ] Bug report (encountered problems with youtube-dl)
+- [ ] Site support request (request for adding support for a new site)
+- [ ] Feature request (request for a new functionality)
+- [ ] Question
+- [ ] Other
+
+---
+
+### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
+
+---
+
+### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
+
+Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
+```
+$ youtube-dl -v <your command line>
+[debug] System config: []
+[debug] User config: []
+[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
+[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
+[debug] youtube-dl version 2016.04.24
+[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
+[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
+[debug] Proxy map: {}
+...
+<end of log>
+```
+
+---
+
+### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
+- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
+- Single video: https://youtu.be/BaW_jenozKc
+- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
+
+---
+
+### Description of your *issue*, suggested solution and other information
+
+Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
+If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
diff --git a/.github/ISSUE_TEMPLATE_tmpl.md b/.github/ISSUE_TEMPLATE_tmpl.md

new file mode 100644 (file)

index 0000000..a5e6a42
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE_tmpl.md
@@ -0,0 +1,58 @@
+## Please follow the guide below
+
+- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
+- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
+- Use *Preview* tab to see how your issue will actually look like
+
+---
+
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
+
+### Before submitting an *issue* make sure you have:
+- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
+- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
+
+### What is the purpose of your *issue*?
+- [ ] Bug report (encountered problems with youtube-dl)
+- [ ] Site support request (request for adding support for a new site)
+- [ ] Feature request (request for a new functionality)
+- [ ] Question
+- [ ] Other
+
+---
+
+### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
+
+---
+
+### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
+
+Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
+```
+$ youtube-dl -v <your command line>
+[debug] System config: []
+[debug] User config: []
+[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
+[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
+[debug] youtube-dl version %(version)s
+[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
+[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
+[debug] Proxy map: {}
+...
+<end of log>
+```
+
+---
+
+### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
+- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
+- Single video: https://youtu.be/BaW_jenozKc
+- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
+
+---
+
+### Description of your *issue*, suggested solution and other information
+
+Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
+If work on your *issue* required an account credentials please provide them or explain how one can obtain them.
diff --git a/.gitignore b/.gitignore

index 0422adf4456ec35166f5d2bce6b832c67601c2dc..72c10425d675f7c1952061be0057db0c2e5e232d 100644 (file)
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,6 @@
  *.pyc
  *.pyo
+*.class
  *~
  *.DS_Store
  wine-py2exe/
@@ -12,6 +13,7 @@ README.txt
  youtube-dl.1
  youtube-dl.bash-completion
  youtube-dl.fish
+youtube_dl/extractor/lazy_extractors.py
  youtube-dl
  youtube-dl.exe
  youtube-dl.tar.gz
@@ -32,4 +34,4 @@ test/testdata
  .tox
  youtube-dl.zsh
  .idea
-.idea/*
-\ No newline at end of file
+.idea/*
diff --git a/.travis.yml b/.travis.yml

index 511bee64cdb8398640a6aa1f4159f5d1f5ce0d3e..cc21fae8f41ca567a2367d3515f981d8ec0af759 100644 (file)
--- a/.travis.yml
+++ b/.travis.yml
@@ -5,9 +5,8 @@ python:
    - "3.2"
    - "3.3"
    - "3.4"
-before_install:
-  - sudo apt-get update -qq
-  - sudo apt-get install -yqq rtmpdump
+  - "3.5"
+sudo: false
  script: nosetests test --verbose
  notifications:
    email:
diff --git a/AUTHORS b/AUTHORS

index aa6b88cc0bc625aab169db6f587ca4cf9275377c..07cade723be12afdbbf60485d9dbc2890d6c0f32 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -136,3 +136,35 @@ sceext
  Zach Bruggeman
  Tjark Saul
  slangangular
+Behrouz Abbasi
+ngld
+nyuszika7h
+Shaun Walbridge
+Lee Jenkins
+Anssi Hannula
+Lukáš Lalinský
+Qijiang Fan
+Rémy Léone
+Marco Ferragina
+reiv
+Muratcan Simsek
+Evan Lu
+flatgreen
+Brian Foley
+Vignesh Venkat
+Tom Gijselinck
+Founder Fang
+Andrew Alexeyew
+Saso Bezlaj
+Erwin de Haan
+Jens Wille
+Robin Houtevelts
+Patrick Griffis
+Aidan Rowe
+mutantmonkey
+Ben Congdon
+Kacper Michajłow
+José Joaquín Atria
+Viťas Strádal
+Kagami Hiiragi
+Philip Huppert
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index 588b15bde7a3ba367c17d1fb3819a2070aeea9f0..c83b8655a595d9d040ef09c2c11c07d51f3f7d29 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,6 +1,20 @@
-**Please include the full output of youtube-dl when run with `-v`**.
-
-The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
+**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
+```
+$ youtube-dl -v <your command line>
+[debug] System config: []
+[debug] User config: []
+[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
+[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
+[debug] youtube-dl version 2015.12.06
+[debug] Git HEAD: 135392e
+[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
+[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
+[debug] Proxy map: {}
+...
+```
+**Do not post screenshots of verbose log only plain text is acceptable.**
+
+The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
  
  Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
  
@@ -14,21 +28,21 @@ So please elaborate on what feature you are requesting, or what bug you want to
  - How it could be fixed
  - How your proposed solution would look like
  
-If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
+If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
  
-For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
+For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
  
-If your server has multiple IPs or you suspect censorship, adding --call-home may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
+If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
  
-**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
+**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
  
  ###  Are you using the latest version?
  
-Before reporting any issue, type youtube-dl -U. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
+Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
  
  ###  Is the issue already documented?
  
-Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
+Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/rg3/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
  
  ###  Why are existing options not enough?
  
@@ -71,14 +85,16 @@ To run the test, simply invoke your favorite test runner, or execute a test file
  If you want to create a build of youtube-dl yourself, you'll need
  
  * python
-* make
+* make (both GNU make and BSD make are supported)
  * pandoc
  * zip
  * nosetests
  
  ### Adding support for a new site
  
-If you want to add support for a new site, you can follow this quick list (assuming your service is called `yourextractor`):
+If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
+
+After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
  2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
@@ -114,27 +130,29 @@ If you want to add support for a new site, you can follow this quick list (assum
              webpage = self._download_webpage(url, video_id)
  
              # TODO more code goes here, for example ...
-            title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
+            title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
  
              return {
                  'id': video_id,
                  'title': title,
                  'description': self._og_search_description(webpage),
+                'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
                  # TODO more properties (see youtube_dl/extractor/common.py)
              }
      ```
-5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
-6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
-7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
-8. If you can, check the code with [flake8](https://pypi.python.org/pypi/flake8).
-9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
-
-        $ git add youtube_dl/extractor/__init__.py
+5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
+6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
+8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
+9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
+10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
+
+        $ git add youtube_dl/extractor/extractors.py
          $ git add youtube_dl/extractor/yourextractor.py
          $ git commit -m '[yourextractor] Add new extractor'
          $ git push origin yourextractor
  
-10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
+11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
  
  In any case, thank you very much for your contributions!
  
diff --git a/Makefile b/Makefile

index fdb1abb60cacfe49295a7438e3d0f4f51c248359..06cffcb710c6fd8fa6962007bd07d4753d5d5af6 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,9 @@
  all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
  
  clean:
-       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
+       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
         find . -name "*.pyc" -delete
+       find . -name "*.class" -delete
  
  PREFIX ?= /usr/local
  BINDIR ?= $(PREFIX)/bin
@@ -11,15 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
  PYTHON ?= /usr/bin/env python
  
  # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
-ifeq ($(PREFIX),/usr)
-       SYSCONFDIR=/etc
-else
-       ifeq ($(PREFIX),/usr/local)
-               SYSCONFDIR=/etc
-       else
-               SYSCONFDIR=$(PREFIX)/etc
-       endif
-endif
+SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
  
  install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
         install -d $(DESTDIR)$(BINDIR)
@@ -44,7 +37,7 @@ test:
  ot: offlinetest
  
  offlinetest: codetest
-       nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py
+       $(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
  
  tar: youtube-dl.tar.gz
  
@@ -61,37 +54,46 @@ youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
         chmod a+x youtube-dl
  
  README.md: youtube_dl/*.py youtube_dl/*/*.py
-       COLUMNS=80 python youtube_dl/__main__.py --help | python devscripts/make_readme.py
+       COLUMNS=80 $(PYTHON) youtube_dl/__main__.py --help | $(PYTHON) devscripts/make_readme.py
  
  CONTRIBUTING.md: README.md
-       python devscripts/make_contributing.py README.md CONTRIBUTING.md
+       $(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
+
+.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md  youtube_dl/version.py
+       $(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
  
  supportedsites:
-       python devscripts/make_supportedsites.py docs/supportedsites.md
+       $(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
  
  README.txt: README.md
         pandoc -f markdown -t plain README.md -o README.txt
  
  youtube-dl.1: README.md
-       python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
+       $(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md
         pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
         rm -f youtube-dl.1.temp.md
  
  youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
-       python devscripts/bash-completion.py
+       $(PYTHON) devscripts/bash-completion.py
  
  bash-completion: youtube-dl.bash-completion
  
  youtube-dl.zsh: youtube_dl/*.py youtube_dl/*/*.py devscripts/zsh-completion.in
-       python devscripts/zsh-completion.py
+       $(PYTHON) devscripts/zsh-completion.py
  
  zsh-completion: youtube-dl.zsh
  
  youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
-       python devscripts/fish-completion.py
+       $(PYTHON) devscripts/fish-completion.py
  
  fish-completion: youtube-dl.fish
  
+lazy-extractors: youtube_dl/extractor/lazy_extractors.py
+
+_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
+youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
+       $(PYTHON) devscripts/make_lazy_extractors.py $@
+
  youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
         @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
                 --exclude '*.DS_Store' \
diff --git a/README.md b/README.md

index ac54d7b67b8c36d370495759153678f711ac614e..e062444b3e718b8d3ad8cf7517a4a4ffe721bd6a 100644 (file)
--- a/README.md
+++ b/README.md
@@ -9,6 +9,7 @@ youtube-dl - download videos from youtube.com or other video platforms
  - [VIDEO SELECTION](#video-selection)
  - [FAQ](#faq)
  - [DEVELOPER INSTRUCTIONS](#developer-instructions)
+- [EMBEDDING YOUTUBE-DL](#embedding-youtube-dl)
  - [BUGS](#bugs)
  - [COPYRIGHT](#copyright)
  
@@ -34,7 +35,7 @@ You can also use pip:
  
      sudo pip install youtube-dl
  
-Alternatively, refer to the developer instructions below for how to check out and work with the git repository. For further options, including PGP signatures, see https://rg3.github.io/youtube-dl/download.html .
+Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
  
  # DESCRIPTION
  **youtube-dl** is a small command-line program to download videos from
@@ -48,110 +49,230 @@ which means you can modify it, redistribute it or use it however you like.
  # OPTIONS
      -h, --help                       Print this help text and exit
      --version                        Print program version and exit
-    -U, --update                     Update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)
-    -i, --ignore-errors              Continue on download errors, for example to skip unavailable videos in a playlist
-    --abort-on-error                 Abort downloading of further videos (in the playlist or the command line) if an error occurs
+    -U, --update                     Update this program to latest version. Make
+                                     sure that you have sufficient permissions
+                                     (run with sudo if needed)
+    -i, --ignore-errors              Continue on download errors, for example to
+                                     skip unavailable videos in a playlist
+    --abort-on-error                 Abort downloading of further videos (in the
+                                     playlist or the command line) if an error
+                                     occurs
      --dump-user-agent                Display the current browser identification
      --list-extractors                List all supported extractors
-    --extractor-descriptions         Output descriptions of all supported extractors
-    --force-generic-extractor        Force extraction to use the generic extractor
-    --default-search PREFIX          Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple".
-                                     Use the value "auto" to let youtube-dl guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The
-                                     default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.
-    --ignore-config                  Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: Do not read the user configuration
-                                     in ~/.config/youtube-dl/config (%APPDATA%/youtube-dl/config.txt on Windows)
-    --flat-playlist                  Do not extract the videos of a playlist, only list them.
+    --extractor-descriptions         Output descriptions of all supported
+                                     extractors
+    --force-generic-extractor        Force extraction to use the generic
+                                     extractor
+    --default-search PREFIX          Use this prefix for unqualified URLs. For
+                                     example "gvsearch2:" downloads two videos
+                                     from google videos for youtube-dl "large
+                                     apple". Use the value "auto" to let
+                                     youtube-dl guess ("auto_warning" to emit a
+                                     warning when guessing). "error" just throws
+                                     an error. The default value "fixup_error"
+                                     repairs broken URLs, but emits an error if
+                                     this is not possible instead of searching.
+    --ignore-config                  Do not read configuration files. When given
+                                     in the global configuration file /etc
+                                     /youtube-dl.conf: Do not read the user
+                                     configuration in ~/.config/youtube-
+                                     dl/config (%APPDATA%/youtube-dl/config.txt
+                                     on Windows)
+    --flat-playlist                  Do not extract the videos of a playlist,
+                                     only list them.
+    --mark-watched                   Mark videos watched (YouTube only)
+    --no-mark-watched                Do not mark videos watched (YouTube only)
      --no-color                       Do not emit color codes in output
  
  ## Network Options:
-    --proxy URL                      Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection
+    --proxy URL                      Use the specified HTTP/HTTPS proxy. Pass in
+                                     an empty string (--proxy "") for direct
+                                     connection
      --socket-timeout SECONDS         Time to wait before giving up, in seconds
-    --source-address IP              Client-side IP address to bind to (experimental)
-    -4, --force-ipv4                 Make all connections via IPv4 (experimental)
-    -6, --force-ipv6                 Make all connections via IPv6 (experimental)
-    --cn-verification-proxy URL      Use this proxy to verify the IP address for some Chinese sites. The default proxy specified by --proxy (or none, if the options is
-                                     not present) is used for the actual downloading. (experimental)
+    --source-address IP              Client-side IP address to bind to
+                                     (experimental)
+    -4, --force-ipv4                 Make all connections via IPv4
+                                     (experimental)
+    -6, --force-ipv6                 Make all connections via IPv6
+                                     (experimental)
+    --cn-verification-proxy URL      Use this proxy to verify the IP address for
+                                     some Chinese sites. The default proxy
+                                     specified by --proxy (or none, if the
+                                     options is not present) is used for the
+                                     actual downloading. (experimental)
  
  ## Video Selection:
      --playlist-start NUMBER          Playlist video to start at (default is 1)
      --playlist-end NUMBER            Playlist video to end at (default is last)
-    --playlist-items ITEM_SPEC       Playlist video items to download. Specify indices of the videos in the playlist separated by commas like: "--playlist-items 1,2,5,8"
-                                     if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: "--playlist-items 1-3,7,10-13", it will
-                                     download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13.
-    --match-title REGEX              Download only matching titles (regex or caseless sub-string)
-    --reject-title REGEX             Skip download for matching titles (regex or caseless sub-string)
+    --playlist-items ITEM_SPEC       Playlist video items to download. Specify
+                                     indices of the videos in the playlist
+                                     separated by commas like: "--playlist-items
+                                     1,2,5,8" if you want to download videos
+                                     indexed 1, 2, 5, 8 in the playlist. You can
+                                     specify range: "--playlist-items
+                                     1-3,7,10-13", it will download the videos
+                                     at index 1, 2, 3, 7, 10, 11, 12 and 13.
+    --match-title REGEX              Download only matching titles (regex or
+                                     caseless sub-string)
+    --reject-title REGEX             Skip download for matching titles (regex or
+                                     caseless sub-string)
      --max-downloads NUMBER           Abort after downloading NUMBER files
-    --min-filesize SIZE              Do not download any videos smaller than SIZE (e.g. 50k or 44.6m)
-    --max-filesize SIZE              Do not download any videos larger than SIZE (e.g. 50k or 44.6m)
+    --min-filesize SIZE              Do not download any videos smaller than
+                                     SIZE (e.g. 50k or 44.6m)
+    --max-filesize SIZE              Do not download any videos larger than SIZE
+                                     (e.g. 50k or 44.6m)
      --date DATE                      Download only videos uploaded in this date
-    --datebefore DATE                Download only videos uploaded on or before this date (i.e. inclusive)
-    --dateafter DATE                 Download only videos uploaded on or after this date (i.e. inclusive)
-    --min-views COUNT                Do not download any videos with less than COUNT views
-    --max-views COUNT                Do not download any videos with more than COUNT views
-    --match-filter FILTER            Generic video filter (experimental). Specify any key (see help for -o for a list of available keys) to match if the key is present,
-                                     !key to check if the key is not present,key > NUMBER (like "comment_count > 12", also works with >=, <, <=, !=, =) to compare against
-                                     a number, and & to require multiple matches. Values which are not known are excluded unless you put a question mark (?) after the
-                                     operator.For example, to only match videos that have been liked more than 100 times and disliked less than 50 times (or the dislike
-                                     functionality is not available at the given service), but who also have a description, use  --match-filter "like_count > 100 &
+    --datebefore DATE                Download only videos uploaded on or before
+                                     this date (i.e. inclusive)
+    --dateafter DATE                 Download only videos uploaded on or after
+                                     this date (i.e. inclusive)
+    --min-views COUNT                Do not download any videos with less than
+                                     COUNT views
+    --max-views COUNT                Do not download any videos with more than
+                                     COUNT views
+    --match-filter FILTER            Generic video filter (experimental).
+                                     Specify any key (see help for -o for a list
+                                     of available keys) to match if the key is
+                                     present, !key to check if the key is not
+                                     present,key > NUMBER (like "comment_count >
+                                     12", also works with >=, <, <=, !=, =) to
+                                     compare against a number, and & to require
+                                     multiple matches. Values which are not
+                                     known are excluded unless you put a
+                                     question mark (?) after the operator.For
+                                     example, to only match videos that have
+                                     been liked more than 100 times and disliked
+                                     less than 50 times (or the dislike
+                                     functionality is not available at the given
+                                     service), but who also have a description,
+                                     use --match-filter "like_count > 100 &
                                       dislike_count <? 50 & description" .
-    --no-playlist                    Download only the video, if the URL refers to a video and a playlist.
-    --yes-playlist                   Download the playlist, if the URL refers to a video and a playlist.
-    --age-limit YEARS                Download only videos suitable for the given age
-    --download-archive FILE          Download only videos not listed in the archive file. Record the IDs of all downloaded videos in it.
-    --include-ads                    Download advertisements as well (experimental)
+    --no-playlist                    Download only the video, if the URL refers
+                                     to a video and a playlist.
+    --yes-playlist                   Download the playlist, if the URL refers to
+                                     a video and a playlist.
+    --age-limit YEARS                Download only videos suitable for the given
+                                     age
+    --download-archive FILE          Download only videos not listed in the
+                                     archive file. Record the IDs of all
+                                     downloaded videos in it.
+    --include-ads                    Download advertisements as well
+                                     (experimental)
  
  ## Download Options:
-    -r, --rate-limit LIMIT           Maximum download rate in bytes per second (e.g. 50K or 4.2M)
-    -R, --retries RETRIES            Number of retries (default is 10), or "infinite".
-    --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K) (default is 1024)
-    --no-resize-buffer               Do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.
+    -r, --rate-limit LIMIT           Maximum download rate in bytes per second
+                                     (e.g. 50K or 4.2M)
+    -R, --retries RETRIES            Number of retries (default is 10), or
+                                     "infinite".
+    --fragment-retries RETRIES       Number of retries for a fragment (default
+                                     is 10), or "infinite" (DASH only)
+    --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K)
+                                     (default is 1024)
+    --no-resize-buffer               Do not automatically adjust the buffer
+                                     size. By default, the buffer size is
+                                     automatically resized from an initial value
+                                     of SIZE.
      --playlist-reverse               Download playlist videos in reverse order
-    --xattr-set-filesize             Set file xattribute ytdl.filesize with expected filesize (experimental)
-    --hls-prefer-native              Use the native HLS downloader instead of ffmpeg (experimental)
-    --external-downloader COMMAND    Use the specified external downloader. Currently supports aria2c,curl,httpie,wget
-    --external-downloader-args ARGS  Give these arguments to the external downloader
+    --xattr-set-filesize             Set file xattribute ytdl.filesize with
+                                     expected filesize (experimental)
+    --hls-prefer-native              Use the native HLS downloader instead of
+                                     ffmpeg
+    --hls-prefer-ffmpeg              Use ffmpeg instead of the native HLS
+                                     downloader
+    --hls-use-mpegts                 Use the mpegts container for HLS videos,
+                                     allowing to play the video while
+                                     downloading (some players may not be able
+                                     to play it)
+    --external-downloader COMMAND    Use the specified external downloader.
+                                     Currently supports
+                                     aria2c,avconv,axel,curl,ffmpeg,httpie,wget
+    --external-downloader-args ARGS  Give these arguments to the external
+                                     downloader
  
  ## Filesystem Options:
-    -a, --batch-file FILE            File containing URLs to download ('-' for stdin)
+    -a, --batch-file FILE            File containing URLs to download ('-' for
+                                     stdin)
      --id                             Use only video ID in file name
-    -o, --output TEMPLATE            Output filename template. Use %(title)s to get the title, %(uploader)s for the uploader name, %(uploader_id)s for the uploader
-                                     nickname if different, %(autonumber)s to get an automatically incremented number, %(ext)s for the filename extension, %(format)s for
-                                     the format description (like "22 - 1280x720" or "HD"), %(format_id)s for the unique id of the format (like YouTube's itags: "137"),
-                                     %(upload_date)s for the upload date (YYYYMMDD), %(extractor)s for the provider (youtube, metacafe, etc), %(id)s for the video id,
-                                     %(playlist_title)s, %(playlist_id)s, or %(playlist)s (=title if present, ID otherwise) for the playlist the video is in,
-                                     %(playlist_index)s for the position in the playlist. %(height)s and %(width)s for the width and height of the video format.
-                                     %(resolution)s for a textual description of the resolution of the video format. %% for a literal percent. Use - to output to stdout.
-                                     Can also be used to download to a different directory, for example with -o '/my/downloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
-    --autonumber-size NUMBER         Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given
-    --restrict-filenames             Restrict filenames to only ASCII characters, and avoid "&" and spaces in filenames
-    -A, --auto-number                [deprecated; use  -o "%(autonumber)s-%(title)s.%(ext)s" ] Number downloaded files starting from 00000
-    -t, --title                      [deprecated] Use title in file name (default)
+    -o, --output TEMPLATE            Output filename template. Use %(title)s to
+                                     get the title, %(uploader)s for the
+                                     uploader name, %(uploader_id)s for the
+                                     uploader nickname if different,
+                                     %(autonumber)s to get an automatically
+                                     incremented number, %(ext)s for the
+                                     filename extension, %(format)s for the
+                                     format description (like "22 - 1280x720" or
+                                     "HD"), %(format_id)s for the unique id of
+                                     the format (like YouTube's itags: "137"),
+                                     %(upload_date)s for the upload date
+                                     (YYYYMMDD), %(extractor)s for the provider
+                                     (youtube, metacafe, etc), %(id)s for the
+                                     video id, %(playlist_title)s,
+                                     %(playlist_id)s, or %(playlist)s (=title if
+                                     present, ID otherwise) for the playlist the
+                                     video is in, %(playlist_index)s for the
+                                     position in the playlist. %(height)s and
+                                     %(width)s for the width and height of the
+                                     video format. %(resolution)s for a textual
+                                     description of the resolution of the video
+                                     format. %% for a literal percent. Use - to
+                                     output to stdout. Can also be used to
+                                     download to a different directory, for
+                                     example with -o '/my/downloads/%(uploader)s
+                                     /%(title)s-%(id)s.%(ext)s' .
+    --autonumber-size NUMBER         Specify the number of digits in
+                                     %(autonumber)s when it is present in output
+                                     filename template or --auto-number option
+                                     is given
+    --restrict-filenames             Restrict filenames to only ASCII
+                                     characters, and avoid "&" and spaces in
+                                     filenames
+    -A, --auto-number                [deprecated; use -o
+                                     "%(autonumber)s-%(title)s.%(ext)s" ] Number
+                                     downloaded files starting from 00000
+    -t, --title                      [deprecated] Use title in file name
+                                     (default)
      -l, --literal                    [deprecated] Alias of --title
      -w, --no-overwrites              Do not overwrite files
-    -c, --continue                   Force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.
-    --no-continue                    Do not resume partially downloaded files (restart from beginning)
-    --no-part                        Do not use .part files - write directly into output file
-    --no-mtime                       Do not use the Last-modified header to set the file modification time
-    --write-description              Write video description to a .description file
+    -c, --continue                   Force resume of partially downloaded files.
+                                     By default, youtube-dl will resume
+                                     downloads if possible.
+    --no-continue                    Do not resume partially downloaded files
+                                     (restart from beginning)
+    --no-part                        Do not use .part files - write directly
+                                     into output file
+    --no-mtime                       Do not use the Last-modified header to set
+                                     the file modification time
+    --write-description              Write video description to a .description
+                                     file
      --write-info-json                Write video metadata to a .info.json file
-    --write-annotations              Write video annotations to a .annotations.xml file
-    --load-info FILE                 JSON file containing the video information (created with the "--write-info-json" option)
-    --cookies FILE                   File to read cookies from and dump cookie jar in
-    --cache-dir DIR                  Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl
-                                     or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may
-                                     change.
+    --write-annotations              Write video annotations to a
+                                     .annotations.xml file
+    --load-info FILE                 JSON file containing the video information
+                                     (created with the "--write-info-json"
+                                     option)
+    --cookies FILE                   File to read cookies from and dump cookie
+                                     jar in
+    --cache-dir DIR                  Location in the filesystem where youtube-dl
+                                     can store some downloaded information
+                                     permanently. By default $XDG_CACHE_HOME
+                                     /youtube-dl or ~/.cache/youtube-dl . At the
+                                     moment, only YouTube player files (for
+                                     videos with obfuscated signatures) are
+                                     cached, but that may change.
      --no-cache-dir                   Disable filesystem caching
      --rm-cache-dir                   Delete all filesystem cache files
  
  ## Thumbnail images:
      --write-thumbnail                Write thumbnail image to disk
      --write-all-thumbnails           Write all thumbnail image formats to disk
-    --list-thumbnails                Simulate and list all available thumbnail formats
+    --list-thumbnails                Simulate and list all available thumbnail
+                                     formats
  
  ## Verbosity / Simulation Options:
      -q, --quiet                      Activate quiet mode
      --no-warnings                    Ignore warnings
-    -s, --simulate                   Do not download the video and do not write anything to disk
+    -s, --simulate                   Do not download the video and do not write
+                                     anything to disk
      --skip-download                  Do not download the video
      -g, --get-url                    Simulate, quiet but print URL
      -e, --get-title                  Simulate, quiet but print title
@@ -161,86 +282,156 @@ which means you can modify it, redistribute it or use it however you like.
      --get-duration                   Simulate, quiet but print video length
      --get-filename                   Simulate, quiet but print output filename
      --get-format                     Simulate, quiet but print output format
-    -j, --dump-json                  Simulate, quiet but print JSON information. See --output for a description of available keys.
-    -J, --dump-single-json           Simulate, quiet but print JSON information for each command-line argument. If the URL refers to a playlist, dump the whole playlist
-                                     information in a single line.
-    --print-json                     Be quiet and print the video information as JSON (video is still being downloaded).
+    -j, --dump-json                  Simulate, quiet but print JSON information.
+                                     See --output for a description of available
+                                     keys.
+    -J, --dump-single-json           Simulate, quiet but print JSON information
+                                     for each command-line argument. If the URL
+                                     refers to a playlist, dump the whole
+                                     playlist information in a single line.
+    --print-json                     Be quiet and print the video information as
+                                     JSON (video is still being downloaded).
      --newline                        Output progress bar as new lines
      --no-progress                    Do not print progress bar
      --console-title                  Display progress in console titlebar
      -v, --verbose                    Print various debugging information
-    --dump-pages                     Print downloaded pages encoded using base64 to debug problems (very verbose)
-    --write-pages                    Write downloaded intermediary pages to files in the current directory to debug problems
+    --dump-pages                     Print downloaded pages encoded using base64
+                                     to debug problems (very verbose)
+    --write-pages                    Write downloaded intermediary pages to
+                                     files in the current directory to debug
+                                     problems
      --print-traffic                  Display sent and read HTTP traffic
      -C, --call-home                  Contact the youtube-dl server for debugging
-    --no-call-home                   Do NOT contact the youtube-dl server for debugging
+    --no-call-home                   Do NOT contact the youtube-dl server for
+                                     debugging
  
  ## Workarounds:
      --encoding ENCODING              Force the specified encoding (experimental)
      --no-check-certificate           Suppress HTTPS certificate validation
-    --prefer-insecure                Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)
+    --prefer-insecure                Use an unencrypted connection to retrieve
+                                     information about the video. (Currently
+                                     supported only for YouTube)
      --user-agent UA                  Specify a custom user agent
-    --referer URL                    Specify a custom referer, use if the video access is restricted to one domain
-    --add-header FIELD:VALUE         Specify a custom HTTP header and its value, separated by a colon ':'. You can use this option multiple times
-    --bidi-workaround                Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH
-    --sleep-interval SECONDS         Number of seconds to sleep before each download.
+    --referer URL                    Specify a custom referer, use if the video
+                                     access is restricted to one domain
+    --add-header FIELD:VALUE         Specify a custom HTTP header and its value,
+                                     separated by a colon ':'. You can use this
+                                     option multiple times
+    --bidi-workaround                Work around terminals that lack
+                                     bidirectional text support. Requires bidiv
+                                     or fribidi executable in PATH
+    --sleep-interval SECONDS         Number of seconds to sleep before each
+                                     download.
  
  ## Video Format Options:
-    -f, --format FORMAT              Video format code, see the "FORMAT SELECTION" for all the info
+    -f, --format FORMAT              Video format code, see the "FORMAT
+                                     SELECTION" for all the info
      --all-formats                    Download all available video formats
-    --prefer-free-formats            Prefer free video formats unless a specific one is requested
-    -F, --list-formats               List all available formats
-    --youtube-skip-dash-manifest     Do not download the DASH manifests and related data on YouTube videos
-    --merge-output-format FORMAT     If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no
-                                     merge is required
+    --prefer-free-formats            Prefer free video formats unless a specific
+                                     one is requested
+    -F, --list-formats               List all available formats of requested
+                                     videos
+    --youtube-skip-dash-manifest     Do not download the DASH manifests and
+                                     related data on YouTube videos
+    --merge-output-format FORMAT     If a merge is required (e.g.
+                                     bestvideo+bestaudio), output to given
+                                     container format. One of mkv, mp4, ogg,
+                                     webm, flv. Ignored if no merge is required
  
  ## Subtitle Options:
      --write-sub                      Write subtitle file
-    --write-auto-sub                 Write automatic subtitle file (YouTube only)
-    --all-subs                       Download all the available subtitles of the video
+    --write-auto-sub                 Write automatically generated subtitle file
+                                     (YouTube only)
+    --all-subs                       Download all the available subtitles of the
+                                     video
      --list-subs                      List all available subtitles for the video
-    --sub-format FORMAT              Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"
-    --sub-lang LANGS                 Languages of the subtitles to download (optional) separated by commas, use IETF language tags like 'en,pt'
+    --sub-format FORMAT              Subtitle format, accepts formats
+                                     preference, for example: "srt" or
+                                     "ass/srt/best"
+    --sub-lang LANGS                 Languages of the subtitles to download
+                                     (optional) separated by commas, use --list-
+                                     subs for available language tags
  
  ## Authentication Options:
      -u, --username USERNAME          Login with this account ID
-    -p, --password PASSWORD          Account password. If this option is left out, youtube-dl will ask interactively.
+    -p, --password PASSWORD          Account password. If this option is left
+                                     out, youtube-dl will ask interactively.
      -2, --twofactor TWOFACTOR        Two-factor auth code
      -n, --netrc                      Use .netrc authentication data
-    --video-password PASSWORD        Video password (vimeo, smotri)
+    --video-password PASSWORD        Video password (vimeo, smotri, youku)
  
  ## Post-processing Options:
-    -x, --extract-audio              Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)
-    --audio-format FORMAT            Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "best" by default
-    --audio-quality QUALITY          Specify ffmpeg/avconv audio quality, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default
-                                     5)
-    --recode-video FORMAT            Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv|avi)
+    -x, --extract-audio              Convert video files to audio-only files
+                                     (requires ffmpeg or avconv and ffprobe or
+                                     avprobe)
+    --audio-format FORMAT            Specify audio format: "best", "aac",
+                                     "vorbis", "mp3", "m4a", "opus", or "wav";
+                                     "best" by default
+    --audio-quality QUALITY          Specify ffmpeg/avconv audio quality, insert
+                                     a value between 0 (better) and 9 (worse)
+                                     for VBR or a specific bitrate like 128K
+                                     (default 5)
+    --recode-video FORMAT            Encode the video to another format if
+                                     necessary (currently supported:
+                                     mp4|flv|ogg|webm|mkv|avi)
      --postprocessor-args ARGS        Give these arguments to the postprocessor
-    -k, --keep-video                 Keep the video file on disk after the post-processing; the video is erased by default
-    --no-post-overwrites             Do not overwrite post-processed files; the post-processed files are overwritten by default
-    --embed-subs                     Embed subtitles in the video (only for mkv and mp4 videos)
+    -k, --keep-video                 Keep the video file on disk after the post-
+                                     processing; the video is erased by default
+    --no-post-overwrites             Do not overwrite post-processed files; the
+                                     post-processed files are overwritten by
+                                     default
+    --embed-subs                     Embed subtitles in the video (only for mp4,
+                                     webm and mkv videos)
      --embed-thumbnail                Embed thumbnail in the audio as cover art
      --add-metadata                   Write metadata to the video file
-    --metadata-from-title FORMAT     Parse additional metadata like song title / artist from the video title. The format syntax is the same as --output, the parsed
-                                     parameters replace existing values. Additional templates: %(album)s, %(artist)s. Example: --metadata-from-title "%(artist)s -
-                                     %(title)s" matches a title like "Coldplay - Paradise"
-    --xattrs                         Write metadata to the video file's xattrs (using dublin core and xdg standards)
-    --fixup POLICY                   Automatically correct known faults of the file. One of never (do nothing), warn (only emit a warning), detect_or_warn (the default;
-                                     fix file if we can, warn otherwise)
-    --prefer-avconv                  Prefer avconv over ffmpeg for running the postprocessors (default)
-    --prefer-ffmpeg                  Prefer ffmpeg over avconv for running the postprocessors
-    --ffmpeg-location PATH           Location of the ffmpeg/avconv binary; either the path to the binary or its containing directory.
-    --exec CMD                       Execute a command on the file after downloading, similar to find's -exec syntax. Example: --exec 'adb push {} /sdcard/Music/ && rm
-                                     {}'
-    --convert-subtitles FORMAT       Convert the subtitles to other format (currently supported: srt|ass|vtt)
+    --metadata-from-title FORMAT     Parse additional metadata like song title /
+                                     artist from the video title. The format
+                                     syntax is the same as --output, the parsed
+                                     parameters replace existing values.
+                                     Additional templates: %(album)s,
+                                     %(artist)s. Example: --metadata-from-title
+                                     "%(artist)s - %(title)s" matches a title
+                                     like "Coldplay - Paradise"
+    --xattrs                         Write metadata to the video file's xattrs
+                                     (using dublin core and xdg standards)
+    --fixup POLICY                   Automatically correct known faults of the
+                                     file. One of never (do nothing), warn (only
+                                     emit a warning), detect_or_warn (the
+                                     default; fix file if we can, warn
+                                     otherwise)
+    --prefer-avconv                  Prefer avconv over ffmpeg for running the
+                                     postprocessors (default)
+    --prefer-ffmpeg                  Prefer ffmpeg over avconv for running the
+                                     postprocessors
+    --ffmpeg-location PATH           Location of the ffmpeg/avconv binary;
+                                     either the path to the binary or its
+                                     containing directory.
+    --exec CMD                       Execute a command on the file after
+                                     downloading, similar to find's -exec
+                                     syntax. Example: --exec 'adb push {}
+                                     /sdcard/Music/ && rm {}'
+    --convert-subs FORMAT            Convert the subtitles to other format
+                                     (currently supported: srt|ass|vtt)
  
  # CONFIGURATION
  
-You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<user name>\youtube-dl.conf`.
+You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
  
-### Authentication with `.netrc` file ###
+For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
+```
+-x
+--no-mtime
+--proxy 127.0.0.1:3128
+-o ~/Movies/%(title)s.%(ext)s
+```
+
+Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
  
-You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in shell command history. You can achieve this using [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
+You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
+
+### Authentication with `.netrc` file
+
+You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
  ```
  touch $HOME/.netrc
  chmod a-rwx,u+rw $HOME/.netrc
@@ -254,50 +445,211 @@ For example:
  machine youtube login myaccount@gmail.com password my_youtube_password
  machine twitch login my_twitch_account_name password my_twitch_password
  ```
-To activate authentication with `.netrc` file you should pass `--netrc` to youtube-dl or to place it in [configuration file](#configuration).
+To activate authentication with the `.netrc` file you should pass `--netrc` to youtube-dl or place it in the [configuration file](#configuration).
  
-On Windows you may also need to setup `%HOME%` environment variable manually.
+On Windows you may also need to setup the `%HOME%` environment variable manually.
  
  # OUTPUT TEMPLATE
  
-The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parenthesis, followed by a lowercase S. Allowed names are:
-
- - `id`: The sequence will be replaced by the video identifier.
- - `url`: The sequence will be replaced by the video URL.
- - `uploader`: The sequence will be replaced by the nickname of the person who uploaded the video.
- - `upload_date`: The sequence will be replaced by the upload date in YYYYMMDD format.
- - `title`: The sequence will be replaced by the video title.
- - `ext`: The sequence will be replaced by the appropriate extension (like flv or mp4).
- - `epoch`: The sequence will be replaced by the Unix epoch when creating the file.
- - `autonumber`: The sequence will be replaced by a five-digit number that will be increased with each download, starting at zero.
- - `playlist`: The name or the id of the playlist that contains the video.
- - `playlist_index`: The index of the video in the playlist, a five-digit number.
+The `-o` option allows users to indicate a template for the output file names.
+
+**tl;dr:** [navigate me to examples](#output-template-examples).
+
+The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a lowercase S. Allowed names are:
+
+ - `id`: Video identifier
+ - `title`: Video title
+ - `url`: Video URL
+ - `ext`: Video filename extension
+ - `alt_title`: A secondary title of the video
+ - `display_id`: An alternative identifier for the video
+ - `uploader`: Full name of the video uploader
+ - `license`: License name the video is licensed under
+ - `creator`: The main artist who created the video
+ - `release_date`: The date (YYYYMMDD) when the video was released
+ - `timestamp`: UNIX timestamp of the moment the video became available
+ - `upload_date`: Video upload date (YYYYMMDD)
+ - `uploader_id`: Nickname or id of the video uploader
+ - `location`: Physical location where the video was filmed
+ - `duration`: Length of the video in seconds
+ - `view_count`: How many users have watched the video on the platform
+ - `like_count`: Number of positive ratings of the video
+ - `dislike_count`: Number of negative ratings of the video
+ - `repost_count`: Number of reposts of the video
+ - `average_rating`: Average rating give by users, the scale used depends on the webpage
+ - `comment_count`: Number of comments on the video
+ - `age_limit`: Age restriction for the video (years)
+ - `format`: A human-readable description of the format 
+ - `format_id`: Format code specified by `--format`
+ - `format_note`: Additional info about the format
+ - `width`: Width of the video
+ - `height`: Height of the video
+ - `resolution`: Textual description of width and height
+ - `tbr`: Average bitrate of audio and video in KBit/s
+ - `abr`: Average audio bitrate in KBit/s
+ - `acodec`: Name of the audio codec in use
+ - `asr`: Audio sampling rate in Hertz
+ - `vbr`: Average video bitrate in KBit/s
+ - `fps`: Frame rate
+ - `vcodec`: Name of the video codec in use
+ - `container`: Name of the container format
+ - `filesize`: The number of bytes, if known in advance
+ - `filesize_approx`: An estimate for the number of bytes
+ - `protocol`: The protocol that will be used for the actual download
+ - `extractor`: Name of the extractor
+ - `extractor_key`: Key name of the extractor
+ - `epoch`: Unix epoch when creating the file
+ - `autonumber`: Five-digit number that will be increased with each download, starting at zero
+ - `playlist`: Name or id of the playlist that contains the video
+ - `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
+
+Available for the video that belongs to some logical chapter or section:
+ - `chapter`: Name or title of the chapter the video belongs to
+ - `chapter_number`: Number of the chapter the video belongs to
+ - `chapter_id`: Id of the chapter the video belongs to
+
+Available for the video that is an episode of some series or programme:
+ - `series`: Title of the series or programme the video episode belongs to
+ - `season`: Title of the season the video episode belongs to
+ - `season_number`: Number of the season the video episode belongs to
+ - `season_id`: Id of the season the video episode belongs to
+ - `episode`: Title of the video episode
+ - `episode_number`: Number of the video episode within a season
+ - `episode_id`: Id of the video episode
+
+Available for the media that is a track or a part of a music album:
+ - `track`: Title of the track
+ - `track_number`: Number of the track within an album or a disc
+ - `track_id`: Id of the track
+ - `artist`: Artist(s) of the track
+ - `genre`: Genre(s) of the track
+ - `album`: Title of the album the track belongs to
+ - `album_type`: Type of the album
+ - `album_artist`: List of all artists appeared on the album
+ - `disc_number`: Number of the disc or other physical medium the track belongs to
+ - `release_year`: Year (YYYY) when the album was released
+
+Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
+
+For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
+
+Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
+
+To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
  
  The current default template is `%(title)s-%(id)s.%(ext)s`.
  
  In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
  
+#### Output template examples
+
+Note on Windows you may need to use double quotes instead of single.
+
  ```bash
-$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
+$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
  youtube-dl test video ''_ä↭𝕐.mp4    # All kinds of weird characters
-$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
+
+$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc --restrict-filenames
  youtube-dl_test_video_.mp4          # A simple file name
+
+# Download YouTube playlist videos in separate directory indexed by video order in a playlist
+$ youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
+
+# Download all playlists of YouTube channel/user keeping each playlist in separate directory:
+$ youtube-dl -o '%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/user/TheLinuxFoundation/playlists
+
+# Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home
+$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
+
+# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
+$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
+
+# Stream the video being downloaded to stdout
+$ youtube-dl -o - BaW_jenozKc
  ```
  
  # FORMAT SELECTION
  
-By default youtube-dl tries to download the best quality, but sometimes you may want to download other format.
-The simplest case is requesting a specific format, for example `-f 22`. You can get the list of available formats using `--list-formats`, you can also use a file extension (currently it supports aac, m4a, mp3, mp4, ogg, wav, webm) or the special names `best`, `bestvideo`, `bestaudio` and `worst`.
+By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
+
+But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
+
+The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
+
+**tl;dr:** [navigate me to examples](#format-selection-examples).
+
+The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific. 
+
+You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
+
+You can also use special names to select particular edge case format:
+ - `best`: Select best quality format represented by single file with video and audio
+ - `worst`: Select worst quality format represented by single file with video and audio
+ - `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
+ - `worstvideo`: Select worst quality video only format, may not be available
+ - `bestaudio`: Select best quality audio only format, may not be available
+ - `worstaudio`: Select worst quality audio only format, may not be available
+
+For example, to download worst quality video only format you can use `-f worstvideo`.
+
+If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
+
+If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
+
+You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
  
-If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).  This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so  `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`.
+The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
+ - `filesize`: The number of bytes, if known in advance
+ - `width`: Width of the video, if known
+ - `height`: Height of the video, if known
+ - `tbr`: Average bitrate of audio and video in KBit/s
+ - `abr`: Average audio bitrate in KBit/s
+ - `vbr`: Average video bitrate in KBit/s
+ - `asr`: Audio sampling rate in Hertz
+ - `fps`: Frame rate
  
-Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
+Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields:
+ - `ext`: File extension
+ - `acodec`: Name of the audio codec in use
+ - `vcodec`: Name of the video codec in use
+ - `container`: Name of the container format
+ - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
+ - `format_id`: A short description of the format
+
+Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
+
+Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
+
+You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
+
+Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
+
+Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
+
+If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
+
+#### Format selection examples
+
+Note on Windows you may need to use double quotes instead of single.
+
+```bash
+# Download best mp4 format available or any other best if no mp4 available
+$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
+
+# Download best format available but not better that 480p
+$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
+
+# Download best video only format but no bigger that 50 MB
+$ youtube-dl -f 'best[filesize<50M]'
+
+# Download best format available via direct link over HTTP/HTTPS protocol
+$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
+```
  
-If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
  
  # VIDEO SELECTION
  
-Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`, they accept dates in two formats:
+Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`. They accept dates in two formats:
  
   - Absolute dates: Dates in the format `YYYYMMDD`.
   - Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`
@@ -311,7 +663,7 @@ $ youtube-dl --dateafter now-6months
  # Download only the videos uploaded on January 1, 1970
  $ youtube-dl --date 19700101
  
-$ # will only download the videos uploaded in the 200x decade
+$ # Download only the videos uploaded in the 200x decade
  $ youtube-dl --dateafter 20000101 --datebefore 20091231
  ```
  
@@ -323,7 +675,7 @@ If you've followed [our manual installation instructions](http://rg3.github.io/y
  
  If you have used pip, a simple `sudo pip install -U youtube-dl` is sufficient to update.
  
-If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to http://yt-dl.org/ to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distributions serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
+If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to http://yt-dl.org/ to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
  
  As a last resort, you can also uninstall the version installed by your package manager and follow our manual installation instructions. For that, remove the distribution's package, with a line like
  
@@ -349,7 +701,7 @@ If you have installed youtube-dl with a package manager, pip, setup.py or a tarb
  
  By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`.
  
-### Can you please put the -b option back?
+### Can you please put the `-b` option back?
  
  Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the `-b` option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you're interested in. In that case, simply request it with the `-f` option and youtube-dl will try to download it.
  
@@ -357,17 +709,23 @@ Most people asking this question are not aware that youtube-dl now defaults to d
  
  Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
  
+### Do I need any other programs?
+
+youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
+
+Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
+
  ### I have downloaded a video but how can I play it?
  
  Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
  
-### I extracted a video URL with -g, but it does not play on another machine / in my webbrowser.
+### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
  
  It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies.  Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
  
  It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
  
-Please bear in mind that some URL protocols are **not** supported by browsers out of the box, including RTMP. If you are using -g, your own downloader must support these as well.
+Please bear in mind that some URL protocols are **not** supported by browsers out of the box, including RTMP. If you are using `-g`, your own downloader must support these as well.
  
  If you want to play the video on a machine that is not running youtube-dl, you can relay the video content from the machine that runs youtube-dl. You can use `-o -` to let youtube-dl stream a video to stdout, or simply allow the player to download the files written by youtube-dl in turn.
  
@@ -375,13 +733,13 @@ If you want to play the video on a machine that is not running youtube-dl, you c
  
  YouTube has switched to a new video info format in July 2011 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
  
-### ERROR: unable to download video ###
+### ERROR: unable to download video
  
  YouTube requires an additional signature since September 2012 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
  
-### Video URL contains an ampersand and I'm getting some strange output `[1] 2839` or `'v' is not recognized as an internal or external command` ###
+### Video URL contains an ampersand and I'm getting some strange output `[1] 2839` or `'v' is not recognized as an internal or external command`
  
-That's actually the output from your shell. Since ampersand is one of the special shell characters it's interpreted by shell preventing you from passing the whole URL to youtube-dl. To disable your shell from interpreting the ampersands (or any other special characters) you have to either put the whole URL in quotes or escape them with a backslash (which approach will work depends on your shell).
+That's actually the output from your shell. Since ampersand is one of the special shell characters it's interpreted by the shell preventing you from passing the whole URL to youtube-dl. To disable your shell from interpreting the ampersands (or any other special characters) you have to either put the whole URL in quotes or escape them with a backslash (which approach will work depends on your shell).
  
  For example if your URL is https://www.youtube.com/watch?t=4&v=BaW_jenozKc you should end up with following command:
  
@@ -403,7 +761,7 @@ In February 2015, the new YouTube player contained a character sequence in a str
  
  These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
  
-### SyntaxError: Non-ASCII character ###
+### SyntaxError: Non-ASCII character
  
  The error
  
@@ -414,7 +772,7 @@ means you're using an outdated version of Python. Please update to Python 2.6 or
  
  ### What is this binary file? Where has the code gone?
  
-Since June 2012 (#342) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
+Since June 2012 ([#342](https://github.com/rg3/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
  
  ### The exe throws a *Runtime error from Visual C++*
  
@@ -432,13 +790,19 @@ From then on, after restarting your shell, you will be able to access both youtu
  
  Use the `-o` to specify an [output template](#output-template), for example `-o "/home/user/videos/%(title)s-%(id)s.%(ext)s"`. If you want this for all of your downloads, put the option into your [configuration file](#configuration).
  
-### How do I download a video starting with a `-` ?
+### How do I download a video starting with a `-`?
  
  Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
  
      youtube-dl -- -wNyEUrxzFU
      youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"
  
+### How do I pass cookies to youtube-dl?
+
+Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+
+Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
+
  ### Can you add support for this anime video site, or site which shows current movies for free?
  
  As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
@@ -484,14 +848,16 @@ To run the test, simply invoke your favorite test runner, or execute a test file
  If you want to create a build of youtube-dl yourself, you'll need
  
  * python
-* make
+* make (both GNU make and BSD make are supported)
  * pandoc
  * zip
  * nosetests
  
  ### Adding support for a new site
  
-If you want to add support for a new site, you can follow this quick list (assuming your service is called `yourextractor`):
+If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
+
+After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
  2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
@@ -527,27 +893,29 @@ If you want to add support for a new site, you can follow this quick list (assum
              webpage = self._download_webpage(url, video_id)
  
              # TODO more code goes here, for example ...
-            title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
+            title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
  
              return {
                  'id': video_id,
                  'title': title,
                  'description': self._og_search_description(webpage),
+                'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
                  # TODO more properties (see youtube_dl/extractor/common.py)
              }
      ```
-5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
-6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
-7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
-8. If you can, check the code with [flake8](https://pypi.python.org/pypi/flake8).
-9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
-
-        $ git add youtube_dl/extractor/__init__.py
+5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
+6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
+7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
+8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
+9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
+10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
+
+        $ git add youtube_dl/extractor/extractors.py
          $ git add youtube_dl/extractor/yourextractor.py
          $ git commit -m '[yourextractor] Add new extractor'
          $ git push origin yourextractor
  
-10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
+11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
  
  In any case, thank you very much for your contributions!
  
@@ -566,7 +934,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
      ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
  ```
  
-Most likely, you'll want to use various options. For a list of what can be done, have a look at [youtube_dl/YoutubeDL.py](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L69). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
+Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
  
  Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
  
@@ -607,11 +975,25 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
  
  # BUGS
  
-Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the irc channel #youtube-dl on freenode.
+Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
  
-**Please include the full output of youtube-dl when run with `-v`**.
+**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
+```
+$ youtube-dl -v <your command line>
+[debug] System config: []
+[debug] User config: []
+[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
+[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
+[debug] youtube-dl version 2015.12.06
+[debug] Git HEAD: 135392e
+[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
+[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
+[debug] Proxy map: {}
+...
+```
+**Do not post screenshots of verbose log only plain text is acceptable.**
  
-The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
+The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
  
  Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
  
@@ -625,21 +1007,21 @@ So please elaborate on what feature you are requesting, or what bug you want to
  - How it could be fixed
  - How your proposed solution would look like
  
-If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a commiter myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
+If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
  
-For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the -v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
+For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
  
-If your server has multiple IPs or you suspect censorship, adding --call-home may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
+If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
  
-**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like http://www.youtube.com/watch?v=BaW_jenozKc . There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. http://www.youtube.com/ ) is *not* an example URL.
+**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
  
  ###  Are you using the latest version?
  
-Before reporting any issue, type youtube-dl -U. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
+Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
  
  ###  Is the issue already documented?
  
-Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or at https://github.com/rg3/youtube-dl/search?type=Issues . If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
+Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/rg3/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
  
  ###  Why are existing options not enough?
  
@@ -669,4 +1051,4 @@ It may sound strange, but some bug reports we receive are completely unrelated t
  
  youtube-dl is released into the public domain by the copyright holders.
  
-This README file was originally written by Daniel Bolton (<https://github.com/dbbolton>) and is likewise released into the public domain.
+This README file was originally written by [Daniel Bolton](https://github.com/dbbolton) and is likewise released into the public domain.
diff --git a/devscripts/bash-completion.py b/devscripts/bash-completion.py

index cd26cc0895d033af03541f48815e8dad23f5161d..ce68f26f9ca39bd298f5d4149346af686257e042 100755 (executable)
--- a/devscripts/bash-completion.py
+++ b/devscripts/bash-completion.py
@@ -5,7 +5,7 @@ import os
  from os.path import dirname as dirn
  import sys
  
-sys.path.append(dirn(dirn((os.path.abspath(__file__)))))
+sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
  import youtube_dl
  
  BASH_COMPLETION_FILE = "youtube-dl.bash-completion"
diff --git a/devscripts/fish-completion.py b/devscripts/fish-completion.py

index c2f2387980d0867adc3cea4c3e14047fdaf1aa28..41629d87d006fbaf4ba90cbb87bf60388fb7f7e5 100755 (executable)
--- a/devscripts/fish-completion.py
+++ b/devscripts/fish-completion.py
@@ -6,7 +6,7 @@ import os
  from os.path import dirname as dirn
  import sys
  
-sys.path.append(dirn(dirn((os.path.abspath(__file__)))))
+sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
  import youtube_dl
  from youtube_dl.utils import shell_quote
  
diff --git a/devscripts/gh-pages/update-copyright.py b/devscripts/gh-pages/update-copyright.py

index 3663c8afef278f132518ed9c0286bdfe34d028a7..e6c3abc8d8c716db6adbb62598a9d9179fcaa2da 100755 (executable)
--- a/devscripts/gh-pages/update-copyright.py
+++ b/devscripts/gh-pages/update-copyright.py
@@ -5,7 +5,7 @@ from __future__ import with_statement, unicode_literals
  
  import datetime
  import glob
-import io  # For Python 2 compatibilty
+import io  # For Python 2 compatibility
  import os
  import re
  
diff --git a/devscripts/gh-pages/update-sites.py b/devscripts/gh-pages/update-sites.py

index d3ef5f0b50daa56513118f55d5b636e5f46552a0..503c1372fd3589f45a207d043999a5286f6c5e1e 100755 (executable)
--- a/devscripts/gh-pages/update-sites.py
+++ b/devscripts/gh-pages/update-sites.py
@@ -6,7 +6,7 @@ import os
  import textwrap
  
  # We must be able to import youtube_dl
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
  
  import youtube_dl
  
diff --git a/devscripts/lazy_load_template.py b/devscripts/lazy_load_template.py

new file mode 100644 (file)

index 0000000..2e6e664
--- /dev/null
+++ b/devscripts/lazy_load_template.py
@@ -0,0 +1,19 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+
+class LazyLoadExtractor(object):
+    _module = None
+
+    @classmethod
+    def ie_key(cls):
+        return cls.__name__[:-2]
+
+    def __new__(cls, *args, **kwargs):
+        mod = __import__(cls._module, fromlist=(cls.__name__,))
+        real_cls = getattr(mod, cls.__name__)
+        instance = real_cls.__new__(real_cls)
+        instance.__init__(*args, **kwargs)
+        return instance
diff --git a/devscripts/make_issue_template.py b/devscripts/make_issue_template.py

new file mode 100644 (file)

index 0000000..b7ad23d
--- /dev/null
+++ b/devscripts/make_issue_template.py
@@ -0,0 +1,29 @@
+#!/usr/bin/env python
+from __future__ import unicode_literals
+
+import io
+import optparse
+
+
+def main():
+    parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
+    options, args = parser.parse_args()
+    if len(args) != 2:
+        parser.error('Expected an input and an output filename')
+
+    infile, outfile = args
+
+    with io.open(infile, encoding='utf-8') as inf:
+        issue_template_tmpl = inf.read()
+
+    # Get the version from youtube_dl/version.py without importing the package
+    exec(compile(open('youtube_dl/version.py').read(),
+                 'youtube_dl/version.py', 'exec'))
+
+    out = issue_template_tmpl % {'version': locals()['__version__']}
+
+    with io.open(outfile, 'w', encoding='utf-8') as outf:
+        outf.write(out)
+
+if __name__ == '__main__':
+    main()
diff --git a/devscripts/make_lazy_extractors.py b/devscripts/make_lazy_extractors.py

new file mode 100644 (file)

index 0000000..b5a8b91
--- /dev/null
+++ b/devscripts/make_lazy_extractors.py
@@ -0,0 +1,63 @@
+from __future__ import unicode_literals, print_function
+
+from inspect import getsource
+import os
+from os.path import dirname as dirn
+import sys
+
+print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
+
+sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
+
+lazy_extractors_filename = sys.argv[1]
+if os.path.exists(lazy_extractors_filename):
+    os.remove(lazy_extractors_filename)
+
+from youtube_dl.extractor import _ALL_CLASSES
+from youtube_dl.extractor.common import InfoExtractor
+
+with open('devscripts/lazy_load_template.py', 'rt') as f:
+    module_template = f.read()
+
+module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)]
+
+ie_template = '''
+class {name}(LazyLoadExtractor):
+    _VALID_URL = {valid_url!r}
+    _module = '{module}'
+'''
+
+make_valid_template = '''
+    @classmethod
+    def _make_valid_url(cls):
+        return {valid_url!r}
+'''
+
+
+def build_lazy_ie(ie, name):
+    valid_url = getattr(ie, '_VALID_URL', None)
+    s = ie_template.format(
+        name=name,
+        valid_url=valid_url,
+        module=ie.__module__)
+    if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
+        s += '\n' + getsource(ie.suitable)
+    if hasattr(ie, '_make_valid_url'):
+        # search extractors
+        s += make_valid_template.format(valid_url=ie._make_valid_url())
+    return s
+
+names = []
+for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]:
+    name = ie.ie_key() + 'IE'
+    src = build_lazy_ie(ie, name)
+    module_contents.append(src)
+    names.append(name)
+
+module_contents.append(
+    '_ALL_CLASSES = [{0}]'.format(', '.join(names)))
+
+module_src = '\n'.join(module_contents) + '\n'
+
+with open(lazy_extractors_filename, 'wt') as f:
+    f.write(module_src)
diff --git a/devscripts/make_supportedsites.py b/devscripts/make_supportedsites.py

index 3df4385a6b09a520791f67f8f3958c0b635df1bf..8cb4a46380253643e6df2370058c433094cf159b 100644 (file)
--- a/devscripts/make_supportedsites.py
+++ b/devscripts/make_supportedsites.py
@@ -9,7 +9,7 @@ import sys
  
  # Import youtube_dl
  ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
-sys.path.append(ROOT_DIR)
+sys.path.insert(0, ROOT_DIR)
  import youtube_dl
  
  
diff --git a/devscripts/prepare_manpage.py b/devscripts/prepare_manpage.py

index 7ece37754d1003ba4cbe63fed109eef00711adcd..776e6556e5b2bd683acbcf79d7bc07431be6548a 100644 (file)
--- a/devscripts/prepare_manpage.py
+++ b/devscripts/prepare_manpage.py
@@ -8,6 +8,35 @@ import re
  ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
  README_FILE = os.path.join(ROOT_DIR, 'README.md')
  
+
+def filter_options(readme):
+    ret = ''
+    in_options = False
+    for line in readme.split('\n'):
+        if line.startswith('# '):
+            if line[2:].startswith('OPTIONS'):
+                in_options = True
+            else:
+                in_options = False
+
+        if in_options:
+            if line.lstrip().startswith('-'):
+                option, description = re.split(r'\s{2,}', line.lstrip())
+                split_option = option.split(' ')
+
+                if not split_option[-1].startswith('-'):  # metavar
+                    option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
+
+                # Pandoc's definition_lists. See http://pandoc.org/README.html
+                # for more information.
+                ret += '\n%s\n:   %s\n' % (option, description)
+            else:
+                ret += line.lstrip() + '\n'
+        else:
+            ret += line + '\n'
+
+    return ret
+
  with io.open(README_FILE, encoding='utf-8') as f:
      readme = f.read()
  
@@ -26,6 +55,8 @@ readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
  readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
  readme = PREFIX + readme
  
+readme = filter_options(readme)
+
  if sys.version_info < (3, 0):
      print(readme.encode('utf-8'))
  else:
diff --git a/devscripts/release.sh b/devscripts/release.sh

index 61806961c63798dfab081ceae5d76f9afc0ff773..8dea55dbbc6a4b577c18d9255a9ed010d95c6f2e 100755 (executable)
--- a/devscripts/release.sh
+++ b/devscripts/release.sh
@@ -45,9 +45,9 @@ fi
  /bin/echo -e "\n### Changing version in version.py..."
  sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
  
-/bin/echo -e "\n### Committing documentation and youtube_dl/version.py..."
-make README.md CONTRIBUTING.md supportedsites
-git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py
+/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
+make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
+git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
  git commit -m "release $version"
  
  /bin/echo -e "\n### Now tagging, signing and pushing..."
diff --git a/devscripts/zsh-completion.py b/devscripts/zsh-completion.py

index f200f2c80aef4da29999ddba75bc1b17e3048a54..04728e8e2ce763ca886853061875c59e4f645921 100755 (executable)
--- a/devscripts/zsh-completion.py
+++ b/devscripts/zsh-completion.py
@@ -5,7 +5,7 @@ import os
  from os.path import dirname as dirn
  import sys
  
-sys.path.append(dirn(dirn((os.path.abspath(__file__)))))
+sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
  import youtube_dl
  
  ZSH_COMPLETION_FILE = "youtube-dl.zsh"
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 657935dc6597b00dab2c45e37380a4c52968f139..03875b8dbe9f0fe4c3d7bbb6ac6ddc88d5cafab5 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -1,6 +1,7 @@
  # Supported sites
   - **1tv**: Первый канал
   - **1up.com**
+ - **20min**
   - **220.ro**
   - **22tracks:genre**
   - **22tracks:track**
@@ -15,37 +16,51 @@
   - **abc.net.au**
   - **Abc7News**
   - **AcademicEarth:Course**
+ - **acast**
+ - **acast:channel**
   - **AddAnime**
   - **AdobeTV**
+ - **AdobeTVChannel**
+ - **AdobeTVShow**
   - **AdobeTVVideo**
   - **AdultSwim**
- - **Aftenposten**
+ - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
   - **Aftonbladet**
   - **AirMozilla**
   - **AlJazeera**
   - **Allocine**
   - **AlphaPorno**
+ - **AnimeOnDemand**
   - **anitube.se**
   - **AnySex**
   - **Aparat**
   - **AppleConnect**
   - **AppleDaily**: 臺灣蘋果日報
- - **AppleTrailers**
+ - **appletrailers**
+ - **appletrailers:section**
   - **archive.org**: archive.org videos
   - **ARD**
+ - **ARD:mediathek**: Saarländischer Rundfunk
   - **ARD:mediathek**
   - **arte.tv**
   - **arte.tv:+7**
+ - **arte.tv:cinema**
   - **arte.tv:concert**
   - **arte.tv:creative**
   - **arte.tv:ddc**
   - **arte.tv:embed**
   - **arte.tv:future**
+ - **arte.tv:info**
+ - **arte.tv:magazine**
   - **AtresPlayer**
   - **ATTTechChannel**
+ - **AudiMedia**
+ - **AudioBoom**
   - **audiomack**
   - **audiomack:album**
+ - **auroravid**: AuroraVid
   - **Azubu**
+ - **AzubuLive**
   - **BaiduVideo**: 百度视频
   - **bambuser**
   - **bambuser:channel**
@@ -53,46 +68,62 @@
   - **Bandcamp:album**
   - **bbc**: BBC
   - **bbc.co.uk**: BBC iPlayer
+ - **bbc.co.uk:article**: BBC articles
   - **BeatportPro**
   - **Beeg**
   - **BehindKink**
   - **Bet**
+ - **Bigflix**
   - **Bild**: Bild.de
   - **BiliBili**
+ - **BioBioChileTV**
+ - **BleacherReport**
+ - **BleacherReportCMS**
   - **blinkx**
- - **blip.tv:user**
- - **BlipTV**
   - **Bloomberg**
+ - **BokeCC**
   - **Bpb**: Bundeszentrale für politische Bildung
   - **BR**: Bayerischer Rundfunk Mediathek
+ - **BravoTV**
   - **Break**
- - **Brightcove**
+ - **brightcove:legacy**
+ - **brightcove:new**
   - **bt:article**: Bergens Tidende Articles
   - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
   - **BuzzFeed**
   - **BYUtv**
   - **Camdemy**
   - **CamdemyFolder**
- - **Canal13cl**
+ - **CamWithHer**
   - **canalc2.tv**
   - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
+ - **Canvas**
+ - **CBC**
+ - **CBCPlayer**
   - **CBS**
+ - **CBSInteractive**
   - **CBSNews**: CBS News
+ - **CBSNewsLiveVideo**: CBS News Live Videos
   - **CBSSports**
+ - **CDA**
   - **CeskaTelevize**
   - **channel9**: Channel 9
+ - **Chaturbate**
   - **Chilloutzone**
   - **chirbit**
   - **chirbit:profile**
   - **Cinchcast**
   - **Cinemassacre**
- - **clipfish**
+ - **Clipfish**
   - **cliphunter**
+ - **ClipRs**
   - **Clipsyndicate**
+ - **cloudtime**: CloudTime
   - **Cloudy**
   - **Clubic**
+ - **Clyp**
   - **cmt.com**
- - **CNET**
+ - **CNBC**
   - **CNN**
   - **CNNArticle**
   - **CNNBlogs**
@@ -101,38 +132,53 @@
   - **ComCarCoff**
   - **ComedyCentral**
   - **ComedyCentralShows**: The Daily Show / The Colbert Report
- - **CondeNast**: Condé Nast media group: Condé Nast, GQ, Glamour, Vanity Fair, Vogue, W Magazine, WIRED
+ - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
   - **Cracked**
+ - **Crackle**
   - **Criterion**
   - **CrooksAndLiars**
   - **Crunchyroll**
   - **crunchyroll:playlist**
+ - **CSNNE**
   - **CSpan**: C-SPAN
   - **CtsNews**: 華視新聞
   - **culturebox.francetvinfo.fr**
+ - **CultureUnplugged**
+ - **CWTV**
   - **dailymotion**
   - **dailymotion:playlist**
   - **dailymotion:user**
   - **DailymotionCloud**
   - **daum.net**
+ - **daum.net:clip**
+ - **daum.net:playlist**
+ - **daum.net:user**
   - **DBTV**
+ - **DCN**
+ - **dcn:live**
+ - **dcn:season**
+ - **dcn:video**
   - **DctpTv**
   - **DeezerPlaylist**
   - **defense.gouv.fr**
+ - **democracynow**
   - **DHM**: Filmarchiv - Deutsches Historisches Museum
+ - **DigitallySpeaking**
+ - **Digiteka**
   - **Discovery**
- - **divxstage**: DivxStage
   - **Dotsub**
   - **DouyuTV**: 斗鱼
+ - **DPlay**
   - **dramafever**
   - **dramafever:series**
   - **DRBonanza**
   - **Dropbox**
   - **DrTuber**
   - **DRTV**
- - **Dump**
   - **Dumpert**
   - **dvtv**: http://video.aktualne.cz/
+ - **dw**
+ - **dw:article**
   - **EaglePlatform**
   - **EbaumsWorld**
   - **EchoMsk**
@@ -148,7 +194,9 @@
   - **Eporner**
   - **EroProfile**
   - **Escapist**
- - **ESPN** (Currently broken)
+ - **ESPN**
+ - **EsriVideo**
+ - **Europa**
   - **EveryonesMixtape**
   - **exfm**: ex.fm
   - **ExpoTV**
@@ -156,25 +204,30 @@
   - **facebook**
   - **faz.net**
   - **fc2**
+ - **Fczenit**
+ - **features.aol.com**
   - **fernsehkritik.tv**
- - **fernsehkritik.tv:postecke**
   - **Firstpost**
   - **FiveTV**
   - **Flickr**
   - **Folketinget**: Folketinget (ft.dk; Danish parliament)
   - **FootyRoom**
+ - **FOX**
   - **Foxgay**
- - **FoxNews**
+ - **FoxNews**: Fox News and Fox Business Video
   - **FoxSports**
   - **france2.fr:generation-quoi**
   - **FranceCulture**
+ - **FranceCultureEmission**
   - **FranceInter**
   - **francetv**: France 2, 3, 4, 5 and Ô
   - **francetvinfo.fr**
   - **Freesound**
   - **freespeech.org**
   - **FreeVideo**
+ - **Funimation**
   - **FunnyOrDie**
+ - **GameInformer**
   - **Gamekings**
   - **GameOne**
   - **gameone:playlist**
@@ -190,25 +243,27 @@
   - **Giga**
   - **Glide**: Glide mobile video messages (glide.me)
   - **Globo**
+ - **GloboArticle**
   - **GodTube**
   - **GoldenMoustache**
   - **Golem**
- - **GorillaVid**: GorillaVid.in, daclips.in, movpod.in, fastvideo.in and realvid.net
+ - **GoogleDrive**
   - **Goshgay**
+ - **GPUTechConf**
   - **Groupon**
   - **Hark**
+ - **HBO**
   - **HearThisAt**
   - **Heise**
   - **HellPorno**
   - **Helsinki**: helsinki.fi
   - **HentaiStigma**
   - **HistoricFilms**
- - **History**
   - **hitbox**
   - **hitbox:live**
   - **HornBunny**
- - **HostingBulk**
   - **HotNewHipHop**
+ - **HotStar**
   - **Howcast**
   - **HowStuffWorks**
   - **HuffPost**: Huffington Post
@@ -218,7 +273,10 @@
   - **imdb**: Internet Movie Database trailers
   - **imdb:list**: Internet Movie Database lists
   - **Imgur**
+ - **ImgurAlbum**
   - **Ina**
+ - **Indavideo**
+ - **IndavideoEmbed**
   - **InfoQ**
   - **Instagram**
   - **instagram:user**: Instagram user profile
@@ -228,12 +286,12 @@
   - **Ir90Tv**
   - **ivi**: ivi.ru
   - **ivi:compilation**: ivi.ru compilations
+ - **ivideon**: Ivideon TV
   - **Izlesene**
- - **JadoreCettePub**
   - **JeuxVideo**
   - **Jove**
   - **jpopsuki.tv**
- - **Jukebox**
+ - **JWPlatform**
   - **Kaltura**
   - **KanalPlay**: Kanal 5/9/11 Play
   - **Kankan**
@@ -243,9 +301,11 @@
   - **KeezMovies**
   - **KhanAcademy**
   - **KickStarter**
+ - **KonserthusetPlay**
   - **kontrtube**: KontrTube.ru - Труба зовёт
   - **KrasView**: Красвью
   - **Ku6**
+ - **KUSI**
   - **kuwo:album**: 酷我音乐 - 专辑
   - **kuwo:category**: 酷我音乐 - 分类
   - **kuwo:chart**: 酷我音乐 - 排行榜
@@ -254,75 +314,89 @@
   - **kuwo:song**: 酷我音乐
   - **la7.tv**
   - **Laola1Tv**
+ - **Le**: 乐视网
   - **Lecture2Go**
- - **Letv**: 乐视网
- - **LetvPlaylist**
- - **LetvTv**
+ - **Lemonde**
+ - **LePlaylist**
+ - **LetvCloud**: 乐视云
   - **Libsyn**
   - **life:embed**
   - **lifenews**: LIFE | NEWS
+ - **limelight**
+ - **limelight:channel**
+ - **limelight:channel_list**
   - **LiveLeak**
   - **livestream**
   - **livestream:original**
   - **LnkGo**
+ - **LoveHomePorn**
   - **lrt.lt**
   - **lynda**: lynda.com videos
   - **lynda:course**: lynda.com online courses
   - **m6**
   - **macgamestore**: MacGameStore trailers
   - **mailru**: Видео@Mail.Ru
+ - **MakersChannel**
+ - **MakerTV**
   - **Malemotion**
- - **MDR**
+ - **MatchTV**
+ - **MDR**: MDR.DE and KiKA
   - **media.ccc.de**
- - **MegaVideoz**
   - **metacafe**
   - **Metacritic**
   - **Mgoon**
+ - **MGTV**: 芒果TV
   - **Minhateca**
   - **MinistryGrid**
+ - **Minoto**
   - **miomio.tv**
- - **mitele.es**
+ - **MiTele**: mitele.es
   - **mixcloud**
+ - **mixcloud:playlist**
+ - **mixcloud:stream**
+ - **mixcloud:user**
   - **MLB**
+ - **Mnet**
   - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
   - **Mofosex**
   - **Mojvideo**
   - **Moniker**: allmyvideos.net and vidspot.net
- - **mooshare**: Mooshare.biz
   - **Morningstar**: morningstar.com
   - **Motherless**
   - **Motorsport**: motorsport.com
   - **MovieClips**
   - **MovieFap**
   - **Moviezine**
- - **movshare**: MovShare
   - **MPORA**
+ - **MSNBC**
   - **MTV**
+ - **mtv.de**
   - **mtviggy.com**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
   - **MusicPlayOn**
- - **MusicVault**
   - **muzu.tv**
+ - **Mwave**
   - **MySpace**
   - **MySpace:album**
   - **MySpass**
   - **Myvi**
- - **myvideo**
+ - **myvideo** (Currently broken)
   - **MyVidster**
- - **N-JOY**
   - **n-tv.de**
- - **NationalGeographic**
+ - **natgeo**
+ - **natgeo:channel**
   - **Naver**
   - **NBA**
   - **NBC**
   - **NBCNews**
   - **NBCSports**
   - **NBCSportsVPlayer**
- - **ndr**: NDR.de - Mediathek
+ - **ndr**: NDR.de - Norddeutscher Rundfunk
+ - **ndr:embed**
+ - **ndr:embed:base**
   - **NDTV**
   - **NerdCubedFeed**
- - **Nerdist**
   - **netease:album**: 网易云音乐 - 专辑
   - **netease:djradio**: 网易云音乐 - 电台
   - **netease:mv**: 网易云音乐 - MV
@@ -335,28 +409,37 @@
   - **Newstube**
   - **NextMedia**: 蘋果日報
   - **NextMediaActionNews**: 蘋果日報 - 動新聞
+ - **nextmovie.com**
   - **nfb**: National Film Board of Canada
   - **nfl.com**
   - **nhl.com**
   - **nhl.com:news**: NHL news
- - **nhl.com:videocenter**: NHL videocenter category
+ - **nhl.com:videocenter**
+ - **nhl.com:videocenter:category**: NHL videocenter category
+ - **nick.com**
   - **niconico**: ニコニコ動画
   - **NiconicoPlaylist**
+ - **njoy**: N-JOY
+ - **njoy:embed**
   - **Noco**
   - **Normalboots**
   - **NosVideo**
   - **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- - **novamov**: NovaMov
- - **Nowness**
- - **NowTV**
+ - **nowness**
+ - **nowness:playlist**
+ - **nowness:series**
+ - **NowTV** (Currently broken)
+ - **NowTVList**
   - **nowvideo**: NowVideo
- - **npo**: npo.nl and ntr.nl
+ - **Noz**
   - **npo**: npo.nl and ntr.nl
   - **npo.nl:live**
   - **npo.nl:radio**
   - **npo.nl:radio:fragment**
+ - **Npr**
   - **NRK**
   - **NRKPlaylist**
+ - **NRKSkole**: NRK Skole
   - **NRKTV**: NRK TV and NRK Radio
   - **ntv.ru**
   - **Nuvid**
@@ -369,33 +452,43 @@
   - **OnionStudios**
   - **Ooyala**
   - **OoyalaExternal**
- - **OpenFilm**
+ - **Openload**
+ - **OraTV**
   - **orf:fm4**: radio FM4
   - **orf:iptv**: iptv.ORF.at
   - **orf:oe1**: Radio Österreich 1
   - **orf:tvthek**: ORF TVthek
+ - **pandora.tv**: 판도라TV
   - **parliamentlive.tv**: UK parliament videos
   - **Patreon**
- - **PBS**
+ - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET  (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
+ - **pcmag**
+ - **People**
+ - **Periscope**: Periscope
   - **PhilharmonieDeParis**: Philharmonie de Paris
- - **Phoenix**
+ - **phoenix.de**
   - **Photobucket**
   - **Pinkbike**
   - **Pladform**
- - **PlanetaPlay**
   - **play.fm**
   - **played.to**
+ - **PlaysTV**
+ - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
   - **Playvid**
   - **Playwire**
+ - **pluralsight**
+ - **pluralsight:course**
   - **plus.google**: Google Plus
   - **pluzz.francetv.fr**
   - **podomatic**
   - **PornHd**
   - **PornHub**
   - **PornHubPlaylist**
+ - **PornHubUserVideos**
   - **Pornotube**
   - **PornoVoisines**
   - **PornoXO**
+ - **PressTV**
   - **PrimeShareTV**
   - **PromptFile**
   - **prosiebensat1**: ProSiebenSat.1 Digital
@@ -406,23 +499,27 @@
   - **qqmusic:playlist**: QQ音乐 - 歌单
   - **qqmusic:singer**: QQ音乐 - 歌手
   - **qqmusic:toplist**: QQ音乐 - 排行榜
- - **QuickVid**
   - **R7**
   - **radio.de**
   - **radiobremen**
   - **radiofrance**
   - **RadioJavan**
   - **Rai**
+ - **RaiTV**
   - **RBMARadio**
   - **RDS**: RDS.ca
   - **RedTube**
+ - **RegioTV**
   - **Restudy**
   - **ReverbNation**
+ - **Revision3**
+ - **RICE**
   - **RingTV**
   - **RottenTomatoes**
   - **Roxwel**
   - **RTBF**
- - **Rte**
+ - **rte**: Raidió Teilifís Éireann TV
+ - **rte:radio**: Raidió Teilifís Éireann radio
   - **rtl.nl**: rtl.nl and rtlxl.nl
   - **RTL2**
   - **RTP**
@@ -430,7 +527,9 @@
   - **rtve.es:alacarta**: RTVE a la carta
   - **rtve.es:infantil**: RTVE infantil
   - **rtve.es:live**: RTVE.es live streams
+ - **RTVNH**
   - **RUHD**
+ - **RulePorn**
   - **rutube**: Rutube videos
   - **rutube:channel**: Rutube channels
   - **rutube:embed**: Rutube embedded videos
@@ -439,23 +538,29 @@
   - **RUTV**: RUTV.RU
   - **Ruutu**
   - **safari**: safaribooksonline.com online video
+ - **safari:api**
   - **safari:course**: safaribooksonline.com online courses
   - **Sandia**: Sandia National Laboratories
   - **Sapo**: SAPO Vídeos
   - **savefrom.net**
   - **SBS**: sbs.com.au
+ - **schooltv**
   - **SciVee**
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
+ - **ScreenJunkies**
   - **ScreenwaveMedia**
   - **SenateISVP**
   - **ServingSys**
   - **Sexu**
   - **SexyKarma**: Sexy Karma and Watch Indian Porn
- - **Shared**
+ - **Shahid**
+ - **Shared**: shared.sx and vivo.sx
   - **ShareSix**
   - **Sina**
+ - **skynewsarabia:video**
+ - **skynewsarabia:video**
   - **Slideshare**
   - **Slutload**
   - **smotri**: Smotri.com
@@ -466,10 +571,9 @@
   - **SnagFilmsEmbed**
   - **Snotr**
   - **Sohu**
- - **soompi**
- - **soompi:show**
   - **soundcloud**
   - **soundcloud:playlist**
+ - **soundcloud:search**: Soundcloud search
   - **soundcloud:set**
   - **soundcloud:user**
   - **soundgasm**
@@ -479,7 +583,6 @@
   - **southpark.de**
   - **southpark.nl**
   - **southparkstudios.dk**
- - **Space**
   - **SpankBang**
   - **Spankwire**
   - **Spiegel**
@@ -491,11 +594,12 @@
   - **SportBoxEmbed**
   - **SportDeutschland**
   - **Sportschau**
- - **Srf**
- - **SRMediathek**: Saarländischer Rundfunk
+ - **SRGSSR**
+ - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
   - **SSA**
   - **stanfordoc**: Stanford Open ClassRoom
   - **Steam**
+ - **Stitcher**
   - **streamcloud.eu**
   - **StreamCZ**
   - **StreetVoice**
@@ -508,6 +612,7 @@
   - **Tagesschau**
   - **Tapely**
   - **Tass**
+ - **TDSLifeway**
   - **teachertube**: teachertube.com videos
   - **teachertube:user:collection**: teachertube.com user and collection videos
   - **TeachingChannel**
@@ -516,46 +621,58 @@
   - **TechTalks**
   - **techtv.mit.edu**
   - **ted**
+ - **Tele13**
   - **TeleBruxelles**
- - **telecinco.es**
+ - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
+ - **Telegraaf**
   - **TeleMB**
   - **TeleTask**
- - **TenPlay**
- - **TestTube**
   - **TF1**
- - **TheOnion**
+ - **TheIntercept**
   - **ThePlatform**
+ - **ThePlatformFeed**
+ - **TheScene**
   - **TheSixtyOne**
+ - **TheStar**
   - **ThisAmericanLife**
   - **ThisAV**
   - **THVideo**
   - **THVideoPlaylist**
   - **tinypic**: tinypic.com videos
- - **tlc.com**
   - **tlc.de**
   - **TMZ**
   - **TMZArticle**
   - **TNAFlix**
+ - **TNAFlixNetworkEmbed**
+ - **toggle**
   - **tou.tv**
   - **Toypics**: Toypics user profile
   - **ToypicsUser**: Toypics user profile
   - **TrailerAddict** (Currently broken)
   - **Trilulilu**
+ - **trollvids**
   - **TruTube**
   - **Tube8**
   - **TubiTv**
- - **Tudou**
+ - **tudou**
+ - **tudou:album**
+ - **tudou:playlist**
   - **Tumblr**
- - **TuneIn**
+ - **tunein:clip**
+ - **tunein:program**
+ - **tunein:station**
+ - **tunein:topic**
   - **Turbo**
   - **Tutv**
   - **tv.dfb.de**
   - **TV2**
   - **TV2Article**
+ - **TV3**
   - **TV4**: tv4.se and tv4play.se
   - **TVC**
   - **TVCArticle**
   - **tvigle**: Интернет-телевидение Tvigle.ru
+ - **tvland.com**
   - **tvp.pl**
   - **tvp.pl:Series**
   - **TVPlay**: TV3Play and related services
@@ -567,16 +684,18 @@
   - **twitch:stream**
   - **twitch:video**
   - **twitch:vod**
- - **TwitterCard**
- - **Ubu**
+ - **twitter**
+ - **twitter:amplify**
+ - **twitter:card**
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
- - **Ultimedia**
   - **Unistra**
   - **Urort**: NRK P3 Urørt
+ - **USAToday**
   - **ustream**
   - **ustream:channel**
+ - **Ustudio**
   - **Varzesh3**
   - **Vbox7**
   - **VeeHD**
@@ -584,25 +703,30 @@
   - **Vessel**
   - **Vesti**: Вести.Ru
   - **Vevo**
- - **VGTV**: VGTV and BTTV
+ - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
   - **vh1.com**
   - **Vice**
+ - **ViceShow**
   - **Viddler**
   - **video.google:search**: Google Video search
   - **video.mit.edu**
- - **VideoBam**
   - **VideoDetective**
   - **videofy.me**
- - **videolectures.net**
   - **VideoMega**
+ - **videomore**
+ - **videomore:season**
+ - **videomore:video**
   - **VideoPremium**
- - **VideoTt**: video.tt - Your True Tube
+ - **VideoTt**: video.tt - Your True Tube (Currently broken)
   - **videoweed**: VideoWeed
- - **Vidme**
+ - **vidme**
+ - **vidme:user**
+ - **vidme:user:likes**
   - **Vidzi**
   - **vier**
   - **vier:videos**
   - **Viewster**
+ - **Viidea**
   - **viki**
   - **viki:channel**
   - **vimeo**
@@ -610,6 +734,7 @@
   - **vimeo:channel**
   - **vimeo:group**
   - **vimeo:likes**: Vimeo user likes
+ - **vimeo:ondemand**
   - **vimeo:review**: Review pages on vimeo
   - **vimeo:user**
   - **vimeo:watchlater**: Vimeo watch later list, "vimeowatchlater" keyword (requires authentication)
@@ -618,9 +743,12 @@
   - **vine:user**
   - **vk**: VK
   - **vk:uservideos**: VK - User's Videos
+ - **vlive**
   - **Vodlocker**
   - **VoiceRepublic**
+ - **VoxMedia**
   - **Vporn**
+ - **vpro**: npo.nl and ntr.nl
   - **VRT**
   - **vube**: Vube.com
   - **VuClip**
@@ -628,13 +756,14 @@
   - **Walla**
   - **WashingtonPost**
   - **wat.tv**
- - **WayOfTheMaster**
   - **WDR**
   - **wdr:mobile**
   - **WDRMaus**: Sendung mit der Maus
   - **WebOfStories**
   - **WebOfStoriesPlaylist**
   - **Weibo**
+ - **WeiqiTV**: WQTV
+ - **wholecloud**: WholeCloud
   - **Wimp**
   - **Wistia**
   - **WNL**
@@ -643,6 +772,7 @@
   - **WSJ**: Wall Street Journal
   - **XBef**
   - **XboxClips**
+ - **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
   - **XHamster**
   - **XHamsterEmbed**
   - **XMinus**
@@ -669,7 +799,9 @@
   - **youtube:channel**: YouTube.com channels
   - **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
   - **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
+ - **youtube:live**: YouTube.com live streams
   - **youtube:playlist**: YouTube.com playlists
+ - **youtube:playlists**: YouTube.com user/channel playlists
   - **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)
   - **youtube:search**: YouTube.com searches
   - **youtube:search:date**: YouTube.com searches, newest videos first
@@ -683,3 +815,4 @@
   - **ZDFChannel**
   - **zingmp3:album**: mp3.zing.vn albums
   - **zingmp3:song**: mp3.zing.vn songs
+ - **ZippCast**
diff --git a/setup.cfg b/setup.cfg

index 26857750c7c1aa5d9a15b09578740d41d063f189..2dc06ffe413f76f4d776fe44780f327a170d7801 100644 (file)
--- a/setup.cfg
+++ b/setup.cfg
@@ -2,5 +2,5 @@
  universal = True
  
  [flake8]
-exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git
+exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
  ignore = E402,E501,E731
diff --git a/setup.py b/setup.py

index 4686260e0bbd25adf39e683bdae1b475e267b0b8..9444d403d542a0d3066d9d633532e4daf11726e9 100644 (file)
--- a/setup.py
+++ b/setup.py
@@ -8,11 +8,12 @@ import warnings
  import sys
  
  try:
-    from setuptools import setup
+    from setuptools import setup, Command
      setuptools_available = True
  except ImportError:
-    from distutils.core import setup
+    from distutils.core import setup, Command
      setuptools_available = False
+from distutils.spawn import spawn
  
  try:
      # This will create an exe that needs Microsoft Visual C++ 2008
@@ -28,7 +29,7 @@ py2exe_options = {
      "compressed": 1,
      "optimize": 2,
      "dist_dir": '.',
-    "dll_excludes": ['w9xpopen.exe'],
+    "dll_excludes": ['w9xpopen.exe', 'crypt32.dll'],
  }
  
  py2exe_console = [{
@@ -70,6 +71,22 @@ else:
      else:
          params['scripts'] = ['bin/youtube-dl']
  
+class build_lazy_extractors(Command):
+    description = "Build the extractor lazy loading module"
+    user_options = []
+
+    def initialize_options(self):
+        pass
+
+    def finalize_options(self):
+        pass
+
+    def run(self):
+        spawn(
+            [sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
+            dry_run=self.dry_run,
+        )
+
  # Get the version from youtube_dl/version.py without importing the package
  exec(compile(open('youtube_dl/version.py').read(),
               'youtube_dl/version.py', 'exec'))
@@ -107,5 +124,6 @@ setup(
          "Programming Language :: Python :: 3.4",
      ],
  
+    cmdclass={'build_lazy_extractors': build_lazy_extractors},
      **params
  )
diff --git a/test/helper.py b/test/helper.py

index e1129e58f44c9f5118b16a52dacdd869d3dd0123..b8e22c5cb42f2e14465e812ed624aaa5e102ff5c 100644 (file)
--- a/test/helper.py
+++ b/test/helper.py
@@ -11,8 +11,11 @@ import sys
  
  import youtube_dl.extractor
  from youtube_dl import YoutubeDL
-from youtube_dl.utils import (
+from youtube_dl.compat import (
+    compat_os_name,
      compat_str,
+)
+from youtube_dl.utils import (
      preferredencoding,
      write_string,
  )
@@ -42,7 +45,7 @@ def report_warning(message):
      Print the message to stderr, it will be prefixed with 'WARNING:'
      If stderr is a tty file the 'WARNING:' will be colored
      '''
-    if sys.stderr.isatty() and os.name != 'nt':
+    if sys.stderr.isatty() and compat_os_name != 'nt':
          _msg_header = '\033[0;33mWARNING:\033[0m'
      else:
          _msg_header = 'WARNING:'
@@ -89,66 +92,84 @@ def gettestcases(include_onlymatching=False):
  md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
  
  
-def expect_info_dict(self, got_dict, expected_dict):
-    for info_field, expected in expected_dict.items():
-        if isinstance(expected, compat_str) and expected.startswith('re:'):
-            got = got_dict.get(info_field)
-            match_str = expected[len('re:'):]
-            match_rex = re.compile(match_str)
-
-            self.assertTrue(
-                isinstance(got, compat_str),
-                'Expected a %s object, but got %s for field %s' % (
-                    compat_str.__name__, type(got).__name__, info_field))
-            self.assertTrue(
-                match_rex.match(got),
-                'field %s (value: %r) should match %r' % (info_field, got, match_str))
-        elif isinstance(expected, compat_str) and expected.startswith('startswith:'):
-            got = got_dict.get(info_field)
-            start_str = expected[len('startswith:'):]
-            self.assertTrue(
-                isinstance(got, compat_str),
-                'Expected a %s object, but got %s for field %s' % (
-                    compat_str.__name__, type(got).__name__, info_field))
-            self.assertTrue(
-                got.startswith(start_str),
-                'field %s (value: %r) should start with %r' % (info_field, got, start_str))
-        elif isinstance(expected, compat_str) and expected.startswith('contains:'):
-            got = got_dict.get(info_field)
-            contains_str = expected[len('contains:'):]
+def expect_value(self, got, expected, field):
+    if isinstance(expected, compat_str) and expected.startswith('re:'):
+        match_str = expected[len('re:'):]
+        match_rex = re.compile(match_str)
+
+        self.assertTrue(
+            isinstance(got, compat_str),
+            'Expected a %s object, but got %s for field %s' % (
+                compat_str.__name__, type(got).__name__, field))
+        self.assertTrue(
+            match_rex.match(got),
+            'field %s (value: %r) should match %r' % (field, got, match_str))
+    elif isinstance(expected, compat_str) and expected.startswith('startswith:'):
+        start_str = expected[len('startswith:'):]
+        self.assertTrue(
+            isinstance(got, compat_str),
+            'Expected a %s object, but got %s for field %s' % (
+                compat_str.__name__, type(got).__name__, field))
+        self.assertTrue(
+            got.startswith(start_str),
+            'field %s (value: %r) should start with %r' % (field, got, start_str))
+    elif isinstance(expected, compat_str) and expected.startswith('contains:'):
+        contains_str = expected[len('contains:'):]
+        self.assertTrue(
+            isinstance(got, compat_str),
+            'Expected a %s object, but got %s for field %s' % (
+                compat_str.__name__, type(got).__name__, field))
+        self.assertTrue(
+            contains_str in got,
+            'field %s (value: %r) should contain %r' % (field, got, contains_str))
+    elif isinstance(expected, type):
+        self.assertTrue(
+            isinstance(got, expected),
+            'Expected type %r for field %s, but got value %r of type %r' % (expected, field, got, type(got)))
+    elif isinstance(expected, dict) and isinstance(got, dict):
+        expect_dict(self, got, expected)
+    elif isinstance(expected, list) and isinstance(got, list):
+        self.assertEqual(
+            len(expected), len(got),
+            'Expect a list of length %d, but got a list of length %d for field %s' % (
+                len(expected), len(got), field))
+        for index, (item_got, item_expected) in enumerate(zip(got, expected)):
+            type_got = type(item_got)
+            type_expected = type(item_expected)
+            self.assertEqual(
+                type_expected, type_got,
+                'Type mismatch for list item at index %d for field %s, expected %r, got %r' % (
+                    index, field, type_expected, type_got))
+            expect_value(self, item_got, item_expected, field)
+    else:
+        if isinstance(expected, compat_str) and expected.startswith('md5:'):
              self.assertTrue(
                  isinstance(got, compat_str),
-                'Expected a %s object, but got %s for field %s' % (
-                    compat_str.__name__, type(got).__name__, info_field))
+                'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
+            got = 'md5:' + md5(got)
+        elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
              self.assertTrue(
-                contains_str in got,
-                'field %s (value: %r) should contain %r' % (info_field, got, contains_str))
-        elif isinstance(expected, type):
-            got = got_dict.get(info_field)
-            self.assertTrue(isinstance(got, expected),
-                            'Expected type %r for field %s, but got value %r of type %r' % (expected, info_field, got, type(got)))
-        else:
-            if isinstance(expected, compat_str) and expected.startswith('md5:'):
-                got = 'md5:' + md5(got_dict.get(info_field))
-            elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
-                got = got_dict.get(info_field)
-                self.assertTrue(
-                    isinstance(got, list),
-                    'Expected field %s to be a list, but it is of type %s' % (
-                        info_field, type(got).__name__))
-                expected_num = int(expected.partition(':')[2])
-                assertGreaterEqual(
-                    self, len(got), expected_num,
-                    'Expected %d items in field %s, but only got %d' % (
-                        expected_num, info_field, len(got)
-                    )
-                )
-                continue
-            else:
-                got = got_dict.get(info_field)
-            self.assertEqual(expected, got,
-                             'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
+                isinstance(got, (list, dict)),
+                'Expected field %s to be a list or a dict, but it is of type %s' % (
+                    field, type(got).__name__))
+            expected_num = int(expected.partition(':')[2])
+            assertGreaterEqual(
+                self, len(got), expected_num,
+                'Expected %d items in field %s, but only got %d' % (expected_num, field, len(got)))
+            return
+        self.assertEqual(
+            expected, got,
+            'Invalid value for field %s, expected %r, got %r' % (field, expected, got))
+
+
+def expect_dict(self, got_dict, expected_dict):
+    for info_field, expected in expected_dict.items():
+        got = got_dict.get(info_field)
+        expect_value(self, got, expected, info_field)
  
+
+def expect_info_dict(self, got_dict, expected_dict):
+    expect_dict(self, got_dict, expected_dict)
      # Check for the presence of mandatory fields
      if got_dict.get('_type') not in ('playlist', 'multi_video'):
          for key in ('id', 'url', 'title', 'ext'):
@@ -160,7 +181,7 @@ def expect_info_dict(self, got_dict, expected_dict):
      # Are checkable fields missing from the test case definition?
      test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
                            for key, value in got_dict.items()
-                          if value and key in ('id', 'title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
+                          if value and key in ('id', 'title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location', 'age_limit'))
      missing_keys = set(test_info_dict.keys()) - set(expected_dict.keys())
      if missing_keys:
          def _repr(v):
diff --git a/test/test_InfoExtractor.py b/test/test_InfoExtractor.py

index be8d12997a1a5aba2cb62270068363f339a5eac6..6404ac89f55df282e9525f6ae1a8e62f7344dd40 100644 (file)
--- a/test/test_InfoExtractor.py
+++ b/test/test_InfoExtractor.py
@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  from test.helper import FakeYDL
  from youtube_dl.extractor.common import InfoExtractor
  from youtube_dl.extractor import YoutubeIE, get_info_extractor
+from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
  
  
  class TestIE(InfoExtractor):
@@ -35,10 +36,18 @@ class TestInfoExtractor(unittest.TestCase):
              <meta name="og:title" content='Foo'/>
              <meta content="Some video's description " name="og:description"/>
              <meta property='og:image' content='http://domain.com/pic.jpg?key1=val1&amp;key2=val2'/>
+            <meta content='application/x-shockwave-flash' property='og:video:type'>
+            <meta content='Foo' property=og:foobar>
+            <meta name="og:test1" content='foo > < bar'/>
+            <meta name="og:test2" content="foo >//< bar"/>
              '''
          self.assertEqual(ie._og_search_title(html), 'Foo')
          self.assertEqual(ie._og_search_description(html), 'Some video\'s description ')
          self.assertEqual(ie._og_search_thumbnail(html), 'http://domain.com/pic.jpg?key1=val1&key2=val2')
+        self.assertEqual(ie._og_search_video_url(html, default=None), None)
+        self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
+        self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
+        self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
  
      def test_html_search_meta(self):
          ie = self.ie
@@ -58,5 +67,14 @@ class TestInfoExtractor(unittest.TestCase):
          self.assertEqual(ie._html_search_meta('e', html), '5')
          self.assertEqual(ie._html_search_meta('f', html), '6')
  
+    def test_download_json(self):
+        uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
+        self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
+        uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
+        self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
+        uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
+        self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
+        self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_YoutubeDL.py b/test/test_YoutubeDL.py

index a13c09ef40c7d8c101c86b471cae142ada9ccd7c..ca25025e23a1eb1fd82eed39a534072bffe293ac 100644 (file)
--- a/test/test_YoutubeDL.py
+++ b/test/test_YoutubeDL.py
@@ -12,10 +12,11 @@ import copy
  
  from test.helper import FakeYDL, assertRegexpMatches
  from youtube_dl import YoutubeDL
-from youtube_dl.compat import compat_str
+from youtube_dl.compat import compat_str, compat_urllib_error
  from youtube_dl.extractor import YoutubeIE
+from youtube_dl.extractor.common import InfoExtractor
  from youtube_dl.postprocessor.common import PostProcessor
-from youtube_dl.utils import match_filter_func
+from youtube_dl.utils import ExtractorError, match_filter_func
  
  TEST_URL = 'http://localhost/sample.mp4'
  
@@ -105,6 +106,7 @@ class TestFormatSelection(unittest.TestCase):
      def test_format_selection(self):
          formats = [
              {'format_id': '35', 'ext': 'mp4', 'preference': 1, 'url': TEST_URL},
+            {'format_id': 'example-with-dashes', 'ext': 'webm', 'preference': 1, 'url': TEST_URL},
              {'format_id': '45', 'ext': 'webm', 'preference': 2, 'url': TEST_URL},
              {'format_id': '47', 'ext': 'webm', 'preference': 3, 'url': TEST_URL},
              {'format_id': '2', 'ext': 'flv', 'preference': 4, 'url': TEST_URL},
@@ -136,6 +138,11 @@ class TestFormatSelection(unittest.TestCase):
          downloaded = ydl.downloaded_info_dicts[0]
          self.assertEqual(downloaded['format_id'], '35')
  
+        ydl = YDL({'format': 'example-with-dashes'})
+        ydl.process_ie_result(info_dict.copy())
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], 'example-with-dashes')
+
      def test_format_selection_audio(self):
          formats = [
              {'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
@@ -215,9 +222,24 @@ class TestFormatSelection(unittest.TestCase):
          downloaded = ydl.downloaded_info_dicts[0]
          self.assertEqual(downloaded['format_id'], 'dash-video-low')
  
+        ydl = YDL({'format': 'bestvideo[format_id^=dash][format_id$=low]'})
+        ydl.process_ie_result(info_dict.copy())
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], 'dash-video-low')
+
+        formats = [
+            {'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
+        ]
+        info_dict = _make_result(formats)
+
+        ydl = YDL({'format': 'bestvideo[vcodec=avc1.123456]'})
+        ydl.process_ie_result(info_dict.copy())
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
+
      def test_youtube_format_selection(self):
          order = [
-            '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',
+            '38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
              # Apple HTTP Live Streaming
              '96', '95', '94', '93', '92', '132', '151',
              # 3D
@@ -229,21 +251,81 @@ class TestFormatSelection(unittest.TestCase):
              '141', '172', '140', '171', '139',
          ]
  
-        for f1id, f2id in zip(order, order[1:]):
-            f1 = YoutubeIE._formats[f1id].copy()
-            f1['format_id'] = f1id
-            f1['url'] = 'url:' + f1id
-            f2 = YoutubeIE._formats[f2id].copy()
-            f2['format_id'] = f2id
-            f2['url'] = 'url:' + f2id
+        def format_info(f_id):
+            info = YoutubeIE._formats[f_id].copy()
+
+            # XXX: In real cases InfoExtractor._parse_mpd_formats() fills up 'acodec'
+            # and 'vcodec', while in tests such information is incomplete since
+            # commit a6c2c24479e5f4827ceb06f64d855329c0a6f593
+            # test_YoutubeDL.test_youtube_format_selection is broken without
+            # this fix
+            if 'acodec' in info and 'vcodec' not in info:
+                info['vcodec'] = 'none'
+            elif 'vcodec' in info and 'acodec' not in info:
+                info['acodec'] = 'none'
+
+            info['format_id'] = f_id
+            info['url'] = 'url:' + f_id
+            return info
+        formats_order = [format_info(f_id) for f_id in order]
+
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': 'bestvideo+bestaudio'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], '137+141')
+        self.assertEqual(downloaded['ext'], 'mp4')
+
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': 'bestvideo[height>=999999]+bestaudio/best'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['format_id'], '38')
  
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': 'bestvideo/best,bestaudio'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
+        self.assertEqual(downloaded_ids, ['137', '141'])
+
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': '(bestvideo[ext=mp4],bestvideo[ext=webm])+bestaudio'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
+        self.assertEqual(downloaded_ids, ['137+141', '248+141'])
+
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': '(bestvideo[ext=mp4],bestvideo[ext=webm])[height<=720]+bestaudio'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
+        self.assertEqual(downloaded_ids, ['136+141', '247+141'])
+
+        info_dict = _make_result(list(formats_order), extractor='youtube')
+        ydl = YDL({'format': '(bestvideo[ext=none]/bestvideo[ext=webm])+bestaudio'})
+        yie = YoutubeIE(ydl)
+        yie._sort_formats(info_dict['formats'])
+        ydl.process_ie_result(info_dict)
+        downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
+        self.assertEqual(downloaded_ids, ['248+141'])
+
+        for f1, f2 in zip(formats_order, formats_order[1:]):
              info_dict = _make_result([f1, f2], extractor='youtube')
              ydl = YDL({'format': 'best/bestvideo'})
              yie = YoutubeIE(ydl)
              yie._sort_formats(info_dict['formats'])
              ydl.process_ie_result(info_dict)
              downloaded = ydl.downloaded_info_dicts[0]
-            self.assertEqual(downloaded['format_id'], f1id)
+            self.assertEqual(downloaded['format_id'], f1['format_id'])
  
              info_dict = _make_result([f2, f1], extractor='youtube')
              ydl = YDL({'format': 'best/bestvideo'})
@@ -251,7 +333,18 @@ class TestFormatSelection(unittest.TestCase):
              yie._sort_formats(info_dict['formats'])
              ydl.process_ie_result(info_dict)
              downloaded = ydl.downloaded_info_dicts[0]
-            self.assertEqual(downloaded['format_id'], f1id)
+            self.assertEqual(downloaded['format_id'], f1['format_id'])
+
+    def test_invalid_format_specs(self):
+        def assert_syntax_error(format_spec):
+            ydl = YDL({'format': format_spec})
+            info_dict = _make_result([{'format_id': 'foo', 'url': TEST_URL}])
+            self.assertRaises(SyntaxError, ydl.process_ie_result, info_dict)
+
+        assert_syntax_error('bestvideo,,best')
+        assert_syntax_error('+bestaudio')
+        assert_syntax_error('bestvideo+')
+        assert_syntax_error('/')
  
      def test_format_filtering(self):
          formats = [
@@ -308,6 +401,18 @@ class TestFormatSelection(unittest.TestCase):
          downloaded = ydl.downloaded_info_dicts[0]
          self.assertEqual(downloaded['format_id'], 'G')
  
+        ydl = YDL({'format': 'all[width>=400][width<=600]'})
+        ydl.process_ie_result(info_dict)
+        downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
+        self.assertEqual(downloaded_ids, ['B', 'C', 'D'])
+
+        ydl = YDL({'format': 'best[height<40]'})
+        try:
+            ydl.process_ie_result(info_dict)
+        except ExtractorError:
+            pass
+        self.assertEqual(ydl.downloaded_info_dicts, [])
+
  
  class TestYoutubeDL(unittest.TestCase):
      def test_subtitles(self):
@@ -402,6 +507,9 @@ class TestYoutubeDL(unittest.TestCase):
          assertRegexpMatches(self, ydl._format_note({
              'vbr': 10,
          }), '^\s*10k$')
+        assertRegexpMatches(self, ydl._format_note({
+            'fps': 30,
+        }), '^30fps$')
  
      def test_postprocessors(self):
          filename = 'post-processor-testfile.mp4'
@@ -553,6 +661,47 @@ class TestYoutubeDL(unittest.TestCase):
          result = get_ids({'playlist_items': '10'})
          self.assertEqual(result, [])
  
+    def test_urlopen_no_file_protocol(self):
+        # see https://github.com/rg3/youtube-dl/issues/8227
+        ydl = YDL()
+        self.assertRaises(compat_urllib_error.URLError, ydl.urlopen, 'file:///etc/passwd')
+
+    def test_do_not_override_ie_key_in_url_transparent(self):
+        ydl = YDL()
+
+        class Foo1IE(InfoExtractor):
+            _VALID_URL = r'foo1:'
+
+            def _real_extract(self, url):
+                return {
+                    '_type': 'url_transparent',
+                    'url': 'foo2:',
+                    'ie_key': 'Foo2',
+                }
+
+        class Foo2IE(InfoExtractor):
+            _VALID_URL = r'foo2:'
+
+            def _real_extract(self, url):
+                return {
+                    '_type': 'url',
+                    'url': 'foo3:',
+                    'ie_key': 'Foo3',
+                }
+
+        class Foo3IE(InfoExtractor):
+            _VALID_URL = r'foo3:'
+
+            def _real_extract(self, url):
+                return _make_result([{'url': TEST_URL}])
+
+        ydl.add_info_extractor(Foo1IE(ydl))
+        ydl.add_info_extractor(Foo2IE(ydl))
+        ydl.add_info_extractor(Foo3IE(ydl))
+        ydl.extract_info('foo1:')
+        downloaded = ydl.downloaded_info_dicts[0]
+        self.assertEqual(downloaded['url'], TEST_URL)
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_all_urls.py b/test/test_all_urls.py

index a9db42b300864180c10dca730f772f7f5a26aad8..f5af184e6e0a79ccc11a9a66c2a9f19434087108 100644 (file)
--- a/test/test_all_urls.py
+++ b/test/test_all_urls.py
@@ -56,7 +56,7 @@ class TestAllURLsMatching(unittest.TestCase):
          assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
  
      def test_youtube_user_matching(self):
-        self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
+        self.assertMatch('http://www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
  
      def test_youtube_feeds(self):
          self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watchlater'])
@@ -121,8 +121,8 @@ class TestAllURLsMatching(unittest.TestCase):
  
      def test_pbs(self):
          # https://github.com/rg3/youtube-dl/issues/2350
-        self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
-        self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['PBS'])
+        self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['pbs'])
+        self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['pbs'])
  
      def test_yahoo_https(self):
          # https://github.com/rg3/youtube-dl/issues/2701
diff --git a/test/test_compat.py b/test/test_compat.py

index c3ba8ad2e3aa1f5cd33dd5a61a184d52cc0c07a9..618668210f62191da7f899a2a586c699b512c129 100644 (file)
--- a/test/test_compat.py
+++ b/test/test_compat.py
@@ -13,9 +13,13 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
  from youtube_dl.utils import get_filesystem_encoding
  from youtube_dl.compat import (
      compat_getenv,
+    compat_etree_fromstring,
      compat_expanduser,
+    compat_shlex_split,
+    compat_str,
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
+    compat_urllib_parse_urlencode,
  )
  
  
@@ -67,5 +71,33 @@ class TestCompat(unittest.TestCase):
          self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
          self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
  
+    def test_compat_urllib_parse_urlencode(self):
+        self.assertEqual(compat_urllib_parse_urlencode({'abc': 'def'}), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
+        self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
+
+    def test_compat_shlex_split(self):
+        self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
+
+    def test_compat_etree_fromstring(self):
+        xml = '''
+            <root foo="bar" spam="中文">
+                <normal>foo</normal>
+                <chinese>中文</chinese>
+                <foo><bar>spam</bar></foo>
+            </root>
+        '''
+        doc = compat_etree_fromstring(xml.encode('utf-8'))
+        self.assertTrue(isinstance(doc.attrib['foo'], compat_str))
+        self.assertTrue(isinstance(doc.attrib['spam'], compat_str))
+        self.assertTrue(isinstance(doc.find('normal').text, compat_str))
+        self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
+        self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_download.py b/test/test_download.py

index 1110357a7e8097eb38479d2a15837516af32a726..a3f1c0644f32b180a2b177e76dbea44854b0983e 100644 (file)
--- a/test/test_download.py
+++ b/test/test_download.py
@@ -102,7 +102,7 @@ def generator(test_case):
  
          params = get_params(test_case.get('params', {}))
          if is_playlist and 'playlist' not in test_case:
-            params.setdefault('extract_flat', True)
+            params.setdefault('extract_flat', 'in_playlist')
              params.setdefault('skip_download', True)
  
          ydl = YoutubeDL(params, auto_init=False)
@@ -136,7 +136,9 @@ def generator(test_case):
                      # We're not using .download here sine that is just a shim
                      # for outside error handling, and returns the exit code
                      # instead of the result dict.
-                    res_dict = ydl.extract_info(test_case['url'])
+                    res_dict = ydl.extract_info(
+                        test_case['url'],
+                        force_generic_extractor=params.get('force_generic_extractor', False))
                  except (DownloadError, ExtractorError) as err:
                      # Check if the exception is not a network related one
                      if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
diff --git a/test/test_http.py b/test/test_http.py

index f2e305b6fed3ce2f0574a7c20e89ffb977934f28..15e0ad369d57966bef222bf35c422ad9bdb4e755 100644 (file)
--- a/test/test_http.py
+++ b/test/test_http.py
@@ -1,4 +1,5 @@
  #!/usr/bin/env python
+# coding: utf-8
  from __future__ import unicode_literals
  
  # Allow direct execution
@@ -52,7 +53,12 @@ class TestHTTP(unittest.TestCase):
              ('localhost', 0), HTTPTestRequestHandler)
          self.httpd.socket = ssl.wrap_socket(
              self.httpd.socket, certfile=certfn, server_side=True)
-        self.port = self.httpd.socket.getsockname()[1]
+        if os.name == 'java':
+            # In Jython SSLSocket is not a subclass of socket.socket
+            sock = self.httpd.socket.sock
+        else:
+            sock = self.httpd.socket
+        self.port = sock.getsockname()[1]
          self.server_thread = threading.Thread(target=self.httpd.serve_forever)
          self.server_thread.daemon = True
          self.server_thread.start()
@@ -115,5 +121,14 @@ class TestProxy(unittest.TestCase):
          response = ydl.urlopen(req).read().decode('utf-8')
          self.assertEqual(response, 'cn: {0}'.format(url))
  
+    def test_proxy_with_idn(self):
+        ydl = YoutubeDL({
+            'proxy': 'localhost:{0}'.format(self.port),
+        })
+        url = 'http://中文.tw/'
+        response = ydl.urlopen(url).read().decode('utf-8')
+        # b'xn--fiq228c' is '中文'.encode('idna')
+        self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_iqiyi_sdk_interpreter.py b/test/test_iqiyi_sdk_interpreter.py

new file mode 100644 (file)

index 0000000..9d95cb6
--- /dev/null
+++ b/test/test_iqiyi_sdk_interpreter.py
@@ -0,0 +1,47 @@
+#!/usr/bin/env python
+
+from __future__ import unicode_literals
+
+# Allow direct execution
+import os
+import sys
+import unittest
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from test.helper import FakeYDL
+from youtube_dl.extractor import IqiyiIE
+
+
+class IqiyiIEWithCredentials(IqiyiIE):
+    def _get_login_info(self):
+        return 'foo', 'bar'
+
+
+class WarningLogger(object):
+    def __init__(self):
+        self.messages = []
+
+    def warning(self, msg):
+        self.messages.append(msg)
+
+    def debug(self, msg):
+        pass
+
+    def error(self, msg):
+        pass
+
+
+class TestIqiyiSDKInterpreter(unittest.TestCase):
+    def test_iqiyi_sdk_interpreter(self):
+        '''
+        Test the functionality of IqiyiSDKInterpreter by trying to log in
+
+        If `sign` is incorrect, /validate call throws an HTTP 556 error
+        '''
+        logger = WarningLogger()
+        ie = IqiyiIEWithCredentials(FakeYDL({'logger': logger}))
+        ie._login()
+        self.assertTrue('unable to log in:' in logger.messages[0])
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/test/test_jsinterp.py b/test/test_jsinterp.py

index fc73e5dc29a5c8faab88f4604f99df4ee9de6b2e..63c350b8fa986fc63d70af43a6a0fdcaf5958eed 100644 (file)
--- a/test/test_jsinterp.py
+++ b/test/test_jsinterp.py
@@ -19,6 +19,9 @@ class TestJSInterpreter(unittest.TestCase):
          jsi = JSInterpreter('function x3(){return 42;}')
          self.assertEqual(jsi.call_function('x3'), 42)
  
+        jsi = JSInterpreter('var x5 = function(){return 42;}')
+        self.assertEqual(jsi.call_function('x5'), 42)
+
      def test_calc(self):
          jsi = JSInterpreter('function x4(a){return 2*a+1;}')
          self.assertEqual(jsi.call_function('x4', 3), 7)
diff --git a/test/test_subtitles.py b/test/test_subtitles.py

index c4e3adb67b7d1034b36cdd3c45969fe321351c64..27e763edd0ec13c19fb97baebd18c6d1020a913c 100644 (file)
--- a/test/test_subtitles.py
+++ b/test/test_subtitles.py
@@ -11,7 +11,6 @@ from test.helper import FakeYDL, md5
  
  
  from youtube_dl.extractor import (
-    BlipTVIE,
      YoutubeIE,
      DailymotionIE,
      TEDIE,
@@ -22,11 +21,13 @@ from youtube_dl.extractor import (
      NPOIE,
      ComedyCentralIE,
      NRKTVIE,
-    RaiIE,
+    RaiTVIE,
      VikiIE,
      ThePlatformIE,
+    ThePlatformFeedIE,
      RTVEALaCartaIE,
      FunnyOrDieIE,
+    DemocracynowIE,
  )
  
  
@@ -64,16 +65,16 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
          self.DL.params['allsubtitles'] = True
          subtitles = self.getSubtitles()
          self.assertEqual(len(subtitles.keys()), 13)
-        self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
-        self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
-        for lang in ['it', 'fr', 'de']:
+        self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
+        self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
+        for lang in ['fr', 'de']:
              self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
  
-    def test_youtube_subtitles_sbv_format(self):
+    def test_youtube_subtitles_ttml_format(self):
          self.DL.params['writesubtitles'] = True
-        self.DL.params['subtitlesformat'] = 'sbv'
+        self.DL.params['subtitlesformat'] = 'ttml'
          subtitles = self.getSubtitles()
-        self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
+        self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
  
      def test_youtube_subtitles_vtt_format(self):
          self.DL.params['writesubtitles'] = True
@@ -143,18 +144,6 @@ class TestTedSubtitles(BaseTestSubtitles):
              self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
  
  
-class TestBlipTVSubtitles(BaseTestSubtitles):
-    url = 'http://blip.tv/a/a-6603250'
-    IE = BlipTVIE
-
-    def test_allsubtitles(self):
-        self.DL.params['writesubtitles'] = True
-        self.DL.params['allsubtitles'] = True
-        subtitles = self.getSubtitles()
-        self.assertEqual(set(subtitles.keys()), set(['en']))
-        self.assertEqual(md5(subtitles['en']), '5b75c300af65fe4476dff79478bb93e4')
-
-
  class TestVimeoSubtitles(BaseTestSubtitles):
      url = 'http://vimeo.com/76979871'
      IE = VimeoIE
@@ -271,7 +260,7 @@ class TestNRKSubtitles(BaseTestSubtitles):
  
  class TestRaiSubtitles(BaseTestSubtitles):
      url = 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html'
-    IE = RaiIE
+    IE = RaiTVIE
  
      def test_allsubtitles(self):
          self.DL.params['writesubtitles'] = True
@@ -307,6 +296,18 @@ class TestThePlatformSubtitles(BaseTestSubtitles):
          self.assertEqual(md5(subtitles['en']), '97e7670cbae3c4d26ae8bcc7fdd78d4b')
  
  
+class TestThePlatformFeedSubtitles(BaseTestSubtitles):
+    url = 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207'
+    IE = ThePlatformFeedIE
+
+    def test_allsubtitles(self):
+        self.DL.params['writesubtitles'] = True
+        self.DL.params['allsubtitles'] = True
+        subtitles = self.getSubtitles()
+        self.assertEqual(set(subtitles.keys()), set(['en']))
+        self.assertEqual(md5(subtitles['en']), '48649a22e82b2da21c9a67a395eedade')
+
+
  class TestRtveSubtitles(BaseTestSubtitles):
      url = 'http://www.rtve.es/alacarta/videos/los-misterios-de-laura/misterios-laura-capitulo-32-misterio-del-numero-17-2-parte/2428621/'
      IE = RTVEALaCartaIE
@@ -333,5 +334,25 @@ class TestFunnyOrDieSubtitles(BaseTestSubtitles):
          self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
  
  
+class TestDemocracynowSubtitles(BaseTestSubtitles):
+    url = 'http://www.democracynow.org/shows/2015/7/3'
+    IE = DemocracynowIE
+
+    def test_allsubtitles(self):
+        self.DL.params['writesubtitles'] = True
+        self.DL.params['allsubtitles'] = True
+        subtitles = self.getSubtitles()
+        self.assertEqual(set(subtitles.keys()), set(['en']))
+        self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
+
+    def test_subtitles_in_page(self):
+        self.url = 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree'
+        self.DL.params['writesubtitles'] = True
+        self.DL.params['allsubtitles'] = True
+        subtitles = self.getSubtitles()
+        self.assertEqual(set(subtitles.keys()), set(['en']))
+        self.assertEqual(md5(subtitles['en']), 'acaca989e24a9e45a6719c9b3d60815c')
+
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_update.py b/test/test_update.py

new file mode 100644 (file)

index 0000000..d9c7151
--- /dev/null
+++ b/test/test_update.py
@@ -0,0 +1,30 @@
+#!/usr/bin/env python
+
+from __future__ import unicode_literals
+
+# Allow direct execution
+import os
+import sys
+import unittest
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+
+import json
+from youtube_dl.update import rsa_verify
+
+
+class TestUpdate(unittest.TestCase):
+    def test_rsa_verify(self):
+        UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
+        with open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'versions.json'), 'rb') as f:
+            versions_info = f.read().decode()
+        versions_info = json.loads(versions_info)
+        signature = versions_info['signature']
+        del versions_info['signature']
+        self.assertTrue(rsa_verify(
+            json.dumps(versions_info, sort_keys=True).encode('utf-8'),
+            signature, UPDATES_RSA_KEY))
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/test/test_utils.py b/test/test_utils.py

index 65692a9fbb2bdc0dd6dbc5203439f1ca12bf5c46..e16a6761b7e9a70589c6da7b48c9f54e2c03e734 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -18,12 +18,18 @@ import xml.etree.ElementTree
  from youtube_dl.utils import (
      age_restricted,
      args_to_str,
+    encode_base_n,
      clean_html,
+    date_from_str,
      DateRange,
      detect_exe_version,
+    determine_ext,
+    dict_get,
+    encode_compat_str,
      encodeFilename,
      escape_rfc3986,
      escape_url,
+    extract_attributes,
      ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
@@ -32,16 +38,19 @@ from youtube_dl.utils import (
      is_html,
      js_to_json,
      limit_length,
+    ohdave_rsa_encrypt,
      OnDemandPagedList,
      orderedSet,
      parse_duration,
      parse_filesize,
+    parse_count,
      parse_iso8601,
      read_batch_urls,
      sanitize_filename,
      sanitize_path,
      prepend_extension,
      replace_extension,
+    remove_quotes,
      shell_quote,
      smuggle_url,
      str_to_int,
@@ -55,13 +64,25 @@ from youtube_dl.utils import (
      lowercase_escape,
      url_basename,
      urlencode_postdata,
+    update_url_query,
      version_tuple,
      xpath_with_ns,
+    xpath_element,
      xpath_text,
+    xpath_attr,
      render_table,
      match_str,
      parse_dfxp_time_expr,
      dfxp2srt,
+    cli_option,
+    cli_valueless_option,
+    cli_bool_option,
+)
+from youtube_dl.compat import (
+    compat_chr,
+    compat_etree_fromstring,
+    compat_urlparse,
+    compat_parse_qs,
  )
  
  
@@ -191,6 +212,15 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
          self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
  
+    def test_remove_quotes(self):
+        self.assertEqual(remove_quotes(None), None)
+        self.assertEqual(remove_quotes('"'), '"')
+        self.assertEqual(remove_quotes("'"), "'")
+        self.assertEqual(remove_quotes(';'), ';')
+        self.assertEqual(remove_quotes('";'), '";')
+        self.assertEqual(remove_quotes('""'), '')
+        self.assertEqual(remove_quotes('";"'), ';')
+
      def test_ordered_set(self):
          self.assertEqual(orderedSet([1, 1, 2, 3, 4, 4, 5, 6, 7, 3, 5]), [1, 2, 3, 4, 5, 6, 7])
          self.assertEqual(orderedSet([]), [])
@@ -202,8 +232,15 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(unescapeHTML('%20;'), '%20;')
          self.assertEqual(unescapeHTML('&#x2F;'), '/')
          self.assertEqual(unescapeHTML('&#47;'), '/')
-        self.assertEqual(
-            unescapeHTML('&eacute;'), 'é')
+        self.assertEqual(unescapeHTML('&eacute;'), 'é')
+        self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
+
+    def test_date_from_str(self):
+        self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
+        self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
+        self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
+        self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
+        self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
  
      def test_daterange(self):
          _20century = DateRange("19000101", "20000101")
@@ -227,7 +264,16 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(
              unified_strdate('2/2/2015 6:47:40 PM', day_first=False),
              '20150202')
+        self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
          self.assertEqual(unified_strdate('25-09-2014'), '20140925')
+        self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
+
+    def test_determine_ext(self):
+        self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
+        self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
+        self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
  
      def test_find_xpath_attr(self):
          testxml = '''<root>
@@ -235,12 +281,21 @@ class TestUtil(unittest.TestCase):
              <node x="a"/>
              <node x="a" y="c" />
              <node x="b" y="d" />
+            <node x="" />
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
  
+        self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n'), None)
          self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None)
+        self.assertEqual(find_xpath_attr(doc, './/node', 'n'), None)
+        self.assertEqual(find_xpath_attr(doc, './/node', 'n', 'v'), None)
+        self.assertEqual(find_xpath_attr(doc, './/node', 'x'), doc[1])
          self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
+        self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'b'), doc[3])
+        self.assertEqual(find_xpath_attr(doc, './/node', 'y'), doc[2])
          self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
+        self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'd'), doc[3])
+        self.assertEqual(find_xpath_attr(doc, './/node', 'x', ''), doc[4])
  
      def test_xpath_with_ns(self):
          testxml = '''<root xmlns:media="http://example.com/">
@@ -249,23 +304,56 @@ class TestUtil(unittest.TestCase):
                  <url>http://server.com/download.mp3</url>
              </media:song>
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
          find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
          self.assertTrue(find('media:song') is not None)
          self.assertEqual(find('media:song/media:author').text, 'The Author')
          self.assertEqual(find('media:song/url').text, 'http://server.com/download.mp3')
  
+    def test_xpath_element(self):
+        doc = xml.etree.ElementTree.Element('root')
+        div = xml.etree.ElementTree.SubElement(doc, 'div')
+        p = xml.etree.ElementTree.SubElement(div, 'p')
+        p.text = 'Foo'
+        self.assertEqual(xpath_element(doc, 'div/p'), p)
+        self.assertEqual(xpath_element(doc, ['div/p']), p)
+        self.assertEqual(xpath_element(doc, ['div/bar', 'div/p']), p)
+        self.assertEqual(xpath_element(doc, 'div/bar', default='default'), 'default')
+        self.assertEqual(xpath_element(doc, ['div/bar'], default='default'), 'default')
+        self.assertTrue(xpath_element(doc, 'div/bar') is None)
+        self.assertTrue(xpath_element(doc, ['div/bar']) is None)
+        self.assertTrue(xpath_element(doc, ['div/bar'], 'div/baz') is None)
+        self.assertRaises(ExtractorError, xpath_element, doc, 'div/bar', fatal=True)
+        self.assertRaises(ExtractorError, xpath_element, doc, ['div/bar'], fatal=True)
+        self.assertRaises(ExtractorError, xpath_element, doc, ['div/bar', 'div/baz'], fatal=True)
+
      def test_xpath_text(self):
          testxml = '''<root>
              <div>
                  <p>Foo</p>
              </div>
          </root>'''
-        doc = xml.etree.ElementTree.fromstring(testxml)
+        doc = compat_etree_fromstring(testxml)
          self.assertEqual(xpath_text(doc, 'div/p'), 'Foo')
+        self.assertEqual(xpath_text(doc, 'div/bar', default='default'), 'default')
          self.assertTrue(xpath_text(doc, 'div/bar') is None)
          self.assertRaises(ExtractorError, xpath_text, doc, 'div/bar', fatal=True)
  
+    def test_xpath_attr(self):
+        testxml = '''<root>
+            <div>
+                <p x="a">Foo</p>
+            </div>
+        </root>'''
+        doc = compat_etree_fromstring(testxml)
+        self.assertEqual(xpath_attr(doc, 'div/p', 'x'), 'a')
+        self.assertEqual(xpath_attr(doc, 'div/bar', 'x'), None)
+        self.assertEqual(xpath_attr(doc, 'div/p', 'y'), None)
+        self.assertEqual(xpath_attr(doc, 'div/bar', 'x', default='default'), 'default')
+        self.assertEqual(xpath_attr(doc, 'div/p', 'y', default='default'), 'default')
+        self.assertRaises(ExtractorError, xpath_attr, doc, 'div/bar', 'x', fatal=True)
+        self.assertRaises(ExtractorError, xpath_attr, doc, 'div/p', 'y', fatal=True)
+
      def test_smuggle_url(self):
          data = {"ö": "ö", "abc": [3]}
          url = 'https://foo.bar/baz?x=y#a'
@@ -325,6 +413,7 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(parse_duration('01:02:03:04'), 93784)
          self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
          self.assertEqual(parse_duration('87 Min.'), 5220)
+        self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
  
      def test_fix_xml_ampersands(self):
          self.assertEqual(
@@ -380,11 +469,73 @@ class TestUtil(unittest.TestCase):
          data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
          self.assertTrue(isinstance(data, bytes))
  
+    def test_update_url_query(self):
+        def query_dict(url):
+            return compat_parse_qs(compat_urlparse.urlparse(url).query)
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'quality': ['HD'], 'format': ['mp4']})),
+            query_dict('http://example.com/path?quality=HD&format=mp4'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'system': ['LINUX', 'WINDOWS']})),
+            query_dict('http://example.com/path?system=LINUX&system=WINDOWS'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'fields': 'id,formats,subtitles'})),
+            query_dict('http://example.com/path?fields=id,formats,subtitles'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'fields': ('id,formats,subtitles', 'thumbnails')})),
+            query_dict('http://example.com/path?fields=id,formats,subtitles&fields=thumbnails'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path?manifest=f4m', {'manifest': []})),
+            query_dict('http://example.com/path'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path?system=LINUX&system=WINDOWS', {'system': 'LINUX'})),
+            query_dict('http://example.com/path?system=LINUX'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'fields': b'id,formats,subtitles'})),
+            query_dict('http://example.com/path?fields=id,formats,subtitles'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'width': 1080, 'height': 720})),
+            query_dict('http://example.com/path?width=1080&height=720'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'bitrate': 5020.43})),
+            query_dict('http://example.com/path?bitrate=5020.43'))
+        self.assertEqual(query_dict(update_url_query(
+            'http://example.com/path', {'test': '第二行тест'})),
+            query_dict('http://example.com/path?test=%E7%AC%AC%E4%BA%8C%E8%A1%8C%D1%82%D0%B5%D1%81%D1%82'))
+
+    def test_dict_get(self):
+        FALSE_VALUES = {
+            'none': None,
+            'false': False,
+            'zero': 0,
+            'empty_string': '',
+            'empty_list': [],
+        }
+        d = FALSE_VALUES.copy()
+        d['a'] = 42
+        self.assertEqual(dict_get(d, 'a'), 42)
+        self.assertEqual(dict_get(d, 'b'), None)
+        self.assertEqual(dict_get(d, 'b', 42), 42)
+        self.assertEqual(dict_get(d, ('a', )), 42)
+        self.assertEqual(dict_get(d, ('b', 'a', )), 42)
+        self.assertEqual(dict_get(d, ('b', 'c', 'a', 'd', )), 42)
+        self.assertEqual(dict_get(d, ('b', 'c', )), None)
+        self.assertEqual(dict_get(d, ('b', 'c', ), 42), 42)
+        for key, false_value in FALSE_VALUES.items():
+            self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
+            self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
+
+    def test_encode_compat_str(self):
+        self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
+        self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
+
      def test_parse_iso8601(self):
          self.assertEqual(parse_iso8601('2014-03-23T23:04:26+0100'), 1395612266)
          self.assertEqual(parse_iso8601('2014-03-23T22:04:26+0000'), 1395612266)
          self.assertEqual(parse_iso8601('2014-03-23T22:04:26Z'), 1395612266)
          self.assertEqual(parse_iso8601('2014-03-23T22:04:26.1234Z'), 1395612266)
+        self.assertEqual(parse_iso8601('2015-09-29T08:27:31.727'), 1443515251)
+        self.assertEqual(parse_iso8601('2015-09-29T08-27-31.727'), None)
  
      def test_strip_jsonp(self):
          stripped = strip_jsonp('cb ([ {"id":"532cb",\n\n\n"x":\n3}\n]\n);')
@@ -395,6 +546,10 @@ class TestUtil(unittest.TestCase):
          d = json.loads(stripped)
          self.assertEqual(d, {'STATUS': 'OK'})
  
+        stripped = strip_jsonp('ps.embedHandler({"status": "success"});')
+        d = json.loads(stripped)
+        self.assertEqual(d, {'status': 'success'})
+
      def test_uppercase_escape(self):
          self.assertEqual(uppercase_escape('aä'), 'aä')
          self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')
@@ -431,11 +586,11 @@ class TestUtil(unittest.TestCase):
          )
          self.assertEqual(
              escape_url('http://тест.рф/фрагмент'),
-            'http://тест.рф/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
+            'http://xn--e1aybc.xn--p1ai/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
          )
          self.assertEqual(
              escape_url('http://тест.рф/абв?абв=абв#абв'),
-            'http://тест.рф/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
+            'http://xn--e1aybc.xn--p1ai/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
          )
          self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0')
  
@@ -455,6 +610,9 @@ class TestUtil(unittest.TestCase):
              "playlist":[{"controls":{"all":null}}]
          }''')
  
+        inp = '''"The CW\\'s \\'Crazy Ex-Girlfriend\\'"'''
+        self.assertEqual(js_to_json(inp), '''"The CW's 'Crazy Ex-Girlfriend'"''')
+
          inp = '"SAND Number: SAND 2013-7800P\\nPresenter: Tom Russo\\nHabanero Software Training - Xyce Software\\nXyce, Sandia\\u0027s"'
          json_code = js_to_json(inp)
          self.assertEqual(json.loads(json_code), json.loads(inp))
@@ -482,6 +640,44 @@ class TestUtil(unittest.TestCase):
          on = js_to_json('{"abc": "def",}')
          self.assertEqual(json.loads(on), {'abc': 'def'})
  
+    def test_extract_attributes(self):
+        self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
+        self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e x=y>'), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e x="a \'b\' c">'), {'x': "a 'b' c"})
+        self.assertEqual(extract_attributes('<e x=\'a "b" c\'>'), {'x': 'a "b" c'})
+        self.assertEqual(extract_attributes('<e x="&#121;">'), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e x="&#x79;">'), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e x="&amp;">'), {'x': '&'})  # XML
+        self.assertEqual(extract_attributes('<e x="&quot;">'), {'x': '"'})
+        self.assertEqual(extract_attributes('<e x="&pound;">'), {'x': '£'})  # HTML 3.2
+        self.assertEqual(extract_attributes('<e x="&lambda;">'), {'x': 'λ'})  # HTML 4.0
+        self.assertEqual(extract_attributes('<e x="&foo">'), {'x': '&foo'})
+        self.assertEqual(extract_attributes('<e x="\'">'), {'x': "'"})
+        self.assertEqual(extract_attributes('<e x=\'"\'>'), {'x': '"'})
+        self.assertEqual(extract_attributes('<e x >'), {'x': None})
+        self.assertEqual(extract_attributes('<e x=y a>'), {'x': 'y', 'a': None})
+        self.assertEqual(extract_attributes('<e x= y>'), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e x=1 y=2 x=3>'), {'y': '2', 'x': '3'})
+        self.assertEqual(extract_attributes('<e \nx=\ny\n>'), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e \nx=\n"y"\n>'), {'x': 'y'})
+        self.assertEqual(extract_attributes("<e \nx=\n'y'\n>"), {'x': 'y'})
+        self.assertEqual(extract_attributes('<e \nx="\ny\n">'), {'x': '\ny\n'})
+        self.assertEqual(extract_attributes('<e CAPS=x>'), {'caps': 'x'})  # Names lowercased
+        self.assertEqual(extract_attributes('<e x=1 X=2>'), {'x': '2'})
+        self.assertEqual(extract_attributes('<e X=1 x=2>'), {'x': '2'})
+        self.assertEqual(extract_attributes('<e _:funny-name1=1>'), {'_:funny-name1': '1'})
+        self.assertEqual(extract_attributes('<e x="Fáilte 世界 \U0001f600">'), {'x': 'Fáilte 世界 \U0001f600'})
+        self.assertEqual(extract_attributes('<e x="décompose&#769;">'), {'x': 'décompose\u0301'})
+        # "Narrow" Python builds don't support unicode code points outside BMP.
+        try:
+            compat_chr(0x10000)
+            supports_outside_bmp = True
+        except ValueError:
+            supports_outside_bmp = False
+        if supports_outside_bmp:
+            self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'})
+
      def test_clean_html(self):
          self.assertEqual(clean_html('a:\nb'), 'a: b')
          self.assertEqual(clean_html('a:\n   "b"'), 'a:    "b"')
@@ -507,6 +703,17 @@ class TestUtil(unittest.TestCase):
          self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
          self.assertEqual(parse_filesize('1,24 KB'), 1240)
  
+    def test_parse_count(self):
+        self.assertEqual(parse_count(None), None)
+        self.assertEqual(parse_count(''), None)
+        self.assertEqual(parse_count('0'), 0)
+        self.assertEqual(parse_count('1000'), 1000)
+        self.assertEqual(parse_count('1.000'), 1000)
+        self.assertEqual(parse_count('1.1k'), 1100)
+        self.assertEqual(parse_count('1.1kk'), 1100000)
+        self.assertEqual(parse_count('1.1kk '), 1100000)
+        self.assertEqual(parse_count('1.1kk views'), 1100000)
+
      def test_version_tuple(self):
          self.assertEqual(version_tuple('1'), (1,))
          self.assertEqual(version_tuple('10.23.344'), (10, 23, 344))
@@ -587,12 +794,13 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
              {'like_count': 190, 'dislike_count': 10}))
  
      def test_parse_dfxp_time_expr(self):
-        self.assertEqual(parse_dfxp_time_expr(None), 0.0)
-        self.assertEqual(parse_dfxp_time_expr(''), 0.0)
+        self.assertEqual(parse_dfxp_time_expr(None), None)
+        self.assertEqual(parse_dfxp_time_expr(''), None)
          self.assertEqual(parse_dfxp_time_expr('0.1'), 0.1)
          self.assertEqual(parse_dfxp_time_expr('0.1s'), 0.1)
          self.assertEqual(parse_dfxp_time_expr('00:00:01'), 1.0)
          self.assertEqual(parse_dfxp_time_expr('00:00:01.100'), 1.1)
+        self.assertEqual(parse_dfxp_time_expr('00:00:01:100'), 1.1)
  
      def test_dfxp2srt(self):
          dfxp_data = '''<?xml version="1.0" encoding="UTF-8"?>
@@ -602,6 +810,9 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
                      <p begin="0" end="1">The following line contains Chinese characters and special symbols</p>
                      <p begin="1" end="2">第二行<br/>♪♪</p>
                      <p begin="2" dur="1"><span>Third<br/>Line</span></p>
+                    <p begin="3" end="-1">Lines with invalid timestamps are ignored</p>
+                    <p begin="-1" end="-1">Ignore, two</p>
+                    <p begin="3" dur="-1">Ignored, three</p>
                  </div>
              </body>
              </tt>'''
@@ -637,6 +848,69 @@ The first line
  '''
          self.assertEqual(dfxp2srt(dfxp_data_no_default_namespace), srt_data)
  
+    def test_cli_option(self):
+        self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
+        self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
+        self.assertEqual(cli_option({}, '--proxy', 'proxy'), [])
+
+    def test_cli_valueless_option(self):
+        self.assertEqual(cli_valueless_option(
+            {'downloader': 'external'}, '--external-downloader', 'downloader', 'external'), ['--external-downloader'])
+        self.assertEqual(cli_valueless_option(
+            {'downloader': 'internal'}, '--external-downloader', 'downloader', 'external'), [])
+        self.assertEqual(cli_valueless_option(
+            {'nocheckcertificate': True}, '--no-check-certificate', 'nocheckcertificate'), ['--no-check-certificate'])
+        self.assertEqual(cli_valueless_option(
+            {'nocheckcertificate': False}, '--no-check-certificate', 'nocheckcertificate'), [])
+        self.assertEqual(cli_valueless_option(
+            {'checkcertificate': True}, '--no-check-certificate', 'checkcertificate', False), [])
+        self.assertEqual(cli_valueless_option(
+            {'checkcertificate': False}, '--no-check-certificate', 'checkcertificate', False), ['--no-check-certificate'])
+
+    def test_cli_bool_option(self):
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': True}, '--no-check-certificate', 'nocheckcertificate'),
+            ['--no-check-certificate', 'true'])
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': True}, '--no-check-certificate', 'nocheckcertificate', separator='='),
+            ['--no-check-certificate=true'])
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': True}, '--check-certificate', 'nocheckcertificate', 'false', 'true'),
+            ['--check-certificate', 'false'])
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': True}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
+            ['--check-certificate=false'])
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true'),
+            ['--check-certificate', 'true'])
+        self.assertEqual(
+            cli_bool_option(
+                {'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
+            ['--check-certificate=true'])
+
+    def test_ohdave_rsa_encrypt(self):
+        N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
+        e = 65537
+
+        self.assertEqual(
+            ohdave_rsa_encrypt(b'aa111222', e, N),
+            '726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881')
+
+    def test_encode_base_n(self):
+        self.assertEqual(encode_base_n(0, 30), '0')
+        self.assertEqual(encode_base_n(80, 30), '2k')
+
+        custom_table = '9876543210ZYXWVUTSRQPONMLKJIHGFEDCBA'
+        self.assertEqual(encode_base_n(0, 30, custom_table), '9')
+        self.assertEqual(encode_base_n(80, 30, custom_table), '7P')
+
+        self.assertRaises(ValueError, encode_base_n, 0, 70)
+        self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_write_annotations.py b/test/test_write_annotations.py

index 780636c7730d396c381fd45fd6d4e8126d1c9fe2..8de08f2d6d3974bd2d28265c323e7ff76d1317a3 100644 (file)
--- a/test/test_write_annotations.py
+++ b/test/test_write_annotations.py
@@ -33,7 +33,7 @@ params = get_params({
  
  
  TEST_ID = 'gr51aVj-mLg'
-ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
+ANNOTATIONS_FILE = TEST_ID + '.annotations.xml'
  EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
  
  
@@ -66,7 +66,7 @@ class TestAnnotations(unittest.TestCase):
                  textTag = a.find('TEXT')
                  text = textTag.text
                  self.assertTrue(text in expected)  # assertIn only added in python 2.7
-                # remove the first occurance, there could be more than one annotation with the same text
+                # remove the first occurrence, there could be more than one annotation with the same text
                  expected.remove(text)
          # We should have seen (and removed) all the expected annotation texts.
          self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
diff --git a/test/test_youtube_lists.py b/test/test_youtube_lists.py

index c889b6f15c40f5ea91a4dd4ea5a86bb8c62c830d..af1c454217d0bec66a27a1bdc89c02195bb6274f 100644 (file)
--- a/test/test_youtube_lists.py
+++ b/test/test_youtube_lists.py
@@ -34,7 +34,7 @@ class TestYoutubeLists(unittest.TestCase):
          ie = YoutubePlaylistIE(dl)
          # TODO find a > 100 (paginating?) videos course
          result = ie.extract('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
-        entries = result['entries']
+        entries = list(result['entries'])
          self.assertEqual(YoutubeIE().extract_id(entries[0]['url']), 'j9WZyLZCBzs')
          self.assertEqual(len(entries), 25)
          self.assertEqual(YoutubeIE().extract_id(entries[-1]['url']), 'rYefUsYuEp0')
@@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
          ie = YoutubePlaylistIE(dl)
          result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
          entries = result['entries']
-        self.assertTrue(len(entries) >= 20)
+        self.assertTrue(len(entries) >= 50)
          original_video = entries[0]
          self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
  
@@ -57,5 +57,14 @@ class TestYoutubeLists(unittest.TestCase):
          entries = result['entries']
          self.assertEqual(len(entries), 100)
  
+    def test_youtube_flat_playlist_titles(self):
+        dl = FakeYDL()
+        dl.params['extract_flat'] = True
+        ie = YoutubePlaylistIE(dl)
+        result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
+        self.assertIsPlaylist(result)
+        for entry in result['entries']:
+            self.assertTrue(entry.get('title'))
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/versions.json b/test/versions.json

new file mode 100644 (file)

index 0000000..6cccc22
--- /dev/null
+++ b/test/versions.json
@@ -0,0 +1,34 @@
+{
+    "latest": "2013.01.06", 
+    "signature": "72158cdba391628569ffdbea259afbcf279bbe3d8aeb7492690735dc1cfa6afa754f55c61196f3871d429599ab22f2667f1fec98865527b32632e7f4b3675a7ef0f0fbe084d359256ae4bba68f0d33854e531a70754712f244be71d4b92e664302aa99653ee4df19800d955b6c4149cd2b3f24288d6e4b40b16126e01f4c8ce6", 
+    "versions": {
+        "2013.01.02": {
+            "bin": [
+                "http://youtube-dl.org/downloads/2013.01.02/youtube-dl", 
+                "f5b502f8aaa77675c4884938b1e4871ebca2611813a0c0e74f60c0fbd6dcca6b"
+            ], 
+            "exe": [
+                "http://youtube-dl.org/downloads/2013.01.02/youtube-dl.exe", 
+                "75fa89d2ce297d102ff27675aa9d92545bbc91013f52ec52868c069f4f9f0422"
+            ], 
+            "tar": [
+                "http://youtube-dl.org/downloads/2013.01.02/youtube-dl-2013.01.02.tar.gz", 
+                "6a66d022ac8e1c13da284036288a133ec8dba003b7bd3a5179d0c0daca8c8196"
+            ]
+        }, 
+        "2013.01.06": {
+            "bin": [
+                "http://youtube-dl.org/downloads/2013.01.06/youtube-dl", 
+                "64b6ed8865735c6302e836d4d832577321b4519aa02640dc508580c1ee824049"
+            ], 
+            "exe": [
+                "http://youtube-dl.org/downloads/2013.01.06/youtube-dl.exe", 
+                "58609baf91e4389d36e3ba586e21dab882daaaee537e4448b1265392ae86ff84"
+            ], 
+            "tar": [
+                "http://youtube-dl.org/downloads/2013.01.06/youtube-dl-2013.01.06.tar.gz", 
+                "fe77ab20a95d980ed17a659aa67e371fdd4d656d19c4c7950e7b720b0c2f1a86"
+            ]
+        }
+    }
+}
+\ No newline at end of file
diff --git a/tox.ini b/tox.ini

index cd805fe8ac27481937a1000a5a37412ff4f0d923..2d71340050bf8f8a971acb3931621f62ded02176 100644 (file)
--- a/tox.ini
+++ b/tox.ini
@@ -1,5 +1,5 @@
  [tox]
-envlist = py26,py27,py33,py34
+envlist = py26,py27,py33,py34,py35
  [testenv]
  deps =
     nose
@@ -8,6 +8,6 @@ deps =
  passenv = HOME
  defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
      --exclude test_subtitles.py --exclude test_write_annotations.py
-    --exclude test_youtube_lists.py
+    --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
  commands = nosetests --verbose {posargs:{[testenv]defaultargs}}  # --with-coverage --cover-package=youtube_dl --cover-html
                                                 # test.test_download:TestDownload.test_NowVideo
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index 702a6ad50b6c6bf2d3f3bfbd8c873cb3a64c8e7b..0554333629b829a2a6cb807546a643713cbd0ad5 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -21,11 +21,9 @@ import subprocess
  import socket
  import sys
  import time
+import tokenize
  import traceback
  
-if os.name == 'nt':
-    import ctypes
-
  from .compat import (
      compat_basestring,
      compat_cookiejar,
@@ -33,36 +31,46 @@ from .compat import (
      compat_get_terminal_size,
      compat_http_client,
      compat_kwargs,
+    compat_os_name,
      compat_str,
+    compat_tokenize_tokenize,
      compat_urllib_error,
      compat_urllib_request,
+    compat_urllib_request_DataHandler,
  )
  from .utils import (
-    escape_url,
+    age_restricted,
+    args_to_str,
      ContentTooShortError,
      date_from_str,
      DateRange,
      DEFAULT_OUTTMPL,
      determine_ext,
+    determine_protocol,
      DownloadError,
+    encode_compat_str,
      encodeFilename,
+    error_to_compat_str,
      ExtractorError,
      format_bytes,
      formatSeconds,
-    HEADRequest,
      locked_file,
      make_HTTPS_handler,
      MaxDownloadsReached,
      PagedList,
      parse_filesize,
      PerRequestProxyHandler,
-    PostProcessingError,
      platform_name,
+    PostProcessingError,
      preferredencoding,
+    prepend_extension,
      render_table,
+    replace_extension,
      SameFileError,
      sanitize_filename,
      sanitize_path,
+    sanitize_url,
+    sanitized_Request,
      std_headers,
      subtitles_filename,
      UnavailableVideoError,
@@ -70,17 +78,15 @@ from .utils import (
      version_tuple,
      write_json_file,
      write_string,
+    YoutubeDLCookieProcessor,
      YoutubeDLHandler,
-    prepend_extension,
-    replace_extension,
-    args_to_str,
-    age_restricted,
  )
  from .cache import Cache
-from .extractor import get_info_extractor, gen_extractors
+from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
  from .downloader import get_suitable_downloader
  from .downloader.rtmp import rtmpdump_version
  from .postprocessor import (
+    FFmpegFixupM3u8PP,
      FFmpegFixupM4aPP,
      FFmpegFixupStretchedPP,
      FFmpegMergerPP,
@@ -89,6 +95,9 @@ from .postprocessor import (
  )
  from .version import __version__
  
+if compat_os_name == 'nt':
+    import ctypes
+
  
  class YoutubeDL(object):
      """YoutubeDL class.
@@ -155,7 +164,7 @@ class YoutubeDL(object):
      writethumbnail:    Write the thumbnail image to a file
      write_all_thumbnails:  Write all thumbnail formats to files
      writesubtitles:    Write the video subtitles to a file
-    writeautomaticsub: Write the automatic subtitles to a file
+    writeautomaticsub: Write the automatically generated subtitles to a file
      allsubtitles:      Downloads all the subtitles of the video
                         (requires writesubtitles or writeautomaticsub)
      listsubtitles:     Lists all available subtitles for the video
@@ -251,13 +260,15 @@ class YoutubeDL(object):
      The following options determine which downloader is picked:
      external_downloader: Executable of the external downloader to call.
                         None or unset for standard (built-in) downloader.
-    hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
+    hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
+                       if True, otherwise use ffmpeg/avconv if False, otherwise
+                       use downloader suggested by extractor if None.
  
      The following parameters are not used by YoutubeDL itself, they are used by
      the downloader (see youtube_dl/downloader/common.py):
      nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
      noresizebuffer, retries, continuedl, noprogress, consoletitle,
-    xattr_set_filesize, external_downloader_args.
+    xattr_set_filesize, external_downloader_args, hls_use_mpegts.
  
      The following options are used by the post processors:
      prefer_ffmpeg:     If True, use ffmpeg instead of avconv if both are available,
@@ -285,7 +296,11 @@ class YoutubeDL(object):
          self._num_downloads = 0
          self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
          self._err_file = sys.stderr
-        self.params = params
+        self.params = {
+            # Default parameters
+            'nocheckcertificate': False,
+        }
+        self.params.update(params)
          self.cache = Cache(self)
  
          if params.get('bidi_workaround', False):
@@ -365,8 +380,9 @@ class YoutubeDL(object):
      def add_info_extractor(self, ie):
          """Add an InfoExtractor object to the end of the list."""
          self._ies.append(ie)
-        self._ies_instances[ie.ie_key()] = ie
-        ie.set_downloader(self)
+        if not isinstance(ie, type):
+            self._ies_instances[ie.ie_key()] = ie
+            ie.set_downloader(self)
  
      def get_info_extractor(self, ie_key):
          """
@@ -384,7 +400,7 @@ class YoutubeDL(object):
          """
          Add the InfoExtractors returned by gen_extractors to the end of the list
          """
-        for ie in gen_extractors():
+        for ie in gen_extractor_classes():
              self.add_info_extractor(ie)
  
      def add_post_processor(self, pp):
@@ -440,7 +456,7 @@ class YoutubeDL(object):
      def to_console_title(self, message):
          if not self.params.get('consoletitle', False):
              return
-        if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
+        if compat_os_name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
              # c_wchar_p() might not be necessary if `message` is
              # already of type unicode()
              ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
@@ -488,7 +504,7 @@ class YoutubeDL(object):
                      tb = ''
                      if hasattr(sys.exc_info()[1], 'exc_info') and sys.exc_info()[1].exc_info[0]:
                          tb += ''.join(traceback.format_exception(*sys.exc_info()[1].exc_info))
-                    tb += compat_str(traceback.format_exc())
+                    tb += encode_compat_str(traceback.format_exc())
                  else:
                      tb_data = traceback.format_list(traceback.extract_stack())
                      tb = ''.join(tb_data)
@@ -511,7 +527,7 @@ class YoutubeDL(object):
          else:
              if self.params.get('no_warnings'):
                  return
-            if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt':
+            if not self.params.get('no_color') and self._err_file.isatty() and compat_os_name != 'nt':
                  _msg_header = '\033[0;33mWARNING:\033[0m'
              else:
                  _msg_header = 'WARNING:'
@@ -523,7 +539,7 @@ class YoutubeDL(object):
          Do the same as trouble, but prefixes the message with 'ERROR:', colored
          in red if stderr is a tty file.
          '''
-        if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt':
+        if not self.params.get('no_color') and self._err_file.isatty() and compat_os_name != 'nt':
              _msg_header = '\033[0;31mERROR:\033[0m'
          else:
              _msg_header = 'ERROR:'
@@ -556,7 +572,7 @@ class YoutubeDL(object):
                  elif template_dict.get('height'):
                      template_dict['resolution'] = '%sp' % template_dict['height']
                  elif template_dict.get('width'):
-                    template_dict['resolution'] = '?x%d' % template_dict['width']
+                    template_dict['resolution'] = '%dx?' % template_dict['width']
  
              sanitize = lambda k, v: sanitize_filename(
                  compat_str(v),
@@ -567,7 +583,7 @@ class YoutubeDL(object):
                                   if v is not None)
              template_dict = collections.defaultdict(lambda: 'NA', template_dict)
  
-            outtmpl = sanitize_path(self.params.get('outtmpl', DEFAULT_OUTTMPL))
+            outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
              tmpl = compat_expanduser(outtmpl)
              filename = tmpl % template_dict
              # Temporary fix for #4787
@@ -575,7 +591,7 @@ class YoutubeDL(object):
              # to workaround encoding issues with subprocess on python2 @ Windows
              if sys.version_info < (3, 0) and sys.platform == 'win32':
                  filename = encodeFilename(filename, True).decode(preferredencoding())
-            return filename
+            return sanitize_path(filename)
          except ValueError as err:
              self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
              return None
@@ -595,12 +611,12 @@ class YoutubeDL(object):
              if rejecttitle:
                  if re.search(rejecttitle, title, re.IGNORECASE):
                      return '"' + title + '" title matched reject pattern "' + rejecttitle + '"'
-        date = info_dict.get('upload_date', None)
+        date = info_dict.get('upload_date')
          if date is not None:
              dateRange = self.params.get('daterange', DateRange())
              if date not in dateRange:
                  return '%s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
-        view_count = info_dict.get('view_count', None)
+        view_count = info_dict.get('view_count')
          if view_count is not None:
              min_views = self.params.get('min_views')
              if min_views is not None and view_count < min_views:
@@ -648,6 +664,7 @@ class YoutubeDL(object):
              if not ie.suitable(url):
                  continue
  
+            ie = self.get_info_extractor(ie.ie_key())
              if not ie.working():
                  self.report_warning('The program functionality for this site has been marked as broken, '
                                      'and will probably not work.')
@@ -667,14 +684,14 @@ class YoutubeDL(object):
                      return self.process_ie_result(ie_result, download, extra_info)
                  else:
                      return ie_result
-            except ExtractorError as de:  # An error we somewhat expected
-                self.report_error(compat_str(de), de.format_traceback())
+            except ExtractorError as e:  # An error we somewhat expected
+                self.report_error(compat_str(e), e.format_traceback())
                  break
              except MaxDownloadsReached:
                  raise
              except Exception as e:
                  if self.params.get('ignoreerrors', False):
-                    self.report_error(compat_str(e), tb=compat_str(traceback.format_exc()))
+                    self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
                      break
                  else:
                      raise
@@ -697,7 +714,6 @@ class YoutubeDL(object):
          It will also download the videos if 'download'.
          Returns the resolved ie_result.
          """
-
          result_type = ie_result.get('_type', 'video')
  
          if result_type in ('url', 'url_transparent'):
@@ -726,7 +742,7 @@ class YoutubeDL(object):
  
              force_properties = dict(
                  (k, v) for k, v in ie_result.items() if v is not None)
-            for f in ('_type', 'url'):
+            for f in ('_type', 'url', 'ie_key'):
                  if f in force_properties:
                      del force_properties[f]
              new_result = info.copy()
@@ -738,18 +754,18 @@ class YoutubeDL(object):
                  new_result, download=download, extra_info=extra_info)
          elif result_type == 'playlist' or result_type == 'multi_video':
              # We process each entry in the playlist
-            playlist = ie_result.get('title', None) or ie_result.get('id', None)
+            playlist = ie_result.get('title') or ie_result.get('id')
              self.to_screen('[download] Downloading playlist: %s' % playlist)
  
              playlist_results = []
  
              playliststart = self.params.get('playliststart', 1) - 1
-            playlistend = self.params.get('playlistend', None)
+            playlistend = self.params.get('playlistend')
              # For backwards compatibility, interpret -1 as whole list
              if playlistend == -1:
                  playlistend = None
  
-            playlistitems_str = self.params.get('playlist_items', None)
+            playlistitems_str = self.params.get('playlist_items')
              playlistitems = None
              if playlistitems_str is not None:
                  def iter_playlistitems(format):
@@ -773,7 +789,7 @@ class YoutubeDL(object):
                      entries = ie_entries[playliststart:playlistend]
                  n_entries = len(entries)
                  self.to_screen(
-                    "[%s] playlist %s: Collected %d video ids (downloading %d of them)" %
+                    '[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
                      (ie_result['extractor'], playlist, n_all_entries, n_entries))
              elif isinstance(ie_entries, PagedList):
                  if playlistitems:
@@ -787,7 +803,7 @@ class YoutubeDL(object):
                          playliststart, playlistend)
                  n_entries = len(entries)
                  self.to_screen(
-                    "[%s] playlist %s: Downloading %d videos" %
+                    '[%s] playlist %s: Downloading %d videos' %
                      (ie_result['extractor'], playlist, n_entries))
              else:  # iterable
                  if playlistitems:
@@ -798,7 +814,7 @@ class YoutubeDL(object):
                          ie_entries, playliststart, playlistend))
                  n_entries = len(entries)
                  self.to_screen(
-                    "[%s] playlist %s: Downloading %d videos" %
+                    '[%s] playlist %s: Downloading %d videos' %
                      (ie_result['extractor'], playlist, n_entries))
  
              if self.params.get('playlistreverse', False):
@@ -828,6 +844,7 @@ class YoutubeDL(object):
                                                        extra_info=extra)
                  playlist_results.append(entry_result)
              ie_result['entries'] = playlist_results
+            self.to_screen('[download] Finished downloading playlist: %s' % playlist)
              return ie_result
          elif result_type == 'compat_list':
              self.report_warning(
@@ -853,8 +870,8 @@ class YoutubeDL(object):
          else:
              raise Exception('Invalid result type: %s' % result_type)
  
-    def _apply_format_filter(self, format_spec, available_formats):
-        " Returns a tuple of the remaining format_spec and filtered formats "
+    def _build_format_filter(self, filter_spec):
+        " Returns a function to filter the formats according to the filter_spec "
  
          OPERATORS = {
              '<': operator.lt,
@@ -864,13 +881,13 @@ class YoutubeDL(object):
              '=': operator.eq,
              '!=': operator.ne,
          }
-        operator_rex = re.compile(r'''(?x)\s*\[
+        operator_rex = re.compile(r'''(?x)\s*
              (?P<key>width|height|tbr|abr|vbr|asr|filesize|fps)
              \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
              (?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
-            \]$
+            $
              ''' % '|'.join(map(re.escape, OPERATORS.keys())))
-        m = operator_rex.search(format_spec)
+        m = operator_rex.search(filter_spec)
          if m:
              try:
                  comparison_value = int(m.group('value'))
@@ -881,93 +898,300 @@ class YoutubeDL(object):
                  if comparison_value is None:
                      raise ValueError(
                          'Invalid value %r in format specification %r' % (
-                            m.group('value'), format_spec))
+                            m.group('value'), filter_spec))
              op = OPERATORS[m.group('op')]
  
          if not m:
              STR_OPERATORS = {
                  '=': operator.eq,
                  '!=': operator.ne,
+                '^=': lambda attr, value: attr.startswith(value),
+                '$=': lambda attr, value: attr.endswith(value),
+                '*=': lambda attr, value: value in attr,
              }
-            str_operator_rex = re.compile(r'''(?x)\s*\[
-                \s*(?P<key>ext|acodec|vcodec|container|protocol)
+            str_operator_rex = re.compile(r'''(?x)
+                \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
                  \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
-                \s*(?P<value>[a-zA-Z0-9_-]+)
-                \s*\]$
+                \s*(?P<value>[a-zA-Z0-9._-]+)
+                \s*$
                  ''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
-            m = str_operator_rex.search(format_spec)
+            m = str_operator_rex.search(filter_spec)
              if m:
                  comparison_value = m.group('value')
                  op = STR_OPERATORS[m.group('op')]
  
          if not m:
-            raise ValueError('Invalid format specification %r' % format_spec)
+            raise ValueError('Invalid filter specification %r' % filter_spec)
  
          def _filter(f):
              actual_value = f.get(m.group('key'))
              if actual_value is None:
                  return m.group('none_inclusive')
              return op(actual_value, comparison_value)
-        new_formats = [f for f in available_formats if _filter(f)]
+        return _filter
+
+    def build_format_selector(self, format_spec):
+        def syntax_error(note, start):
+            message = (
+                'Invalid format specification: '
+                '{0}\n\t{1}\n\t{2}^'.format(note, format_spec, ' ' * start[1]))
+            return SyntaxError(message)
+
+        PICKFIRST = 'PICKFIRST'
+        MERGE = 'MERGE'
+        SINGLE = 'SINGLE'
+        GROUP = 'GROUP'
+        FormatSelector = collections.namedtuple('FormatSelector', ['type', 'selector', 'filters'])
+
+        def _parse_filter(tokens):
+            filter_parts = []
+            for type, string, start, _, _ in tokens:
+                if type == tokenize.OP and string == ']':
+                    return ''.join(filter_parts)
+                else:
+                    filter_parts.append(string)
+
+        def _remove_unused_ops(tokens):
+            # Remove operators that we don't use and join them with the surrounding strings
+            # for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
+            ALLOWED_OPS = ('/', '+', ',', '(', ')')
+            last_string, last_start, last_end, last_line = None, None, None, None
+            for type, string, start, end, line in tokens:
+                if type == tokenize.OP and string == '[':
+                    if last_string:
+                        yield tokenize.NAME, last_string, last_start, last_end, last_line
+                        last_string = None
+                    yield type, string, start, end, line
+                    # everything inside brackets will be handled by _parse_filter
+                    for type, string, start, end, line in tokens:
+                        yield type, string, start, end, line
+                        if type == tokenize.OP and string == ']':
+                            break
+                elif type == tokenize.OP and string in ALLOWED_OPS:
+                    if last_string:
+                        yield tokenize.NAME, last_string, last_start, last_end, last_line
+                        last_string = None
+                    yield type, string, start, end, line
+                elif type in [tokenize.NAME, tokenize.NUMBER, tokenize.OP]:
+                    if not last_string:
+                        last_string = string
+                        last_start = start
+                        last_end = end
+                    else:
+                        last_string += string
+            if last_string:
+                yield tokenize.NAME, last_string, last_start, last_end, last_line
+
+        def _parse_format_selection(tokens, inside_merge=False, inside_choice=False, inside_group=False):
+            selectors = []
+            current_selector = None
+            for type, string, start, _, _ in tokens:
+                # ENCODING is only defined in python 3.x
+                if type == getattr(tokenize, 'ENCODING', None):
+                    continue
+                elif type in [tokenize.NAME, tokenize.NUMBER]:
+                    current_selector = FormatSelector(SINGLE, string, [])
+                elif type == tokenize.OP:
+                    if string == ')':
+                        if not inside_group:
+                            # ')' will be handled by the parentheses group
+                            tokens.restore_last_token()
+                        break
+                    elif inside_merge and string in ['/', ',']:
+                        tokens.restore_last_token()
+                        break
+                    elif inside_choice and string == ',':
+                        tokens.restore_last_token()
+                        break
+                    elif string == ',':
+                        if not current_selector:
+                            raise syntax_error('"," must follow a format selector', start)
+                        selectors.append(current_selector)
+                        current_selector = None
+                    elif string == '/':
+                        if not current_selector:
+                            raise syntax_error('"/" must follow a format selector', start)
+                        first_choice = current_selector
+                        second_choice = _parse_format_selection(tokens, inside_choice=True)
+                        current_selector = FormatSelector(PICKFIRST, (first_choice, second_choice), [])
+                    elif string == '[':
+                        if not current_selector:
+                            current_selector = FormatSelector(SINGLE, 'best', [])
+                        format_filter = _parse_filter(tokens)
+                        current_selector.filters.append(format_filter)
+                    elif string == '(':
+                        if current_selector:
+                            raise syntax_error('Unexpected "("', start)
+                        group = _parse_format_selection(tokens, inside_group=True)
+                        current_selector = FormatSelector(GROUP, group, [])
+                    elif string == '+':
+                        video_selector = current_selector
+                        audio_selector = _parse_format_selection(tokens, inside_merge=True)
+                        if not video_selector or not audio_selector:
+                            raise syntax_error('"+" must be between two format selectors', start)
+                        current_selector = FormatSelector(MERGE, (video_selector, audio_selector), [])
+                    else:
+                        raise syntax_error('Operator not recognized: "{0}"'.format(string), start)
+                elif type == tokenize.ENDMARKER:
+                    break
+            if current_selector:
+                selectors.append(current_selector)
+            return selectors
+
+        def _build_selector_function(selector):
+            if isinstance(selector, list):
+                fs = [_build_selector_function(s) for s in selector]
+
+                def selector_function(formats):
+                    for f in fs:
+                        for format in f(formats):
+                            yield format
+                return selector_function
+            elif selector.type == GROUP:
+                selector_function = _build_selector_function(selector.selector)
+            elif selector.type == PICKFIRST:
+                fs = [_build_selector_function(s) for s in selector.selector]
+
+                def selector_function(formats):
+                    for f in fs:
+                        picked_formats = list(f(formats))
+                        if picked_formats:
+                            return picked_formats
+                    return []
+            elif selector.type == SINGLE:
+                format_spec = selector.selector
+
+                def selector_function(formats):
+                    formats = list(formats)
+                    if not formats:
+                        return
+                    if format_spec == 'all':
+                        for f in formats:
+                            yield f
+                    elif format_spec in ['best', 'worst', None]:
+                        format_idx = 0 if format_spec == 'worst' else -1
+                        audiovideo_formats = [
+                            f for f in formats
+                            if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
+                        if audiovideo_formats:
+                            yield audiovideo_formats[format_idx]
+                        # for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
+                        elif (all(f.get('acodec') != 'none' for f in formats) or
+                              all(f.get('vcodec') != 'none' for f in formats)):
+                            yield formats[format_idx]
+                    elif format_spec == 'bestaudio':
+                        audio_formats = [
+                            f for f in formats
+                            if f.get('vcodec') == 'none']
+                        if audio_formats:
+                            yield audio_formats[-1]
+                    elif format_spec == 'worstaudio':
+                        audio_formats = [
+                            f for f in formats
+                            if f.get('vcodec') == 'none']
+                        if audio_formats:
+                            yield audio_formats[0]
+                    elif format_spec == 'bestvideo':
+                        video_formats = [
+                            f for f in formats
+                            if f.get('acodec') == 'none']
+                        if video_formats:
+                            yield video_formats[-1]
+                    elif format_spec == 'worstvideo':
+                        video_formats = [
+                            f for f in formats
+                            if f.get('acodec') == 'none']
+                        if video_formats:
+                            yield video_formats[0]
+                    else:
+                        extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav']
+                        if format_spec in extensions:
+                            filter_f = lambda f: f['ext'] == format_spec
+                        else:
+                            filter_f = lambda f: f['format_id'] == format_spec
+                        matches = list(filter(filter_f, formats))
+                        if matches:
+                            yield matches[-1]
+            elif selector.type == MERGE:
+                def _merge(formats_info):
+                    format_1, format_2 = [f['format_id'] for f in formats_info]
+                    # The first format must contain the video and the
+                    # second the audio
+                    if formats_info[0].get('vcodec') == 'none':
+                        self.report_error('The first format must '
+                                          'contain the video, try using '
+                                          '"-f %s+%s"' % (format_2, format_1))
+                        return
+                    # Formats must be opposite (video+audio)
+                    if formats_info[0].get('acodec') == 'none' and formats_info[1].get('acodec') == 'none':
+                        self.report_error(
+                            'Both formats %s and %s are video-only, you must specify "-f video+audio"'
+                            % (format_1, format_2))
+                        return
+                    output_ext = (
+                        formats_info[0]['ext']
+                        if self.params.get('merge_output_format') is None
+                        else self.params['merge_output_format'])
+                    return {
+                        'requested_formats': formats_info,
+                        'format': '%s+%s' % (formats_info[0].get('format'),
+                                             formats_info[1].get('format')),
+                        'format_id': '%s+%s' % (formats_info[0].get('format_id'),
+                                                formats_info[1].get('format_id')),
+                        'width': formats_info[0].get('width'),
+                        'height': formats_info[0].get('height'),
+                        'resolution': formats_info[0].get('resolution'),
+                        'fps': formats_info[0].get('fps'),
+                        'vcodec': formats_info[0].get('vcodec'),
+                        'vbr': formats_info[0].get('vbr'),
+                        'stretched_ratio': formats_info[0].get('stretched_ratio'),
+                        'acodec': formats_info[1].get('acodec'),
+                        'abr': formats_info[1].get('abr'),
+                        'ext': output_ext,
+                    }
+                video_selector, audio_selector = map(_build_selector_function, selector.selector)
  
-        new_format_spec = format_spec[:-len(m.group(0))]
-        if not new_format_spec:
-            new_format_spec = 'best'
+                def selector_function(formats):
+                    formats = list(formats)
+                    for pair in itertools.product(video_selector(formats), audio_selector(formats)):
+                        yield _merge(pair)
  
-        return (new_format_spec, new_formats)
+            filters = [self._build_format_filter(f) for f in selector.filters]
  
-    def select_format(self, format_spec, available_formats):
-        while format_spec.endswith(']'):
-            format_spec, available_formats = self._apply_format_filter(
-                format_spec, available_formats)
-        if not available_formats:
-            return None
+            def final_selector(formats):
+                for _filter in filters:
+                    formats = list(filter(_filter, formats))
+                return selector_function(formats)
+            return final_selector
  
-        if format_spec in ['best', 'worst', None]:
-            format_idx = 0 if format_spec == 'worst' else -1
-            audiovideo_formats = [
-                f for f in available_formats
-                if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
-            if audiovideo_formats:
-                return audiovideo_formats[format_idx]
-            # for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
-            elif (all(f.get('acodec') != 'none' for f in available_formats) or
-                  all(f.get('vcodec') != 'none' for f in available_formats)):
-                return available_formats[format_idx]
-        elif format_spec == 'bestaudio':
-            audio_formats = [
-                f for f in available_formats
-                if f.get('vcodec') == 'none']
-            if audio_formats:
-                return audio_formats[-1]
-        elif format_spec == 'worstaudio':
-            audio_formats = [
-                f for f in available_formats
-                if f.get('vcodec') == 'none']
-            if audio_formats:
-                return audio_formats[0]
-        elif format_spec == 'bestvideo':
-            video_formats = [
-                f for f in available_formats
-                if f.get('acodec') == 'none']
-            if video_formats:
-                return video_formats[-1]
-        elif format_spec == 'worstvideo':
-            video_formats = [
-                f for f in available_formats
-                if f.get('acodec') == 'none']
-            if video_formats:
-                return video_formats[0]
-        else:
-            extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav']
-            if format_spec in extensions:
-                filter_f = lambda f: f['ext'] == format_spec
-            else:
-                filter_f = lambda f: f['format_id'] == format_spec
-            matches = list(filter(filter_f, available_formats))
-            if matches:
-                return matches[-1]
-        return None
+        stream = io.BytesIO(format_spec.encode('utf-8'))
+        try:
+            tokens = list(_remove_unused_ops(compat_tokenize_tokenize(stream.readline)))
+        except tokenize.TokenError:
+            raise syntax_error('Missing closing/opening brackets or parenthesis', (0, len(format_spec)))
+
+        class TokenIterator(object):
+            def __init__(self, tokens):
+                self.tokens = tokens
+                self.counter = 0
+
+            def __iter__(self):
+                return self
+
+            def __next__(self):
+                if self.counter >= len(self.tokens):
+                    raise StopIteration()
+                value = self.tokens[self.counter]
+                self.counter += 1
+                return value
+
+            next = __next__
+
+            def restore_last_token(self):
+                self.counter -= 1
+
+        parsed_selector = _parse_format_selection(iter(TokenIterator(tokens)))
+        return _build_selector_function(parsed_selector)
  
      def _calc_headers(self, info_dict):
          res = std_headers.copy()
@@ -983,7 +1207,7 @@ class YoutubeDL(object):
          return res
  
      def _calc_cookies(self, info_dict):
-        pr = compat_urllib_request.Request(info_dict['url'])
+        pr = sanitized_Request(info_dict['url'])
          self.cookiejar.add_cookie_header(pr)
          return pr.get_header('Cookie')
  
@@ -1010,12 +1234,20 @@ class YoutubeDL(object):
                  t.get('preference'), t.get('width'), t.get('height'),
                  t.get('id'), t.get('url')))
              for i, t in enumerate(thumbnails):
+                t['url'] = sanitize_url(t['url'])
                  if t.get('width') and t.get('height'):
                      t['resolution'] = '%dx%d' % (t['width'], t['height'])
                  if t.get('id') is None:
                      t['id'] = '%d' % i
  
-        if thumbnails and 'thumbnail' not in info_dict:
+        if self.params.get('list_thumbnails'):
+            self.list_thumbnails(info_dict)
+            return
+
+        thumbnail = info_dict.get('thumbnail')
+        if thumbnail:
+            info_dict['thumbnail'] = sanitize_url(thumbnail)
+        elif thumbnails:
              info_dict['thumbnail'] = thumbnails[-1]['url']
  
          if 'display_id' not in info_dict and 'id' in info_dict:
@@ -1030,13 +1262,28 @@ class YoutubeDL(object):
              except (ValueError, OverflowError, OSError):
                  pass
  
+        # Auto generate title fields corresponding to the *_number fields when missing
+        # in order to always have clean titles. This is very common for TV series.
+        for field in ('chapter', 'season', 'episode'):
+            if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
+                info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
+
+        subtitles = info_dict.get('subtitles')
+        if subtitles:
+            for _, subtitle in subtitles.items():
+                for subtitle_format in subtitle:
+                    if subtitle_format.get('url'):
+                        subtitle_format['url'] = sanitize_url(subtitle_format['url'])
+                    if 'ext' not in subtitle_format:
+                        subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
+
          if self.params.get('listsubtitles', False):
              if 'automatic_captions' in info_dict:
                  self.list_subtitles(info_dict['id'], info_dict.get('automatic_captions'), 'automatic captions')
-            self.list_subtitles(info_dict['id'], info_dict.get('subtitles'), 'subtitles')
+            self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
              return
          info_dict['requested_subtitles'] = self.process_subtitles(
-            info_dict['id'], info_dict.get('subtitles'),
+            info_dict['id'], subtitles,
              info_dict.get('automatic_captions'))
  
          # We now pick which formats have to be downloaded
@@ -1056,8 +1303,13 @@ class YoutubeDL(object):
              if 'url' not in format:
                  raise ExtractorError('Missing "url" key in result (index %d)' % i)
  
+            format['url'] = sanitize_url(format['url'])
+
              if format.get('format_id') is None:
                  format['format_id'] = compat_str(i)
+            else:
+                # Sanitize format_id from characters used in format selector expression
+                format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
              format_id = format['format_id']
              if format_id not in formats_dict:
                  formats_dict[format_id] = []
@@ -1079,6 +1331,10 @@ class YoutubeDL(object):
              # Automatically determine file extension if missing
              if 'ext' not in format:
                  format['ext'] = determine_ext(format['url']).lower()
+            # Automatically determine protocol if missing (useful for format
+            # selection purposes)
+            if 'protocol' not in format:
+                format['protocol'] = determine_protocol(format)
              # Add HTTP headers, so that external programs can use them from the
              # json output
              full_format_info = info_dict.copy()
@@ -1091,76 +1347,24 @@ class YoutubeDL(object):
              # only set the 'formats' fields if the original info_dict list them
              # otherwise we end up with a circular reference, the first (and unique)
              # element in the 'formats' field in info_dict is info_dict itself,
-            # wich can't be exported to json
+            # which can't be exported to json
              info_dict['formats'] = formats
          if self.params.get('listformats'):
              self.list_formats(info_dict)
              return
-        if self.params.get('list_thumbnails'):
-            self.list_thumbnails(info_dict)
-            return
  
          req_format = self.params.get('format')
          if req_format is None:
              req_format_list = []
              if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
-                    info_dict['extractor'] in ['youtube', 'ted'] and
                      not info_dict.get('is_live')):
                  merger = FFmpegMergerPP(self)
                  if merger.available and merger.can_merge():
                      req_format_list.append('bestvideo+bestaudio')
              req_format_list.append('best')
              req_format = '/'.join(req_format_list)
-        formats_to_download = []
-        if req_format == 'all':
-            formats_to_download = formats
-        else:
-            for rfstr in req_format.split(','):
-                # We can accept formats requested in the format: 34/5/best, we pick
-                # the first that is available, starting from left
-                req_formats = rfstr.split('/')
-                for rf in req_formats:
-                    if re.match(r'.+?\+.+?', rf) is not None:
-                        # Two formats have been requested like '137+139'
-                        format_1, format_2 = rf.split('+')
-                        formats_info = (self.select_format(format_1, formats),
-                                        self.select_format(format_2, formats))
-                        if all(formats_info):
-                            # The first format must contain the video and the
-                            # second the audio
-                            if formats_info[0].get('vcodec') == 'none':
-                                self.report_error('The first format must '
-                                                  'contain the video, try using '
-                                                  '"-f %s+%s"' % (format_2, format_1))
-                                return
-                            output_ext = (
-                                formats_info[0]['ext']
-                                if self.params.get('merge_output_format') is None
-                                else self.params['merge_output_format'])
-                            selected_format = {
-                                'requested_formats': formats_info,
-                                'format': '%s+%s' % (formats_info[0].get('format'),
-                                                     formats_info[1].get('format')),
-                                'format_id': '%s+%s' % (formats_info[0].get('format_id'),
-                                                        formats_info[1].get('format_id')),
-                                'width': formats_info[0].get('width'),
-                                'height': formats_info[0].get('height'),
-                                'resolution': formats_info[0].get('resolution'),
-                                'fps': formats_info[0].get('fps'),
-                                'vcodec': formats_info[0].get('vcodec'),
-                                'vbr': formats_info[0].get('vbr'),
-                                'stretched_ratio': formats_info[0].get('stretched_ratio'),
-                                'acodec': formats_info[1].get('acodec'),
-                                'abr': formats_info[1].get('abr'),
-                                'ext': output_ext,
-                            }
-                        else:
-                            selected_format = None
-                    else:
-                        selected_format = self.select_format(rf, formats)
-                    if selected_format is not None:
-                        formats_to_download.append(selected_format)
-                        break
+        format_selector = self.build_format_selector(req_format)
+        formats_to_download = list(format_selector(formats))
          if not formats_to_download:
              raise ExtractorError('requested format not available',
                                   expected=True)
@@ -1288,7 +1492,7 @@ class YoutubeDL(object):
              if dn and not os.path.exists(dn):
                  os.makedirs(dn)
          except (OSError, IOError) as err:
-            self.report_error('unable to create directory ' + compat_str(err))
+            self.report_error('unable to create directory ' + error_to_compat_str(err))
              return
  
          if self.params.get('writedescription', False):
@@ -1339,7 +1543,7 @@ class YoutubeDL(object):
                              sub_info['url'], info_dict['id'], note=False)
                      except ExtractorError as err:
                          self.report_warning('Unable to download subtitle for "%s": %s' %
-                                            (sub_lang, compat_str(err.cause)))
+                                            (sub_lang, error_to_compat_str(err.cause)))
                          continue
                  try:
                      sub_filename = subtitles_filename(filename, sub_lang, sub_format)
@@ -1443,12 +1647,14 @@ class YoutubeDL(object):
                  self.report_error('content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
                  return
  
-            if success:
+            if success and filename != '-':
                  # Fixup content
                  fixup_policy = self.params.get('fixup')
                  if fixup_policy is None:
                      fixup_policy = 'detect_or_warn'
  
+                INSTALL_FFMPEG_MESSAGE = 'Install ffmpeg or avconv to fix this automatically.'
+
                  stretched_ratio = info_dict.get('stretched_ratio')
                  if stretched_ratio is not None and stretched_ratio != 1:
                      if fixup_policy == 'warn':
@@ -1461,15 +1667,18 @@ class YoutubeDL(object):
                              info_dict['__postprocessors'].append(stretched_pp)
                          else:
                              self.report_warning(
-                                '%s: Non-uniform pixel ratio (%s). Install ffmpeg or avconv to fix this automatically.' % (
-                                    info_dict['id'], stretched_ratio))
+                                '%s: Non-uniform pixel ratio (%s). %s'
+                                % (info_dict['id'], stretched_ratio, INSTALL_FFMPEG_MESSAGE))
                      else:
                          assert fixup_policy in ('ignore', 'never')
  
-                if info_dict.get('requested_formats') is None and info_dict.get('container') == 'm4a_dash':
+                if (info_dict.get('requested_formats') is None and
+                        info_dict.get('container') == 'm4a_dash'):
                      if fixup_policy == 'warn':
-                        self.report_warning('%s: writing DASH m4a. Only some players support this container.' % (
-                            info_dict['id']))
+                        self.report_warning(
+                            '%s: writing DASH m4a. '
+                            'Only some players support this container.'
+                            % info_dict['id'])
                      elif fixup_policy == 'detect_or_warn':
                          fixup_pp = FFmpegFixupM4aPP(self)
                          if fixup_pp.available:
@@ -1477,8 +1686,27 @@ class YoutubeDL(object):
                              info_dict['__postprocessors'].append(fixup_pp)
                          else:
                              self.report_warning(
-                                '%s: writing DASH m4a. Only some players support this container. Install ffmpeg or avconv to fix this automatically.' % (
-                                    info_dict['id']))
+                                '%s: writing DASH m4a. '
+                                'Only some players support this container. %s'
+                                % (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
+                    else:
+                        assert fixup_policy in ('ignore', 'never')
+
+                if (info_dict.get('protocol') == 'm3u8_native' or
+                        info_dict.get('protocol') == 'm3u8' and
+                        self.params.get('hls_prefer_native')):
+                    if fixup_policy == 'warn':
+                        self.report_warning('%s: malformated aac bitstream.' % (
+                            info_dict['id']))
+                    elif fixup_policy == 'detect_or_warn':
+                        fixup_pp = FFmpegFixupM3u8PP(self)
+                        if fixup_pp.available:
+                            info_dict.setdefault('__postprocessors', [])
+                            info_dict['__postprocessors'].append(fixup_pp)
+                        else:
+                            self.report_warning(
+                                '%s: malformated aac bitstream. %s'
+                                % (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
                      else:
                          assert fixup_policy in ('ignore', 'never')
  
@@ -1609,7 +1837,7 @@ class YoutubeDL(object):
              else:
                  res = '%sp' % format['height']
          elif format.get('width') is not None:
-            res = '?x%d' % format['width']
+            res = '%dx?' % format['width']
          else:
              res = default
          return res
@@ -1618,6 +1846,10 @@ class YoutubeDL(object):
          res = ''
          if fdict.get('ext') in ['f4f', 'f4m']:
              res += '(unsupported) '
+        if fdict.get('language'):
+            if res:
+                res += ' '
+            res += '[%s] ' % fdict['language']
          if fdict.get('format_note') is not None:
              res += fdict['format_note'] + ' '
          if fdict.get('tbr') is not None:
@@ -1638,7 +1870,9 @@ class YoutubeDL(object):
          if fdict.get('vbr') is not None:
              res += '%4dk' % fdict['vbr']
          if fdict.get('fps') is not None:
-            res += ', %sfps' % fdict['fps']
+            if res:
+                res += ', '
+            res += '%sfps' % fdict['fps']
          if fdict.get('acodec') is not None:
              if res:
                  res += ', '
@@ -1681,13 +1915,8 @@ class YoutubeDL(object):
      def list_thumbnails(self, info_dict):
          thumbnails = info_dict.get('thumbnails')
          if not thumbnails:
-            tn_url = info_dict.get('thumbnail')
-            if tn_url:
-                thumbnails = [{'id': '0', 'url': tn_url}]
-            else:
-                self.to_screen(
-                    '[info] No thumbnails present for %s' % info_dict['id'])
-                return
+            self.to_screen('[info] No thumbnails present for %s' % info_dict['id'])
+            return
  
          self.to_screen(
              '[info] Thumbnails for %s:' % info_dict['id'])
@@ -1708,27 +1937,8 @@ class YoutubeDL(object):
  
      def urlopen(self, req):
          """ Start an HTTP download """
-
-        # According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
-        # always respected by websites, some tend to give out URLs with non percent-encoded
-        # non-ASCII characters (see telemb.py, ard.py [#3412])
-        # urllib chokes on URLs with non-ASCII characters (see http://bugs.python.org/issue3991)
-        # To work around aforementioned issue we will replace request's original URL with
-        # percent-encoded one
-        req_is_string = isinstance(req, compat_basestring)
-        url = req if req_is_string else req.get_full_url()
-        url_escaped = escape_url(url)
-
-        # Substitute URL if any change after escaping
-        if url != url_escaped:
-            if req_is_string:
-                req = url_escaped
-            else:
-                req_type = HEADRequest if req.get_method() == 'HEAD' else compat_urllib_request.Request
-                req = req_type(
-                    url_escaped, data=req.data, headers=req.headers,
-                    origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
-
+        if isinstance(req, compat_basestring):
+            req = sanitized_Request(req)
          return self._opener.open(req, timeout=self._socket_timeout)
  
      def print_debug_header(self):
@@ -1751,6 +1961,8 @@ class YoutubeDL(object):
          write_string(encoding_str, encoding=None)
  
          self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
+        if _LAZY_LOADER:
+            self._write_string('[debug] Lazy loading extractors enabled' + '\n')
          try:
              sp = subprocess.Popen(
                  ['git', 'rev-parse', '--short', 'HEAD'],
@@ -1811,8 +2023,7 @@ class YoutubeDL(object):
              if os.access(opts_cookiefile, os.R_OK):
                  self.cookiejar.load()
  
-        cookie_processor = compat_urllib_request.HTTPCookieProcessor(
-            self.cookiejar)
+        cookie_processor = YoutubeDLCookieProcessor(self.cookiejar)
          if opts_proxy is not None:
              if opts_proxy == '':
                  proxies = {}
@@ -1828,8 +2039,20 @@ class YoutubeDL(object):
          debuglevel = 1 if self.params.get('debug_printtraffic') else 0
          https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
          ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
+        data_handler = compat_urllib_request_DataHandler()
+
+        # When passing our own FileHandler instance, build_opener won't add the
+        # default FileHandler and allows us to disable the file protocol, which
+        # can be used for malicious purposes (see
+        # https://github.com/rg3/youtube-dl/issues/8227)
+        file_handler = compat_urllib_request.FileHandler()
+
+        def file_open(*args, **kwargs):
+            raise compat_urllib_error.URLError('file:// scheme is explicitly disabled in youtube-dl for security reasons')
+        file_handler.file_open = file_open
+
          opener = compat_urllib_request.build_opener(
-            proxy_handler, https_handler, cookie_processor, ydlh)
+            proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
  
          # Delete the default user-agent header, which would otherwise apply in
          # cases where our custom HTTP handler doesn't come into play
@@ -1881,10 +2104,10 @@ class YoutubeDL(object):
                                 (info_dict['extractor'], info_dict['id'], thumb_display_id))
                  try:
                      uf = self.urlopen(t['url'])
-                    with open(thumb_filename, 'wb') as thumbf:
+                    with open(encodeFilename(thumb_filename), 'wb') as thumbf:
                          shutil.copyfileobj(uf, thumbf)
                      self.to_screen('[%s] %s: Writing thumbnail %sto: %s' %
                                     (info_dict['extractor'], info_dict['id'], thumb_display_id, thumb_filename))
                  except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
                      self.report_warning('Unable to download thumbnail "%s": %s' %
-                                        (t['url'], compat_str(err)))
+                                        (t['url'], error_to_compat_str(err)))
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 55b22c889f97c73d28e732466f850bcfb1615c83..737f6545d4136401dd3d8ddd691ad52e86894bb0 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -9,7 +9,6 @@ import codecs
  import io
  import os
  import random
-import shlex
  import sys
  
  
@@ -20,6 +19,7 @@ from .compat import (
      compat_expanduser,
      compat_getpass,
      compat_print,
+    compat_shlex_split,
      workaround_optparse_bug9161,
  )
  from .utils import (
@@ -144,14 +144,20 @@ def _real_main(argv=None):
          if numeric_limit is None:
              parser.error('invalid max_filesize specified')
          opts.max_filesize = numeric_limit
-    if opts.retries is not None:
-        if opts.retries in ('inf', 'infinite'):
-            opts_retries = float('inf')
+
+    def parse_retries(retries):
+        if retries in ('inf', 'infinite'):
+            parsed_retries = float('inf')
          else:
              try:
-                opts_retries = int(opts.retries)
+                parsed_retries = int(retries)
              except (TypeError, ValueError):
                  parser.error('invalid retry count specified')
+        return parsed_retries
+    if opts.retries is not None:
+        opts.retries = parse_retries(opts.retries)
+    if opts.fragment_retries is not None:
+        opts.fragment_retries = parse_retries(opts.fragment_retries)
      if opts.buffersize is not None:
          numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
          if numeric_buffersize is None:
@@ -262,10 +268,10 @@ def _real_main(argv=None):
              parser.error('setting filesize xattr requested but python-xattr is not available')
      external_downloader_args = None
      if opts.external_downloader_args:
-        external_downloader_args = shlex.split(opts.external_downloader_args)
+        external_downloader_args = compat_shlex_split(opts.external_downloader_args)
      postprocessor_args = None
      if opts.postprocessor_args:
-        postprocessor_args = shlex.split(opts.postprocessor_args)
+        postprocessor_args = compat_shlex_split(opts.postprocessor_args)
      match_filter = (
          None if opts.match_filter is None
          else match_filter_func(opts.match_filter))
@@ -299,7 +305,8 @@ def _real_main(argv=None):
          'force_generic_extractor': opts.force_generic_extractor,
          'ratelimit': opts.ratelimit,
          'nooverwrites': opts.nooverwrites,
-        'retries': opts_retries,
+        'retries': opts.retries,
+        'fragment_retries': opts.fragment_retries,
          'buffersize': opts.buffersize,
          'noresizebuffer': opts.noresizebuffer,
          'continuedl': opts.continue_dl,
@@ -355,6 +362,7 @@ def _real_main(argv=None):
          'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
          'encoding': opts.encoding,
          'extract_flat': opts.extract_flat,
+        'mark_watched': opts.mark_watched,
          'merge_output_format': opts.merge_output_format,
          'postprocessors': postprocessors,
          'fixup': opts.fixup,
@@ -369,6 +377,7 @@ def _real_main(argv=None):
          'no_color': opts.no_color,
          'ffmpeg_location': opts.ffmpeg_location,
          'hls_prefer_native': opts.hls_prefer_native,
+        'hls_use_mpegts': opts.hls_use_mpegts,
          'external_downloader_args': external_downloader_args,
          'postprocessor_args': postprocessor_args,
          'cn_verification_proxy': opts.cn_verification_proxy,
@@ -377,7 +386,7 @@ def _real_main(argv=None):
      with YoutubeDL(ydl_opts) as ydl:
          # Update version
          if opts.update_self:
-            update_self(ydl.to_screen, opts.verbose)
+            update_self(ydl.to_screen, opts.verbose, ydl._opener)
  
          # Remove cache dir
          if opts.rm_cachedir:
diff --git a/youtube_dl/__main__.py b/youtube_dl/__main__.py

index 65a0f891c5998cd49c7e1a98dbca75b6c926ccb8..138f5fbec39f1e0e84051b092ff0cae6974212cf 100755 (executable)
--- a/youtube_dl/__main__.py
+++ b/youtube_dl/__main__.py
@@ -7,11 +7,11 @@ from __future__ import unicode_literals
  
  import sys
  
-if __package__ is None and not hasattr(sys, "frozen"):
+if __package__ is None and not hasattr(sys, 'frozen'):
      # direct call of __main__.py
      import os.path
      path = os.path.realpath(os.path.abspath(__file__))
-    sys.path.append(os.path.dirname(os.path.dirname(path)))
+    sys.path.insert(0, os.path.dirname(os.path.dirname(path)))
  
  import youtube_dl
  
diff --git a/youtube_dl/aes.py b/youtube_dl/aes.py

index 7817adcfdd546f70cfb76e0634b8df8ddbcaf8e0..a01c367de4f6cf5e6f9ce4d9b86de4991fa859dc 100644 (file)
--- a/youtube_dl/aes.py
+++ b/youtube_dl/aes.py
@@ -161,7 +161,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
      nonce = data[:NONCE_LENGTH_BYTES]
      cipher = data[NONCE_LENGTH_BYTES:]
  
-    class Counter:
+    class Counter(object):
          __value = nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
  
          def next_value(self):
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index e4b9286c06e12d967f60b5fbcf2c691802684163..0b6c5ca7a8ba5eb6cb064916d56a5ca8eae32003 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -1,15 +1,20 @@
  from __future__ import unicode_literals
  
+import binascii
  import collections
+import email
  import getpass
+import io
  import optparse
  import os
  import re
+import shlex
  import shutil
  import socket
  import subprocess
  import sys
  import itertools
+import xml.etree.ElementTree
  
  
  try:
@@ -37,6 +42,11 @@ try:
  except ImportError:  # Python 2
      import urlparse as compat_urlparse
  
+try:
+    import urllib.response as compat_urllib_response
+except ImportError:  # Python 2
+    import urllib as compat_urllib_response
+
  try:
      import http.cookiejar as compat_cookiejar
  except ImportError:  # Python 2
@@ -67,6 +77,11 @@ try:
  except ImportError:  # Python 2
      from urllib import urlretrieve as compat_urlretrieve
  
+try:
+    from html.parser import HTMLParser as compat_HTMLParser
+except ImportError:  # Python 2
+    from HTMLParser import HTMLParser as compat_HTMLParser
+
  
  try:
      from subprocess import DEVNULL
@@ -79,6 +94,11 @@ try:
  except ImportError:
      import BaseHTTPServer as compat_http_server
  
+try:
+    compat_str = unicode  # Python 2
+except NameError:
+    compat_str = str
+
  try:
      from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes
      from urllib.parse import unquote as compat_urllib_parse_unquote
@@ -99,7 +119,7 @@ except ImportError:  # Python 2
              # Is it a string-like object?
              string.split
              return b''
-        if isinstance(string, unicode):
+        if isinstance(string, compat_str):
              string = string.encode('utf-8')
          bits = string.split(b'%')
          if len(bits) == 1:
@@ -150,9 +170,64 @@ except ImportError:  # Python 2
          return compat_urllib_parse_unquote(string, encoding, errors)
  
  try:
-    compat_str = unicode  # Python 2
-except NameError:
-    compat_str = str
+    from urllib.parse import urlencode as compat_urllib_parse_urlencode
+except ImportError:  # Python 2
+    # Python 2 will choke in urlencode on mixture of byte and unicode strings.
+    # Possible solutions are to either port it from python 3 with all
+    # the friends or manually ensure input query contains only byte strings.
+    # We will stick with latter thus recursively encoding the whole query.
+    def compat_urllib_parse_urlencode(query, doseq=0, encoding='utf-8'):
+        def encode_elem(e):
+            if isinstance(e, dict):
+                e = encode_dict(e)
+            elif isinstance(e, (list, tuple,)):
+                list_e = encode_list(e)
+                e = tuple(list_e) if isinstance(e, tuple) else list_e
+            elif isinstance(e, compat_str):
+                e = e.encode(encoding)
+            return e
+
+        def encode_dict(d):
+            return dict((encode_elem(k), encode_elem(v)) for k, v in d.items())
+
+        def encode_list(l):
+            return [encode_elem(e) for e in l]
+
+        return compat_urllib_parse.urlencode(encode_elem(query), doseq=doseq)
+
+try:
+    from urllib.request import DataHandler as compat_urllib_request_DataHandler
+except ImportError:  # Python < 3.4
+    # Ported from CPython 98774:1733b3bd46db, Lib/urllib/request.py
+    class compat_urllib_request_DataHandler(compat_urllib_request.BaseHandler):
+        def data_open(self, req):
+            # data URLs as specified in RFC 2397.
+            #
+            # ignores POSTed data
+            #
+            # syntax:
+            # dataurl   := "data:" [ mediatype ] [ ";base64" ] "," data
+            # mediatype := [ type "/" subtype ] *( ";" parameter )
+            # data      := *urlchar
+            # parameter := attribute "=" value
+            url = req.get_full_url()
+
+            scheme, data = url.split(':', 1)
+            mediatype, data = data.split(',', 1)
+
+            # even base64 encoded data URLs might be quoted so unquote in any case:
+            data = compat_urllib_parse_unquote_to_bytes(data)
+            if mediatype.endswith(';base64'):
+                data = binascii.a2b_base64(data)
+                mediatype = mediatype[:-7]
+
+            if not mediatype:
+                mediatype = 'text/plain;charset=US-ASCII'
+
+            headers = email.message_from_string(
+                'Content-type: %s\nContent-length: %d\n' % (mediatype, len(data)))
+
+            return compat_urllib_response.addinfourl(io.BytesIO(data), headers, url)
  
  try:
      compat_basestring = basestring  # Python 2
@@ -169,6 +244,53 @@ try:
  except ImportError:  # Python 2.6
      from xml.parsers.expat import ExpatError as compat_xml_parse_error
  
+if sys.version_info[0] >= 3:
+    compat_etree_fromstring = xml.etree.ElementTree.fromstring
+else:
+    # python 2.x tries to encode unicode strings with ascii (see the
+    # XMLParser._fixtext method)
+    etree = xml.etree.ElementTree
+
+    try:
+        _etree_iter = etree.Element.iter
+    except AttributeError:  # Python <=2.6
+        def _etree_iter(root):
+            for el in root.findall('*'):
+                yield el
+                for sub in _etree_iter(el):
+                    yield sub
+
+    # on 2.6 XML doesn't have a parser argument, function copied from CPython
+    # 2.7 source
+    def _XML(text, parser=None):
+        if not parser:
+            parser = etree.XMLParser(target=etree.TreeBuilder())
+        parser.feed(text)
+        return parser.close()
+
+    def _element_factory(*args, **kwargs):
+        el = etree.Element(*args, **kwargs)
+        for k, v in el.items():
+            if isinstance(v, bytes):
+                el.set(k, v.decode('utf-8'))
+        return el
+
+    def compat_etree_fromstring(text):
+        doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
+        for el in _etree_iter(doc):
+            if el.text is not None and isinstance(el.text, bytes):
+                el.text = el.text.decode('utf-8')
+        return doc
+
+if sys.version_info < (2, 7):
+    # Here comes the crazy part: In 2.6, if the xpath is a unicode,
+    # .//node does not match if a node is a direct child of . !
+    def compat_xpath(xpath):
+        if isinstance(xpath, compat_str):
+            xpath = xpath.encode('ascii')
+        return xpath
+else:
+    compat_xpath = lambda xpath: xpath
  
  try:
      from urllib.parse import parse_qs as compat_parse_qs
@@ -187,7 +309,7 @@ except ImportError:  # Python 2
              nv = name_value.split('=', 1)
              if len(nv) != 2:
                  if strict_parsing:
-                    raise ValueError("bad query field: %r" % (name_value,))
+                    raise ValueError('bad query field: %r' % (name_value,))
                  # Handle case of a control-name with no equal sign
                  if keep_blank_values:
                      nv.append('')
@@ -227,6 +349,17 @@ except ImportError:  # Python < 3.3
              return "'" + s.replace("'", "'\"'\"'") + "'"
  
  
+if sys.version_info >= (2, 7, 3):
+    compat_shlex_split = shlex.split
+else:
+    # Working around shlex issue with unicode strings on some python 2
+    # versions (see http://bugs.python.org/issue1548891)
+    def compat_shlex_split(s, comments=False, posix=True):
+        if isinstance(s, compat_str):
+            s = s.encode('utf-8')
+        return shlex.split(s, comments, posix)
+
+
  def compat_ord(c):
      if type(c) is int:
          return c
@@ -234,6 +367,9 @@ def compat_ord(c):
          return ord(c)
  
  
+compat_os_name = os._name if os.name == 'java' else os.name
+
+
  if sys.version_info >= (3, 0):
      compat_getenv = os.getenv
      compat_expanduser = os.path.expanduser
@@ -254,7 +390,7 @@ else:
      # The following are os.path.expanduser implementations from cpython 2.7.8 stdlib
      # for different platforms with correct environment variables decoding.
  
-    if os.name == 'posix':
+    if compat_os_name == 'posix':
          def compat_expanduser(path):
              """Expand ~ and ~user constructions.  If user or $HOME is unknown,
              do nothing."""
@@ -278,7 +414,7 @@ else:
                  userhome = pwent.pw_dir
              userhome = userhome.rstrip('/')
              return (userhome + path[i:]) or '/'
-    elif os.name == 'nt' or os.name == 'ce':
+    elif compat_os_name == 'nt' or compat_os_name == 'ce':
          def compat_expanduser(path):
              """Expand ~ and ~user constructs.
  
@@ -341,7 +477,7 @@ if sys.version_info < (3, 0) and sys.platform == 'win32':
  else:
      compat_getpass = getpass.getpass
  
-# Old 2.6 and 2.7 releases require kwargs to be bytes
+# Python < 2.6.5 require kwargs to be bytes
  try:
      def _testfunc(x):
          pass
@@ -374,7 +510,7 @@ if sys.version_info < (2, 7):
          if err is not None:
              raise err
          else:
-            raise socket.error("getaddrinfo returns an empty list")
+            raise socket.error('getaddrinfo returns an empty list')
  else:
      compat_socket_create_connection = socket.create_connection
  
@@ -404,26 +540,32 @@ if hasattr(shutil, 'get_terminal_size'):  # Python >= 3.3
  else:
      _terminal_size = collections.namedtuple('terminal_size', ['columns', 'lines'])
  
-    def compat_get_terminal_size():
-        columns = compat_getenv('COLUMNS', None)
+    def compat_get_terminal_size(fallback=(80, 24)):
+        columns = compat_getenv('COLUMNS')
          if columns:
              columns = int(columns)
          else:
              columns = None
-        lines = compat_getenv('LINES', None)
+        lines = compat_getenv('LINES')
          if lines:
              lines = int(lines)
          else:
              lines = None
  
-        try:
-            sp = subprocess.Popen(
-                ['stty', 'size'],
-                stdout=subprocess.PIPE, stderr=subprocess.PIPE)
-            out, err = sp.communicate()
-            lines, columns = map(int, out.split())
-        except Exception:
-            pass
+        if columns is None or lines is None or columns <= 0 or lines <= 0:
+            try:
+                sp = subprocess.Popen(
+                    ['stty', 'size'],
+                    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+                out, err = sp.communicate()
+                _lines, _columns = map(int, out.split())
+            except Exception:
+                _columns, _lines = _terminal_size(*fallback)
+
+            if columns is None or columns <= 0:
+                columns = _columns
+            if lines is None or lines <= 0:
+                lines = _lines
          return _terminal_size(columns, lines)
  
  try:
@@ -436,12 +578,19 @@ except TypeError:  # Python 2.6
              yield n
              n += step
  
+if sys.version_info >= (3, 0):
+    from tokenize import tokenize as compat_tokenize_tokenize
+else:
+    from tokenize import generate_tokens as compat_tokenize_tokenize
+
  __all__ = [
+    'compat_HTMLParser',
      'compat_HTTPError',
      'compat_basestring',
      'compat_chr',
      'compat_cookiejar',
      'compat_cookies',
+    'compat_etree_fromstring',
      'compat_expanduser',
      'compat_get_terminal_size',
      'compat_getenv',
@@ -452,21 +601,28 @@ __all__ = [
      'compat_itertools_count',
      'compat_kwargs',
      'compat_ord',
+    'compat_os_name',
      'compat_parse_qs',
      'compat_print',
+    'compat_shlex_split',
      'compat_socket_create_connection',
      'compat_str',
      'compat_subprocess_get_DEVNULL',
+    'compat_tokenize_tokenize',
      'compat_urllib_error',
      'compat_urllib_parse',
      'compat_urllib_parse_unquote',
      'compat_urllib_parse_unquote_plus',
      'compat_urllib_parse_unquote_to_bytes',
+    'compat_urllib_parse_urlencode',
      'compat_urllib_parse_urlparse',
      'compat_urllib_request',
+    'compat_urllib_request_DataHandler',
+    'compat_urllib_response',
      'compat_urlparse',
      'compat_urlretrieve',
      'compat_xml_parse_error',
+    'compat_xpath',
      'shlex_quote',
      'subprocess_check_output',
      'workaround_optparse_bug9161',
diff --git a/youtube_dl/downloader/__init__.py b/youtube_dl/downloader/__init__.py

index dccc59212d3028bb9a96f0eb9ffff4acb0be681e..817591d97e88606b966b7055026f691faab840dc 100644 (file)
--- a/youtube_dl/downloader/__init__.py
+++ b/youtube_dl/downloader/__init__.py
@@ -1,14 +1,16 @@
  from __future__ import unicode_literals
  
  from .common import FileDownloader
-from .external import get_external_downloader
  from .f4m import F4mFD
  from .hls import HlsFD
-from .hls import NativeHlsFD
  from .http import HttpFD
-from .rtsp import RtspFD
  from .rtmp import RtmpFD
  from .dash import DashSegmentsFD
+from .rtsp import RtspFD
+from .external import (
+    get_external_downloader,
+    FFmpegFD,
+)
  
  from ..utils import (
      determine_protocol,
@@ -16,8 +18,8 @@ from ..utils import (
  
  PROTOCOL_MAP = {
      'rtmp': RtmpFD,
-    'm3u8_native': NativeHlsFD,
-    'm3u8': HlsFD,
+    'm3u8_native': HlsFD,
+    'm3u8': FFmpegFD,
      'mms': RtspFD,
      'rtsp': RtspFD,
      'f4m': F4mFD,
@@ -30,14 +32,20 @@ def get_suitable_downloader(info_dict, params={}):
      protocol = determine_protocol(info_dict)
      info_dict['protocol'] = protocol
  
+    # if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
+    #     return FFmpegFD
+
      external_downloader = params.get('external_downloader')
      if external_downloader is not None:
          ed = get_external_downloader(external_downloader)
-        if ed.supports(info_dict):
+        if ed.can_download(info_dict):
              return ed
  
-    if protocol == 'm3u8' and params.get('hls_prefer_native'):
-        return NativeHlsFD
+    if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
+        return HlsFD
+
+    if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
+        return FFmpegFD
  
      return PROTOCOL_MAP.get(protocol, HttpFD)
  
diff --git a/youtube_dl/downloader/common.py b/youtube_dl/downloader/common.py

index 97e755d4baa56972a9a4e5223a6871edd8bf0565..1dba9f49a8b9b586b8428c6c7d65ca641de02c58 100644 (file)
--- a/youtube_dl/downloader/common.py
+++ b/youtube_dl/downloader/common.py
@@ -5,9 +5,10 @@ import re
  import sys
  import time
  
-from ..compat import compat_str
+from ..compat import compat_os_name
  from ..utils import (
      encodeFilename,
+    error_to_compat_str,
      decodeArgument,
      format_bytes,
      timeconvert,
@@ -42,9 +43,10 @@ class FileDownloader(object):
      min_filesize:       Skip files smaller than this size
      max_filesize:       Skip files larger than this size
      xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
-                        (experimenatal)
+                        (experimental)
      external_downloader_args:  A list of additional command-line arguments for the
                          external downloader.
+    hls_use_mpegts:     Use the mpegts container for HLS videos.
  
      Subclasses of this one must re-define the real_download method.
      """
@@ -113,6 +115,10 @@ class FileDownloader(object):
              return '%10s' % '---b/s'
          return '%10s' % ('%s/s' % format_bytes(speed))
  
+    @staticmethod
+    def format_retries(retries):
+        return 'inf' if retries == float('inf') else '%.0f' % retries
+
      @staticmethod
      def best_block_size(elapsed_time, bytes):
          new_min = max(bytes / 2.0, 1.0)
@@ -156,7 +162,7 @@ class FileDownloader(object):
  
      def slow_down(self, start_time, now, byte_counter):
          """Sleep if the download speed is over the rate limit."""
-        rate_limit = self.params.get('ratelimit', None)
+        rate_limit = self.params.get('ratelimit')
          if rate_limit is None or byte_counter == 0:
              return
          if now is None:
@@ -186,7 +192,7 @@ class FileDownloader(object):
                  return
              os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
          except (IOError, OSError) as err:
-            self.report_error('unable to rename file: %s' % compat_str(err))
+            self.report_error('unable to rename file: %s' % error_to_compat_str(err))
  
      def try_utime(self, filename, last_modified_hdr):
          """Try to set the last-modified time of the given file."""
@@ -218,7 +224,7 @@ class FileDownloader(object):
          if self.params.get('progress_with_newline', False):
              self.to_screen(fullmsg)
          else:
-            if os.name == 'nt':
+            if compat_os_name == 'nt':
                  prev_len = getattr(self, '_report_progress_prev_line_length',
                                     0)
                  if prev_len > len(fullmsg):
@@ -295,7 +301,9 @@ class FileDownloader(object):
  
      def report_retry(self, count, retries):
          """Report retry in case of HTTP error 5xx"""
-        self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %d)...' % (count, retries))
+        self.to_screen(
+            '[download] Got server HTTP error. Retrying (attempt %d of %s)...'
+            % (count, self.format_retries(retries)))
  
      def report_file_already_downloaded(self, file_name):
          """Report file has already been fully downloaded."""
@@ -325,7 +333,7 @@ class FileDownloader(object):
          )
  
          # Check file already present
-        if filename != '-' and nooverwrites_and_exists or continuedl_and_exists:
+        if filename != '-' and (nooverwrites_and_exists or continuedl_and_exists):
              self.report_file_already_downloaded(filename)
              self._hook_progress({
                  'filename': filename,
diff --git a/youtube_dl/downloader/dash.py b/youtube_dl/downloader/dash.py

index 8b6fa2753adbafcb1a0ab26788dcec2be5903638..8bbab9dbc596c659db622fe9910d0ae90018a598 100644 (file)
--- a/youtube_dl/downloader/dash.py
+++ b/youtube_dl/downloader/dash.py
@@ -1,66 +1,81 @@
  from __future__ import unicode_literals
  
+import os
  import re
  
-from .common import FileDownloader
-from ..compat import compat_urllib_request
+from .fragment import FragmentFD
+from ..compat import compat_urllib_error
+from ..utils import (
+    sanitize_open,
+    encodeFilename,
+)
  
  
-class DashSegmentsFD(FileDownloader):
+class DashSegmentsFD(FragmentFD):
      """
      Download segments in a DASH manifest
      """
-    def real_download(self, filename, info_dict):
-        self.report_destination(filename)
-        tmpfilename = self.temp_name(filename)
-        base_url = info_dict['url']
-        segment_urls = info_dict['segment_urls']
-
-        is_test = self.params.get('test', False)
-        remaining_bytes = self._TEST_FILE_SIZE if is_test else None
-        byte_counter = 0
  
-        def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
-            self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
-            req = compat_urllib_request.Request(target_url)
-            if remaining_bytes is not None:
-                req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
+    FD_NAME = 'dashsegments'
  
-            data = self.ydl.urlopen(req).read()
+    def real_download(self, filename, info_dict):
+        base_url = info_dict['url']
+        segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
+        initialization_url = info_dict.get('initialization_url')
  
-            if remaining_bytes is not None:
-                data = data[:remaining_bytes]
+        ctx = {
+            'filename': filename,
+            'total_frags': len(segment_urls) + (1 if initialization_url else 0),
+        }
  
-            outf.write(data)
-            return len(data)
+        self._prepare_and_start_frag_download(ctx)
  
          def combine_url(base_url, target_url):
              if re.match(r'^https?://', target_url):
                  return target_url
              return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
  
-        with open(tmpfilename, 'wb') as outf:
-            append_url_to_file(
-                outf, combine_url(base_url, info_dict['initialization_url']),
-                'initialization segment')
-            for i, segment_url in enumerate(segment_urls):
-                segment_len = append_url_to_file(
-                    outf, combine_url(base_url, segment_url),
-                    'segment %d / %d' % (i + 1, len(segment_urls)),
-                    remaining_bytes)
-                byte_counter += segment_len
-                if remaining_bytes is not None:
-                    remaining_bytes -= segment_len
-                    if remaining_bytes <= 0:
-                        break
-
-        self.try_rename(tmpfilename, filename)
-
-        self._hook_progress({
-            'downloaded_bytes': byte_counter,
-            'total_bytes': byte_counter,
-            'filename': filename,
-            'status': 'finished',
-        })
+        segments_filenames = []
+
+        fragment_retries = self.params.get('fragment_retries', 0)
+
+        def append_url_to_file(target_url, tmp_filename, segment_name):
+            target_filename = '%s-%s' % (tmp_filename, segment_name)
+            count = 0
+            while count <= fragment_retries:
+                try:
+                    success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
+                    if not success:
+                        return False
+                    down, target_sanitized = sanitize_open(target_filename, 'rb')
+                    ctx['dest_stream'].write(down.read())
+                    down.close()
+                    segments_filenames.append(target_sanitized)
+                    break
+                except (compat_urllib_error.HTTPError, ) as err:
+                    # YouTube may often return 404 HTTP error for a fragment causing the
+                    # whole download to fail. However if the same fragment is immediately
+                    # retried with the same request data this usually succeeds (1-2 attemps
+                    # is usually enough) thus allowing to download the whole file successfully.
+                    # So, we will retry all fragments that fail with 404 HTTP error for now.
+                    if err.code != 404:
+                        raise
+                    # Retry fragment
+                    count += 1
+                    if count <= fragment_retries:
+                        self.report_retry_fragment(segment_name, count, fragment_retries)
+            if count > fragment_retries:
+                self.report_error('giving up after %s fragment retries' % fragment_retries)
+                return False
+
+        if initialization_url:
+            append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
+        for i, segment_url in enumerate(segment_urls):
+            append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
+
+        self._finish_frag_download(ctx)
+
+        for segment_file in segments_filenames:
+            os.remove(encodeFilename(segment_file))
  
          return True
diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py

index 1d5cc99043d02f658064e688c268c37171c37325..8d642fc3e60594f10a057847cf5702f715941326 100644 (file)
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -2,11 +2,20 @@ from __future__ import unicode_literals
  
  import os.path
  import subprocess
+import sys
+import re
  
  from .common import FileDownloader
+from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
  from ..utils import (
+    cli_option,
+    cli_valueless_option,
+    cli_bool_option,
+    cli_configuration_args,
      encodeFilename,
      encodeArgument,
+    handle_youtubedl_headers,
+    check_executable,
  )
  
  
@@ -41,22 +50,29 @@ class ExternalFD(FileDownloader):
      def exe(self):
          return self.params.get('external_downloader')
  
+    @classmethod
+    def available(cls):
+        return check_executable(cls.get_basename(), [cls.AVAILABLE_OPT])
+
      @classmethod
      def supports(cls, info_dict):
          return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps')
  
-    def _source_address(self, command_option):
-        source_address = self.params.get('source_address')
-        if source_address is None:
-            return []
-        return [command_option, source_address]
+    @classmethod
+    def can_download(cls, info_dict):
+        return cls.available() and cls.supports(info_dict)
+
+    def _option(self, command_option, param):
+        return cli_option(self.params, command_option, param)
+
+    def _bool_option(self, command_option, param, true_value='true', false_value='false', separator=None):
+        return cli_bool_option(self.params, command_option, param, true_value, false_value, separator)
+
+    def _valueless_option(self, command_option, param, expected_value=True):
+        return cli_valueless_option(self.params, command_option, param, expected_value)
  
      def _configuration_args(self, default=[]):
-        ex_args = self.params.get('external_downloader_args')
-        if ex_args is None:
-            return default
-        assert isinstance(ex_args, list)
-        return ex_args
+        return cli_configuration_args(self.params, 'external_downloader_args', default)
  
      def _call_downloader(self, tmpfilename, info_dict):
          """ Either overwrite this or implement _make_cmd """
@@ -73,28 +89,50 @@ class ExternalFD(FileDownloader):
  
  
  class CurlFD(ExternalFD):
+    AVAILABLE_OPT = '-V'
+
      def _make_cmd(self, tmpfilename, info_dict):
          cmd = [self.exe, '--location', '-o', tmpfilename]
          for key, val in info_dict['http_headers'].items():
              cmd += ['--header', '%s: %s' % (key, val)]
-        cmd += self._source_address('--interface')
+        cmd += self._option('--interface', 'source_address')
+        cmd += self._option('--proxy', 'proxy')
+        cmd += self._valueless_option('--insecure', 'nocheckcertificate')
+        cmd += self._configuration_args()
+        cmd += ['--', info_dict['url']]
+        return cmd
+
+
+class AxelFD(ExternalFD):
+    AVAILABLE_OPT = '-V'
+
+    def _make_cmd(self, tmpfilename, info_dict):
+        cmd = [self.exe, '-o', tmpfilename]
+        for key, val in info_dict['http_headers'].items():
+            cmd += ['-H', '%s: %s' % (key, val)]
          cmd += self._configuration_args()
          cmd += ['--', info_dict['url']]
          return cmd
  
  
  class WgetFD(ExternalFD):
+    AVAILABLE_OPT = '--version'
+
      def _make_cmd(self, tmpfilename, info_dict):
          cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies']
          for key, val in info_dict['http_headers'].items():
              cmd += ['--header', '%s: %s' % (key, val)]
-        cmd += self._source_address('--bind-address')
+        cmd += self._option('--bind-address', 'source_address')
+        cmd += self._option('--proxy', 'proxy')
+        cmd += self._valueless_option('--no-check-certificate', 'nocheckcertificate')
          cmd += self._configuration_args()
          cmd += ['--', info_dict['url']]
          return cmd
  
  
  class Aria2cFD(ExternalFD):
+    AVAILABLE_OPT = '-v'
+
      def _make_cmd(self, tmpfilename, info_dict):
          cmd = [self.exe, '-c']
          cmd += self._configuration_args([
@@ -105,18 +143,120 @@ class Aria2cFD(ExternalFD):
          cmd += ['--out', os.path.basename(tmpfilename)]
          for key, val in info_dict['http_headers'].items():
              cmd += ['--header', '%s: %s' % (key, val)]
-        cmd += self._source_address('--interface')
+        cmd += self._option('--interface', 'source_address')
+        cmd += self._option('--all-proxy', 'proxy')
+        cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
          cmd += ['--', info_dict['url']]
          return cmd
  
  
  class HttpieFD(ExternalFD):
+    @classmethod
+    def available(cls):
+        return check_executable('http', ['--version'])
+
      def _make_cmd(self, tmpfilename, info_dict):
          cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
          for key, val in info_dict['http_headers'].items():
              cmd += ['%s:%s' % (key, val)]
          return cmd
  
+
+class FFmpegFD(ExternalFD):
+    @classmethod
+    def supports(cls, info_dict):
+        return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
+
+    @classmethod
+    def available(cls):
+        return FFmpegPostProcessor().available
+
+    def _call_downloader(self, tmpfilename, info_dict):
+        url = info_dict['url']
+        ffpp = FFmpegPostProcessor(downloader=self)
+        if not ffpp.available:
+            self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
+            return False
+        ffpp.check_version()
+
+        args = [ffpp.executable, '-y']
+
+        args += self._configuration_args()
+
+        # start_time = info_dict.get('start_time') or 0
+        # if start_time:
+        #     args += ['-ss', compat_str(start_time)]
+        # end_time = info_dict.get('end_time')
+        # if end_time:
+        #     args += ['-t', compat_str(end_time - start_time)]
+
+        if info_dict['http_headers'] and re.match(r'^https?://', url):
+            # Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
+            # [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
+            headers = handle_youtubedl_headers(info_dict['http_headers'])
+            args += [
+                '-headers',
+                ''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
+
+        protocol = info_dict.get('protocol')
+
+        if protocol == 'rtmp':
+            player_url = info_dict.get('player_url')
+            page_url = info_dict.get('page_url')
+            app = info_dict.get('app')
+            play_path = info_dict.get('play_path')
+            tc_url = info_dict.get('tc_url')
+            flash_version = info_dict.get('flash_version')
+            live = info_dict.get('rtmp_live', False)
+            if player_url is not None:
+                args += ['-rtmp_swfverify', player_url]
+            if page_url is not None:
+                args += ['-rtmp_pageurl', page_url]
+            if app is not None:
+                args += ['-rtmp_app', app]
+            if play_path is not None:
+                args += ['-rtmp_playpath', play_path]
+            if tc_url is not None:
+                args += ['-rtmp_tcurl', tc_url]
+            if flash_version is not None:
+                args += ['-rtmp_flashver', flash_version]
+            if live:
+                args += ['-rtmp_live', 'live']
+
+        args += ['-i', url, '-c', 'copy']
+        if protocol == 'm3u8':
+            if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
+                args += ['-f', 'mpegts']
+            else:
+                args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
+        elif protocol == 'rtmp':
+            args += ['-f', 'flv']
+        else:
+            args += ['-f', EXT_TO_OUT_FORMATS.get(info_dict['ext'], info_dict['ext'])]
+
+        args = [encodeArgument(opt) for opt in args]
+        args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
+
+        self._debug_cmd(args)
+
+        proc = subprocess.Popen(args, stdin=subprocess.PIPE)
+        try:
+            retval = proc.wait()
+        except KeyboardInterrupt:
+            # subprocces.run would send the SIGKILL signal to ffmpeg and the
+            # mp4 file couldn't be played, but if we ask ffmpeg to quit it
+            # produces a file that is playable (this is mostly useful for live
+            # streams). Note that Windows is not affected and produces playable
+            # files (see https://github.com/rg3/youtube-dl/issues/8300).
+            if sys.platform != 'win32':
+                proc.communicate(b'q')
+            raise
+        return retval
+
+
+class AVconvFD(FFmpegFD):
+    pass
+
  _BY_NAME = dict(
      (klass.get_basename(), klass)
      for name, klass in globals().items()
diff --git a/youtube_dl/downloader/f4m.py b/youtube_dl/downloader/f4m.py

index b1a858c452617ed452bc0dcae8d612d22fd224d3..664d87543d07f7c357b803e0a0058034b71276a6 100644 (file)
--- a/youtube_dl/downloader/f4m.py
+++ b/youtube_dl/downloader/f4m.py
@@ -5,19 +5,20 @@ import io
  import itertools
  import os
  import time
-import xml.etree.ElementTree as etree
  
-from .common import FileDownloader
-from .http import HttpFD
+from .fragment import FragmentFD
  from ..compat import (
+    compat_etree_fromstring,
      compat_urlparse,
      compat_urllib_error,
+    compat_urllib_parse_urlparse,
  )
  from ..utils import (
-    struct_pack,
-    struct_unpack,
      encodeFilename,
+    fix_xml_ampersands,
      sanitize_open,
+    struct_pack,
+    struct_unpack,
      xpath_text,
  )
  
@@ -222,20 +223,23 @@ def write_metadata_tag(stream, metadata):
          write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
  
  
-def _add_ns(prop):
-    return '{http://ns.adobe.com/f4m/1.0}%s' % prop
+def remove_encrypted_media(media):
+    return list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
+                                 'drmAdditionalHeaderSetId' not in e.attrib,
+                       media))
  
  
-class HttpQuietDownloader(HttpFD):
-    def to_screen(self, *args, **kargs):
-        pass
+def _add_ns(prop):
+    return '{http://ns.adobe.com/f4m/1.0}%s' % prop
  
  
-class F4mFD(FileDownloader):
+class F4mFD(FragmentFD):
      """
      A downloader for f4m manifests or AdobeHDS.
      """
  
+    FD_NAME = 'f4m'
+
      def _get_unencrypted_media(self, doc):
          media = doc.findall(_add_ns('media'))
          if not media:
@@ -246,9 +250,7 @@ class F4mFD(FileDownloader):
              # without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute
              if 'id' not in e.attrib:
                  self.report_error('Missing ID in f4m DRM')
-        media = list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
-                                      'drmAdditionalHeaderSetId' not in e.attrib,
-                            media))
+        media = remove_encrypted_media(media)
          if not media:
              self.report_error('Unsupported DRM')
          return media
@@ -275,23 +277,34 @@ class F4mFD(FileDownloader):
          return fragments_list
  
      def _parse_bootstrap_node(self, node, base_url):
-        if node.text is None:
+        # Sometimes non empty inline bootstrap info can be specified along
+        # with bootstrap url attribute (e.g. dummy inline bootstrap info
+        # contains whitespace characters in [1]). We will prefer bootstrap
+        # url over inline bootstrap info when present.
+        # 1. http://live-1-1.rutube.ru/stream/1024/HDS/SD/C2NKsS85HQNckgn5HdEmOQ/1454167650/S-s604419906/move/four/dirs/upper/1024-576p.f4m
+        bootstrap_url = node.get('url')
+        if bootstrap_url:
              bootstrap_url = compat_urlparse.urljoin(
-                base_url, node.attrib['url'])
+                base_url, bootstrap_url)
              boot_info = self._get_bootstrap_from_url(bootstrap_url)
          else:
              bootstrap_url = None
              bootstrap = base64.b64decode(node.text.encode('ascii'))
              boot_info = read_bootstrap_info(bootstrap)
-        return (boot_info, bootstrap_url)
+        return boot_info, bootstrap_url
  
      def real_download(self, filename, info_dict):
          man_url = info_dict['url']
          requested_bitrate = info_dict.get('tbr')
-        self.to_screen('[download] Downloading f4m manifest')
-        manifest = self.ydl.urlopen(man_url).read()
-
-        doc = etree.fromstring(manifest)
+        self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
+        urlh = self.ydl.urlopen(man_url)
+        man_url = urlh.geturl()
+        # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
+        # (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244
+        # and https://github.com/rg3/youtube-dl/issues/7823)
+        manifest = fix_xml_ampersands(urlh.read().decode('utf-8', 'ignore')).strip()
+
+        doc = compat_etree_fromstring(manifest)
          formats = [(int(f.attrib.get('bitrate', -1)), f)
                     for f in self._get_unencrypted_media(doc)]
          if requested_bitrate is None:
@@ -313,101 +326,62 @@ class F4mFD(FileDownloader):
              metadata = None
  
          fragments_list = build_fragments_list(boot_info)
-        if self.params.get('test', False):
+        test = self.params.get('test', False)
+        if test:
              # We only download the first fragment
              fragments_list = fragments_list[:1]
          total_frags = len(fragments_list)
          # For some akamai manifests we'll need to add a query to the fragment url
          akamai_pv = xpath_text(doc, _add_ns('pv-2.0'))
  
-        self.report_destination(filename)
-        http_dl = HttpQuietDownloader(
-            self.ydl,
-            {
-                'continuedl': True,
-                'quiet': True,
-                'noprogress': True,
-                'ratelimit': self.params.get('ratelimit', None),
-                'test': self.params.get('test', False),
-            }
-        )
-        tmpfilename = self.temp_name(filename)
-        (dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb')
+        ctx = {
+            'filename': filename,
+            'total_frags': total_frags,
+            'live': live,
+        }
+
+        self._prepare_frag_download(ctx)
+
+        dest_stream = ctx['dest_stream']
  
          write_flv_header(dest_stream)
          if not live:
              write_metadata_tag(dest_stream, metadata)
  
-        # This dict stores the download progress, it's updated by the progress
-        # hook
-        state = {
-            'status': 'downloading',
-            'downloaded_bytes': 0,
-            'frag_index': 0,
-            'frag_count': total_frags,
-            'filename': filename,
-            'tmpfilename': tmpfilename,
-        }
-        start = time.time()
-
-        def frag_progress_hook(s):
-            if s['status'] not in ('downloading', 'finished'):
-                return
-
-            frag_total_bytes = s.get('total_bytes', 0)
-            if s['status'] == 'finished':
-                state['downloaded_bytes'] += frag_total_bytes
-                state['frag_index'] += 1
-
-            estimated_size = (
-                (state['downloaded_bytes'] + frag_total_bytes) /
-                (state['frag_index'] + 1) * total_frags)
-            time_now = time.time()
-            state['total_bytes_estimate'] = estimated_size
-            state['elapsed'] = time_now - start
-
-            if s['status'] == 'finished':
-                progress = self.calc_percent(state['frag_index'], total_frags)
-            else:
-                frag_downloaded_bytes = s['downloaded_bytes']
-                frag_progress = self.calc_percent(frag_downloaded_bytes,
-                                                  frag_total_bytes)
-                progress = self.calc_percent(state['frag_index'], total_frags)
-                progress += frag_progress / float(total_frags)
-
-                state['eta'] = self.calc_eta(
-                    start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
-                state['speed'] = s.get('speed')
-            self._hook_progress(state)
+        base_url_parsed = compat_urllib_parse_urlparse(base_url)
  
-        http_dl.add_progress_hook(frag_progress_hook)
+        self._start_frag_download(ctx)
  
          frags_filenames = []
          while fragments_list:
              seg_i, frag_i = fragments_list.pop(0)
              name = 'Seg%d-Frag%d' % (seg_i, frag_i)
-            url = base_url + name
+            query = []
+            if base_url_parsed.query:
+                query.append(base_url_parsed.query)
              if akamai_pv:
-                url += '?' + akamai_pv.strip(';')
+                query.append(akamai_pv.strip(';'))
              if info_dict.get('extra_param_to_segment_url'):
-                url += info_dict.get('extra_param_to_segment_url')
-            frag_filename = '%s-%s' % (tmpfilename, name)
+                query.append(info_dict['extra_param_to_segment_url'])
+            url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
+            frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
              try:
-                success = http_dl.download(frag_filename, {'url': url})
+                success = ctx['dl'].download(frag_filename, {'url': url_parsed.geturl()})
                  if not success:
                      return False
-                with open(frag_filename, 'rb') as down:
-                    down_data = down.read()
-                    reader = FlvReader(down_data)
-                    while True:
-                        _, box_type, box_data = reader.read_box_info()
-                        if box_type == b'mdat':
-                            dest_stream.write(box_data)
-                            break
+                (down, frag_sanitized) = sanitize_open(frag_filename, 'rb')
+                down_data = down.read()
+                down.close()
+                reader = FlvReader(down_data)
+                while True:
+                    _, box_type, box_data = reader.read_box_info()
+                    if box_type == b'mdat':
+                        dest_stream.write(box_data)
+                        break
                  if live:
-                    os.remove(frag_filename)
+                    os.remove(encodeFilename(frag_sanitized))
                  else:
-                    frags_filenames.append(frag_filename)
+                    frags_filenames.append(frag_sanitized)
              except (compat_urllib_error.HTTPError, ) as err:
                  if live and (err.code == 404 or err.code == 410):
                      # We didn't keep up with the live window. Continue
@@ -418,27 +392,16 @@ class F4mFD(FileDownloader):
                  else:
                      raise
  
-            if not fragments_list and live and bootstrap_url:
+            if not fragments_list and not test and live and bootstrap_url:
                  fragments_list = self._update_live_fragments(bootstrap_url, frag_i)
                  total_frags += len(fragments_list)
                  if fragments_list and (fragments_list[0][1] > frag_i + 1):
                      msg = 'Missed %d fragments' % (fragments_list[0][1] - (frag_i + 1))
                      self.report_warning(msg)
  
-        dest_stream.close()
+        self._finish_frag_download(ctx)
  
-        elapsed = time.time() - start
-        self.try_rename(tmpfilename, filename)
          for frag_file in frags_filenames:
-            os.remove(frag_file)
-
-        fsize = os.path.getsize(encodeFilename(filename))
-        self._hook_progress({
-            'downloaded_bytes': fsize,
-            'total_bytes': fsize,
-            'filename': filename,
-            'status': 'finished',
-            'elapsed': elapsed,
-        })
+            os.remove(encodeFilename(frag_file))
  
          return True
diff --git a/youtube_dl/downloader/fragment.py b/youtube_dl/downloader/fragment.py

new file mode 100644 (file)

index 0000000..ba903ae
--- /dev/null
+++ b/youtube_dl/downloader/fragment.py
@@ -0,0 +1,132 @@
+from __future__ import division, unicode_literals
+
+import os
+import time
+
+from .common import FileDownloader
+from .http import HttpFD
+from ..utils import (
+    encodeFilename,
+    sanitize_open,
+)
+
+
+class HttpQuietDownloader(HttpFD):
+    def to_screen(self, *args, **kargs):
+        pass
+
+
+class FragmentFD(FileDownloader):
+    """
+    A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests).
+
+    Available options:
+
+    fragment_retries:   Number of times to retry a fragment for HTTP error (DASH only)
+    """
+
+    def report_retry_fragment(self, fragment_name, count, retries):
+        self.to_screen(
+            '[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
+            % (fragment_name, count, self.format_retries(retries)))
+
+    def _prepare_and_start_frag_download(self, ctx):
+        self._prepare_frag_download(ctx)
+        self._start_frag_download(ctx)
+
+    def _prepare_frag_download(self, ctx):
+        if 'live' not in ctx:
+            ctx['live'] = False
+        self.to_screen(
+            '[%s] Total fragments: %s'
+            % (self.FD_NAME, ctx['total_frags'] if not ctx['live'] else 'unknown (live)'))
+        self.report_destination(ctx['filename'])
+        dl = HttpQuietDownloader(
+            self.ydl,
+            {
+                'continuedl': True,
+                'quiet': True,
+                'noprogress': True,
+                'ratelimit': self.params.get('ratelimit'),
+                'retries': self.params.get('retries', 0),
+                'test': self.params.get('test', False),
+            }
+        )
+        tmpfilename = self.temp_name(ctx['filename'])
+        dest_stream, tmpfilename = sanitize_open(tmpfilename, 'wb')
+        ctx.update({
+            'dl': dl,
+            'dest_stream': dest_stream,
+            'tmpfilename': tmpfilename,
+        })
+
+    def _start_frag_download(self, ctx):
+        total_frags = ctx['total_frags']
+        # This dict stores the download progress, it's updated by the progress
+        # hook
+        state = {
+            'status': 'downloading',
+            'downloaded_bytes': 0,
+            'frag_index': 0,
+            'frag_count': total_frags,
+            'filename': ctx['filename'],
+            'tmpfilename': ctx['tmpfilename'],
+        }
+
+        start = time.time()
+        ctx.update({
+            'started': start,
+            # Total complete fragments downloaded so far in bytes
+            'complete_frags_downloaded_bytes': 0,
+            # Amount of fragment's bytes downloaded by the time of the previous
+            # frag progress hook invocation
+            'prev_frag_downloaded_bytes': 0,
+        })
+
+        def frag_progress_hook(s):
+            if s['status'] not in ('downloading', 'finished'):
+                return
+
+            time_now = time.time()
+            state['elapsed'] = time_now - start
+            frag_total_bytes = s.get('total_bytes') or 0
+            if not ctx['live']:
+                estimated_size = (
+                    (ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
+                    (state['frag_index'] + 1) * total_frags)
+                state['total_bytes_estimate'] = estimated_size
+
+            if s['status'] == 'finished':
+                state['frag_index'] += 1
+                state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
+                ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
+                ctx['prev_frag_downloaded_bytes'] = 0
+            else:
+                frag_downloaded_bytes = s['downloaded_bytes']
+                state['downloaded_bytes'] += frag_downloaded_bytes - ctx['prev_frag_downloaded_bytes']
+                if not ctx['live']:
+                    state['eta'] = self.calc_eta(
+                        start, time_now, estimated_size,
+                        state['downloaded_bytes'])
+                state['speed'] = s.get('speed') or ctx.get('speed')
+                ctx['speed'] = state['speed']
+                ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
+            self._hook_progress(state)
+
+        ctx['dl'].add_progress_hook(frag_progress_hook)
+
+        return start
+
+    def _finish_frag_download(self, ctx):
+        ctx['dest_stream'].close()
+        elapsed = time.time() - ctx['started']
+        self.try_rename(ctx['tmpfilename'], ctx['filename'])
+        fsize = os.path.getsize(encodeFilename(ctx['filename']))
+
+        self._hook_progress({
+            'downloaded_bytes': fsize,
+            'total_bytes': fsize,
+            'filename': ctx['filename'],
+            'status': 'finished',
+            'elapsed': elapsed,
+        })
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index 8be4f424907e55adfac91af5eb587b62b54b8487..a01dac031aa3b0c012a4262d210d16ef2b10a47a 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -1,104 +1,62 @@
  from __future__ import unicode_literals
  
-import os
+import os.path
  import re
-import subprocess
  
-from ..postprocessor.ffmpeg import FFmpegPostProcessor
-from .common import FileDownloader
-from ..compat import (
-    compat_urlparse,
-    compat_urllib_request,
-)
+from .fragment import FragmentFD
+
+from ..compat import compat_urlparse
  from ..utils import (
-    encodeArgument,
      encodeFilename,
+    sanitize_open,
  )
  
  
-class HlsFD(FileDownloader):
-    def real_download(self, filename, info_dict):
-        url = info_dict['url']
-        self.report_destination(filename)
-        tmpfilename = self.temp_name(filename)
-
-        ffpp = FFmpegPostProcessor(downloader=self)
-        if not ffpp.available:
-            self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
-            return False
-        ffpp.check_version()
-
-        args = [
-            encodeArgument(opt)
-            for opt in (ffpp.executable, '-y', '-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc')]
-        args.append(encodeFilename(tmpfilename, True))
-
-        retval = subprocess.call(args)
-        if retval == 0:
-            fsize = os.path.getsize(encodeFilename(tmpfilename))
-            self.to_screen('\r[%s] %s bytes' % (args[0], fsize))
-            self.try_rename(tmpfilename, filename)
-            self._hook_progress({
-                'downloaded_bytes': fsize,
-                'total_bytes': fsize,
-                'filename': filename,
-                'status': 'finished',
-            })
-            return True
-        else:
-            self.to_stderr('\n')
-            self.report_error('%s exited with code %d' % (ffpp.basename, retval))
-            return False
-
+class HlsFD(FragmentFD):
+    """ A limited implementation that does not require ffmpeg """
  
-class NativeHlsFD(FileDownloader):
-    """ A more limited implementation that does not require ffmpeg """
+    FD_NAME = 'hlsnative'
  
      def real_download(self, filename, info_dict):
-        url = info_dict['url']
-        self.report_destination(filename)
-        tmpfilename = self.temp_name(filename)
+        man_url = info_dict['url']
+        self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
+        manifest = self.ydl.urlopen(man_url).read()
  
-        self.to_screen(
-            '[hlsnative] %s: Downloading m3u8 manifest' % info_dict['id'])
-        data = self.ydl.urlopen(url).read()
-        s = data.decode('utf-8', 'ignore')
-        segment_urls = []
+        s = manifest.decode('utf-8', 'ignore')
+        fragment_urls = []
          for line in s.splitlines():
              line = line.strip()
              if line and not line.startswith('#'):
                  segment_url = (
                      line
                      if re.match(r'^https?://', line)
-                    else compat_urlparse.urljoin(url, line))
-                segment_urls.append(segment_url)
-
-        is_test = self.params.get('test', False)
-        remaining_bytes = self._TEST_FILE_SIZE if is_test else None
-        byte_counter = 0
-        with open(tmpfilename, 'wb') as outf:
-            for i, segurl in enumerate(segment_urls):
-                self.to_screen(
-                    '[hlsnative] %s: Downloading segment %d / %d' %
-                    (info_dict['id'], i + 1, len(segment_urls)))
-                seg_req = compat_urllib_request.Request(segurl)
-                if remaining_bytes is not None:
-                    seg_req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
-
-                segment = self.ydl.urlopen(seg_req).read()
-                if remaining_bytes is not None:
-                    segment = segment[:remaining_bytes]
-                    remaining_bytes -= len(segment)
-                outf.write(segment)
-                byte_counter += len(segment)
-                if remaining_bytes is not None and remaining_bytes <= 0:
+                    else compat_urlparse.urljoin(man_url, line))
+                fragment_urls.append(segment_url)
+                # We only download the first fragment during the test
+                if self.params.get('test', False):
                      break
  
-        self._hook_progress({
-            'downloaded_bytes': byte_counter,
-            'total_bytes': byte_counter,
+        ctx = {
              'filename': filename,
-            'status': 'finished',
-        })
-        self.try_rename(tmpfilename, filename)
+            'total_frags': len(fragment_urls),
+        }
+
+        self._prepare_and_start_frag_download(ctx)
+
+        frags_filenames = []
+        for i, frag_url in enumerate(fragment_urls):
+            frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
+            success = ctx['dl'].download(frag_filename, {'url': frag_url})
+            if not success:
+                return False
+            down, frag_sanitized = sanitize_open(frag_filename, 'rb')
+            ctx['dest_stream'].write(down.read())
+            down.close()
+            frags_filenames.append(frag_sanitized)
+
+        self._finish_frag_download(ctx)
+
+        for frag_file in frags_filenames:
+            os.remove(encodeFilename(frag_file))
+
          return True
diff --git a/youtube_dl/downloader/http.py b/youtube_dl/downloader/http.py

index b7f144af9ea33a102246632e04e71707be3d98ad..f8b69d186ac5ee93c8402f85bc66e7ed59570118 100644 (file)
--- a/youtube_dl/downloader/http.py
+++ b/youtube_dl/downloader/http.py
@@ -4,16 +4,15 @@ import errno
  import os
  import socket
  import time
+import re
  
  from .common import FileDownloader
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_error,
-)
+from ..compat import compat_urllib_error
  from ..utils import (
      ContentTooShortError,
      encodeFilename,
      sanitize_open,
+    sanitized_Request,
  )
  
  
@@ -28,8 +27,8 @@ class HttpFD(FileDownloader):
          add_headers = info_dict.get('http_headers')
          if add_headers:
              headers.update(add_headers)
-        basic_request = compat_urllib_request.Request(url, None, headers)
-        request = compat_urllib_request.Request(url, None, headers)
+        basic_request = sanitized_Request(url, None, headers)
+        request = sanitized_Request(url, None, headers)
  
          is_test = self.params.get('test', False)
  
@@ -57,6 +56,24 @@ class HttpFD(FileDownloader):
              # Establish connection
              try:
                  data = self.ydl.urlopen(request)
+                # When trying to resume, Content-Range HTTP header of response has to be checked
+                # to match the value of requested Range HTTP header. This is due to a webservers
+                # that don't support resuming and serve a whole file with no Content-Range
+                # set in response despite of requested Range (see
+                # https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
+                if resume_len > 0:
+                    content_range = data.headers.get('Content-Range')
+                    if content_range:
+                        content_range_m = re.search(r'bytes (\d+)-', content_range)
+                        # Content-Range is present and matches requested Range, resume is possible
+                        if content_range_m and resume_len == int(content_range_m.group(1)):
+                            break
+                    # Content-Range is either not present or invalid. Assuming remote webserver is
+                    # trying to send the whole file, resume is not possible, so wiping the local file
+                    # and performing entire redownload
+                    self.report_unable_to_resume()
+                    resume_len = 0
+                    open_mode = 'wb'
                  break
              except (compat_urllib_error.HTTPError, ) as err:
                  if (err.code < 500 or err.code >= 600) and err.code != 416:
@@ -123,8 +140,8 @@ class HttpFD(FileDownloader):
  
          if data_len is not None:
              data_len = int(data_len) + resume_len
-            min_data_len = self.params.get("min_filesize", None)
-            max_data_len = self.params.get("max_filesize", None)
+            min_data_len = self.params.get('min_filesize')
+            max_data_len = self.params.get('max_filesize')
              if min_data_len is not None and data_len < min_data_len:
                  self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
                  return False
diff --git a/youtube_dl/downloader/rtmp.py b/youtube_dl/downloader/rtmp.py

index 7d19bb808a820da77aeb21070ebbdec4355f6739..9de6e70bbc66b3e7db591c5713599fee82182663 100644 (file)
--- a/youtube_dl/downloader/rtmp.py
+++ b/youtube_dl/downloader/rtmp.py
@@ -94,18 +94,18 @@ class RtmpFD(FileDownloader):
              return proc.returncode
  
          url = info_dict['url']
-        player_url = info_dict.get('player_url', None)
-        page_url = info_dict.get('page_url', None)
-        app = info_dict.get('app', None)
-        play_path = info_dict.get('play_path', None)
-        tc_url = info_dict.get('tc_url', None)
-        flash_version = info_dict.get('flash_version', None)
+        player_url = info_dict.get('player_url')
+        page_url = info_dict.get('page_url')
+        app = info_dict.get('app')
+        play_path = info_dict.get('play_path')
+        tc_url = info_dict.get('tc_url')
+        flash_version = info_dict.get('flash_version')
          live = info_dict.get('rtmp_live', False)
-        conn = info_dict.get('rtmp_conn', None)
-        protocol = info_dict.get('rtmp_protocol', None)
+        conn = info_dict.get('rtmp_conn')
+        protocol = info_dict.get('rtmp_protocol')
          real_time = info_dict.get('rtmp_real_time', False)
          no_resume = info_dict.get('no_resume', False)
-        continue_dl = info_dict.get('continuedl', True)
+        continue_dl = self.params.get('continuedl', True)
  
          self.report_destination(filename)
          tmpfilename = self.temp_name(filename)
@@ -117,7 +117,7 @@ class RtmpFD(FileDownloader):
              return False
  
          # Download using rtmpdump. rtmpdump returns exit code 2 when
-        # the connection was interrumpted and resuming appears to be
+        # the connection was interrupted and resuming appears to be
          # possible. This is part of rtmpdump's normal usage, AFAIK.
          basic_args = [
              'rtmpdump', '--verbose', '-r', url,
diff --git a/youtube_dl/downloader/rtsp.py b/youtube_dl/downloader/rtsp.py

index 3eb29526cbc90cb3351c75876698a1b238c07ef8..939358b2a2f00edaca5283d311d89ab220d26966 100644 (file)
--- a/youtube_dl/downloader/rtsp.py
+++ b/youtube_dl/downloader/rtsp.py
@@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
              self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
              return False
  
+        self._debug_cmd(args)
+
          retval = subprocess.call(args)
          if retval == 0:
              fsize = os.path.getsize(encodeFilename(tmpfilename))
diff --git a/youtube_dl/extractor/__init__.py b/youtube_dl/extractor/__init__.py

index 7e5c908292c87ef1ddaa29d5d64532bd8b34f3bf..18d8dbcd6672f82776a9bd9f6f4cc63cac91129d 100644 (file)
--- a/youtube_dl/extractor/__init__.py
+++ b/youtube_dl/extractor/__init__.py
@@ -1,815 +1,33 @@
  from __future__ import unicode_literals
  
-from .abc import ABCIE
-from .abc7news import Abc7NewsIE
-from .academicearth import AcademicEarthCourseIE
-from .addanime import AddAnimeIE
-from .adobetv import (
-    AdobeTVIE,
-    AdobeTVVideoIE,
-)
-from .adultswim import AdultSwimIE
-from .aftenposten import AftenpostenIE
-from .aftonbladet import AftonbladetIE
-from .airmozilla import AirMozillaIE
-from .aljazeera import AlJazeeraIE
-from .alphaporno import AlphaPornoIE
-from .anitube import AnitubeIE
-from .anysex import AnySexIE
-from .aol import AolIE
-from .allocine import AllocineIE
-from .aparat import AparatIE
-from .appleconnect import AppleConnectIE
-from .appletrailers import AppleTrailersIE
-from .archiveorg import ArchiveOrgIE
-from .ard import (
-    ARDIE,
-    ARDMediathekIE,
-    SportschauIE,
-)
-from .arte import (
-    ArteTvIE,
-    ArteTVPlus7IE,
-    ArteTVCreativeIE,
-    ArteTVConcertIE,
-    ArteTVFutureIE,
-    ArteTVDDCIE,
-    ArteTVEmbedIE,
-)
-from .atresplayer import AtresPlayerIE
-from .atttechchannel import ATTTechChannelIE
-from .audiomack import AudiomackIE, AudiomackAlbumIE
-from .azubu import AzubuIE
-from .baidu import BaiduVideoIE
-from .bambuser import BambuserIE, BambuserChannelIE
-from .bandcamp import BandcampIE, BandcampAlbumIE
-from .bbc import (
-    BBCCoUkIE,
-    BBCIE,
-)
-from .beeg import BeegIE
-from .behindkink import BehindKinkIE
-from .beatportpro import BeatportProIE
-from .bet import BetIE
-from .bild import BildIE
-from .bilibili import BiliBiliIE
-from .blinkx import BlinkxIE
-from .bliptv import BlipTVIE, BlipTVUserIE
-from .bloomberg import BloombergIE
-from .bpb import BpbIE
-from .br import BRIE
-from .breakcom import BreakIE
-from .brightcove import BrightcoveIE
-from .buzzfeed import BuzzFeedIE
-from .byutv import BYUtvIE
-from .c56 import C56IE
-from .camdemy import (
-    CamdemyIE,
-    CamdemyFolderIE
-)
-from .canal13cl import Canal13clIE
-from .canalplus import CanalplusIE
-from .canalc2 import Canalc2IE
-from .cbs import CBSIE
-from .cbsnews import CBSNewsIE
-from .cbssports import CBSSportsIE
-from .ccc import CCCIE
-from .ceskatelevize import CeskaTelevizeIE
-from .channel9 import Channel9IE
-from .chilloutzone import ChilloutzoneIE
-from .chirbit import (
-    ChirbitIE,
-    ChirbitProfileIE,
-)
-from .cinchcast import CinchcastIE
-from .cinemassacre import CinemassacreIE
-from .clipfish import ClipfishIE
-from .cliphunter import CliphunterIE
-from .clipsyndicate import ClipsyndicateIE
-from .cloudy import CloudyIE
-from .clubic import ClubicIE
-from .cmt import CMTIE
-from .cnet import CNETIE
-from .cnn import (
-    CNNIE,
-    CNNBlogsIE,
-    CNNArticleIE,
-)
-from .collegehumor import CollegeHumorIE
-from .collegerama import CollegeRamaIE
-from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
-from .comcarcoff import ComCarCoffIE
-from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
-from .condenast import CondeNastIE
-from .cracked import CrackedIE
-from .criterion import CriterionIE
-from .crooksandliars import CrooksAndLiarsIE
-from .crunchyroll import (
-    CrunchyrollIE,
-    CrunchyrollShowPlaylistIE
-)
-from .cspan import CSpanIE
-from .ctsnews import CtsNewsIE
-from .dailymotion import (
-    DailymotionIE,
-    DailymotionPlaylistIE,
-    DailymotionUserIE,
-    DailymotionCloudIE,
-)
-from .daum import DaumIE
-from .dbtv import DBTVIE
-from .dctp import DctpTvIE
-from .deezer import DeezerPlaylistIE
-from .dfb import DFBIE
-from .dhm import DHMIE
-from .dotsub import DotsubIE
-from .douyutv import DouyuTVIE
-from .dramafever import (
-    DramaFeverIE,
-    DramaFeverSeriesIE,
-)
-from .dreisat import DreiSatIE
-from .drbonanza import DRBonanzaIE
-from .drtuber import DrTuberIE
-from .drtv import DRTVIE
-from .dvtv import DVTVIE
-from .dump import DumpIE
-from .dumpert import DumpertIE
-from .defense import DefenseGouvFrIE
-from .discovery import DiscoveryIE
-from .divxstage import DivxStageIE
-from .dropbox import DropboxIE
-from .eagleplatform import EaglePlatformIE
-from .ebaumsworld import EbaumsWorldIE
-from .echomsk import EchoMskIE
-from .ehow import EHowIE
-from .eighttracks import EightTracksIE
-from .einthusan import EinthusanIE
-from .eitb import EitbIE
-from .ellentv import (
-    EllenTVIE,
-    EllenTVClipsIE,
-)
-from .elpais import ElPaisIE
-from .embedly import EmbedlyIE
-from .engadget import EngadgetIE
-from .eporner import EpornerIE
-from .eroprofile import EroProfileIE
-from .escapist import EscapistIE
-from .espn import ESPNIE
-from .everyonesmixtape import EveryonesMixtapeIE
-from .exfm import ExfmIE
-from .expotv import ExpoTVIE
-from .extremetube import ExtremeTubeIE
-from .facebook import FacebookIE
-from .faz import FazIE
-from .fc2 import FC2IE
-from .firstpost import FirstpostIE
-from .firsttv import FirstTVIE
-from .fivemin import FiveMinIE
-from .fivetv import FiveTVIE
-from .fktv import (
-    FKTVIE,
-    FKTVPosteckeIE,
-)
-from .flickr import FlickrIE
-from .folketinget import FolketingetIE
-from .footyroom import FootyRoomIE
-from .fourtube import FourTubeIE
-from .foxgay import FoxgayIE
-from .foxnews import FoxNewsIE
-from .foxsports import FoxSportsIE
-from .franceculture import FranceCultureIE
-from .franceinter import FranceInterIE
-from .francetv import (
-    PluzzIE,
-    FranceTvInfoIE,
-    FranceTVIE,
-    GenerationQuoiIE,
-    CultureboxIE,
-)
-from .freesound import FreesoundIE
-from .freespeech import FreespeechIE
-from .freevideo import FreeVideoIE
-from .funnyordie import FunnyOrDieIE
-from .gamekings import GamekingsIE
-from .gameone import (
-    GameOneIE,
-    GameOnePlaylistIE,
-)
-from .gamersyde import GamersydeIE
-from .gamespot import GameSpotIE
-from .gamestar import GameStarIE
-from .gametrailers import GametrailersIE
-from .gazeta import GazetaIE
-from .gdcvault import GDCVaultIE
-from .generic import GenericIE
-from .gfycat import GfycatIE
-from .giantbomb import GiantBombIE
-from .giga import GigaIE
-from .glide import GlideIE
-from .globo import GloboIE
-from .godtube import GodTubeIE
-from .goldenmoustache import GoldenMoustacheIE
-from .golem import GolemIE
-from .googleplus import GooglePlusIE
-from .googlesearch import GoogleSearchIE
-from .gorillavid import GorillaVidIE
-from .goshgay import GoshgayIE
-from .groupon import GrouponIE
-from .hark import HarkIE
-from .hearthisat import HearThisAtIE
-from .heise import HeiseIE
-from .hellporno import HellPornoIE
-from .helsinki import HelsinkiIE
-from .hentaistigma import HentaiStigmaIE
-from .historicfilms import HistoricFilmsIE
-from .history import HistoryIE
-from .hitbox import HitboxIE, HitboxLiveIE
-from .hornbunny import HornBunnyIE
-from .hostingbulk import HostingBulkIE
-from .hotnewhiphop import HotNewHipHopIE
-from .howcast import HowcastIE
-from .howstuffworks import HowStuffWorksIE
-from .huffpost import HuffPostIE
-from .hypem import HypemIE
-from .iconosquare import IconosquareIE
-from .ign import IGNIE, OneUPIE
-from .imdb import (
-    ImdbIE,
-    ImdbListIE
-)
-from .imgur import ImgurIE
-from .ina import InaIE
-from .infoq import InfoQIE
-from .instagram import InstagramIE, InstagramUserIE
-from .internetvideoarchive import InternetVideoArchiveIE
-from .iprima import IPrimaIE
-from .iqiyi import IqiyiIE
-from .ir90tv import Ir90TvIE
-from .ivi import (
-    IviIE,
-    IviCompilationIE
-)
-from .izlesene import IzleseneIE
-from .jadorecettepub import JadoreCettePubIE
-from .jeuxvideo import JeuxVideoIE
-from .jove import JoveIE
-from .jukebox import JukeboxIE
-from .jpopsukitv import JpopsukiIE
-from .kaltura import KalturaIE
-from .kanalplay import KanalPlayIE
-from .kankan import KankanIE
-from .karaoketv import KaraoketvIE
-from .karrierevideos import KarriereVideosIE
-from .keezmovies import KeezMoviesIE
-from .khanacademy import KhanAcademyIE
-from .kickstarter import KickStarterIE
-from .keek import KeekIE
-from .kontrtube import KontrTubeIE
-from .krasview import KrasViewIE
-from .ku6 import Ku6IE
-from .kuwo import (
-    KuwoIE,
-    KuwoAlbumIE,
-    KuwoChartIE,
-    KuwoSingerIE,
-    KuwoCategoryIE,
-    KuwoMvIE,
-)
-from .la7 import LA7IE
-from .laola1tv import Laola1TvIE
-from .lecture2go import Lecture2GoIE
-from .letv import (
-    LetvIE,
-    LetvTvIE,
-    LetvPlaylistIE
-)
-from .libsyn import LibsynIE
-from .lifenews import (
-    LifeNewsIE,
-    LifeEmbedIE,
-)
-from .liveleak import LiveLeakIE
-from .livestream import (
-    LivestreamIE,
-    LivestreamOriginalIE,
-    LivestreamShortenerIE,
-)
-from .lnkgo import LnkGoIE
-from .lrt import LRTIE
-from .lynda import (
-    LyndaIE,
-    LyndaCourseIE
-)
-from .m6 import M6IE
-from .macgamestore import MacGameStoreIE
-from .mailru import MailRuIE
-from .malemotion import MalemotionIE
-from .mdr import MDRIE
-from .megavideoz import MegaVideozIE
-from .metacafe import MetacafeIE
-from .metacritic import MetacriticIE
-from .mgoon import MgoonIE
-from .minhateca import MinhatecaIE
-from .ministrygrid import MinistryGridIE
-from .miomio import MioMioIE
-from .mit import TechTVMITIE, MITIE, OCWMITIE
-from .mitele import MiTeleIE
-from .mixcloud import MixcloudIE
-from .mlb import MLBIE
-from .mpora import MporaIE
-from .moevideo import MoeVideoIE
-from .mofosex import MofosexIE
-from .mojvideo import MojvideoIE
-from .moniker import MonikerIE
-from .mooshare import MooshareIE
-from .morningstar import MorningstarIE
-from .motherless import MotherlessIE
-from .motorsport import MotorsportIE
-from .movieclips import MovieClipsIE
-from .moviezine import MoviezineIE
-from .movshare import MovShareIE
-from .mtv import (
-    MTVIE,
-    MTVServicesEmbeddedIE,
-    MTVIggyIE,
-)
-from .muenchentv import MuenchenTVIE
-from .musicplayon import MusicPlayOnIE
-from .musicvault import MusicVaultIE
-from .muzu import MuzuTVIE
-from .myspace import MySpaceIE, MySpaceAlbumIE
-from .myspass import MySpassIE
-from .myvi import MyviIE
-from .myvideo import MyVideoIE
-from .myvidster import MyVidsterIE
-from .nationalgeographic import NationalGeographicIE
-from .naver import NaverIE
-from .nba import NBAIE
-from .nbc import (
-    NBCIE,
-    NBCNewsIE,
-    NBCSportsIE,
-    NBCSportsVPlayerIE,
-)
-from .ndr import (
-    NDRIE,
-    NJoyIE,
-)
-from .ndtv import NDTVIE
-from .netzkino import NetzkinoIE
-from .nerdcubed import NerdCubedFeedIE
-from .nerdist import NerdistIE
-from .neteasemusic import (
-    NetEaseMusicIE,
-    NetEaseMusicAlbumIE,
-    NetEaseMusicSingerIE,
-    NetEaseMusicListIE,
-    NetEaseMusicMvIE,
-    NetEaseMusicProgramIE,
-    NetEaseMusicDjRadioIE,
-)
-from .newgrounds import NewgroundsIE
-from .newstube import NewstubeIE
-from .nextmedia import (
-    NextMediaIE,
-    NextMediaActionNewsIE,
-    AppleDailyIE,
-)
-from .nfb import NFBIE
-from .nfl import NFLIE
-from .nhl import (
-    NHLIE,
-    NHLNewsIE,
-    NHLVideocenterIE,
-)
-from .niconico import NiconicoIE, NiconicoPlaylistIE
-from .ninegag import NineGagIE
-from .noco import NocoIE
-from .normalboots import NormalbootsIE
-from .nosvideo import NosVideoIE
-from .nova import NovaIE
-from .novamov import NovaMovIE
-from .nowness import NownessIE
-from .nowtv import NowTVIE
-from .nowvideo import NowVideoIE
-from .npo import (
-    NPOIE,
-    NPOLiveIE,
-    NPORadioIE,
-    NPORadioFragmentIE,
-    VPROIE,
-    WNLIE
-)
-from .nrk import (
-    NRKIE,
-    NRKPlaylistIE,
-    NRKTVIE,
-)
-from .ntvde import NTVDeIE
-from .ntvru import NTVRuIE
-from .nytimes import (
-    NYTimesIE,
-    NYTimesArticleIE,
-)
-from .nuvid import NuvidIE
-from .odnoklassniki import OdnoklassnikiIE
-from .oktoberfesttv import OktoberfestTVIE
-from .onionstudios import OnionStudiosIE
-from .ooyala import (
-    OoyalaIE,
-    OoyalaExternalIE,
-)
-from .openfilm import OpenFilmIE
-from .orf import (
-    ORFTVthekIE,
-    ORFOE1IE,
-    ORFFM4IE,
-    ORFIPTVIE,
-)
-from .parliamentliveuk import ParliamentLiveUKIE
-from .patreon import PatreonIE
-from .pbs import PBSIE
-from .philharmoniedeparis import PhilharmonieDeParisIE
-from .phoenix import PhoenixIE
-from .photobucket import PhotobucketIE
-from .pinkbike import PinkbikeIE
-from .planetaplay import PlanetaPlayIE
-from .pladform import PladformIE
-from .played import PlayedIE
-from .playfm import PlayFMIE
-from .playvid import PlayvidIE
-from .playwire import PlaywireIE
-from .podomatic import PodomaticIE
-from .porn91 import Porn91IE
-from .pornhd import PornHdIE
-from .pornhub import (
-    PornHubIE,
-    PornHubPlaylistIE,
-)
-from .pornotube import PornotubeIE
-from .pornovoisines import PornoVoisinesIE
-from .pornoxo import PornoXOIE
-from .primesharetv import PrimeShareTVIE
-from .promptfile import PromptFileIE
-from .prosiebensat1 import ProSiebenSat1IE
-from .puls4 import Puls4IE
-from .pyvideo import PyvideoIE
-from .qqmusic import (
-    QQMusicIE,
-    QQMusicSingerIE,
-    QQMusicAlbumIE,
-    QQMusicToplistIE,
-    QQMusicPlaylistIE,
-)
-from .quickvid import QuickVidIE
-from .r7 import R7IE
-from .radiode import RadioDeIE
-from .radiojavan import RadioJavanIE
-from .radiobremen import RadioBremenIE
-from .radiofrance import RadioFranceIE
-from .rai import RaiIE
-from .rbmaradio import RBMARadioIE
-from .rds import RDSIE
-from .redtube import RedTubeIE
-from .restudy import RestudyIE
-from .reverbnation import ReverbNationIE
-from .ringtv import RingTVIE
-from .ro220 import Ro220IE
-from .rottentomatoes import RottenTomatoesIE
-from .roxwel import RoxwelIE
-from .rtbf import RTBFIE
-from .rte import RteIE
-from .rtlnl import RtlNlIE
-from .rtl2 import RTL2IE
-from .rtp import RTPIE
-from .rts import RTSIE
-from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
-from .ruhd import RUHDIE
-from .rutube import (
-    RutubeIE,
-    RutubeChannelIE,
-    RutubeEmbedIE,
-    RutubeMovieIE,
-    RutubePersonIE,
-)
-from .rutv import RUTVIE
-from .ruutu import RuutuIE
-from .sandia import SandiaIE
-from .safari import (
-    SafariIE,
-    SafariCourseIE,
-)
-from .sapo import SapoIE
-from .savefrom import SaveFromIE
-from .sbs import SBSIE
-from .scivee import SciVeeIE
-from .screencast import ScreencastIE
-from .screencastomatic import ScreencastOMaticIE
-from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
-from .senateisvp import SenateISVPIE
-from .servingsys import ServingSysIE
-from .sexu import SexuIE
-from .sexykarma import SexyKarmaIE
-from .shared import SharedIE
-from .sharesix import ShareSixIE
-from .sina import SinaIE
-from .slideshare import SlideshareIE
-from .slutload import SlutloadIE
-from .smotri import (
-    SmotriIE,
-    SmotriCommunityIE,
-    SmotriUserIE,
-    SmotriBroadcastIE,
-)
-from .snagfilms import (
-    SnagFilmsIE,
-    SnagFilmsEmbedIE,
-)
-from .snotr import SnotrIE
-from .sohu import SohuIE
-from .soompi import (
-    SoompiIE,
-    SoompiShowIE,
-)
-from .soundcloud import (
-    SoundcloudIE,
-    SoundcloudSetIE,
-    SoundcloudUserIE,
-    SoundcloudPlaylistIE
-)
-from .soundgasm import (
-    SoundgasmIE,
-    SoundgasmProfileIE
-)
-from .southpark import (
-    SouthParkIE,
-    SouthParkDeIE,
-    SouthParkDkIE,
-    SouthParkEsIE,
-    SouthParkNlIE
-)
-from .space import SpaceIE
-from .spankbang import SpankBangIE
-from .spankwire import SpankwireIE
-from .spiegel import SpiegelIE, SpiegelArticleIE
-from .spiegeltv import SpiegeltvIE
-from .spike import SpikeIE
-from .sport5 import Sport5IE
-from .sportbox import (
-    SportBoxIE,
-    SportBoxEmbedIE,
-)
-from .sportdeutschland import SportDeutschlandIE
-from .srf import SrfIE
-from .srmediathek import SRMediathekIE
-from .ssa import SSAIE
-from .stanfordoc import StanfordOpenClassroomIE
-from .steam import SteamIE
-from .streamcloud import StreamcloudIE
-from .streamcz import StreamCZIE
-from .streetvoice import StreetVoiceIE
-from .sunporno import SunPornoIE
-from .svt import (
-    SVTIE,
-    SVTPlayIE,
-)
-from .swrmediathek import SWRMediathekIE
-from .syfy import SyfyIE
-from .sztvhu import SztvHuIE
-from .tagesschau import TagesschauIE
-from .tapely import TapelyIE
-from .tass import TassIE
-from .teachertube import (
-    TeacherTubeIE,
-    TeacherTubeUserIE,
-)
-from .teachingchannel import TeachingChannelIE
-from .teamcoco import TeamcocoIE
-from .techtalks import TechTalksIE
-from .ted import TEDIE
-from .telebruxelles import TeleBruxellesIE
-from .telecinco import TelecincoIE
-from .telemb import TeleMBIE
-from .teletask import TeleTaskIE
-from .tenplay import TenPlayIE
-from .testurl import TestURLIE
-from .testtube import TestTubeIE
-from .tf1 import TF1IE
-from .theonion import TheOnionIE
-from .theplatform import ThePlatformIE
-from .thesixtyone import TheSixtyOneIE
-from .thisamericanlife import ThisAmericanLifeIE
-from .thisav import ThisAVIE
-from .tinypic import TinyPicIE
-from .tlc import TlcIE, TlcDeIE
-from .tmz import (
-    TMZIE,
-    TMZArticleIE,
-)
-from .tnaflix import (
-    TNAFlixIE,
-    EMPFlixIE,
-    MovieFapIE,
-)
-from .thvideo import (
-    THVideoIE,
-    THVideoPlaylistIE
-)
-from .toutv import TouTvIE
-from .toypics import ToypicsUserIE, ToypicsIE
-from .traileraddict import TrailerAddictIE
-from .trilulilu import TriluliluIE
-from .trutube import TruTubeIE
-from .tube8 import Tube8IE
-from .tubitv import TubiTvIE
-from .tudou import TudouIE
-from .tumblr import TumblrIE
-from .tunein import TuneInIE
-from .turbo import TurboIE
-from .tutv import TutvIE
-from .tv2 import (
-    TV2IE,
-    TV2ArticleIE,
-)
-from .tv4 import TV4IE
-from .tvc import (
-    TVCIE,
-    TVCArticleIE,
-)
-from .tvigle import TvigleIE
-from .tvp import TvpIE, TvpSeriesIE
-from .tvplay import TVPlayIE
-from .tweakers import TweakersIE
-from .twentyfourvideo import TwentyFourVideoIE
-from .twentytwotracks import (
-    TwentyTwoTracksIE,
-    TwentyTwoTracksGenreIE
-)
-from .twitch import (
-    TwitchVideoIE,
-    TwitchChapterIE,
-    TwitchVodIE,
-    TwitchProfileIE,
-    TwitchPastBroadcastsIE,
-    TwitchBookmarksIE,
-    TwitchStreamIE,
-)
-from .twitter import TwitterCardIE
-from .ubu import UbuIE
-from .udemy import (
-    UdemyIE,
-    UdemyCourseIE
-)
-from .udn import UDNEmbedIE
-from .ultimedia import UltimediaIE
-from .unistra import UnistraIE
-from .urort import UrortIE
-from .ustream import UstreamIE, UstreamChannelIE
-from .varzesh3 import Varzesh3IE
-from .vbox7 import Vbox7IE
-from .veehd import VeeHDIE
-from .veoh import VeohIE
-from .vessel import VesselIE
-from .vesti import VestiIE
-from .vevo import VevoIE
-from .vgtv import (
-    BTArticleIE,
-    BTVestlendingenIE,
-    VGTVIE,
-)
-from .vh1 import VH1IE
-from .vice import ViceIE
-from .viddler import ViddlerIE
-from .videobam import VideoBamIE
-from .videodetective import VideoDetectiveIE
-from .videolecturesnet import VideoLecturesNetIE
-from .videofyme import VideofyMeIE
-from .videomega import VideoMegaIE
-from .videopremium import VideoPremiumIE
-from .videott import VideoTtIE
-from .videoweed import VideoWeedIE
-from .vidme import VidmeIE
-from .vidzi import VidziIE
-from .vier import VierIE, VierVideosIE
-from .viewster import ViewsterIE
-from .vimeo import (
-    VimeoIE,
-    VimeoAlbumIE,
-    VimeoChannelIE,
-    VimeoGroupsIE,
-    VimeoLikesIE,
-    VimeoReviewIE,
-    VimeoUserIE,
-    VimeoWatchLaterIE,
-)
-from .vimple import VimpleIE
-from .vine import (
-    VineIE,
-    VineUserIE,
-)
-from .viki import (
-    VikiIE,
-    VikiChannelIE,
-)
-from .vk import (
-    VKIE,
-    VKUserVideosIE,
-)
-from .vodlocker import VodlockerIE
-from .voicerepublic import VoiceRepublicIE
-from .vporn import VpornIE
-from .vrt import VRTIE
-from .vube import VubeIE
-from .vuclip import VuClipIE
-from .vulture import VultureIE
-from .walla import WallaIE
-from .washingtonpost import WashingtonPostIE
-from .wat import WatIE
-from .wayofthemaster import WayOfTheMasterIE
-from .wdr import (
-    WDRIE,
-    WDRMobileIE,
-    WDRMausIE,
-)
-from .webofstories import (
-    WebOfStoriesIE,
-    WebOfStoriesPlaylistIE,
-)
-from .weibo import WeiboIE
-from .wimp import WimpIE
-from .wistia import WistiaIE
-from .worldstarhiphop import WorldStarHipHopIE
-from .wrzuta import WrzutaIE
-from .wsj import WSJIE
-from .xbef import XBefIE
-from .xboxclips import XboxClipsIE
-from .xhamster import (
-    XHamsterIE,
-    XHamsterEmbedIE,
-)
-from .xminus import XMinusIE
-from .xnxx import XNXXIE
-from .xstream import XstreamIE
-from .xtube import XTubeUserIE, XTubeIE
-from .xuite import XuiteIE
-from .xvideos import XVideosIE
-from .xxxymovies import XXXYMoviesIE
-from .yahoo import (
-    YahooIE,
-    YahooSearchIE,
-)
-from .yam import YamIE
-from .yandexmusic import (
-    YandexMusicTrackIE,
-    YandexMusicAlbumIE,
-    YandexMusicPlaylistIE,
-)
-from .yesjapan import YesJapanIE
-from .yinyuetai import YinYueTaiIE
-from .ynet import YnetIE
-from .youjizz import YouJizzIE
-from .youku import YoukuIE
-from .youporn import YouPornIE
-from .yourupload import YourUploadIE
-from .youtube import (
-    YoutubeIE,
-    YoutubeChannelIE,
-    YoutubeFavouritesIE,
-    YoutubeHistoryIE,
-    YoutubePlaylistIE,
-    YoutubeRecommendedIE,
-    YoutubeSearchDateIE,
-    YoutubeSearchIE,
-    YoutubeSearchURLIE,
-    YoutubeShowIE,
-    YoutubeSubscriptionsIE,
-    YoutubeTruncatedIDIE,
-    YoutubeTruncatedURLIE,
-    YoutubeUserIE,
-    YoutubeWatchLaterIE,
-)
-from .zapiks import ZapiksIE
-from .zdf import ZDFIE, ZDFChannelIE
-from .zingmp3 import (
-    ZingMp3SongIE,
-    ZingMp3AlbumIE,
-)
-
-_ALL_CLASSES = [
-    klass
-    for name, klass in globals().items()
-    if name.endswith('IE') and name != 'GenericIE'
-]
-_ALL_CLASSES.append(GenericIE)
+try:
+    from .lazy_extractors import *
+    from .lazy_extractors import _ALL_CLASSES
+    _LAZY_LOADER = True
+except ImportError:
+    _LAZY_LOADER = False
+    from .extractors import *
+
+    _ALL_CLASSES = [
+        klass
+        for name, klass in globals().items()
+        if name.endswith('IE') and name != 'GenericIE'
+    ]
+    _ALL_CLASSES.append(GenericIE)
+
+
+def gen_extractor_classes():
+    """ Return a list of supported extractors.
+    The order does matter; the first extractor matched is the one handling the URL.
+    """
+    return _ALL_CLASSES
  
  
  def gen_extractors():
      """ Return a list of an instance of every supported extractor.
      The order does matter; the first extractor matched is the one handling the URL.
      """
-    return [klass() for klass in _ALL_CLASSES]
+    return [klass() for klass in gen_extractor_classes()]
  
  
  def list_extractors(age_limit):
diff --git a/youtube_dl/extractor/abc.py b/youtube_dl/extractor/abc.py

index dc0fb85d6048962505d1d207ae590940d69f52e6..b584277be92b5a86fb9e0ac5d95870444d441174 100644 (file)
--- a/youtube_dl/extractor/abc.py
+++ b/youtube_dl/extractor/abc.py
@@ -1,16 +1,20 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    js_to_json,
+    int_or_none,
+)
  
  
  class ABCIE(InfoExtractor):
      IE_NAME = 'abc.net.au'
-    _VALID_URL = r'http://www\.abc\.net\.au/news/[^/]+/[^/]+/(?P<id>\d+)'
+    _VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
          'md5': 'cb3dd03b18455a661071ee1e28344d9f',
          'info_dict': {
@@ -19,23 +23,67 @@ class ABCIE(InfoExtractor):
              'title': 'Australia to help staff Ebola treatment centre in Sierra Leone',
              'description': 'md5:809ad29c67a05f54eb41f2a105693a67',
          },
-    }
+        'skip': 'this video has expired',
+    }, {
+        'url': 'http://www.abc.net.au/news/2015-08-17/warren-entsch-introduces-same-sex-marriage-bill/6702326',
+        'md5': 'db2a5369238b51f9811ad815b69dc086',
+        'info_dict': {
+            'id': 'NvqvPeNZsHU',
+            'ext': 'mp4',
+            'upload_date': '20150816',
+            'uploader': 'ABC News (Australia)',
+            'description': 'Government backbencher Warren Entsch introduces a cross-party sponsored bill to legalise same-sex marriage, saying the bill is designed to promote "an inclusive Australia, not a divided one.". Read more here: http://ab.co/1Mwc6ef',
+            'uploader_id': 'NewsOnABC',
+            'title': 'Marriage Equality: Warren Entsch introduces same sex marriage bill',
+        },
+        'add_ie': ['Youtube'],
+        'skip': 'Not accessible from Travis CI server',
+    }, {
+        'url': 'http://www.abc.net.au/news/2015-10-23/nab-lifts-interest-rates-following-westpac-and-cba/6880080',
+        'md5': 'b96eee7c9edf4fc5a358a0252881cc1f',
+        'info_dict': {
+            'id': '6880080',
+            'ext': 'mp3',
+            'title': 'NAB lifts interest rates, following Westpac and CBA',
+            'description': 'md5:f13d8edc81e462fce4a0437c7dc04728',
+        },
+    }, {
+        'url': 'http://www.abc.net.au/news/2015-10-19/6866214',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        urls_info_json = self._search_regex(
-            r'inlineVideoData\.push\((.*?)\);', webpage, 'video urls',
-            flags=re.DOTALL)
-        urls_info = json.loads(urls_info_json.replace('\'', '"'))
+        mobj = re.search(
+            r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
+            webpage)
+        if mobj is None:
+            expired = self._html_search_regex(r'(?s)class="expired-(?:video|audio)".+?<span>(.+?)</span>', webpage, 'expired', None)
+            if expired:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, expired), expected=True)
+            raise ExtractorError('Unable to extract video urls')
+
+        urls_info = self._parse_json(
+            mobj.group('json_data'), video_id, transform_source=js_to_json)
+
+        if not isinstance(urls_info, list):
+            urls_info = [urls_info]
+
+        if mobj.group('type') == 'YouTube':
+            return self.playlist_result([
+                self.url_result(url_info['url']) for url_info in urls_info])
+
          formats = [{
              'url': url_info['url'],
-            'width': int(url_info['width']),
-            'height': int(url_info['height']),
-            'tbr': int(url_info['bitrate']),
-            'filesize': int(url_info['filesize']),
+            'vcodec': url_info.get('codec') if mobj.group('type') == 'Video' else 'none',
+            'width': int_or_none(url_info.get('width')),
+            'height': int_or_none(url_info.get('height')),
+            'tbr': int_or_none(url_info.get('bitrate')),
+            'filesize': int_or_none(url_info.get('filesize')),
          } for url_info in urls_info]
+
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/academicearth.py b/youtube_dl/extractor/academicearth.py

index 47313fba8735902f964c0cd39992f9962e0f47fb..34095501cfa342e7ccb4afdb77e14b9514ca389c 100644 (file)
--- a/youtube_dl/extractor/academicearth.py
+++ b/youtube_dl/extractor/academicearth.py
@@ -15,7 +15,7 @@ class AcademicEarthCourseIE(InfoExtractor):
              'title': 'Laws of Nature',
              'description': 'Introduce yourself to the laws of nature with these free online college lectures from Yale, Harvard, and MIT.',
          },
-        'playlist_count': 4,
+        'playlist_count': 3,
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py

new file mode 100644 (file)

index 0000000..94ce88c
--- /dev/null
+++ b/youtube_dl/extractor/acast.py
@@ -0,0 +1,82 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import functools
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    OnDemandPagedList,
+)
+
+
+class ACastIE(InfoExtractor):
+    IE_NAME = 'acast'
+    _VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)'
+    _TEST = {
+        'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
+        'md5': 'ada3de5a1e3a2a381327d749854788bb',
+        'info_dict': {
+            'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
+            'ext': 'mp3',
+            'title': '"Where Are You?": Taipei 101, Taiwan',
+            'timestamp': 1196172000000,
+            'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
+            'duration': 211,
+        }
+    }
+
+    def _real_extract(self, url):
+        channel, display_id = re.match(self._VALID_URL, url).groups()
+        cast_data = self._download_json(
+            'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
+        return {
+            'id': compat_str(cast_data['id']),
+            'display_id': display_id,
+            'url': cast_data['blings'][0]['audio'],
+            'title': cast_data['name'],
+            'description': cast_data.get('description'),
+            'thumbnail': cast_data.get('image'),
+            'timestamp': int_or_none(cast_data.get('publishingDate')),
+            'duration': int_or_none(cast_data.get('duration')),
+        }
+
+
+class ACastChannelIE(InfoExtractor):
+    IE_NAME = 'acast:channel'
+    _VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<id>[^/#?]+)'
+    _TEST = {
+        'url': 'https://www.acast.com/condenasttraveler',
+        'info_dict': {
+            'id': '50544219-29bb-499e-a083-6087f4cb7797',
+            'title': 'Condé Nast Traveler Podcast',
+            'description': 'md5:98646dee22a5b386626ae31866638fbd',
+        },
+        'playlist_mincount': 20,
+    }
+    _API_BASE_URL = 'https://www.acast.com/api/'
+    _PAGE_SIZE = 10
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
+
+    def _fetch_page(self, channel_slug, page):
+        casts = self._download_json(
+            self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
+            channel_slug, note='Download page %d of channel data' % page)
+        for cast in casts:
+            yield self.url_result(
+                'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
+                'ACast', cast['id'])
+
+    def _real_extract(self, url):
+        channel_slug = self._match_id(url)
+        channel_data = self._download_json(
+            self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
+        entries = OnDemandPagedList(functools.partial(
+            self._fetch_page, channel_slug), self._PAGE_SIZE)
+        return self.playlist_result(entries, compat_str(
+            channel_data['id']), channel_data['name'], channel_data.get('description'))
diff --git a/youtube_dl/extractor/addanime.py b/youtube_dl/extractor/addanime.py

index e3e6d21137994593d593fbc51313bf38032ce7f8..55a9322a753829e90715a76bc91e06828c460531 100644 (file)
--- a/youtube_dl/extractor/addanime.py
+++ b/youtube_dl/extractor/addanime.py
@@ -6,7 +6,7 @@ from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
      compat_str,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
  )
  from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
  
  
  class AddAnimeIE(InfoExtractor):
-    _VALID_URL = r'http://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
+    _VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
      _TESTS = [{
          'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
          'md5': '72954ea10bc979ab5e2eb288b21425a0',
@@ -60,7 +60,7 @@ class AddAnimeIE(InfoExtractor):
              confirm_url = (
                  parsed_url.scheme + '://' + parsed_url.netloc +
                  action + '?' +
-                compat_urllib_parse.urlencode({
+                compat_urllib_parse_urlencode({
                      'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
              self._download_webpage(
                  confirm_url, video_id,
diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py

index 5e43adc51f98c2f22e728c49150b84ae64f704e3..8753ee2cf2b5fdaa5810fc8d564f388734a84324 100644 (file)
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@@ -1,23 +1,32 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      parse_duration,
      unified_strdate,
      str_to_int,
+    int_or_none,
      float_or_none,
      ISO639Utils,
+    determine_ext,
  )
  
  
-class AdobeTVIE(InfoExtractor):
-    _VALID_URL = r'https?://tv\.adobe\.com/watch/[^/]+/(?P<id>[^/]+)'
+class AdobeTVBaseIE(InfoExtractor):
+    _API_BASE_URL = 'http://tv.adobe.com/api/v4/'
+
+
+class AdobeTVIE(AdobeTVBaseIE):
+    _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?watch/(?P<show_urlname>[^/]+)/(?P<id>[^/]+)'
  
      _TEST = {
          'url': 'http://tv.adobe.com/watch/the-complete-picture-with-julieanne-kost/quick-tip-how-to-draw-a-circle-around-an-object-in-photoshop/',
          'md5': '9bc5727bcdd55251f35ad311ca74fa1e',
          'info_dict': {
-            'id': 'quick-tip-how-to-draw-a-circle-around-an-object-in-photoshop',
+            'id': '10981',
              'ext': 'mp4',
              'title': 'Quick Tip - How to Draw a Circle Around an Object in Photoshop',
              'description': 'md5:99ec318dc909d7ba2a1f2b038f7d2311',
@@ -29,50 +38,106 @@ class AdobeTVIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        player = self._parse_json(
-            self._search_regex(r'html5player:\s*({.+?})\s*\n', webpage, 'player'),
-            video_id)
+        language, show_urlname, urlname = re.match(self._VALID_URL, url).groups()
+        if not language:
+            language = 'en'
  
-        title = player.get('title') or self._search_regex(
-            r'data-title="([^"]+)"', webpage, 'title')
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-
-        upload_date = unified_strdate(
-            self._html_search_meta('datepublished', webpage, 'upload date'))
-
-        duration = parse_duration(
-            self._html_search_meta('duration', webpage, 'duration') or
-            self._search_regex(
-                r'Runtime:\s*(\d{2}:\d{2}:\d{2})',
-                webpage, 'duration', fatal=False))
-
-        view_count = str_to_int(self._search_regex(
-            r'<div class="views">\s*Views?:\s*([\d,.]+)\s*</div>',
-            webpage, 'view count'))
+        video_data = self._download_json(
+            self._API_BASE_URL + 'episode/get/?language=%s&show_urlname=%s&urlname=%s&disclosure=standard' % (language, show_urlname, urlname),
+            urlname)['data'][0]
  
          formats = [{
-            'url': source['src'],
-            'format_id': source.get('quality') or source['src'].split('-')[-1].split('.')[0] or None,
-            'tbr': source.get('bitrate'),
-        } for source in player['sources']]
+            'url': source['url'],
+            'format_id': source.get('quality_level') or source['url'].split('-')[-1].split('.')[0] or None,
+            'width': int_or_none(source.get('width')),
+            'height': int_or_none(source.get('height')),
+            'tbr': int_or_none(source.get('video_data_rate')),
+        } for source in video_data['videos']]
          self._sort_formats(formats)
  
          return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-            'duration': duration,
-            'view_count': view_count,
+            'id': compat_str(video_data['id']),
+            'title': video_data['title'],
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('thumbnail'),
+            'upload_date': unified_strdate(video_data.get('start_date')),
+            'duration': parse_duration(video_data.get('duration')),
+            'view_count': str_to_int(video_data.get('playcount')),
              'formats': formats,
          }
  
  
+class AdobeTVPlaylistBaseIE(AdobeTVBaseIE):
+    def _parse_page_data(self, page_data):
+        return [self.url_result(self._get_element_url(element_data)) for element_data in page_data]
+
+    def _extract_playlist_entries(self, url, display_id):
+        page = self._download_json(url, display_id)
+        entries = self._parse_page_data(page['data'])
+        for page_num in range(2, page['paging']['pages'] + 1):
+            entries.extend(self._parse_page_data(
+                self._download_json(url + '&page=%d' % page_num, display_id)['data']))
+        return entries
+
+
+class AdobeTVShowIE(AdobeTVPlaylistBaseIE):
+    _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?show/(?P<id>[^/]+)'
+
+    _TEST = {
+        'url': 'http://tv.adobe.com/show/the-complete-picture-with-julieanne-kost',
+        'info_dict': {
+            'id': '36',
+            'title': 'The Complete Picture with Julieanne Kost',
+            'description': 'md5:fa50867102dcd1aa0ddf2ab039311b27',
+        },
+        'playlist_mincount': 136,
+    }
+
+    def _get_element_url(self, element_data):
+        return element_data['urls'][0]
+
+    def _real_extract(self, url):
+        language, show_urlname = re.match(self._VALID_URL, url).groups()
+        if not language:
+            language = 'en'
+        query = 'language=%s&show_urlname=%s' % (language, show_urlname)
+
+        show_data = self._download_json(self._API_BASE_URL + 'show/get/?%s' % query, show_urlname)['data'][0]
+
+        return self.playlist_result(
+            self._extract_playlist_entries(self._API_BASE_URL + 'episode/?%s' % query, show_urlname),
+            compat_str(show_data['id']),
+            show_data['show_name'],
+            show_data['show_description'])
+
+
+class AdobeTVChannelIE(AdobeTVPlaylistBaseIE):
+    _VALID_URL = r'https?://tv\.adobe\.com/(?:(?P<language>fr|de|es|jp)/)?channel/(?P<id>[^/]+)(?:/(?P<category_urlname>[^/]+))?'
+
+    _TEST = {
+        'url': 'http://tv.adobe.com/channel/development',
+        'info_dict': {
+            'id': 'development',
+        },
+        'playlist_mincount': 96,
+    }
+
+    def _get_element_url(self, element_data):
+        return element_data['url']
+
+    def _real_extract(self, url):
+        language, channel_urlname, category_urlname = re.match(self._VALID_URL, url).groups()
+        if not language:
+            language = 'en'
+        query = 'language=%s&channel_urlname=%s' % (language, channel_urlname)
+        if category_urlname:
+            query += '&category_urlname=%s' % category_urlname
+
+        return self.playlist_result(
+            self._extract_playlist_entries(self._API_BASE_URL + 'show/?%s' % query, channel_urlname),
+            channel_urlname)
+
+
  class AdobeTVVideoIE(InfoExtractor):
      _VALID_URL = r'https?://video\.tv\.adobe\.com/v/(?P<id>\d+)'
  
@@ -91,28 +156,25 @@ class AdobeTVVideoIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        player_params = self._parse_json(self._search_regex(
-            r'var\s+bridge\s*=\s*([^;]+);', webpage, 'player parameters'),
-            video_id)
+        video_data = self._download_json(url + '?format=json', video_id)
  
          formats = [{
+            'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
              'url': source['src'],
-            'width': source.get('width'),
-            'height': source.get('height'),
-            'tbr': source.get('bitrate'),
-        } for source in player_params['sources']]
+            'width': int_or_none(source.get('width')),
+            'height': int_or_none(source.get('height')),
+            'tbr': int_or_none(source.get('bitrate')),
+        } for source in video_data['sources']]
+        self._sort_formats(formats)
  
          # For both metadata and downloaded files the duration varies among
          # formats. I just pick the max one
          duration = max(filter(None, [
              float_or_none(source.get('duration'), scale=1000)
-            for source in player_params['sources']]))
+            for source in video_data['sources']]))
  
          subtitles = {}
-        for translation in player_params.get('translations', []):
+        for translation in video_data.get('translations', []):
              lang_id = translation.get('language_w3c') or ISO639Utils.long2short(translation['language_medium'])
              if lang_id not in subtitles:
                  subtitles[lang_id] = []
@@ -124,8 +186,9 @@ class AdobeTVVideoIE(InfoExtractor):
          return {
              'id': video_id,
              'formats': formats,
-            'title': player_params['title'],
-            'description': self._og_search_description(webpage),
+            'title': video_data['title'],
+            'description': video_data.get('description'),
+            'thumbnail': video_data['video'].get('poster'),
              'duration': duration,
              'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/adultswim.py b/youtube_dl/extractor/adultswim.py

index 39335b8272295dbf2b640881cd29a6f5b99acaba..8157da2cb63af8a7079fda8c388be3108281a7ad 100644 (file)
--- a/youtube_dl/extractor/adultswim.py
+++ b/youtube_dl/extractor/adultswim.py
@@ -5,6 +5,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      ExtractorError,
      float_or_none,
      xpath_text,
@@ -40,7 +41,8 @@ class AdultSwimIE(InfoExtractor):
              'id': 'rQxZvXQ4ROaSOqq-or2Mow',
              'title': 'Rick and Morty - Pilot',
              'description': "Rick moves in with his daughter's family and establishes himself as a bad influence on his grandson, Morty. "
-        }
+        },
+        'skip': 'This video is only available for registered users',
      }, {
          'url': 'http://www.adultswim.com/videos/playlists/american-parenting/putting-francine-out-of-business/',
          'playlist': [
@@ -66,7 +68,7 @@ class AdultSwimIE(InfoExtractor):
                  'md5': '3e346a2ab0087d687a05e1e7f3b3e529',
                  'info_dict': {
                      'id': 'sY3cMUR_TbuE4YmdjzbIcQ-0',
-                    'ext': 'flv',
+                    'ext': 'mp4',
                      'title': 'Tim and Eric Awesome Show Great Job! - Dr. Steve Brule, For Your Wine',
                      'description': 'Dr. Brule reports live from Wine Country with a special report on wines.  \r\nWatch Tim and Eric Awesome Show Great Job! episode #20, "Embarrassed" on Adult Swim.\r\n\r\n',
                  },
@@ -77,6 +79,10 @@ class AdultSwimIE(InfoExtractor):
              'title': 'Tim and Eric Awesome Show Great Job! - Dr. Steve Brule, For Your Wine',
              'description': 'Dr. Brule reports live from Wine Country with a special report on wines.  \r\nWatch Tim and Eric Awesome Show Great Job! episode #20, "Embarrassed" on Adult Swim.\r\n\r\n',
          },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
      }]
  
      @staticmethod
@@ -123,7 +129,6 @@ class AdultSwimIE(InfoExtractor):
          else:
              collections = bootstrapped_data['show']['collections']
              collection, video_info = self.find_collection_containing_video(collections, episode_path)
-
              # Video wasn't found in the collections, let's try `slugged_video`.
              if video_info is None:
                  if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
@@ -133,7 +138,15 @@ class AdultSwimIE(InfoExtractor):
  
              show = bootstrapped_data['show']
              show_title = show['title']
-            segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']]
+            stream = video_info.get('stream')
+            clips = [stream] if stream else video_info.get('clips')
+            if not clips:
+                raise ExtractorError(
+                    'This video is only available via cable service provider subscription that'
+                    ' is not currently supported. You may want to use --cookies.'
+                    if video_info.get('auth') is True else 'Unable to find stream or clips',
+                    expected=True)
+            segment_ids = [clip['videoPlaybackID'] for clip in clips]
  
          episode_id = video_info['id']
          episode_title = video_info['title']
@@ -142,7 +155,7 @@ class AdultSwimIE(InfoExtractor):
  
          entries = []
          for part_num, segment_id in enumerate(segment_ids):
-            segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=mobile' % segment_id
+            segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id
  
              segment_title = '%s - %s' % (show_title, episode_title)
              if len(segment_ids) > 1:
@@ -156,19 +169,33 @@ class AdultSwimIE(InfoExtractor):
                  xpath_text(idoc, './/trt', 'segment duration').strip())
  
              formats = []
-            file_els = idoc.findall('.//files/file')
+            file_els = idoc.findall('.//files/file') or idoc.findall('./files/file')
  
+            unique_urls = []
+            unique_file_els = []
              for file_el in file_els:
+                media_url = file_el.text
+                if not media_url or determine_ext(media_url) == 'f4m':
+                    continue
+                if file_el.text not in unique_urls:
+                    unique_urls.append(file_el.text)
+                    unique_file_els.append(file_el)
+
+            for file_el in unique_file_els:
                  bitrate = file_el.attrib.get('bitrate')
                  ftype = file_el.attrib.get('type')
-
-                formats.append({
-                    'format_id': '%s_%s' % (bitrate, ftype),
-                    'url': file_el.text.strip(),
-                    # The bitrate may not be a number (for example: 'iphone')
-                    'tbr': int(bitrate) if bitrate.isdigit() else None,
-                    'quality': 1 if ftype == 'hd' else -1
-                })
+                media_url = file_el.text
+                if determine_ext(media_url) == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        media_url, segment_title, 'mp4', preference=0,
+                        m3u8_id='hls', fatal=False))
+                else:
+                    formats.append({
+                        'format_id': '%s_%s' % (bitrate, ftype),
+                        'url': file_el.text.strip(),
+                        # The bitrate may not be a number (for example: 'iphone')
+                        'tbr': int(bitrate) if bitrate.isdigit() else None,
+                    })
  
              self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/aenetworks.py b/youtube_dl/extractor/aenetworks.py

new file mode 100644 (file)

index 0000000..1bbfe26
--- /dev/null
+++ b/youtube_dl/extractor/aenetworks.py
@@ -0,0 +1,87 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+    unescapeHTML,
+)
+
+
+class AENetworksIE(InfoExtractor):
+    IE_NAME = 'aenetworks'
+    IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
+
+    _TESTS = [{
+        'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
+        'info_dict': {
+            'id': 'g12m5Gyt3fdR',
+            'ext': 'mp4',
+            'title': "Bet You Didn't Know: Valentine's Day",
+            'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
+            'timestamp': 1375819729,
+            'upload_date': '20130806',
+            'uploader': 'AENE-NEW',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['ThePlatform'],
+        'expected_warnings': ['JSON-LD'],
+    }, {
+        'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
+        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
+        'info_dict': {
+            'id': 'eg47EERs_JsZ',
+            'ext': 'mp4',
+            'title': 'Winter Is Coming',
+            'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
+            'timestamp': 1338306241,
+            'upload_date': '20120529',
+            'uploader': 'AENE-NEW',
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
+        'only_matching': True
+    }, {
+        'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
+        'only_matching': True
+    }, {
+        'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        page_type, video_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_url_re = [
+            r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
+            r"media_url\s*=\s*'([^']+)'"
+        ]
+        video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
+        query = {'mbr': 'true'}
+        if page_type == 'shows':
+            query['assetTypes'] = 'medium_video_s3'
+        if 'switch=hds' in video_url:
+            query['switch'] = 'hls'
+
+        info = self._search_json_ld(webpage, video_id, fatal=False)
+        info.update({
+            '_type': 'url_transparent',
+            'url': smuggle_url(
+                update_url_query(video_url, query),
+                {
+                    'sig': {
+                        'key': 'crazyjava',
+                        'secret': 's3cr3t'},
+                    'force_smil_url': True
+                }),
+        })
+        return info
diff --git a/youtube_dl/extractor/aftenposten.py b/youtube_dl/extractor/aftenposten.py

deleted file mode 100644 (file)

index 0c00acf..0000000
--- a/youtube_dl/extractor/aftenposten.py
+++ /dev/null
@@ -1,23 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-
-class AftenpostenIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?aftenposten\.no/webtv/(?:#!/)?video/(?P<id>\d+)'
-    _TEST = {
-        'url': 'http://www.aftenposten.no/webtv/#!/video/21039/trailer-sweatshop-i-can-t-take-any-more',
-        'md5': 'fd828cd29774a729bf4d4425fe192972',
-        'info_dict': {
-            'id': '21039',
-            'ext': 'mov',
-            'title': 'TRAILER: "Sweatshop" - I can´t take any more',
-            'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238',
-            'timestamp': 1416927969,
-            'upload_date': '20141125',
-        }
-    }
-
-    def _real_extract(self, url):
-        return self.url_result('xstream:ap:%s' % self._match_id(url), 'Xstream')
diff --git a/youtube_dl/extractor/aftonbladet.py b/youtube_dl/extractor/aftonbladet.py

index e0518cf261fbffc4dd23bc4a3800d04eae324139..d548592fe8acbbf2db432db3ed699b80b78e0aa0 100644 (file)
--- a/youtube_dl/extractor/aftonbladet.py
+++ b/youtube_dl/extractor/aftonbladet.py
@@ -6,7 +6,7 @@ from ..utils import int_or_none
  
  
  class AftonbladetIE(InfoExtractor):
-    _VALID_URL = r'http://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
          'info_dict': {
diff --git a/youtube_dl/extractor/airmozilla.py b/youtube_dl/extractor/airmozilla.py

index 611ad1e9d4a2c5892621802007b26c17e5802673..f8e70f4e580746093d97e3d2d596d008ed3e6c15 100644 (file)
--- a/youtube_dl/extractor/airmozilla.py
+++ b/youtube_dl/extractor/airmozilla.py
@@ -20,14 +20,14 @@ class AirMozillaIE(InfoExtractor):
              'id': '6x4q2w',
              'ext': 'mp4',
              'title': 'Privacy Lab - a meetup for privacy minded people in San Francisco',
-            'thumbnail': 're:https://\w+\.cloudfront\.net/6x4q2w/poster\.jpg\?t=\d+',
+            'thumbnail': 're:https?://vid\.ly/(?P<id>[0-9a-z-]+)/poster',
              'description': 'Brings together privacy professionals and others interested in privacy at for-profits, non-profits, and NGOs in an effort to contribute to the state of the ecosystem...',
              'timestamp': 1422487800,
              'upload_date': '20150128',
              'location': 'SFO Commons',
              'duration': 3780,
              'view_count': int,
-            'categories': ['Main'],
+            'categories': ['Main', 'Privacy'],
          }
      }
  
diff --git a/youtube_dl/extractor/aljazeera.py b/youtube_dl/extractor/aljazeera.py

index 612708e25730c7407d4bf6e76d8b674e39d62bad..b081695d8400c0e24d36e84bd8445efa084ed8b3 100644 (file)
--- a/youtube_dl/extractor/aljazeera.py
+++ b/youtube_dl/extractor/aljazeera.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class AlJazeeraIE(InfoExtractor):
-    _VALID_URL = r'http://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
+    _VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
  
      _TEST = {
          'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
@@ -13,23 +13,18 @@ class AlJazeeraIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Slum - Episode 1: Deliverance',
              'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
-            'uploader': 'Al Jazeera English',
+            'uploader_id': '665003303001',
+            'timestamp': 1411116829,
+            'upload_date': '20140919',
          },
-        'add_ie': ['Brightcove'],
+        'add_ie': ['BrightcoveNew'],
+        'skip': 'Not accessible from Travis CI server',
      }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
          program_name = self._match_id(url)
          webpage = self._download_webpage(url, program_name)
          brightcove_id = self._search_regex(
              r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
-
-        return {
-            '_type': 'url',
-            'url': (
-                'brightcove:'
-                'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
-                '&%40videoPlayer={0}'.format(brightcove_id)
-            ),
-            'ie_key': 'Brightcove',
-        }
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/allocine.py b/youtube_dl/extractor/allocine.py

index 7d65b81931fb2d9b3acb5dd4ab2961a2aec52bea..190bc2cc8730853a23b9025f1849bf234a32e001 100644 (file)
--- a/youtube_dl/extractor/allocine.py
+++ b/youtube_dl/extractor/allocine.py
@@ -8,6 +8,8 @@ from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
      qualities,
+    unescapeHTML,
+    xpath_element,
  )
  
  
@@ -31,7 +33,7 @@ class AllocineIE(InfoExtractor):
              'id': '19540403',
              'ext': 'mp4',
              'title': 'Planes 2 Bande-annonce VF',
-            'description': 'md5:eeaffe7c2d634525e21159b93acf3b1e',
+            'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
              'thumbnail': 're:http://.*\.jpg',
          },
      }, {
@@ -41,7 +43,7 @@ class AllocineIE(InfoExtractor):
              'id': '19544709',
              'ext': 'mp4',
              'title': 'Dragons 2 - Bande annonce finale VF',
-            'description': 'md5:71742e3a74b0d692c7fce0dd2017a4ac',
+            'description': 'md5:601d15393ac40f249648ef000720e7e3',
              'thumbnail': 're:http://.*\.jpg',
          },
      }, {
@@ -59,14 +61,18 @@ class AllocineIE(InfoExtractor):
          if typ == 'film':
              video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
          else:
-            player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player')
-
-            player_data = json.loads(player)
-            video_id = compat_str(player_data['refMedia'])
+            player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
+            if player:
+                player_data = json.loads(player)
+                video_id = compat_str(player_data['refMedia'])
+            else:
+                model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
+                model_data = self._parse_json(unescapeHTML(model), display_id)
+                video_id = compat_str(model_data['id'])
  
          xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
  
-        video = xml.find('.//AcVisionVideo').attrib
+        video = xpath_element(xml, './/AcVisionVideo').attrib
          quality = qualities(['ld', 'md', 'hd'])
  
          formats = []
diff --git a/youtube_dl/extractor/amp.py b/youtube_dl/extractor/amp.py

new file mode 100644 (file)

index 0000000..138fa08
--- /dev/null
+++ b/youtube_dl/extractor/amp.py
@@ -0,0 +1,83 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class AMPIE(InfoExtractor):
+    # parse Akamai Adaptive Media Player feed
+    def _extract_feed_info(self, url):
+        item = self._download_json(
+            url, None, 'Downloading Akamai AMP feed',
+            'Unable to download Akamai AMP feed')['channel']['item']
+
+        video_id = item['guid']
+
+        def get_media_node(name, default=None):
+            media_name = 'media-%s' % name
+            media_group = item.get('media-group') or item
+            return media_group.get(media_name) or item.get(media_name) or item.get(name, default)
+
+        thumbnails = []
+        media_thumbnail = get_media_node('thumbnail')
+        if media_thumbnail:
+            if isinstance(media_thumbnail, dict):
+                media_thumbnail = [media_thumbnail]
+            for thumbnail_data in media_thumbnail:
+                thumbnail = thumbnail_data['@attributes']
+                thumbnails.append({
+                    'url': self._proto_relative_url(thumbnail['url'], 'http:'),
+                    'width': int_or_none(thumbnail.get('width')),
+                    'height': int_or_none(thumbnail.get('height')),
+                })
+
+        subtitles = {}
+        media_subtitle = get_media_node('subTitle')
+        if media_subtitle:
+            if isinstance(media_subtitle, dict):
+                media_subtitle = [media_subtitle]
+            for subtitle_data in media_subtitle:
+                subtitle = subtitle_data['@attributes']
+                lang = subtitle.get('lang') or 'en'
+                subtitles[lang] = [{'url': subtitle['href']}]
+
+        formats = []
+        media_content = get_media_node('content')
+        if isinstance(media_content, dict):
+            media_content = [media_content]
+        for media_data in media_content:
+            media = media_data['@attributes']
+            media_type = media['type']
+            if media_type == 'video/f4m':
+                formats.extend(self._extract_f4m_formats(
+                    media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
+                    video_id, f4m_id='hds', fatal=False))
+            elif media_type == 'application/x-mpegURL':
+                formats.extend(self._extract_m3u8_formats(
+                    media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
+            else:
+                formats.append({
+                    'format_id': media_data['media-category']['@attributes']['label'],
+                    'url': media['url'],
+                    'tbr': int_or_none(media.get('bitrate')),
+                    'filesize': int_or_none(media.get('fileSize')),
+                })
+
+        self._sort_formats(formats)
+
+        timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
+
+        return {
+            'id': video_id,
+            'title': get_media_node('title'),
+            'description': get_media_node('description'),
+            'thumbnails': thumbnails,
+            'timestamp': timestamp,
+            'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/animeondemand.py b/youtube_dl/extractor/animeondemand.py

new file mode 100644 (file)

index 0000000..9b01e38
--- /dev/null
+++ b/youtube_dl/extractor/animeondemand.py
@@ -0,0 +1,242 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urlparse,
+    compat_str,
+)
+from ..utils import (
+    determine_ext,
+    extract_attributes,
+    ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
+)
+
+
+class AnimeOnDemandIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?anime-on-demand\.de/anime/(?P<id>\d+)'
+    _LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
+    _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
+    _NETRC_MACHINE = 'animeondemand'
+    _TESTS = [{
+        'url': 'https://www.anime-on-demand.de/anime/161',
+        'info_dict': {
+            'id': '161',
+            'title': 'Grimgar, Ashes and Illusions (OmU)',
+            'description': 'md5:6681ce3c07c7189d255ac6ab23812d31',
+        },
+        'playlist_mincount': 4,
+    }, {
+        # Film wording is used instead of Episode
+        'url': 'https://www.anime-on-demand.de/anime/39',
+        'only_matching': True,
+    }, {
+        # Episodes without titles
+        'url': 'https://www.anime-on-demand.de/anime/162',
+        'only_matching': True,
+    }, {
+        # ger/jap, Dub/OmU, account required
+        'url': 'https://www.anime-on-demand.de/anime/169',
+        'only_matching': True,
+    }]
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+
+        login_page = self._download_webpage(
+            self._LOGIN_URL, None, 'Downloading login page')
+
+        if '>Our licensing terms allow the distribution of animes only to German-speaking countries of Europe' in login_page:
+            self.raise_geo_restricted(
+                '%s is only available in German-speaking countries of Europe' % self.IE_NAME)
+
+        login_form = self._form_hidden_inputs('new_user', login_page)
+
+        login_form.update({
+            'user[login]': username,
+            'user[password]': password,
+        })
+
+        post_url = self._search_regex(
+            r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
+            'post url', default=self._LOGIN_URL, group='url')
+
+        if not post_url.startswith('http'):
+            post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
+
+        request = sanitized_Request(
+            post_url, urlencode_postdata(login_form))
+        request.add_header('Referer', self._LOGIN_URL)
+
+        response = self._download_webpage(
+            request, None, 'Logging in as %s' % username)
+
+        if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
+            error = self._search_regex(
+                r'<p class="alert alert-danger">(.+?)</p>',
+                response, 'error', default=None)
+            if error:
+                raise ExtractorError('Unable to login: %s' % error, expected=True)
+            raise ExtractorError('Unable to log in')
+
+    def _real_initialize(self):
+        self._login()
+
+    def _real_extract(self, url):
+        anime_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, anime_id)
+
+        if 'data-playlist=' not in webpage:
+            self._download_webpage(
+                self._APPLY_HTML5_URL, anime_id,
+                'Activating HTML5 beta', 'Unable to apply HTML5 beta')
+            webpage = self._download_webpage(url, anime_id)
+
+        csrf_token = self._html_search_meta(
+            'csrf-token', webpage, 'csrf token', fatal=True)
+
+        anime_title = self._html_search_regex(
+            r'(?s)<h1[^>]+itemprop="name"[^>]*>(.+?)</h1>',
+            webpage, 'anime name')
+        anime_description = self._html_search_regex(
+            r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
+            webpage, 'anime description', default=None)
+
+        entries = []
+
+        for num, episode_html in enumerate(re.findall(
+                r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
+            episodebox_title = self._search_regex(
+                (r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
+                 r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
+                episode_html, 'episodebox title', default=None, group='title')
+            if not episodebox_title:
+                continue
+
+            episode_number = int(self._search_regex(
+                r'(?:Episode|Film)\s*(\d+)',
+                episodebox_title, 'episode number', default=num))
+            episode_title = self._search_regex(
+                r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
+                episodebox_title, 'episode title', default=None)
+
+            video_id = 'episode-%d' % episode_number
+
+            common_info = {
+                'id': video_id,
+                'series': anime_title,
+                'episode': episode_title,
+                'episode_number': episode_number,
+            }
+
+            formats = []
+
+            for input_ in re.findall(
+                    r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
+                attributes = extract_attributes(input_)
+                playlist_urls = []
+                for playlist_key in ('data-playlist', 'data-otherplaylist'):
+                    playlist_url = attributes.get(playlist_key)
+                    if isinstance(playlist_url, compat_str) and re.match(
+                            r'/?[\da-zA-Z]+', playlist_url):
+                        playlist_urls.append(attributes[playlist_key])
+                if not playlist_urls:
+                    continue
+
+                lang = attributes.get('data-lang')
+                lang_note = attributes.get('value')
+
+                for playlist_url in playlist_urls:
+                    kind = self._search_regex(
+                        r'videomaterialurl/\d+/([^/]+)/',
+                        playlist_url, 'media kind', default=None)
+                    format_id_list = []
+                    if lang:
+                        format_id_list.append(lang)
+                    if kind:
+                        format_id_list.append(kind)
+                    if not format_id_list:
+                        format_id_list.append(compat_str(num))
+                    format_id = '-'.join(format_id_list)
+                    format_note = ', '.join(filter(None, (kind, lang_note)))
+                    request = sanitized_Request(
+                        compat_urlparse.urljoin(url, playlist_url),
+                        headers={
+                            'X-Requested-With': 'XMLHttpRequest',
+                            'X-CSRF-Token': csrf_token,
+                            'Referer': url,
+                            'Accept': 'application/json, text/javascript, */*; q=0.01',
+                        })
+                    playlist = self._download_json(
+                        request, video_id, 'Downloading %s playlist JSON' % format_id,
+                        fatal=False)
+                    if not playlist:
+                        continue
+                    start_video = playlist.get('startvideo', 0)
+                    playlist = playlist.get('playlist')
+                    if not playlist or not isinstance(playlist, list):
+                        continue
+                    playlist = playlist[start_video]
+                    title = playlist.get('title')
+                    if not title:
+                        continue
+                    description = playlist.get('description')
+                    for source in playlist.get('sources', []):
+                        file_ = source.get('file')
+                        if not file_:
+                            continue
+                        ext = determine_ext(file_)
+                        format_id_list = [lang, kind]
+                        if ext == 'm3u8':
+                            format_id_list.append('hls')
+                        elif source.get('type') == 'video/dash' or ext == 'mpd':
+                            format_id_list.append('dash')
+                        format_id = '-'.join(filter(None, format_id_list))
+                        if ext == 'm3u8':
+                            file_formats = self._extract_m3u8_formats(
+                                file_, video_id, 'mp4',
+                                entry_protocol='m3u8_native', m3u8_id=format_id, fatal=False)
+                        elif source.get('type') == 'video/dash' or ext == 'mpd':
+                            continue
+                            file_formats = self._extract_mpd_formats(
+                                file_, video_id, mpd_id=format_id, fatal=False)
+                        else:
+                            continue
+                        for f in file_formats:
+                            f.update({
+                                'language': lang,
+                                'format_note': format_note,
+                            })
+                        formats.extend(file_formats)
+
+            if formats:
+                self._sort_formats(formats)
+                f = common_info.copy()
+                f.update({
+                    'title': title,
+                    'description': description,
+                    'formats': formats,
+                })
+                entries.append(f)
+
+            # Extract teaser only when full episode is not available
+            if not formats:
+                m = re.search(
+                    r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
+                    episode_html)
+                if m:
+                    f = common_info.copy()
+                    f.update({
+                        'id': '%s-teaser' % f['id'],
+                        'title': m.group('title'),
+                        'url': compat_urlparse.urljoin(url, m.group('href')),
+                    })
+                    entries.append(f)
+
+        return self.playlist_result(entries, anime_id, anime_title, anime_description)
diff --git a/youtube_dl/extractor/anitube.py b/youtube_dl/extractor/anitube.py

index 31f0d417ce420de69f9e06ac0498242dc913bf73..2fd912da452dce26fb2e4ed2f939a22fa431f3dc 100644 (file)
--- a/youtube_dl/extractor/anitube.py
+++ b/youtube_dl/extractor/anitube.py
@@ -1,11 +1,9 @@
  from __future__ import unicode_literals
  
-import re
+from .nuevo import NuevoBaseIE
  
-from .common import InfoExtractor
  
-
-class AnitubeIE(InfoExtractor):
+class AnitubeIE(NuevoBaseIE):
      IE_NAME = 'anitube.se'
      _VALID_URL = r'https?://(?:www\.)?anitube\.se/video/(?P<id>\d+)'
  
@@ -22,38 +20,11 @@ class AnitubeIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        key = self._html_search_regex(
-            r'http://www\.anitube\.se/embed/([A-Za-z0-9_-]*)', webpage, 'key')
-
-        config_xml = self._download_xml(
-            'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, key)
-
-        video_title = config_xml.find('title').text
-        thumbnail = config_xml.find('image').text
-        duration = float(config_xml.find('duration').text)
-
-        formats = []
-        video_url = config_xml.find('file')
-        if video_url is not None:
-            formats.append({
-                'format_id': 'sd',
-                'url': video_url.text,
-            })
-        video_url = config_xml.find('filehd')
-        if video_url is not None:
-            formats.append({
-                'format_id': 'hd',
-                'url': video_url.text,
-            })
+        key = self._search_regex(
+            r'src=["\']https?://[^/]+/embed/([A-Za-z0-9_-]+)', webpage, 'key')
  
-        return {
-            'id': video_id,
-            'title': video_title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats
-        }
+        return self._extract_nuevo(
+            'http://www.anitube.se/nuevo/econfig.php?key=%s' % key, video_id)
diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py

index b51eafc45928f8e6ff4ce571763593f71b715583..24df8fe9305e7df0487965ed03756305feca3dea 100644 (file)
--- a/youtube_dl/extractor/aol.py
+++ b/youtube_dl/extractor/aol.py
@@ -1,70 +1,127 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
  
  
  class AolIE(InfoExtractor):
      IE_NAME = 'on.aol.com'
-    _VALID_URL = r'''(?x)
-        (?:
-            aol-video:|
-            http://on\.aol\.com/
-            (?:
-                video/.*-|
-                playlist/(?P<playlist_display_id>[^/?#]+?)-(?P<playlist_id>[0-9]+)[?#].*_videoid=
-            )
-        )
-        (?P<id>[0-9]+)
-        (?:$|\?)
-    '''
+    _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
  
      _TESTS = [{
+        # video with 5min ID
          'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
          'md5': '18ef68f48740e86ae94b98da815eec42',
          'info_dict': {
              'id': '518167793',
              'ext': 'mp4',
              'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
+            'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
+            'timestamp': 1395405060,
+            'upload_date': '20140321',
+            'uploader': 'Newsy Studio',
          },
-        'add_ie': ['FiveMin'],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
      }, {
-        'url': 'http://on.aol.com/playlist/brace-yourself---todays-weirdest-news-152147?icid=OnHomepageC4_Omg_Img#_videoid=518184316',
+        # video with vidible ID
+        'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
          'info_dict': {
-            'id': '152147',
-            'title': 'Brace Yourself - Today\'s Weirdest News',
+            'id': '5707d6b8e4b090497b04f706',
+            'ext': 'mp4',
+            'title': 'Netflix is Raising Rates',
+            'description': 'Netflix is rewarding millions of it’s long-standing members with an increase in cost. Veuer’s Carly Figueroa has more.',
+            'upload_date': '20160408',
+            'timestamp': 1460123280,
+            'uploader': 'Veuer',
          },
-        'playlist_mincount': 10,
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
+        'only_matching': True,
+    }, {
+        'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        playlist_id = mobj.group('playlist_id')
-        if not playlist_id or self._downloader.params.get('noplaylist'):
-            return self.url_result('5min:%s' % video_id)
+        video_id = self._match_id(url)
  
-        self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
+        response = self._download_json(
+            'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
+            video_id)['response']
+        if response['statusText'] != 'Ok':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
  
-        webpage = self._download_webpage(url, playlist_id)
-        title = self._html_search_regex(
-            r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title')
-        playlist_html = self._search_regex(
-            r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
-            'playlist HTML')
-        entries = [{
-            '_type': 'url',
-            'url': 'aol-video:%s' % m.group('id'),
-            'ie_key': 'Aol',
-        } for m in re.finditer(
-            r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
-            playlist_html)]
+        video_data = response['data']
+        formats = []
+        m3u8_url = video_data.get('videoMasterPlaylist')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+        for rendition in video_data.get('renditions', []):
+            video_url = rendition.get('url')
+            if not video_url:
+                continue
+            ext = rendition.get('format')
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+            else:
+                f = {
+                    'url': video_url,
+                    'format_id': rendition.get('quality'),
+                }
+                mobj = re.search(r'(\d+)x(\d+)', video_url)
+                if mobj:
+                    f.update({
+                        'width': int(mobj.group(1)),
+                        'height': int(mobj.group(2)),
+                    })
+                formats.append(f)
+        self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
  
          return {
-            '_type': 'playlist',
-            'id': playlist_id,
-            'display_id': mobj.group('playlist_display_id'),
-            'title': title,
-            'entries': entries,
+            'id': video_id,
+            'title': video_data['title'],
+            'duration': int_or_none(video_data.get('duration')),
+            'timestamp': int_or_none(video_data.get('publishDate')),
+            'view_count': int_or_none(video_data.get('views')),
+            'description': video_data.get('description'),
+            'uploader': video_data.get('videoOwner'),
+            'formats': formats,
          }
+
+
+class AolFeaturesIE(InfoExtractor):
+    IE_NAME = 'features.aol.com'
+    _VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
+
+    _TESTS = [{
+        'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',
+        'md5': '7db483bb0c09c85e241f84a34238cc75',
+        'info_dict': {
+            'id': '519507715',
+            'ext': 'mp4',
+            'title': 'What To Watch - February 17, 2016',
+        },
+        'add_ie': ['FiveMin'],
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        return self.url_result(self._search_regex(
+            r'<script type="text/javascript" src="(https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js[^"]+)"',
+            webpage, '5min embed url'), 'FiveMin')
diff --git a/youtube_dl/extractor/appletrailers.py b/youtube_dl/extractor/appletrailers.py

index 576f03b5b71115771555e1d8d46f4a108eb9de93..be40f85b487057b4cb319dba102cec76519880a5 100644 (file)
--- a/youtube_dl/extractor/appletrailers.py
+++ b/youtube_dl/extractor/appletrailers.py
@@ -11,61 +11,71 @@ from ..utils import (
  
  
  class AppleTrailersIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
+    IE_NAME = 'appletrailers'
+    _VALID_URL = r'https?://(?:www\.|movie)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
      _TESTS = [{
-        "url": "http://trailers.apple.com/trailers/wb/manofsteel/",
+        'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
          'info_dict': {
              'id': 'manofsteel',
          },
-        "playlist": [
+        'playlist': [
              {
-                "md5": "d97a8e575432dbcb81b7c3acb741f8a8",
-                "info_dict": {
-                    "id": "manofsteel-trailer4",
-                    "ext": "mov",
-                    "duration": 111,
-                    "title": "Trailer 4",
-                    "upload_date": "20130523",
-                    "uploader_id": "wb",
+                'md5': 'd97a8e575432dbcb81b7c3acb741f8a8',
+                'info_dict': {
+                    'id': 'manofsteel-trailer4',
+                    'ext': 'mov',
+                    'duration': 111,
+                    'title': 'Trailer 4',
+                    'upload_date': '20130523',
+                    'uploader_id': 'wb',
                  },
              },
              {
-                "md5": "b8017b7131b721fb4e8d6f49e1df908c",
-                "info_dict": {
-                    "id": "manofsteel-trailer3",
-                    "ext": "mov",
-                    "duration": 182,
-                    "title": "Trailer 3",
-                    "upload_date": "20130417",
-                    "uploader_id": "wb",
+                'md5': 'b8017b7131b721fb4e8d6f49e1df908c',
+                'info_dict': {
+                    'id': 'manofsteel-trailer3',
+                    'ext': 'mov',
+                    'duration': 182,
+                    'title': 'Trailer 3',
+                    'upload_date': '20130417',
+                    'uploader_id': 'wb',
                  },
              },
              {
-                "md5": "d0f1e1150989b9924679b441f3404d48",
-                "info_dict": {
-                    "id": "manofsteel-trailer",
-                    "ext": "mov",
-                    "duration": 148,
-                    "title": "Trailer",
-                    "upload_date": "20121212",
-                    "uploader_id": "wb",
+                'md5': 'd0f1e1150989b9924679b441f3404d48',
+                'info_dict': {
+                    'id': 'manofsteel-trailer',
+                    'ext': 'mov',
+                    'duration': 148,
+                    'title': 'Trailer',
+                    'upload_date': '20121212',
+                    'uploader_id': 'wb',
                  },
              },
              {
-                "md5": "5fe08795b943eb2e757fa95cb6def1cb",
-                "info_dict": {
-                    "id": "manofsteel-teaser",
-                    "ext": "mov",
-                    "duration": 93,
-                    "title": "Teaser",
-                    "upload_date": "20120721",
-                    "uploader_id": "wb",
+                'md5': '5fe08795b943eb2e757fa95cb6def1cb',
+                'info_dict': {
+                    'id': 'manofsteel-teaser',
+                    'ext': 'mov',
+                    'duration': 93,
+                    'title': 'Teaser',
+                    'upload_date': '20120721',
+                    'uploader_id': 'wb',
                  },
              },
          ]
+    }, {
+        'url': 'http://trailers.apple.com/trailers/magnolia/blackthorn/',
+        'info_dict': {
+            'id': 'blackthorn',
+        },
+        'playlist_mincount': 2,
      }, {
          'url': 'http://trailers.apple.com/ca/metropole/autrui/',
          'only_matching': True,
+    }, {
+        'url': 'http://movietrailers.apple.com/trailers/focus_features/kuboandthetwostrings/',
+        'only_matching': True,
      }]
  
      _JSON_RE = r'iTunes.playURL\((.*?)\);'
@@ -79,7 +89,7 @@ class AppleTrailersIE(InfoExtractor):
  
          def fix_html(s):
              s = re.sub(r'(?s)<script[^<]*?>.*?</script>', '', s)
-            s = re.sub(r'<img ([^<]*?)>', r'<img \1/>', s)
+            s = re.sub(r'<img ([^<]*?)/?>', r'<img \1/>', s)
              # The ' in the onClick attributes are not escaped, it couldn't be parsed
              # like: http://trailers.apple.com/trailers/wb/gravity/
  
@@ -96,6 +106,9 @@ class AppleTrailersIE(InfoExtractor):
              trailer_info_json = self._search_regex(self._JSON_RE,
                                                     on_click, 'trailer info')
              trailer_info = json.loads(trailer_info_json)
+            first_url = trailer_info.get('url')
+            if not first_url:
+                continue
              title = trailer_info['title']
              video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
              thumbnail = li.find('.//img').attrib['src']
@@ -107,7 +120,6 @@ class AppleTrailersIE(InfoExtractor):
              if m:
                  duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
  
-            first_url = trailer_info['url']
              trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
              settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
              settings = self._download_json(settings_json_url, trailer_id, 'Downloading settings json')
@@ -144,3 +156,76 @@ class AppleTrailersIE(InfoExtractor):
              'id': movie,
              'entries': playlist,
          }
+
+
+class AppleTrailersSectionIE(InfoExtractor):
+    IE_NAME = 'appletrailers:section'
+    _SECTIONS = {
+        'justadded': {
+            'feed_path': 'just_added',
+            'title': 'Just Added',
+        },
+        'exclusive': {
+            'feed_path': 'exclusive',
+            'title': 'Exclusive',
+        },
+        'justhd': {
+            'feed_path': 'just_hd',
+            'title': 'Just HD',
+        },
+        'mostpopular': {
+            'feed_path': 'most_pop',
+            'title': 'Most Popular',
+        },
+        'moviestudios': {
+            'feed_path': 'studios',
+            'title': 'Movie Studios',
+        },
+    }
+    _VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/#section=(?P<id>%s)' % '|'.join(_SECTIONS)
+    _TESTS = [{
+        'url': 'http://trailers.apple.com/#section=justadded',
+        'info_dict': {
+            'title': 'Just Added',
+            'id': 'justadded',
+        },
+        'playlist_mincount': 80,
+    }, {
+        'url': 'http://trailers.apple.com/#section=exclusive',
+        'info_dict': {
+            'title': 'Exclusive',
+            'id': 'exclusive',
+        },
+        'playlist_mincount': 80,
+    }, {
+        'url': 'http://trailers.apple.com/#section=justhd',
+        'info_dict': {
+            'title': 'Just HD',
+            'id': 'justhd',
+        },
+        'playlist_mincount': 80,
+    }, {
+        'url': 'http://trailers.apple.com/#section=mostpopular',
+        'info_dict': {
+            'title': 'Most Popular',
+            'id': 'mostpopular',
+        },
+        'playlist_mincount': 80,
+    }, {
+        'url': 'http://trailers.apple.com/#section=moviestudios',
+        'info_dict': {
+            'title': 'Movie Studios',
+            'id': 'moviestudios',
+        },
+        'playlist_mincount': 80,
+    }]
+
+    def _real_extract(self, url):
+        section = self._match_id(url)
+        section_data = self._download_json(
+            'http://trailers.apple.com/trailers/home/feeds/%s.json' % self._SECTIONS[section]['feed_path'],
+            section)
+        entries = [
+            self.url_result('http://trailers.apple.com' + e['location'])
+            for e in section_data]
+        return self.playlist_result(entries, section, self._SECTIONS[section]['title'])
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 6f465789b497a6625776c383ff699a64b0b5c346..26446c2fe1e4ecb0b15b6ec87a927a2b6151a1da 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -14,8 +14,8 @@ from ..utils import (
      parse_duration,
      unified_strdate,
      xpath_text,
-    parse_xml,
  )
+from ..compat import compat_etree_fromstring
  
  
  class ARDMediathekIE(InfoExtractor):
@@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
          subtitle_url = media_info.get('_subtitleUrl')
          if subtitle_url:
              subtitles['de'] = [{
-                'ext': 'srt',
+                'ext': 'ttml',
                  'url': subtitle_url,
              }]
  
@@ -110,13 +110,15 @@ class ARDMediathekIE(InfoExtractor):
                  server = stream.get('_server')
                  for stream_url in stream_urls:
                      ext = determine_ext(stream_url)
+                    if quality != 'auto' and ext in ('f4m', 'm3u8'):
+                        continue
                      if ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
                              stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
-                            video_id, preference=-1, f4m_id='hds'))
+                            video_id, preference=-1, f4m_id='hds', fatal=False))
                      elif ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            stream_url, video_id, 'mp4', preference=1, m3u8_id='hls'))
+                            stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
                      else:
                          if server and server.startswith('rtmp'):
                              f = {
@@ -161,7 +163,7 @@ class ARDMediathekIE(InfoExtractor):
              raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True)
  
          if re.search(r'[\?&]rss($|[=&])', url):
-            doc = parse_xml(webpage)
+            doc = compat_etree_fromstring(webpage.encode('utf-8'))
              if doc.tag == 'rss':
                  return GenericIE()._extract_rss(url, video_id, doc)
  
diff --git a/youtube_dl/extractor/arte.py b/youtube_dl/extractor/arte.py

index 76de244774369dd53c510961ecf6b6a7641c7027..a9e3266dcb138794774e30ad2c0af0dea645463f 100644 (file)
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@@ -4,11 +4,16 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_parse_qs,
+    compat_urllib_parse_urlparse,
+)
  from ..utils import (
      find_xpath_attr,
      unified_strdate,
      get_element_by_attribute,
      int_or_none,
+    NO_DEFAULT,
      qualities,
  )
  
@@ -18,7 +23,7 @@ from ..utils import (
  
  
  class ArteTvIE(InfoExtractor):
-    _VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
+    _VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
      IE_NAME = 'arte.tv'
  
      def _real_extract(self, url):
@@ -58,15 +63,19 @@ class ArteTvIE(InfoExtractor):
  
  class ArteTVPlus7IE(InfoExtractor):
      IE_NAME = 'arte.tv:+7'
-    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
+    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
  
      @classmethod
      def _extract_url_info(cls, url):
          mobj = re.match(cls._VALID_URL, url)
          lang = mobj.group('lang')
-        # This is not a real id, it can be for example AJT for the news
-        # http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
-        video_id = mobj.group('id')
+        query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
+        if 'vid' in query:
+            video_id = query['vid'][0]
+        else:
+            # This is not a real id, it can be for example AJT for the news
+            # http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
+            video_id = mobj.group('id')
          return video_id, lang
  
      def _real_extract(self, url):
@@ -75,20 +84,63 @@ class ArteTVPlus7IE(InfoExtractor):
          return self._extract_from_webpage(webpage, video_id, lang)
  
      def _extract_from_webpage(self, webpage, video_id, lang):
+        patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
+        ids = (video_id, '')
+        # some pages contain multiple videos (like
+        # http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
+        # so we first try to look for json URLs that contain the video id from
+        # the 'vid' parameter.
+        patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
          json_url = self._html_search_regex(
-            [r'arte_vp_url=["\'](.*?)["\']', r'data-url=["\']([^"]+)["\']'],
-            webpage, 'json vp url')
-        return self._extract_from_json_url(json_url, video_id, lang)
-
-    def _extract_from_json_url(self, json_url, video_id, lang):
+            patterns, webpage, 'json vp url', default=None)
+        if not json_url:
+            def find_iframe_url(webpage, default=NO_DEFAULT):
+                return self._html_search_regex(
+                    r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
+                    webpage, 'iframe url', group='url', default=default)
+
+            iframe_url = find_iframe_url(webpage, None)
+            if not iframe_url:
+                embed_url = self._html_search_regex(
+                    r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
+                if embed_url:
+                    player = self._download_json(
+                        embed_url, video_id, 'Downloading player page')
+                    iframe_url = find_iframe_url(player['html'])
+            # en and es URLs produce react-based pages with different layout (e.g.
+            # http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
+            if not iframe_url:
+                program = self._search_regex(
+                    r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
+                    webpage, 'program', default=None)
+                if program:
+                    embed_html = self._parse_json(program, video_id)
+                    if embed_html:
+                        iframe_url = find_iframe_url(embed_html['embed_html'])
+            if iframe_url:
+                json_url = compat_parse_qs(
+                    compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
+        if json_url:
+            title = self._search_regex(
+                r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
+                webpage, 'title', default=None, group='title')
+            return self._extract_from_json_url(json_url, video_id, lang, title=title)
+        # Different kind of embed URL (e.g.
+        # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
+        embed_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'embed url', group='url')
+        return self.url_result(embed_url)
+
+    def _extract_from_json_url(self, json_url, video_id, lang, title=None):
          info = self._download_json(json_url, video_id)
          player_info = info['videoJsonPlayer']
  
          upload_date_str = player_info.get('shootingDate')
          if not upload_date_str:
-            upload_date_str = player_info.get('VDA', '').split(' ')[0]
+            upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
  
-        title = player_info['VTI'].strip()
+        title = (player_info.get('VTI') or title or player_info['VID']).strip()
          subtitle = player_info.get('VSU', '').strip()
          if subtitle:
              title += ' - %s' % subtitle
@@ -102,27 +154,30 @@ class ArteTVPlus7IE(InfoExtractor):
          }
          qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
  
+        LANGS = {
+            'fr': 'F',
+            'de': 'A',
+            'en': 'E[ANG]',
+            'es': 'E[ESP]',
+        }
+
          formats = []
          for format_id, format_dict in player_info['VSR'].items():
              f = dict(format_dict)
              versionCode = f.get('versionCode')
-
-            langcode = {
-                'fr': 'F',
-                'de': 'A',
-            }.get(lang, lang)
-            lang_rexs = [r'VO?%s' % langcode, r'VO?.-ST%s' % langcode]
-            lang_pref = (
-                None if versionCode is None else (
-                    10 if any(re.match(r, versionCode) for r in lang_rexs)
-                    else -10))
+            langcode = LANGS.get(lang, lang)
+            lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
+            lang_pref = None
+            if versionCode:
+                matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)]
+                lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs)
              source_pref = 0
              if versionCode is not None:
                  # The original version with subtitles has lower relevance
-                if re.match(r'VO-ST(F|A)', versionCode):
+                if re.match(r'VO-ST(F|A|E)', versionCode):
                      source_pref -= 10
                  # The version with sourds/mal subtitles has also lower relevance
-                elif re.match(r'VO?(F|A)-STM\1', versionCode):
+                elif re.match(r'VO?(F|A|E)-STM\1', versionCode):
                      source_pref -= 9
              format = {
                  'format_id': format_id,
@@ -155,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor):
  # It also uses the arte_vp_url url from the webpage to extract the information
  class ArteTVCreativeIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:creative'
-    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/(?:magazine?/)?(?P<id>[^?#]+)'
+    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
  
      _TESTS = [{
          'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@@ -174,35 +229,48 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
              'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
              'upload_date': '20140805',
          }
+    }, {
+        'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
+        'only_matching': True,
      }]
  
  
-class ArteTVFutureIE(ArteTVPlus7IE):
-    IE_NAME = 'arte.tv:future'
-    _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
+class ArteTVInfoIE(ArteTVPlus7IE):
+    IE_NAME = 'arte.tv:info'
+    _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
  
      _TEST = {
-        'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
+        'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
          'info_dict': {
-            'id': '5201',
+            'id': '067528-000-A',
              'ext': 'mp4',
-            'title': 'Les champignons au secours de la planète',
-            'upload_date': '20131101',
+            'title': 'Service civique, un cache misère ?',
+            'upload_date': '20160403',
          },
      }
  
-    def _real_extract(self, url):
-        anchor_id, lang = self._extract_url_info(url)
-        webpage = self._download_webpage(url, anchor_id)
-        row = self._search_regex(
-            r'(?s)id="%s"[^>]*>.+?(<div[^>]*arte_vp_url[^>]*>)' % anchor_id,
-            webpage, 'row')
-        return self._extract_from_webpage(row, anchor_id, lang)
+
+class ArteTVFutureIE(ArteTVPlus7IE):
+    IE_NAME = 'arte.tv:future'
+    _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
+        'info_dict': {
+            'id': '050940-028-A',
+            'ext': 'mp4',
+            'title': 'Les écrevisses aussi peuvent être anxieuses',
+            'upload_date': '20140902',
+        },
+    }, {
+        'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
+        'only_matching': True,
+    }]
  
  
  class ArteTVDDCIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:ddc'
-    _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
+    _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
  
      def _real_extract(self, url):
          video_id, lang = self._extract_url_info(url)
@@ -220,7 +288,7 @@ class ArteTVDDCIE(ArteTVPlus7IE):
  
  class ArteTVConcertIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:concert'
-    _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
+    _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
  
      _TEST = {
          'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
@@ -235,11 +303,59 @@ class ArteTVConcertIE(ArteTVPlus7IE):
      }
  
  
+class ArteTVCinemaIE(ArteTVPlus7IE):
+    IE_NAME = 'arte.tv:cinema'
+    _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
+
+    _TEST = {
+        'url': 'http://cinema.arte.tv/de/node/38291',
+        'md5': '6b275511a5107c60bacbeeda368c3aa1',
+        'info_dict': {
+            'id': '055876-000_PWA12025-D',
+            'ext': 'mp4',
+            'title': 'Tod auf dem Nil',
+            'upload_date': '20160122',
+            'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
+        },
+    }
+
+
+class ArteTVMagazineIE(ArteTVPlus7IE):
+    IE_NAME = 'arte.tv:magazine'
+    _VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        # Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
+        'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
+        'md5': '2a9369bcccf847d1c741e51416299f25',
+        'info_dict': {
+            'id': '065965-000-A',
+            'ext': 'mp4',
+            'title': 'Trepalium - Extrait Ep.01',
+            'upload_date': '20160121',
+        },
+    }, {
+        # Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
+        'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
+        'md5': 'fedc64fc7a946110fe311634e79782ca',
+        'info_dict': {
+            'id': '054813-004_PLUS7-F',
+            'ext': 'mp4',
+            'title': 'Trepalium (4/6)',
+            'description': 'md5:10057003c34d54e95350be4f9b05cb40',
+            'upload_date': '20160218',
+        },
+    }, {
+        'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
+        'only_matching': True,
+    }]
+
+
  class ArteTVEmbedIE(ArteTVPlus7IE):
      IE_NAME = 'arte.tv:embed'
      _VALID_URL = r'''(?x)
          http://www\.arte\.tv
-        /playerv2/embed\.php\?json_url=
+        /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
          (?P<json_url>
              http://arte\.tv/papi/tvguide/videos/stream/player/
              (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
diff --git a/youtube_dl/extractor/atresplayer.py b/youtube_dl/extractor/atresplayer.py

index 29f8795d3dfe2bdae9993f9b1fd3d278cb8c3a9c..d2f3889645f9b9324deb0eda00d4f6b67ab32dc1 100644 (file)
--- a/youtube_dl/extractor/atresplayer.py
+++ b/youtube_dl/extractor/atresplayer.py
@@ -2,18 +2,18 @@ from __future__ import unicode_literals
  
  import time
  import hmac
+import hashlib
+import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_str
  from ..utils import (
-    int_or_none,
+    ExtractorError,
      float_or_none,
+    int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
      xpath_text,
-    ExtractorError,
  )
  
  
@@ -32,6 +32,19 @@ class AtresPlayerIE(InfoExtractor):
                  'duration': 5527.6,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
+            'skip': 'This video is only available for registered users'
+        },
+        {
+            'url': 'http://www.atresplayer.com/television/especial/videoencuentros/temporada-1/capitulo-112-david-bustamante_2014121600375.html',
+            'md5': '0d0e918533bbd4b263f2de4d197d4aac',
+            'info_dict': {
+                'id': 'capitulo-112-david-bustamante',
+                'ext': 'flv',
+                'title': 'David Bustamante',
+                'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
+                'duration': 1439.0,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            },
          },
          {
              'url': 'http://www.atresplayer.com/television/series/el-secreto-de-puente-viejo/el-chico-de-los-tres-lunares/capitulo-977-29-12-14_2014122400174.html',
@@ -50,6 +63,13 @@ class AtresPlayerIE(InfoExtractor):
  
      _LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
  
+    _ERRORS = {
+        'UNPUBLISHED': 'We\'re sorry, but this video is not yet available.',
+        'DELETED': 'This video has expired and is no longer available for online streaming.',
+        'GEOUNPUBLISHED': 'We\'re sorry, but this video is not available in your region due to right restrictions.',
+        # 'PREMIUM': 'PREMIUM',
+    }
+
      def _real_initialize(self):
          self._login()
  
@@ -63,8 +83,8 @@ class AtresPlayerIE(InfoExtractor):
              'j_password': password,
          }
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          response = self._download_webpage(
              request, None, 'Logging in as %s' % username)
@@ -83,58 +103,72 @@ class AtresPlayerIE(InfoExtractor):
          episode_id = self._search_regex(
              r'episode="([^"]+)"', webpage, 'episode id')
  
+        request = sanitized_Request(
+            self._PLAYER_URL_TEMPLATE % episode_id,
+            headers={'User-Agent': self._USER_AGENT})
+        player = self._download_json(request, episode_id, 'Downloading player JSON')
+
+        episode_type = player.get('typeOfEpisode')
+        error_message = self._ERRORS.get(episode_type)
+        if error_message:
+            raise ExtractorError(
+                '%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
+
+        formats = []
+        video_url = player.get('urlVideo')
+        if video_url:
+            format_info = {
+                'url': video_url,
+                'format_id': 'http',
+            }
+            mobj = re.search(r'(?P<bitrate>\d+)K_(?P<width>\d+)x(?P<height>\d+)', video_url)
+            if mobj:
+                format_info.update({
+                    'width': int_or_none(mobj.group('width')),
+                    'height': int_or_none(mobj.group('height')),
+                    'tbr': int_or_none(mobj.group('bitrate')),
+                })
+            formats.append(format_info)
+
          timestamp = int_or_none(self._download_webpage(
              self._TIME_API_URL,
              video_id, 'Downloading timestamp', fatal=False), 1000, time.time())
          timestamp_shifted = compat_str(timestamp + self._TIMESTAMP_SHIFT)
          token = hmac.new(
              self._MAGIC.encode('ascii'),
-            (episode_id + timestamp_shifted).encode('utf-8')
+            (episode_id + timestamp_shifted).encode('utf-8'), hashlib.md5
          ).hexdigest()
  
-        formats = []
-        for fmt in ['windows', 'android_tablet']:
-            request = compat_urllib_request.Request(
-                self._URL_VIDEO_TEMPLATE.format(fmt, episode_id, timestamp_shifted, token))
-            request.add_header('User-Agent', self._USER_AGENT)
-
-            fmt_json = self._download_json(
-                request, video_id, 'Downloading %s video JSON' % fmt)
-
-            result = fmt_json.get('resultDes')
-            if result.lower() != 'ok':
-                raise ExtractorError(
-                    '%s returned error: %s' % (self.IE_NAME, result), expected=True)
-
-            for format_id, video_url in fmt_json['resultObject'].items():
-                if format_id == 'token' or not video_url.startswith('http'):
-                    continue
-                if video_url.endswith('/Manifest'):
-                    if 'geodeswowsmpra3player' in video_url:
-                        f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
-                        f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
-                        # this videos are protected by DRM, the f4m downloader doesn't support them
-                        continue
-                    else:
-                        f4m_url = video_url[:-9] + '/manifest.f4m'
-                    formats.extend(self._extract_f4m_formats(f4m_url, video_id))
-                else:
-                    formats.append({
-                        'url': video_url,
-                        'format_id': 'android-%s' % format_id,
-                        'preference': 1,
-                    })
-        self._sort_formats(formats)
+        request = sanitized_Request(
+            self._URL_VIDEO_TEMPLATE.format('windows', episode_id, timestamp_shifted, token),
+            headers={'User-Agent': self._USER_AGENT})
  
-        player = self._download_json(
-            self._PLAYER_URL_TEMPLATE % episode_id,
-            episode_id)
+        fmt_json = self._download_json(
+            request, video_id, 'Downloading windows video JSON')
+
+        result = fmt_json.get('resultDes')
+        if result.lower() != 'ok':
+            raise ExtractorError(
+                '%s returned error: %s' % (self.IE_NAME, result), expected=True)
+
+        for format_id, video_url in fmt_json['resultObject'].items():
+            if format_id == 'token' or not video_url.startswith('http'):
+                continue
+            if 'geodeswowsmpra3player' in video_url:
+                f4m_path = video_url.split('smil:', 1)[-1].split('free_', 1)[0]
+                f4m_url = 'http://drg.antena3.com/{0}hds/es/sd.f4m'.format(f4m_path)
+                # this videos are protected by DRM, the f4m downloader doesn't support them
+                continue
+            else:
+                f4m_url = video_url[:-9] + '/manifest.f4m'
+            formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
+        self._sort_formats(formats)
  
          path_data = player.get('pathData')
  
          episode = self._download_xml(
-            self._EPISODE_URL_TEMPLATE % path_data,
-            video_id, 'Downloading episode XML')
+            self._EPISODE_URL_TEMPLATE % path_data, video_id,
+            'Downloading episode XML')
  
          duration = float_or_none(xpath_text(
              episode, './media/asset/info/technical/contentDuration', 'duration'))
diff --git a/youtube_dl/extractor/audimedia.py b/youtube_dl/extractor/audimedia.py

new file mode 100644 (file)

index 0000000..aa69256
--- /dev/null
+++ b/youtube_dl/extractor/audimedia.py
@@ -0,0 +1,89 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+)
+
+
+class AudiMediaIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?P<id>[^/?#]+)'
+    _TEST = {
+        'url': 'https://www.audi-mediacenter.com/en/audimediatv/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-1467',
+        'md5': '79a8b71c46d49042609795ab59779b66',
+        'info_dict': {
+            'id': '1565',
+            'ext': 'mp4',
+            'title': '60 Seconds of Audi Sport 104/2015 - WEC Bahrain, Rookie Test',
+            'description': 'md5:60e5d30a78ced725f7b8d34370762941',
+            'upload_date': '20151124',
+            'timestamp': 1448354940,
+            'duration': 74022,
+            'view_count': int,
+        }
+    }
+    # extracted from https://audimedia.tv/assets/embed/embedded-player.js (dataSourceAuthToken)
+    _AUTH_TOKEN = 'e25b42847dba18c6c8816d5d8ce94c326e06823ebf0859ed164b3ba169be97f2'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        raw_payload = self._search_regex([
+            r'class="amtv-embed"[^>]+id="([^"]+)"',
+            r'class=\\"amtv-embed\\"[^>]+id=\\"([^"]+)\\"',
+        ], webpage, 'raw payload')
+        _, stage_mode, video_id, lang = raw_payload.split('-')
+
+        # TODO: handle s and e stage_mode (live streams and ended live streams)
+        if stage_mode not in ('s', 'e'):
+            request = sanitized_Request(
+                'https://audimedia.tv/api/video/v1/videos/%s?embed[]=video_versions&embed[]=thumbnail_image&where[content_language_iso]=%s' % (video_id, lang),
+                headers={'X-Auth-Token': self._AUTH_TOKEN})
+            json_data = self._download_json(request, video_id)['results']
+            formats = []
+
+            stream_url_hls = json_data.get('stream_url_hls')
+            if stream_url_hls:
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url_hls, video_id, 'mp4',
+                    entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+
+            stream_url_hds = json_data.get('stream_url_hds')
+            if stream_url_hds:
+                formats.extend(self._extract_f4m_formats(
+                    stream_url_hds + '?hdcore=3.4.0',
+                    video_id, f4m_id='hds', fatal=False))
+
+            for video_version in json_data.get('video_versions'):
+                video_version_url = video_version.get('download_url') or video_version.get('stream_url')
+                if not video_version_url:
+                    continue
+                f = {
+                    'url': video_version_url,
+                    'width': int_or_none(video_version.get('width')),
+                    'height': int_or_none(video_version.get('height')),
+                    'abr': int_or_none(video_version.get('audio_bitrate')),
+                    'vbr': int_or_none(video_version.get('video_bitrate')),
+                }
+                bitrate = self._search_regex(r'(\d+)k', video_version_url, 'bitrate', default=None)
+                if bitrate:
+                    f.update({
+                        'format_id': 'http-%s' % bitrate,
+                    })
+                formats.append(f)
+            self._sort_formats(formats)
+
+            return {
+                'id': video_id,
+                'title': json_data['title'],
+                'description': json_data.get('subtitle'),
+                'thumbnail': json_data.get('thumbnail_image', {}).get('file'),
+                'timestamp': parse_iso8601(json_data.get('publication_date')),
+                'duration': int_or_none(json_data.get('duration')),
+                'view_count': int_or_none(json_data.get('view_count')),
+                'formats': formats,
+            }
diff --git a/youtube_dl/extractor/audioboom.py b/youtube_dl/extractor/audioboom.py

new file mode 100644 (file)

index 0000000..2ec2d70
--- /dev/null
+++ b/youtube_dl/extractor/audioboom.py
@@ -0,0 +1,66 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import float_or_none
+
+
+class AudioBoomIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?audioboom\.com/boos/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
+        'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
+        'info_dict': {
+            'id': '4279833',
+            'ext': 'mp3',
+            'title': '3/09/2016 Czaban Hour 3',
+            'description': 'Guest:   Nate Davis - NFL free agency,   Guest:   Stan Gans',
+            'duration': 2245.72,
+            'uploader': 'Steve Czaban',
+            'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        clip = None
+
+        clip_store = self._parse_json(
+            self._search_regex(
+                r'data-new-clip-store=(["\'])(?P<json>{.*?"clipId"\s*:\s*%s.*?})\1' % video_id,
+                webpage, 'clip store', default='{}', group='json'),
+            video_id, fatal=False)
+        if clip_store:
+            clips = clip_store.get('clips')
+            if clips and isinstance(clips, list) and isinstance(clips[0], dict):
+                clip = clips[0]
+
+        def from_clip(field):
+            if clip:
+                clip.get(field)
+
+        audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
+            'audio', webpage, 'audio url')
+        title = from_clip('title') or self._og_search_title(webpage)
+        description = from_clip('description') or self._og_search_description(webpage)
+
+        duration = float_or_none(from_clip('duration') or self._html_search_meta(
+            'weibo:audio:duration', webpage))
+
+        uploader = from_clip('author') or self._og_search_property(
+            'audio:artist', webpage, 'uploader', fatal=False)
+        uploader_url = from_clip('author_url') or self._html_search_meta(
+            'audioboo:channel', webpage, 'uploader url')
+
+        return {
+            'id': video_id,
+            'url': audio_url,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'uploader': uploader,
+            'uploader_url': uploader_url,
+        }
diff --git a/youtube_dl/extractor/audiomack.py b/youtube_dl/extractor/audiomack.py

index 693ba22c6dde0dd1760531108353ab48b1ed0fa1..a52d26cecd1e98f8d4a902ed4e8051a42e21e200 100644 (file)
--- a/youtube_dl/extractor/audiomack.py
+++ b/youtube_dl/extractor/audiomack.py
@@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
          # audiomack wrapper around soundcloud song
          {
              'add_ie': ['Soundcloud'],
-            'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
+            'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
              'info_dict': {
-                'id': '172419696',
+                'id': '258901379',
                  'ext': 'mp3',
-                'description': 'md5:1fc3272ed7a635cce5be1568c2822997',
-                'title': 'Young Thug ft Lil Wayne - Take Kare',
-                'uploader': 'Young Thug World',
-                'upload_date': '20141016',
+                'description': 'mamba day freestyle for the legend Kobe Bryant ',
+                'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
+                'uploader': 'ILOVEMAKONNEN',
+                'upload_date': '20160414',
              }
          },
      ]
@@ -56,7 +56,7 @@ class AudiomackIE(InfoExtractor):
  
          # API is inconsistent with errors
          if 'url' not in api_response or not api_response['url'] or 'error' in api_response:
-            raise ExtractorError('Invalid url %s', url)
+            raise ExtractorError('Invalid url %s' % url)
  
          # Audiomack wraps a lot of soundcloud tracks in their branded wrapper
          # if so, pass the work off to the soundcloud extractor
diff --git a/youtube_dl/extractor/azubu.py b/youtube_dl/extractor/azubu.py

index 0961d339fd09b15cc867377db6650b08064a0f25..efa624de1cbfddb741a7f8114059165d4e099095 100644 (file)
--- a/youtube_dl/extractor/azubu.py
+++ b/youtube_dl/extractor/azubu.py
@@ -3,7 +3,11 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..utils import (
+    ExtractorError,
+    float_or_none,
+    sanitized_Request,
+)
  
  
  class AzubuIE(InfoExtractor):
@@ -91,3 +95,38 @@ class AzubuIE(InfoExtractor):
              'view_count': view_count,
              'formats': formats,
          }
+
+
+class AzubuLiveIE(InfoExtractor):
+    _VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
+
+    _TEST = {
+        'url': 'http://www.azubu.tv/MarsTVMDLen',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        user = self._match_id(url)
+
+        info = self._download_json(
+            'http://api.azubu.tv/public/modules/last-video/{0}/info'.format(user),
+            user)['data']
+        if info['type'] != 'STREAM':
+            raise ExtractorError('{0} is not streaming live'.format(user), expected=True)
+
+        req = sanitized_Request(
+            'https://edge-elb.api.brightcove.com/playback/v1/accounts/3361910549001/videos/ref:' + info['reference_id'])
+        req.add_header('Accept', 'application/json;pk=BCpkADawqM1gvI0oGWg8dxQHlgT8HkdE2LnAlWAZkOlznO39bSZX726u4JqnDsK3MDXcO01JxXK2tZtJbgQChxgaFzEVdHRjaDoxaOu8hHOO8NYhwdxw9BzvgkvLUlpbDNUuDoc4E4wxDToV')
+        bc_info = self._download_json(req, user)
+        m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
+        formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
+        self._sort_formats(formats)
+
+        return {
+            'id': info['id'],
+            'title': self._live_title(info['title']),
+            'uploader_id': user,
+            'formats': formats,
+            'is_live': True,
+            'thumbnail': bc_info['poster'],
+        }
diff --git a/youtube_dl/extractor/baidu.py b/youtube_dl/extractor/baidu.py

index e37ee44403a34afe03c02660a71765e2af89cdba..234a661d34623b0b2da3028b20bcc23fc11e2991 100644 (file)
--- a/youtube_dl/extractor/baidu.py
+++ b/youtube_dl/extractor/baidu.py
@@ -4,18 +4,18 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..utils import unescapeHTML
  
  
  class BaiduVideoIE(InfoExtractor):
      IE_DESC = '百度视频'
-    _VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
+    _VALID_URL = r'https?://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
      _TESTS = [{
          'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
          'info_dict': {
              'id': '1069',
-            'title': '中华小当家 TV版 (全52集)',
-            'description': 'md5:395a419e41215e531c857bb037bbaf80',
+            'title': '中华小当家 TV版国语',
+            'description': 'md5:51be07afe461cf99fa61231421b5397c',
          },
          'playlist_count': 52,
      }, {
@@ -25,45 +25,32 @@ class BaiduVideoIE(InfoExtractor):
              'title': 're:^奔跑吧兄弟',
              'description': 'md5:1bf88bad6d850930f542d51547c089b8',
          },
-        'playlist_mincount': 3,
+        'playlist_mincount': 12,
      }]
  
+    def _call_api(self, path, category, playlist_id, note):
+        return self._download_json('http://app.video.baidu.com/%s/?worktype=adnative%s&id=%s' % (
+            path, category, playlist_id), playlist_id, note)
+
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        playlist_id = mobj.group('id')
-        category = category2 = mobj.group('type')
+        category, playlist_id = re.match(self._VALID_URL, url).groups()
          if category == 'show':
-            category2 = 'tvshow'
-
-        webpage = self._download_webpage(url, playlist_id)
-
-        playlist_title = self._html_search_regex(
-            r'title\s*:\s*(["\'])(?P<title>[^\']+)\1', webpage,
-            'playlist title', group='title')
-        playlist_description = self._html_search_regex(
-            r'<input[^>]+class="j-data-intro"[^>]+value="([^"]+)"/>', webpage,
-            playlist_id, 'playlist description')
+            category = 'tvshow'
+        if category == 'tv':
+            category = 'tvplay'
  
-        site = self._html_search_regex(
-            r'filterSite\s*:\s*["\']([^"]*)["\']', webpage,
-            'primary provider site')
-        api_result = self._download_json(
-            'http://v.baidu.com/%s_intro/?dtype=%sPlayUrl&id=%s&site=%s' % (
-                category, category2, playlist_id, site),
-            playlist_id, 'Get playlist links')
+        playlist_detail = self._call_api(
+            'xqinfo', category, playlist_id, 'Download playlist JSON metadata')
  
-        entries = []
-        for episode in api_result[0]['episodes']:
-            episode_id = '%s_%s' % (playlist_id, episode['episode'])
+        playlist_title = playlist_detail['title']
+        playlist_description = unescapeHTML(playlist_detail.get('intro'))
  
-            redirect_page = self._download_webpage(
-                compat_urlparse.urljoin(url, episode['url']), episode_id,
-                note='Download Baidu redirect page')
-            real_url = self._html_search_regex(
-                r'location\.replace\("([^"]+)"\)', redirect_page, 'real URL')
+        episodes_detail = self._call_api(
+            'xqsingle', category, playlist_id, 'Download episodes JSON metadata')
  
-            entries.append(self.url_result(
-                real_url, video_title=episode['single_title']))
+        entries = [self.url_result(
+            episode['url'], video_title=episode['title']
+        ) for episode in episodes_detail['videos']]
  
          return self.playlist_result(
              entries, playlist_id, playlist_title, playlist_description)
diff --git a/youtube_dl/extractor/bambuser.py b/youtube_dl/extractor/bambuser.py

index 8dff1d6e377c0c246cfc958821b1d18cae4b2b64..0eb1930c2d24bb01acd57292d4fe7e1b9a330bff 100644 (file)
--- a/youtube_dl/extractor/bambuser.py
+++ b/youtube_dl/extractor/bambuser.py
@@ -4,15 +4,13 @@ import re
  import itertools
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-    compat_str,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
-    int_or_none,
      float_or_none,
+    int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -57,8 +55,8 @@ class BambuserIE(InfoExtractor):
              'pass': password,
          }
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Referer', self._LOGIN_URL)
          response = self._download_webpage(
              request, None, 'Logging in as %s' % username)
@@ -126,7 +124,7 @@ class BambuserChannelIE(InfoExtractor):
                  '&sort=created&access_mode=0%2C1%2C2&limit={count}'
                  '&method=broadcast&format=json&vid_older_than={last}'
              ).format(user=user, count=self._STEP, last=last_id)
-            req = compat_urllib_request.Request(req_url)
+            req = sanitized_Request(req_url)
              # Without setting this header, we wouldn't get any result
              req.add_header('Referer', 'http://bambuser.com/channel/%s' % user)
              data = self._download_json(
diff --git a/youtube_dl/extractor/bandcamp.py b/youtube_dl/extractor/bandcamp.py

index 505877b773d45b36be31d8dea8a6a1766d72d4ca..c1ef8051d3074a6551941bf140f88eee4ed8a124 100644 (file)
--- a/youtube_dl/extractor/bandcamp.py
+++ b/youtube_dl/extractor/bandcamp.py
@@ -10,6 +10,8 @@ from ..compat import (
  )
  from ..utils import (
      ExtractorError,
+    float_or_none,
+    int_or_none,
  )
  
  
@@ -52,11 +54,11 @@ class BandcampIE(InfoExtractor):
                      ext, abr_str = format_id.split('-', 1)
                      formats.append({
                          'format_id': format_id,
-                        'url': format_url,
+                        'url': self._proto_relative_url(format_url, 'http:'),
                          'ext': ext,
                          'vcodec': 'none',
                          'acodec': ext,
-                        'abr': int(abr_str),
+                        'abr': int_or_none(abr_str),
                      })
  
                  self._sort_formats(formats)
@@ -65,7 +67,7 @@ class BandcampIE(InfoExtractor):
                      'id': compat_str(data['id']),
                      'title': data['title'],
                      'formats': formats,
-                    'duration': float(data['duration']),
+                    'duration': float_or_none(data.get('duration')),
                  }
              else:
                  raise ExtractorError('No free songs found')
@@ -93,8 +95,8 @@ class BandcampIE(InfoExtractor):
          final_url_webpage = self._download_webpage(request_url, video_id, 'Requesting download url')
          # If we could correctly generate the .rand field the url would be
          # in the "download_url" key
-        final_url = self._search_regex(
-            r'"retry_url":"(.*?)"', final_url_webpage, 'final video URL')
+        final_url = self._proto_relative_url(self._search_regex(
+            r'"retry_url":"(.+?)"', final_url_webpage, 'final video URL'), 'http:')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py

index 9a1b6e3dce7dd3247b0076b36280e7e4e0550c90..74c4510f9b4522b0a914cdf1621bff832ac94638 100644 (file)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -2,7 +2,6 @@
  from __future__ import unicode_literals
  
  import re
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..utils import (
@@ -11,28 +10,54 @@ from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
+    unescapeHTML,
+)
+from ..compat import (
+    compat_etree_fromstring,
+    compat_HTTPError,
  )
-from ..compat import compat_HTTPError
  
  
  class BBCCoUkIE(InfoExtractor):
      IE_NAME = 'bbc.co.uk'
      IE_DESC = 'BBC iPlayer'
-    _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer(?:/[^/]+)?/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
+    _ID_REGEX = r'[pb][\da-z]{7}'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?bbc\.co\.uk/
+                        (?:
+                            programmes/(?!articles/)|
+                            iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
+                            music/clips[/#]|
+                            radio/player/
+                        )
+                        (?P<id>%s)
+                    ''' % _ID_REGEX
  
      _MEDIASELECTOR_URLS = [
+        # Provides HQ HLS streams with even better quality that pc mediaset but fails
+        # with geolocation in some cases when it's even not geo restricted at all (e.g.
+        # http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
+        'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
          'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
      ]
  
+    _MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
+    _EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
+
+    _NAMESPACES = (
+        _MEDIASELECTION_NS,
+        _EMP_PLAYLIST_NS,
+    )
+
      _TESTS = [
          {
              'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
              'info_dict': {
                  'id': 'b039d07m',
                  'ext': 'flv',
-                'title': 'Kaleidoscope, Leonard Cohen',
+                'title': 'Leonard Cohen, Kaleidoscope - BBC Radio 4',
                  'description': 'The Canadian poet and songwriter reflects on his musical career.',
-                'duration': 1740,
              },
              'params': {
                  # rtmp download
@@ -60,7 +85,7 @@ class BBCCoUkIE(InfoExtractor):
                  'id': 'b00yng1d',
                  'ext': 'flv',
                  'title': 'The Voice UK: Series 3: Blind Auditions 5',
-                'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
+                'description': 'Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.',
                  'duration': 5100,
              },
              'params': {
@@ -95,16 +120,17 @@ class BBCCoUkIE(InfoExtractor):
              'params': {
                  # rtmp download
                  'skip_download': True,
-            }
+            },
+            'skip': 'Episode is no longer available on BBC iPlayer Radio',
          }, {
-            'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
+            'url': 'http://www.bbc.co.uk/music/clips/p022h44b',
              'note': 'Audio',
              'info_dict': {
-                'id': 'p02frcch',
+                'id': 'p022h44j',
                  'ext': 'flv',
-                'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
-                'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
-                'duration': 3507,
+                'title': 'BBC Proms Music Guides, Rachmaninov: Symphonic Dances',
+                'description': "In this Proms Music Guide, Andrew McGregor looks at Rachmaninov's Symphonic Dances.",
+                'duration': 227,
              },
              'params': {
                  # rtmp download
@@ -152,6 +178,33 @@ class BBCCoUkIE(InfoExtractor):
                  'skip_download': True,
              },
              'skip': 'geolocation',
+        }, {
+            # iptv-all mediaset fails with geolocation however there is no geo restriction
+            # for this programme at all
+            'url': 'http://www.bbc.co.uk/programmes/b06rkn85',
+            'info_dict': {
+                'id': 'b06rkms3',
+                'ext': 'flv',
+                'title': "Best of the Mini-Mixes 2015: Part 3, Annie Mac's Friday Night - BBC Radio 1",
+                'description': "Annie has part three in the Best of the Mini-Mixes 2015, plus the year's Most Played!",
+            },
+            'params': {
+                # rtmp download
+                'skip_download': True,
+            },
+        }, {
+            # compact player (https://github.com/rg3/youtube-dl/issues/8147)
+            'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
+            'info_dict': {
+                'id': 'p028bfkj',
+                'ext': 'flv',
+                'title': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
+                'description': 'Extract from BBC documentary Look Stranger - Giant Leeks and Magic Brews',
+            },
+            'params': {
+                # rtmp download
+                'skip_download': True,
+            },
          }, {
              'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
              'only_matching': True,
@@ -161,6 +214,9 @@ class BBCCoUkIE(InfoExtractor):
          }, {
              'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
              'only_matching': True,
+        }, {
+            'url': 'http://www.bbc.co.uk/radio/player/p03cchwf',
+            'only_matching': True,
          }
      ]
  
@@ -174,6 +230,7 @@ class BBCCoUkIE(InfoExtractor):
  
      def _extract_connection(self, connection, programme_id):
          formats = []
+        kind = connection.get('kind')
          protocol = connection.get('protocol')
          supplier = connection.get('supplier')
          if protocol == 'http':
@@ -189,11 +246,15 @@ class BBCCoUkIE(InfoExtractor):
              # Skip DASH until supported
              elif transfer_format == 'dash':
                  pass
+            elif transfer_format == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    href, programme_id, ext='mp4', entry_protocol='m3u8_native',
+                    m3u8_id=supplier, fatal=False))
              # Direct link
              else:
                  formats.append({
                      'url': href,
-                    'format_id': supplier,
+                    'format_id': supplier or kind or protocol,
                  })
          elif protocol == 'rtmp':
              application = connection.get('application', 'ondemand')
@@ -213,16 +274,24 @@ class BBCCoUkIE(InfoExtractor):
          return formats
  
      def _extract_items(self, playlist):
-        return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
+        return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
+
+    def _findall_ns(self, element, xpath):
+        elements = []
+        for ns in self._NAMESPACES:
+            elements.extend(element.findall(xpath % ns))
+        return elements
  
      def _extract_medias(self, media_selection):
-        error = media_selection.find('./{http://bbc.co.uk/2008/mp/mediaselection}error')
+        error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS)
+        if error is None:
+            media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS)
          if error is not None:
              raise BBCCoUkIE.MediaSelectionError(error.get('id'))
-        return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
+        return self._findall_ns(media_selection, './{%s}media')
  
      def _extract_connections(self, media):
-        return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
+        return self._findall_ns(media, './{%s}connection')
  
      def _extract_video(self, media, programme_id):
          formats = []
@@ -236,13 +305,14 @@ class BBCCoUkIE(InfoExtractor):
              conn_formats = self._extract_connection(connection, programme_id)
              for format in conn_formats:
                  format.update({
-                    'format_id': '%s_%s' % (service, format['format_id']),
                      'width': width,
                      'height': height,
                      'vbr': vbr,
                      'vcodec': vcodec,
                      'filesize': file_size,
                  })
+                if service:
+                    format['format_id'] = '%s_%s' % (service, format['format_id'])
              formats.extend(conn_formats)
          return formats
  
@@ -258,6 +328,7 @@ class BBCCoUkIE(InfoExtractor):
                      'format_id': '%s_%s' % (service, format['format_id']),
                      'abr': abr,
                      'acodec': acodec,
+                    'vcodec': 'none',
                  })
              formats.extend(conn_formats)
          return formats
@@ -287,7 +358,7 @@ class BBCCoUkIE(InfoExtractor):
                  return self._download_media_selector_url(
                      mediaselector_url % programme_id, programme_id)
              except BBCCoUkIE.MediaSelectionError as e:
-                if e.id == 'notukerror':
+                if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
                      last_exception = e
                      continue
                  self._raise_extractor_error(e)
@@ -298,8 +369,8 @@ class BBCCoUkIE(InfoExtractor):
              media_selection = self._download_xml(
                  url, programme_id, 'Downloading media selection XML')
          except ExtractorError as ee:
-            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
-                media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().decode('utf-8'))
+            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code in (403, 404):
+                media_selection = compat_etree_fromstring(ee.cause.read().decode('utf-8'))
              else:
                  raise
          return self._process_media_selector(media_selection, programme_id)
@@ -357,7 +428,7 @@ class BBCCoUkIE(InfoExtractor):
              url, playlist_id, 'Downloading legacy playlist XML')
  
      def _extract_from_legacy_playlist(self, playlist, playlist_id):
-        no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
+        no_items = playlist.find('./{%s}noItems' % self._EMP_PLAYLIST_NS)
          if no_items is not None:
              reason = no_items.get('reason')
              if reason == 'preAvailability':
@@ -374,8 +445,9 @@ class BBCCoUkIE(InfoExtractor):
              kind = item.get('kind')
              if kind != 'programme' and kind != 'radioProgramme':
                  continue
-            title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
-            description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
+            title = playlist.find('./{%s}title' % self._EMP_PLAYLIST_NS).text
+            description_el = playlist.find('./{%s}summary' % self._EMP_PLAYLIST_NS)
+            description = description_el.text if description_el is not None else None
  
              def get_programme_id(item):
                  def get_from_attributes(item):
@@ -384,16 +456,18 @@ class BBCCoUkIE(InfoExtractor):
                          if value and re.match(r'^[pb][\da-z]{7}$', value):
                              return value
                  get_from_attributes(item)
-                mediator = item.find('./{http://bbc.co.uk/2008/emp/playlist}mediator')
+                mediator = item.find('./{%s}mediator' % self._EMP_PLAYLIST_NS)
                  if mediator is not None:
                      return get_from_attributes(mediator)
  
              programme_id = get_programme_id(item)
              duration = int_or_none(item.get('duration'))
-            # TODO: programme_id can be None and media items can be incorporated right inside
-            # playlist's item (e.g. http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
-            # as f4m and m3u8
-            formats, subtitles = self._download_media_selector(programme_id)
+
+            if programme_id:
+                formats, subtitles = self._download_media_selector(programme_id)
+            else:
+                formats, subtitles = self._process_media_selector(item, playlist_id)
+                programme_id = playlist_id
  
          return programme_id, title, description, duration, formats, subtitles
  
@@ -403,6 +477,7 @@ class BBCCoUkIE(InfoExtractor):
          webpage = self._download_webpage(url, group_id, 'Downloading video page')
  
          programme_id = None
+        duration = None
  
          tviplayer = self._search_regex(
              r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
@@ -415,14 +490,19 @@ class BBCCoUkIE(InfoExtractor):
  
          if not programme_id:
              programme_id = self._search_regex(
-                r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
+                r'"vpid"\s*:\s*"(%s)"' % self._ID_REGEX, webpage, 'vpid', fatal=False, default=None)
  
          if programme_id:
              formats, subtitles = self._download_media_selector(programme_id)
-            title = self._og_search_title(webpage)
+            title = self._og_search_title(webpage, default=None) or self._html_search_regex(
+                (r'<h2[^>]+id="parent-title"[^>]*>(.+?)</h2>',
+                 r'<div[^>]+class="info"[^>]*>\s*<h1>(.+?)</h1>'), webpage, 'title')
              description = self._search_regex(
-                r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
-                webpage, 'description', fatal=False)
+                (r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
+                 r'<div[^>]+class="info_+synopsis"[^>]*>([^<]+)</div>'),
+                webpage, 'description', default=None)
+            if not description:
+                description = self._html_search_meta('description', webpage)
          else:
              programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
  
@@ -445,6 +525,9 @@ class BBCIE(BBCCoUkIE):
      _VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
  
      _MEDIASELECTOR_URLS = [
+        # Provides HQ HLS streams but fails with geolocation in some cases when it's
+        # even not geo restricted at all
+        'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
          # Provides more formats, namely direct mp4 links, but fails on some videos with
          # notukerror for non UK (?) users (e.g.
          # http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
@@ -454,8 +537,7 @@ class BBCIE(BBCCoUkIE):
      ]
  
      _TESTS = [{
-        # article with multiple videos embedded with data-media-meta containing
-        # playlist.sxml, externalId and no direct video links
+        # article with multiple videos embedded with data-playable containing vpids
          'url': 'http://www.bbc.com/news/world-europe-32668511',
          'info_dict': {
              'id': 'world-europe-32668511',
@@ -464,7 +546,7 @@ class BBCIE(BBCCoUkIE):
          },
          'playlist_count': 2,
      }, {
-        # article with multiple videos embedded with data-media-meta (more videos)
+        # article with multiple videos embedded with data-playable (more videos)
          'url': 'http://www.bbc.com/news/business-28299555',
          'info_dict': {
              'id': 'business-28299555',
@@ -475,19 +557,21 @@ class BBCIE(BBCCoUkIE):
          'skip': 'Save time',
      }, {
          # article with multiple videos embedded with `new SMP()`
+        # broken
          'url': 'http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460',
          'info_dict': {
              'id': '3662a707-0af9-3149-963f-47bea720b460',
-            'title': 'BBC Blogs - Adam Curtis - BUGGER',
+            'title': 'BUGGER',
          },
          'playlist_count': 18,
      }, {
-        # single video embedded with mediaAssetPage.init()
+        # single video embedded with data-playable containing vpid
          'url': 'http://www.bbc.com/news/world-europe-32041533',
          'info_dict': {
              'id': 'p02mprgb',
              'ext': 'mp4',
              'title': 'Aerial footage showed the site of the crash in the Alps - courtesy BFM TV',
+            'description': 'md5:2868290467291b37feda7863f7a83f54',
              'duration': 47,
              'timestamp': 1427219242,
              'upload_date': '20150324',
@@ -497,15 +581,14 @@ class BBCIE(BBCCoUkIE):
              'skip_download': True,
          }
      }, {
-        # article with single video embedded with data-media-meta containing
-        # direct video links (for now these are extracted) and playlist.xml (with
-        # media items as f4m and m3u8 - currently unsupported)
+        # article with single video embedded with data-playable containing XML playlist
+        # with direct video links as progressiveDownloadUrl (for now these are extracted)
+        # and playlist with f4m and m3u8 as streamingUrl
          'url': 'http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu',
          'info_dict': {
              'id': '150615_telabyad_kentin_cogu',
              'ext': 'mp4',
              'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
-            'duration': 47,
              'timestamp': 1434397334,
              'upload_date': '20150615',
          },
@@ -513,19 +596,31 @@ class BBCIE(BBCCoUkIE):
              'skip_download': True,
          }
      }, {
-        # single video embedded with mediaAssetPage.init() (regional section)
+        # single video embedded with data-playable containing XML playlists (regional section)
          'url': 'http://www.bbc.com/mundo/video_fotos/2015/06/150619_video_honduras_militares_hospitales_corrupcion_aw',
          'info_dict': {
              'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
              'ext': 'mp4',
              'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
-            'duration': 87,
              'timestamp': 1434713142,
              'upload_date': '20150619',
          },
          'params': {
              'skip_download': True,
          }
+    }, {
+        # single video from video playlist embedded with vxp-playlist-data JSON
+        'url': 'http://www.bbc.com/news/video_and_audio/must_see/33376376',
+        'info_dict': {
+            'id': 'p02w6qjc',
+            'ext': 'mp4',
+            'title': '''Judge Mindy Glazer: "I'm sorry to see you here... I always wondered what happened to you"''',
+            'duration': 56,
+            'description': '''Judge Mindy Glazer: "I'm sorry to see you here... I always wondered what happened to you"''',
+        },
+        'params': {
+            'skip_download': True,
+        }
      }, {
          # single video story with digitalData
          'url': 'http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret',
@@ -549,27 +644,44 @@ class BBCIE(BBCCoUkIE):
              'ext': 'mp4',
              'title': 'Hyundai Santa Fe Sport: Rock star',
              'description': 'md5:b042a26142c4154a6e472933cf20793d',
-            'timestamp': 1368473503,
-            'upload_date': '20130513',
+            'timestamp': 1415867444,
+            'upload_date': '20141113',
          },
          'params': {
              # rtmp download
              'skip_download': True,
          }
      }, {
-        # single video with playlist.sxml URL
+        # single video with playlist.sxml URL in playlist param
          'url': 'http://www.bbc.com/sport/0/football/33653409',
          'info_dict': {
              'id': 'p02xycnp',
              'ext': 'mp4',
              'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
-            'description': 'md5:398fca0e2e701c609d726e034fa1fc89',
+            'description': 'BBC Sport\'s David Ornstein has the latest transfer gossip, including rumours of a Manchester United return for Cristiano Ronaldo.',
              'duration': 140,
          },
          'params': {
              # rtmp download
              'skip_download': True,
          }
+    }, {
+        # article with multiple videos embedded with playlist.sxml in playlist param
+        'url': 'http://www.bbc.com/sport/0/football/34475836',
+        'info_dict': {
+            'id': '34475836',
+            'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
+            'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
+        },
+        'playlist_count': 3,
+    }, {
+        # school report article with single video
+        'url': 'http://www.bbc.co.uk/schoolreport/35744779',
+        'info_dict': {
+            'id': '35744779',
+            'title': 'School which breaks down barriers in Jerusalem',
+        },
+        'playlist_count': 1,
      }, {
          # single video with playlist URL from weather section
          'url': 'http://www.bbc.com/weather/features/33601775',
@@ -578,11 +690,15 @@ class BBCIE(BBCCoUkIE):
          # custom redirection to www.bbc.com
          'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
          'only_matching': True,
+    }, {
+        # single video article embedded with data-media-vpid
+        'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
+        'only_matching': True,
      }]
  
      @classmethod
      def suitable(cls, url):
-        return False if BBCCoUkIE.suitable(url) else super(BBCIE, cls).suitable(url)
+        return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url)
  
      def _extract_from_media_meta(self, media_meta, video_id):
          # Direct links to media in media metadata (e.g.
@@ -611,40 +727,107 @@ class BBCIE(BBCCoUkIE):
  
          return [], []
  
+    def _extract_from_playlist_sxml(self, url, playlist_id, timestamp):
+        programme_id, title, description, duration, formats, subtitles = \
+            self._process_legacy_playlist_url(url, playlist_id)
+        self._sort_formats(formats)
+        return {
+            'id': programme_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': timestamp,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
  
          webpage = self._download_webpage(url, playlist_id)
  
-        timestamp = parse_iso8601(self._search_regex(
-            [r'"datePublished":\s*"([^"]+)',
-             r'<meta[^>]+property="article:published_time"[^>]+content="([^"]+)"',
-             r'itemprop="datePublished"[^>]+datetime="([^"]+)"'],
-            webpage, 'date', default=None))
-
-        # single video with playlist.sxml URL (e.g. http://www.bbc.com/sport/0/football/3365340ng)
-        playlist = self._search_regex(
-            r'<param[^>]+name="playlist"[^>]+value="([^"]+)"',
-            webpage, 'playlist', default=None)
-        if playlist:
-            programme_id, title, description, duration, formats, subtitles = \
-                self._process_legacy_playlist_url(playlist, playlist_id)
-            self._sort_formats(formats)
-            return {
-                'id': programme_id,
-                'title': title,
-                'description': description,
-                'duration': duration,
-                'timestamp': timestamp,
-                'formats': formats,
-                'subtitles': subtitles,
-            }
+        json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
+        timestamp = json_ld_info.get('timestamp')
+
+        playlist_title = json_ld_info.get('title')
+        if not playlist_title:
+            playlist_title = self._og_search_title(
+                webpage, default=None) or self._html_search_regex(
+                r'<title>(.+?)</title>', webpage, 'playlist title', default=None)
+            if playlist_title:
+                playlist_title = re.sub(r'(.+)\s*-\s*BBC.*?$', r'\1', playlist_title).strip()
+
+        playlist_description = json_ld_info.get(
+            'description') or self._og_search_description(webpage, default=None)
+
+        if not timestamp:
+            timestamp = parse_iso8601(self._search_regex(
+                [r'<meta[^>]+property="article:published_time"[^>]+content="([^"]+)"',
+                 r'itemprop="datePublished"[^>]+datetime="([^"]+)"',
+                 r'"datePublished":\s*"([^"]+)'],
+                webpage, 'date', default=None))
+
+        entries = []
+
+        # article with multiple videos embedded with playlist.sxml (e.g.
+        # http://www.bbc.com/sport/0/football/34475836)
+        playlists = re.findall(r'<param[^>]+name="playlist"[^>]+value="([^"]+)"', webpage)
+        playlists.extend(re.findall(r'data-media-id="([^"]+/playlist\.sxml)"', webpage))
+        if playlists:
+            entries = [
+                self._extract_from_playlist_sxml(playlist_url, playlist_id, timestamp)
+                for playlist_url in playlists]
+
+        # news article with multiple videos embedded with data-playable
+        data_playables = re.findall(r'data-playable=(["\'])({.+?})\1', webpage)
+        if data_playables:
+            for _, data_playable_json in data_playables:
+                data_playable = self._parse_json(
+                    unescapeHTML(data_playable_json), playlist_id, fatal=False)
+                if not data_playable:
+                    continue
+                settings = data_playable.get('settings', {})
+                if settings:
+                    # data-playable with video vpid in settings.playlistObject.items (e.g.
+                    # http://www.bbc.com/news/world-us-canada-34473351)
+                    playlist_object = settings.get('playlistObject', {})
+                    if playlist_object:
+                        items = playlist_object.get('items')
+                        if items and isinstance(items, list):
+                            title = playlist_object['title']
+                            description = playlist_object.get('summary')
+                            duration = int_or_none(items[0].get('duration'))
+                            programme_id = items[0].get('vpid')
+                            formats, subtitles = self._download_media_selector(programme_id)
+                            self._sort_formats(formats)
+                            entries.append({
+                                'id': programme_id,
+                                'title': title,
+                                'description': description,
+                                'timestamp': timestamp,
+                                'duration': duration,
+                                'formats': formats,
+                                'subtitles': subtitles,
+                            })
+                    else:
+                        # data-playable without vpid but with a playlist.sxml URLs
+                        # in otherSettings.playlist (e.g.
+                        # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
+                        playlist = data_playable.get('otherSettings', {}).get('playlist', {})
+                        if playlist:
+                            entries.append(self._extract_from_playlist_sxml(
+                                playlist.get('progressiveDownloadUrl'), playlist_id, timestamp))
+
+        if entries:
+            return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
  
          # single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
          programme_id = self._search_regex(
-            [r'data-video-player-vpid="([\da-z]{8})"',
-             r'<param[^>]+name="externalIdentifier"[^>]+value="([\da-z]{8})"'],
+            [r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
+             r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
+             r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
              webpage, 'vpid', default=None)
+
          if programme_id:
              formats, subtitles = self._download_media_selector(programme_id)
              self._sort_formats(formats)
@@ -666,10 +849,6 @@ class BBCIE(BBCCoUkIE):
                  'subtitles': subtitles,
              }
  
-        playlist_title = self._html_search_regex(
-            r'<title>(.*?)(?:\s*-\s*BBC [^ ]+)?</title>', webpage, 'playlist title')
-        playlist_description = self._og_search_description(webpage, default=None)
-
          def extract_all(pattern):
              return list(filter(None, map(
                  lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -677,7 +856,7 @@ class BBCIE(BBCCoUkIE):
  
          # Multiple video article (e.g.
          # http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460)
-        EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+[\da-z]{8}(?:\b[^"]+)?'
+        EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+%s(?:\b[^"]+)?' % self._ID_REGEX
          entries = []
          for match in extract_all(r'new\s+SMP\(({.+?})\)'):
              embed_url = match.get('playerSettings', {}).get('externalEmbedUrl')
@@ -695,13 +874,36 @@ class BBCIE(BBCCoUkIE):
  
          if not medias:
              # Single video article (e.g. http://www.bbc.com/news/video_and_audio/international)
-            media_asset_page = self._parse_json(
+            media_asset = self._search_regex(
+                r'mediaAssetPage\.init\(\s*({.+?}), "/',
+                webpage, 'media asset', default=None)
+            if media_asset:
+                media_asset_page = self._parse_json(media_asset, playlist_id, fatal=False)
+                medias = []
+                for video in media_asset_page.get('videos', {}).values():
+                    medias.extend(video.values())
+
+        if not medias:
+            # Multiple video playlist with single `now playing` entry (e.g.
+            # http://www.bbc.com/news/video_and_audio/must_see/33767813)
+            vxp_playlist = self._parse_json(
                  self._search_regex(
-                    r'mediaAssetPage\.init\(\s*({.+?}), "/', webpage, 'media asset'),
+                    r'<script[^>]+class="vxp-playlist-data"[^>]+type="application/json"[^>]*>([^<]+)</script>',
+                    webpage, 'playlist data'),
                  playlist_id)
-            medias = []
-            for video in media_asset_page.get('videos', {}).values():
-                medias.extend(video.values())
+            playlist_medias = []
+            for item in vxp_playlist:
+                media = item.get('media')
+                if not media:
+                    continue
+                playlist_medias.append(media)
+                # Download single video if found media with asset id matching the video id from URL
+                if item.get('advert', {}).get('assetId') == playlist_id:
+                    medias = [media]
+                    break
+            # Fallback to the whole playlist
+            if not medias:
+                medias = playlist_medias
  
          entries = []
          for num, media_meta in enumerate(medias, start=1):
@@ -743,3 +945,33 @@ class BBCIE(BBCCoUkIE):
              })
  
          return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
+
+
+class BBCCoUkArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
+    IE_NAME = 'bbc.co.uk:article'
+    IE_DESC = 'BBC articles'
+
+    _TEST = {
+        'url': 'http://www.bbc.co.uk/programmes/articles/3jNQLTMrPlYGTBn0WV6M2MS/not-your-typical-role-model-ada-lovelace-the-19th-century-programmer',
+        'info_dict': {
+            'id': '3jNQLTMrPlYGTBn0WV6M2MS',
+            'title': 'Calculating Ada: The Countess of Computing - Not your typical role model: Ada Lovelace the 19th century programmer - BBC Four',
+            'description': 'Hannah Fry reveals some of her surprising discoveries about Ada Lovelace during filming.',
+        },
+        'playlist_count': 4,
+        'add_ie': ['BBCCoUk'],
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        title = self._og_search_title(webpage)
+        description = self._og_search_description(webpage).strip()
+
+        entries = [self.url_result(programme_url) for programme_url in re.findall(
+            r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
+
+        return self.playlist_result(entries, playlist_id, title, description)
diff --git a/youtube_dl/extractor/beeg.py b/youtube_dl/extractor/beeg.py

index b38057f2f500f520829ff9c5d7324e66558eb356..956c7680e2ecc46a1df493947ed0be7b973d81b8 100644 (file)
--- a/youtube_dl/extractor/beeg.py
+++ b/youtube_dl/extractor/beeg.py
@@ -1,65 +1,130 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
+from ..compat import (
+    compat_chr,
+    compat_ord,
+    compat_urllib_parse_unquote,
+)
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
  
  
  class BeegIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?beeg\.com/(?P<id>\d+)'
      _TEST = {
          'url': 'http://beeg.com/5416503',
-        'md5': '1bff67111adb785c51d1b42959ec10e5',
+        'md5': '46c384def73b33dbc581262e5ee67cef',
          'info_dict': {
              'id': '5416503',
              'ext': 'mp4',
              'title': 'Sultry Striptease',
-            'description': 'md5:6db3c6177972822aaba18652ff59c773',
-            'categories': list,  # NSFW
-            'thumbnail': 're:https?://.*\.jpg$',
+            'description': 'md5:d22219c09da287c14bed3d6c37ce4bc2',
+            'timestamp': 1391813355,
+            'upload_date': '20140207',
+            'duration': 383,
+            'tags': list,
              'age_limit': 18,
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
  
-        quality_arr = self._search_regex(
-            r'(?s)var\s+qualityArr\s*=\s*{\s*(.+?)\s*}', webpage, 'quality formats')
+        cpl_url = self._search_regex(
+            r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
+            webpage, 'cpl', default=None, group='url')
+
+        beeg_version, beeg_salt = [None] * 2
+
+        if cpl_url:
+            cpl = self._download_webpage(
+                self._proto_relative_url(cpl_url), video_id,
+                'Downloading cpl JS', fatal=False)
+            if cpl:
+                beeg_version = self._search_regex(
+                    r'beeg_version\s*=\s*(\d+)', cpl,
+                    'beeg version', default=None) or self._search_regex(
+                    r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
+                beeg_salt = self._search_regex(
+                    r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
+                    default=None, group='beeg_salt')
+
+        beeg_version = beeg_version or '1750'
+        beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
  
-        formats = [{
-            'url': fmt[1],
-            'format_id': fmt[0],
-            'height': int(fmt[0][:-1]),
-        } for fmt in re.findall(r"'([^']+)'\s*:\s*'([^']+)'", quality_arr)]
+        video = self._download_json(
+            'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
+            video_id)
  
+        def split(o, e):
+            def cut(s, x):
+                n.append(s[:x])
+                return s[x:]
+            n = []
+            r = len(o) % e
+            if r > 0:
+                o = cut(o, r)
+            while len(o) > e:
+                o = cut(o, e)
+            n.append(o)
+            return n
+
+        def decrypt_key(key):
+            # Reverse engineered from http://static.beeg.com/cpl/1738.js
+            a = beeg_salt
+            e = compat_urllib_parse_unquote(key)
+            o = ''.join([
+                compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
+                for n in range(len(e))])
+            return ''.join(split(o, 3)[::-1])
+
+        def decrypt_url(encrypted_url):
+            encrypted_url = self._proto_relative_url(
+                encrypted_url.replace('{DATA_MARKERS}', ''), 'https:')
+            key = self._search_regex(
+                r'/key=(.*?)%2Cend=', encrypted_url, 'key', default=None)
+            if not key:
+                return encrypted_url
+            return encrypted_url.replace(key, decrypt_key(key))
+
+        formats = []
+        for format_id, video_url in video.items():
+            if not video_url:
+                continue
+            height = self._search_regex(
+                r'^(\d+)[pP]$', format_id, 'height', default=None)
+            if not height:
+                continue
+            formats.append({
+                'url': decrypt_url(video_url),
+                'format_id': format_id,
+                'height': int(height),
+            })
          self._sort_formats(formats)
  
-        title = self._html_search_regex(
-            r'<title>([^<]+)\s*-\s*beeg\.?</title>', webpage, 'title')
+        title = video['title']
+        video_id = video.get('id') or video_id
+        display_id = video.get('code')
+        description = video.get('desc')
  
-        description = self._html_search_regex(
-            r'<meta name="description" content="([^"]*)"',
-            webpage, 'description', fatal=False)
-        thumbnail = self._html_search_regex(
-            r'\'previewer.url\'\s*:\s*"([^"]*)"',
-            webpage, 'thumbnail', fatal=False)
+        timestamp = parse_iso8601(video.get('date'), ' ')
+        duration = int_or_none(video.get('duration'))
  
-        categories_str = self._html_search_regex(
-            r'<meta name="keywords" content="([^"]+)"', webpage, 'categories', fatal=False)
-        categories = (
-            None if categories_str is None
-            else categories_str.split(','))
+        tags = [tag.strip() for tag in video['tags'].split(',')] if video.get('tags') else None
  
          return {
              'id': video_id,
+            'display_id': display_id,
              'title': title,
              'description': description,
-            'thumbnail': thumbnail,
-            'categories': categories,
+            'timestamp': timestamp,
+            'duration': duration,
+            'tags': tags,
              'formats': formats,
-            'age_limit': 18,
+            'age_limit': self._rta_search(webpage),
          }
diff --git a/youtube_dl/extractor/behindkink.py b/youtube_dl/extractor/behindkink.py

index 1bdc25812b6afb4cf133007f2d12b89fd56b353f..9bca853b32979a4e2700f5d121c24a08fd875224 100644 (file)
--- a/youtube_dl/extractor/behindkink.py
+++ b/youtube_dl/extractor/behindkink.py
@@ -8,7 +8,7 @@ from ..utils import url_basename
  
  
  class BehindKinkIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
+    _VALID_URL = r'https?://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
      _TEST = {
          'url': 'http://www.behindkink.com/2014/12/05/what-are-you-passionate-about-marley-blaze/',
          'md5': '507b57d8fdcd75a41a9a7bdb7989c762',
diff --git a/youtube_dl/extractor/bet.py b/youtube_dl/extractor/bet.py

index 03dad4636afdf0443735fde8f1d643aea553ba10..986245bf0568e8aaaaab8b8a32eeedca866b21cc 100644 (file)
--- a/youtube_dl/extractor/bet.py
+++ b/youtube_dl/extractor/bet.py
@@ -94,6 +94,7 @@ class BetIE(InfoExtractor):
              xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
  
          formats = self._extract_smil_formats(smil_url, display_id)
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/bigflix.py b/youtube_dl/extractor/bigflix.py

new file mode 100644 (file)

index 0000000..33762ad
--- /dev/null
+++ b/youtube_dl/extractor/bigflix.py
@@ -0,0 +1,85 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import base64
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+
+
+class BigflixIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
+        'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
+        'info_dict': {
+            'id': '16537',
+            'ext': 'mp4',
+            'title': 'Singham Returns',
+            'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
+        }
+    }, {
+        # 2 formats
+        'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
+        'info_dict': {
+            'id': '16070',
+            'ext': 'mp4',
+            'title': 'Madarasapatinam',
+            'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
+            'formats': 'mincount:2',
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        # multiple formats
+        'url': 'http://www.bigflix.com/Malayalam-movies/Drama-movies/Indian-Rupee/15967',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._html_search_regex(
+            r'<div[^>]+class=["\']pagetitle["\'][^>]*>(.+?)</div>',
+            webpage, 'title')
+
+        def decode_url(quoted_b64_url):
+            return base64.b64decode(compat_urllib_parse_unquote(
+                quoted_b64_url).encode('ascii')).decode('utf-8')
+
+        formats = []
+        for height, encoded_url in re.findall(
+                r'ContentURL_(\d{3,4})[pP][^=]+=([^&]+)', webpage):
+            video_url = decode_url(encoded_url)
+            f = {
+                'url': video_url,
+                'format_id': '%sp' % height,
+                'height': int(height),
+            }
+            if video_url.startswith('rtmp'):
+                f['ext'] = 'flv'
+            formats.append(f)
+
+        file_url = self._search_regex(
+            r'file=([^&]+)', webpage, 'video url', default=None)
+        if file_url:
+            video_url = decode_url(file_url)
+            if all(f['url'] != video_url for f in formats):
+                formats.append({
+                    'url': decode_url(file_url),
+                })
+
+        self._sort_formats(formats)
+
+        description = self._html_search_meta('description', webpage)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/bild.py b/youtube_dl/extractor/bild.py

index 4d8cce1ef252fde0ac02dc166d3fb4fff528d1a8..1a0184861d20d7674badc042bfc44fbda6c9718b 100644 (file)
--- a/youtube_dl/extractor/bild.py
+++ b/youtube_dl/extractor/bild.py
@@ -4,7 +4,7 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    fix_xml_ampersands,
+    unescapeHTML,
  )
  
  
@@ -17,26 +17,24 @@ class BildIE(InfoExtractor):
          'info_dict': {
              'id': '38184146',
              'ext': 'mp4',
-            'title': 'BILD hat sie getestet',
+            'title': 'Das können die  neuen iPads',
+            'description': 'md5:a4058c4fa2a804ab59c00d7244bbf62f',
              'thumbnail': 're:^https?://.*\.jpg$',
              'duration': 196,
-            'description': 'Mit dem iPad Air 2 und dem iPad Mini 3 hat Apple zwei neue Tablet-Modelle präsentiert. BILD-Reporter Sven Stein durfte die Geräte bereits testen. ',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        xml_url = url.split(".bild.html")[0] + ",view=xml.bild.xml"
-        doc = self._download_xml(xml_url, video_id, transform_source=fix_xml_ampersands)
-
-        duration = int_or_none(doc.attrib.get('duration'), scale=1000)
+        video_data = self._download_json(
+            url.split('.bild.html')[0] + ',view=json.bild.html', video_id)
  
          return {
              'id': video_id,
-            'title': doc.attrib['ueberschrift'],
-            'description': doc.attrib.get('text'),
-            'url': doc.attrib['src'],
-            'thumbnail': doc.attrib.get('img'),
-            'duration': duration,
+            'title': unescapeHTML(video_data['title']).strip(),
+            'description': unescapeHTML(video_data.get('description')),
+            'url': video_data['clipList'][0]['srces'][0]['src'],
+            'thumbnail': video_data.get('poster'),
+            'duration': int_or_none(video_data.get('durationSec')),
          }
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index ecc17ebebca9e1819fc804f37d48dcceb80c44c5..8baff2041bb380d0204895cbbc6c64b16be94993 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -2,141 +2,109 @@
  from __future__ import unicode_literals
  
  import re
-import itertools
-import json
-import xml.etree.ElementTree as ET
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
-    unified_strdate,
+    unescapeHTML,
      ExtractorError,
+    xpath_text,
  )
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>[0-9]+)/'
+    _VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
  
      _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
          'md5': '2c301e4dab317596e837c3e7633e7d86',
          'info_dict': {
-            'id': '1074402_part1',
+            'id': '1554319',
              'ext': 'flv',
              'title': '【金坷垃】金泡沫',
-            'duration': 308,
+            'duration': 308313,
              'upload_date': '20140420',
              'thumbnail': 're:^https?://.+\.jpg',
+            'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
+            'timestamp': 1397983878,
+            'uploader': '菊子桑',
          },
      }, {
          'url': 'http://www.bilibili.com/video/av1041170/',
          'info_dict': {
              'id': '1041170',
              'title': '【BD1080P】刀语【诸神&异域】',
+            'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦！~',
+            'uploader': '枫叶逝去',
+            'timestamp': 1396501299,
          },
          'playlist_count': 9,
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        if '(此视频不存在或被删除)' in webpage:
-            raise ExtractorError(
-                'The video does not exist or was deleted', expected=True)
-
-        if '>你没有权限浏览！ 由于版权相关问题 我们不对您所在的地区提供服务<' in webpage:
-            raise ExtractorError(
-                'The video is not available in your region due to copyright reasons',
-                expected=True)
-
-        video_code = self._search_regex(
-            r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')
-
-        title = self._html_search_meta(
-            'media:title', video_code, 'title', fatal=True)
-        duration_str = self._html_search_meta(
-            'duration', video_code, 'duration')
-        if duration_str is None:
-            duration = None
-        else:
-            duration_mobj = re.match(
-                r'^T(?:(?P<hours>[0-9]+)H)?(?P<minutes>[0-9]+)M(?P<seconds>[0-9]+)S$',
-                duration_str)
-            duration = (
-                int_or_none(duration_mobj.group('hours'), default=0) * 3600 +
-                int(duration_mobj.group('minutes')) * 60 +
-                int(duration_mobj.group('seconds')))
-        upload_date = unified_strdate(self._html_search_meta(
-            'uploadDate', video_code, fatal=False))
-        thumbnail = self._html_search_meta(
-            'thumbnailUrl', video_code, 'thumbnail', fatal=False)
-
-        cid = self._search_regex(r'cid=(\d+)', webpage, 'cid')
-
-        entries = []
-
-        lq_page = self._download_webpage(
-            'http://interface.bilibili.com/v_cdn_play?appkey=1&cid=%s' % cid,
-            video_id,
-            note='Downloading LQ video info'
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        page_num = mobj.group('page_num') or '1'
+
+        view_data = self._download_json(
+            'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
+            video_id)
+        if 'error' in view_data:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
+
+        cid = view_data['cid']
+        title = unescapeHTML(view_data['title'])
+
+        doc = self._download_xml(
+            'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid,
+            cid,
+            'Downloading page %s/%s' % (page_num, view_data['pages'])
          )
-        try:
-            err_info = json.loads(lq_page)
-            raise ExtractorError(
-                'BiliBili said: ' + err_info['error_text'], expected=True)
-        except ValueError:
-            pass
  
-        lq_doc = ET.fromstring(lq_page)
-        lq_durls = lq_doc.findall('./durl')
+        if xpath_text(doc, './result') == 'error':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True)
  
-        hq_doc = self._download_xml(
-            'http://interface.bilibili.com/playurl?appkey=1&cid=%s' % cid,
-            video_id,
-            note='Downloading HQ video info',
-            fatal=False,
-        )
-        if hq_doc is not False:
-            hq_durls = hq_doc.findall('./durl')
-            assert len(lq_durls) == len(hq_durls)
-        else:
-            hq_durls = itertools.repeat(None)
+        entries = []
  
-        i = 1
-        for lq_durl, hq_durl in zip(lq_durls, hq_durls):
+        for durl in doc.findall('./durl'):
+            size = xpath_text(durl, ['./filesize', './size'])
              formats = [{
-                'format_id': 'lq',
-                'quality': 1,
-                'url': lq_durl.find('./url').text,
-                'filesize': int_or_none(
-                    lq_durl.find('./size'), get_attr='text'),
+                'url': durl.find('./url').text,
+                'filesize': int_or_none(size),
+                'ext': 'flv',
              }]
-            if hq_durl is not None:
-                formats.append({
-                    'format_id': 'hq',
-                    'quality': 2,
-                    'ext': 'flv',
-                    'url': hq_durl.find('./url').text,
-                    'filesize': int_or_none(
-                        hq_durl.find('./size'), get_attr='text'),
-                })
-            self._sort_formats(formats)
+            backup_urls = durl.find('./backup_url')
+            if backup_urls is not None:
+                for backup_url in backup_urls.findall('./url'):
+                    formats.append({'url': backup_url.text})
+            formats.reverse()
  
              entries.append({
-                'id': '%s_part%d' % (video_id, i),
+                'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
                  'title': title,
+                'duration': int_or_none(xpath_text(durl, './length'), 1000),
                  'formats': formats,
-                'duration': duration,
-                'upload_date': upload_date,
-                'thumbnail': thumbnail,
              })
  
-            i += 1
-
-        return {
-            '_type': 'multi_video',
-            'entries': entries,
-            'id': video_id,
-            'title': title
+        info = {
+            'id': compat_str(cid),
+            'title': title,
+            'description': view_data.get('description'),
+            'thumbnail': view_data.get('pic'),
+            'uploader': view_data.get('author'),
+            'timestamp': int_or_none(view_data.get('created')),
+            'view_count': int_or_none(view_data.get('play')),
+            'duration': int_or_none(xpath_text(doc, './timelength')),
          }
+
+        if len(entries) == 1:
+            entries[0].update(info)
+            return entries[0]
+        else:
+            info.update({
+                '_type': 'multi_video',
+                'id': video_id,
+                'entries': entries,
+            })
+            return info
diff --git a/youtube_dl/extractor/biobiochiletv.py b/youtube_dl/extractor/biobiochiletv.py

new file mode 100644 (file)

index 0000000..1332281
--- /dev/null
+++ b/youtube_dl/extractor/biobiochiletv.py
@@ -0,0 +1,86 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import remove_end
+
+
+class BioBioChileTVIE(InfoExtractor):
+    _VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
+
+    _TESTS = [{
+        'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
+        'md5': '26f51f03cf580265defefb4518faec09',
+        'info_dict': {
+            'id': 'sobre-camaras-y-camarillas-parlamentarias',
+            'ext': 'mp4',
+            'title': 'Sobre Cámaras y camarillas parlamentarias',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Fernando Atria',
+        },
+    }, {
+        # different uploader layout
+        'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
+        'md5': 'edc2e6b58974c46d5b047dea3c539ff3',
+        'info_dict': {
+            'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
+            'ext': 'mp4',
+            'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Piangella Obrador',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
+        'only_matching': True,
+    }, {
+        'url': 'http://tv.biobiochile.cl/notas/2015/10/21/exclusivo-hector-pinto-formador-de-chupete-revela-version-del-ex-delantero-albo.shtml',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
+
+        file_url = self._search_regex(
+            r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
+            webpage, 'file url', group='url')
+
+        base_url = self._search_regex(
+            r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
+            'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
+            group='url')
+
+        formats = self._extract_m3u8_formats(
+            '%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
+            entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
+        f = {
+            'url': '%s%s' % (base_url, file_url),
+            'format_id': 'http',
+            'protocol': 'http',
+            'preference': 1,
+        }
+        if formats:
+            f_copy = formats[-1].copy()
+            f_copy.update(f)
+            f = f_copy
+        formats.append(f)
+        self._sort_formats(formats)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+        uploader = self._html_search_regex(
+            r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
+            webpage, 'uploader', fatal=False)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'uploader': uploader,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/bleacherreport.py b/youtube_dl/extractor/bleacherreport.py

new file mode 100644 (file)

index 0000000..7a8e1f6
--- /dev/null
+++ b/youtube_dl/extractor/bleacherreport.py
@@ -0,0 +1,110 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .amp import AMPIE
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class BleacherReportIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/articles/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://bleacherreport.com/articles/2496438-fsu-stat-projections-is-jalen-ramsey-best-defensive-player-in-college-football',
+        'md5': 'a3ffc3dc73afdbc2010f02d98f990f20',
+        'info_dict': {
+            'id': '2496438',
+            'ext': 'mp4',
+            'title': 'FSU Stat Projections: Is Jalen Ramsey Best Defensive Player in College Football?',
+            'uploader_id': 3992341,
+            'description': 'CFB, ACC, Florida State',
+            'timestamp': 1434380212,
+            'upload_date': '20150615',
+            'uploader': 'Team Stream Now ',
+        },
+        'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://bleacherreport.com/articles/2586817-aussie-golfers-get-fright-of-their-lives-after-being-chased-by-angry-kangaroo',
+        'md5': '6a5cd403418c7b01719248ca97fb0692',
+        'info_dict': {
+            'id': '2586817',
+            'ext': 'webm',
+            'title': 'Aussie Golfers Get Fright of Their Lives After Being Chased by Angry Kangaroo',
+            'timestamp': 1446839961,
+            'uploader': 'Sean Fay',
+            'description': 'md5:825e94e0f3521df52fa83b2ed198fa20',
+            'uploader_id': 6466954,
+            'upload_date': '20151011',
+        },
+        'add_ie': ['Youtube'],
+    }]
+
+    def _real_extract(self, url):
+        article_id = self._match_id(url)
+
+        article_data = self._download_json('http://api.bleacherreport.com/api/v1/articles/%s' % article_id, article_id)['article']
+
+        thumbnails = []
+        primary_photo = article_data.get('primaryPhoto')
+        if primary_photo:
+            thumbnails = [{
+                'url': primary_photo['url'],
+                'width': primary_photo.get('width'),
+                'height': primary_photo.get('height'),
+            }]
+
+        info = {
+            '_type': 'url_transparent',
+            'id': article_id,
+            'title': article_data['title'],
+            'uploader': article_data.get('author', {}).get('name'),
+            'uploader_id': article_data.get('authorId'),
+            'timestamp': parse_iso8601(article_data.get('createdAt')),
+            'thumbnails': thumbnails,
+            'comment_count': int_or_none(article_data.get('commentsCount')),
+            'view_count': int_or_none(article_data.get('hitCount')),
+        }
+
+        video = article_data.get('video')
+        if video:
+            video_type = video['type']
+            if video_type == 'cms.bleacherreport.com':
+                info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
+            elif video_type == 'ooyala.com':
+                info['url'] = 'ooyala:%s' % video['id']
+            elif video_type == 'youtube.com':
+                info['url'] = video['id']
+            elif video_type == 'vine.co':
+                info['url'] = 'https://vine.co/v/%s' % video['id']
+            else:
+                info['url'] = video_type + video['id']
+            return info
+        else:
+            raise ExtractorError('no video in the article', expected=True)
+
+
+class BleacherReportCMSIE(AMPIE):
+    _VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
+    _TESTS = [{
+        'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
+        'md5': '8c2c12e3af7805152675446c905d159b',
+        'info_dict': {
+            'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
+            'ext': 'mp4',
+            'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
+            'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
+        info['id'] = video_id
+        return info
diff --git a/youtube_dl/extractor/bliptv.py b/youtube_dl/extractor/bliptv.py

deleted file mode 100644 (file)

index c329628..0000000
--- a/youtube_dl/extractor/bliptv.py
+++ /dev/null
@@ -1,292 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
-from ..utils import (
-    clean_html,
-    int_or_none,
-    parse_iso8601,
-    unescapeHTML,
-    xpath_text,
-    xpath_with_ns,
-)
-
-
-class BlipTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:\w+\.)?blip\.tv/(?:(?:.+-|rss/flash/)(?P<id>\d+)|((?:play/|api\.swf#)(?P<lookup_id>[\da-zA-Z+_]+)))'
-
-    _TESTS = [
-        {
-            'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
-            'md5': '80baf1ec5c3d2019037c1c707d676b9f',
-            'info_dict': {
-                'id': '5779306',
-                'ext': 'm4v',
-                'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
-                'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
-                'timestamp': 1323138843,
-                'upload_date': '20111206',
-                'uploader': 'cbr',
-                'uploader_id': '679425',
-                'duration': 81,
-            }
-        },
-        {
-            # https://github.com/rg3/youtube-dl/pull/2274
-            'note': 'Video with subtitles',
-            'url': 'http://blip.tv/play/h6Uag5OEVgI.html',
-            'md5': '309f9d25b820b086ca163ffac8031806',
-            'info_dict': {
-                'id': '6586561',
-                'ext': 'mp4',
-                'title': 'Red vs. Blue Season 11 Episode 1',
-                'description': 'One-Zero-One',
-                'timestamp': 1371261608,
-                'upload_date': '20130615',
-                'uploader': 'redvsblue',
-                'uploader_id': '792887',
-                'duration': 279,
-            }
-        },
-        {
-            # https://bugzilla.redhat.com/show_bug.cgi?id=967465
-            'url': 'http://a.blip.tv/api.swf#h6Uag5KbVwI',
-            'md5': '314e87b1ebe7a48fcbfdd51b791ce5a6',
-            'info_dict': {
-                'id': '6573122',
-                'ext': 'mov',
-                'upload_date': '20130520',
-                'description': 'Two hapless space marines argue over what to do when they realize they have an astronomically huge problem on their hands.',
-                'title': 'Red vs. Blue Season 11 Trailer',
-                'timestamp': 1369029609,
-                'uploader': 'redvsblue',
-                'uploader_id': '792887',
-            }
-        },
-        {
-            'url': 'http://blip.tv/play/gbk766dkj4Yn',
-            'md5': 'fe0a33f022d49399a241e84a8ea8b8e3',
-            'info_dict': {
-                'id': '1749452',
-                'ext': 'mp4',
-                'upload_date': '20090208',
-                'description': 'Witness the first appearance of the Nostalgia Critic character, as Doug reviews the movie Transformers.',
-                'title': 'Nostalgia Critic: Transformers',
-                'timestamp': 1234068723,
-                'uploader': 'NostalgiaCritic',
-                'uploader_id': '246467',
-            }
-        },
-        {
-            # https://github.com/rg3/youtube-dl/pull/4404
-            'note': 'Audio only',
-            'url': 'http://blip.tv/hilarios-productions/weekly-manga-recap-kingdom-7119982',
-            'md5': '76c0a56f24e769ceaab21fbb6416a351',
-            'info_dict': {
-                'id': '7103299',
-                'ext': 'flv',
-                'title': 'Weekly Manga Recap: Kingdom',
-                'description': 'And then Shin breaks the enemy line, and he&apos;s all like HWAH! And then he slices a guy and it&apos;s all like FWASHING! And... it&apos;s really hard to describe the best parts of this series without breaking down into sound effects, okay?',
-                'timestamp': 1417660321,
-                'upload_date': '20141204',
-                'uploader': 'The Rollo T',
-                'uploader_id': '407429',
-                'duration': 7251,
-                'vcodec': 'none',
-            }
-        },
-        {
-            # missing duration
-            'url': 'http://blip.tv/rss/flash/6700880',
-            'info_dict': {
-                'id': '6684191',
-                'ext': 'm4v',
-                'title': 'Cowboy Bebop: Gateway Shuffle Review',
-                'description': 'md5:3acc480c0f9ae157f5fe88547ecaf3f8',
-                'timestamp': 1386639757,
-                'upload_date': '20131210',
-                'uploader': 'sfdebris',
-                'uploader_id': '706520',
-            }
-        }
-    ]
-
-    @staticmethod
-    def _extract_url(webpage):
-        mobj = re.search(r'<meta\s[^>]*https?://api\.blip\.tv/\w+/redirect/\w+/(\d+)', webpage)
-        if mobj:
-            return 'http://blip.tv/a/a-' + mobj.group(1)
-        mobj = re.search(r'<(?:iframe|embed|object)\s[^>]*(https?://(?:\w+\.)?blip\.tv/(?:play/|api\.swf#)[a-zA-Z0-9_]+)', webpage)
-        if mobj:
-            return mobj.group(1)
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        lookup_id = mobj.group('lookup_id')
-
-        # See https://github.com/rg3/youtube-dl/issues/857 and
-        # https://github.com/rg3/youtube-dl/issues/4197
-        if lookup_id:
-            urlh = self._request_webpage(
-                'http://blip.tv/play/%s' % lookup_id, lookup_id, 'Resolving lookup id')
-            url = compat_urlparse.urlparse(urlh.geturl())
-            qs = compat_urlparse.parse_qs(url.query)
-            mobj = re.match(self._VALID_URL, qs['file'][0])
-
-        video_id = mobj.group('id')
-
-        rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
-
-        def _x(p):
-            return xpath_with_ns(p, {
-                'blip': 'http://blip.tv/dtd/blip/1.0',
-                'media': 'http://search.yahoo.com/mrss/',
-                'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
-            })
-
-        item = rss.find('channel/item')
-
-        video_id = xpath_text(item, _x('blip:item_id'), 'video id') or lookup_id
-        title = xpath_text(item, 'title', 'title', fatal=True)
-        description = clean_html(xpath_text(item, _x('blip:puredescription'), 'description'))
-        timestamp = parse_iso8601(xpath_text(item, _x('blip:datestamp'), 'timestamp'))
-        uploader = xpath_text(item, _x('blip:user'), 'uploader')
-        uploader_id = xpath_text(item, _x('blip:userid'), 'uploader id')
-        duration = int_or_none(xpath_text(item, _x('blip:runtime'), 'duration'))
-        media_thumbnail = item.find(_x('media:thumbnail'))
-        thumbnail = (media_thumbnail.get('url') if media_thumbnail is not None
-                     else xpath_text(item, 'image', 'thumbnail'))
-        categories = [category.text for category in item.findall('category') if category is not None]
-
-        formats = []
-        subtitles_urls = {}
-
-        media_group = item.find(_x('media:group'))
-        for media_content in media_group.findall(_x('media:content')):
-            url = media_content.get('url')
-            role = media_content.get(_x('blip:role'))
-            msg = self._download_webpage(
-                url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
-                video_id, 'Resolving URL for %s' % role)
-            real_url = compat_urlparse.parse_qs(msg.strip())['message'][0]
-
-            media_type = media_content.get('type')
-            if media_type == 'text/srt' or url.endswith('.srt'):
-                LANGS = {
-                    'english': 'en',
-                }
-                lang = role.rpartition('-')[-1].strip().lower()
-                langcode = LANGS.get(lang, lang)
-                subtitles_urls[langcode] = url
-            elif media_type.startswith('video/'):
-                formats.append({
-                    'url': real_url,
-                    'format_id': role,
-                    'format_note': media_type,
-                    'vcodec': media_content.get(_x('blip:vcodec')) or 'none',
-                    'acodec': media_content.get(_x('blip:acodec')),
-                    'filesize': media_content.get('filesize'),
-                    'width': int_or_none(media_content.get('width')),
-                    'height': int_or_none(media_content.get('height')),
-                })
-        self._check_formats(formats, video_id)
-        self._sort_formats(formats)
-
-        subtitles = self.extract_subtitles(video_id, subtitles_urls)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'timestamp': timestamp,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'categories': categories,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
-
-    def _get_subtitles(self, video_id, subtitles_urls):
-        subtitles = {}
-        for lang, url in subtitles_urls.items():
-            # For some weird reason, blip.tv serves a video instead of subtitles
-            # when we request with a common UA
-            req = compat_urllib_request.Request(url)
-            req.add_header('User-Agent', 'youtube-dl')
-            subtitles[lang] = [{
-                # The extension is 'srt' but it's actually an 'ass' file
-                'ext': 'ass',
-                'data': self._download_webpage(req, None, note=False),
-            }]
-        return subtitles
-
-
-class BlipTVUserIE(InfoExtractor):
-    _VALID_URL = r'(?:(?:https?://(?:\w+\.)?blip\.tv/)|bliptvuser:)(?!api\.swf)([^/]+)/*$'
-    _PAGE_SIZE = 12
-    IE_NAME = 'blip.tv:user'
-    _TEST = {
-        'url': 'http://blip.tv/actone',
-        'info_dict': {
-            'id': 'actone',
-            'title': 'Act One: The Series',
-        },
-        'playlist_count': 5,
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        username = mobj.group(1)
-
-        page_base = 'http://m.blip.tv/pr/show_get_full_episode_list?users_id=%s&lite=0&esi=1'
-
-        page = self._download_webpage(url, username, 'Downloading user page')
-        mobj = re.search(r'data-users-id="([^"]+)"', page)
-        page_base = page_base % mobj.group(1)
-        title = self._og_search_title(page)
-
-        # Download video ids using BlipTV Ajax calls. Result size per
-        # query is limited (currently to 12 videos) so we need to query
-        # page by page until there are no video ids - it means we got
-        # all of them.
-
-        video_ids = []
-        pagenum = 1
-
-        while True:
-            url = page_base + "&page=" + str(pagenum)
-            page = self._download_webpage(
-                url, username, 'Downloading video ids from page %d' % pagenum)
-
-            # Extract video identifiers
-            ids_in_page = []
-
-            for mobj in re.finditer(r'href="/([^"]+)"', page):
-                if mobj.group(1) not in ids_in_page:
-                    ids_in_page.append(unescapeHTML(mobj.group(1)))
-
-            video_ids.extend(ids_in_page)
-
-            # A little optimization - if current page is not
-            # "full", ie. does not contain PAGE_SIZE video ids then
-            # we can assume that this page is the last one - there
-            # are no more ids on further pages - no need to query
-            # again.
-
-            if len(ids_in_page) < self._PAGE_SIZE:
-                break
-
-            pagenum += 1
-
-        urls = ['http://blip.tv/%s' % video_id for video_id in video_ids]
-        url_entries = [self.url_result(vurl, 'BlipTV') for vurl in urls]
-        return self.playlist_result(
-            url_entries, playlist_title=title, playlist_id=username)
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 0dca29b712c79a27fb621f094a6f64ab503ba3df..13343bc258532b37bf912f0648e317103b5f428d 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -6,9 +6,9 @@ from .common import InfoExtractor
  
  
  class BloombergIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.bloomberg\.com/news/videos/[^/]+/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?bloomberg\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2',
          # The md5 checksum changes
          'info_dict': {
@@ -17,22 +17,35 @@ class BloombergIE(InfoExtractor):
              'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
              'description': 'md5:a8ba0302912d03d246979735c17d2761',
          },
-    }
+    }, {
+        'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.bloomberg.com/politics/videos/2015-11-25/karl-rove-on-jeb-bush-s-struggles-stopping-trump',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          name = self._match_id(url)
          webpage = self._download_webpage(url, name)
-        video_id = self._search_regex(r'"bmmrId":"(.+?)"', webpage, 'id')
+        video_id = self._search_regex(
+            r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
+            webpage, 'id', group='url')
          title = re.sub(': Video$', '', self._og_search_title(webpage))
  
          embed_info = self._download_json(
              'http://www.bloomberg.com/api/embed?id=%s' % video_id, video_id)
          formats = []
          for stream in embed_info['streams']:
-            if stream["muxing_format"] == "TS":
-                formats.extend(self._extract_m3u8_formats(stream['url'], video_id))
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            if stream['muxing_format'] == 'TS':
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
              else:
-                formats.extend(self._extract_f4m_formats(stream['url'], video_id))
+                formats.extend(self._extract_f4m_formats(
+                    stream_url, video_id, f4m_id='hds', fatal=False))
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/bokecc.py b/youtube_dl/extractor/bokecc.py

new file mode 100644 (file)

index 0000000..86a7f4d
--- /dev/null
+++ b/youtube_dl/extractor/bokecc.py
@@ -0,0 +1,60 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_parse_qs
+from ..utils import ExtractorError
+
+
+class BokeCCBaseIE(InfoExtractor):
+    def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
+        player_params_str = self._html_search_regex(
+            r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
+            webpage, 'player params')
+
+        player_params = compat_parse_qs(player_params_str)
+
+        info_xml = self._download_xml(
+            'http://p.bokecc.com/servlet/playinfo?uid=%s&vid=%s&m=1' % (
+                player_params['siteid'][0], player_params['vid'][0]), video_id)
+
+        formats = [{
+            'format_id': format_id,
+            'url': quality.find('./copy').attrib['playurl'],
+            'preference': int(quality.attrib['value']),
+        } for quality in info_xml.findall('./video/quality')]
+
+        self._sort_formats(formats)
+
+        return formats
+
+
+class BokeCCIE(BokeCCBaseIE):
+    _IE_DESC = 'CC视频'
+    _VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
+
+    _TESTS = [{
+        'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
+        'info_dict': {
+            'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
+            'ext': 'flv',
+            'title': 'BokeCC Video',
+        },
+    }]
+
+    def _real_extract(self, url):
+        qs = compat_parse_qs(re.match(self._VALID_URL, url).group('query'))
+        if not qs.get('vid') or not qs.get('uid'):
+            raise ExtractorError('Invalid URL', expected=True)
+
+        video_id = '%s_%s' % (qs['uid'][0], qs['vid'][0])
+
+        webpage = self._download_webpage(url, video_id)
+
+        return {
+            'id': video_id,
+            'title': 'BokeCC Video',  # no title provided in the webpage
+            'formats': self._extract_bokecc_formats(webpage, video_id),
+        }
diff --git a/youtube_dl/extractor/bpb.py b/youtube_dl/extractor/bpb.py

index 510813f7663c6be8ea3dbf3407105fb494ae27aa..6ad45a1e6a30bac2450743de3f0d12a2c9f2b89d 100644 (file)
--- a/youtube_dl/extractor/bpb.py
+++ b/youtube_dl/extractor/bpb.py
@@ -1,16 +1,23 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
+from ..utils import (
+    js_to_json,
+    determine_ext,
+)
  
  
  class BpbIE(InfoExtractor):
      IE_DESC = 'Bundeszentrale für politische Bildung'
-    _VALID_URL = r'http://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
+    _VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
  
      _TEST = {
          'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',
-        'md5': '0792086e8e2bfbac9cdf27835d5f2093',
+        # md5 fails in Python 2.6 due to buggy server response and wrong handling of urllib2
+        'md5': 'c4f84c8a8044ca9ff68bb8441d300b3f',
          'info_dict': {
              'id': '297',
              'ext': 'mp4',
@@ -25,13 +32,26 @@ class BpbIE(InfoExtractor):
  
          title = self._html_search_regex(
              r'<h2 class="white">(.*?)</h2>', webpage, 'title')
-        video_url = self._html_search_regex(
-            r'(http://film\.bpb\.de/player/dokument_[0-9]+\.mp4)',
-            webpage, 'video URL')
+        video_info_dicts = re.findall(
+            r"({\s*src:\s*'http://film\.bpb\.de/[^}]+})", webpage)
+
+        formats = []
+        for video_info in video_info_dicts:
+            video_info = self._parse_json(video_info, video_id, transform_source=js_to_json)
+            quality = video_info['quality']
+            video_url = video_info['src']
+            formats.append({
+                'url': video_url,
+                'preference': 10 if quality == 'high' else 0,
+                'format_note': quality,
+                'format_id': '%s-%s' % (quality, determine_ext(video_url)),
+            })
+
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'url': video_url,
+            'formats': formats,
              'title': title,
              'description': self._og_search_description(webpage),
          }
diff --git a/youtube_dl/extractor/br.py b/youtube_dl/extractor/br.py

index 66e394e1093105b936191da798734128d4ea1afe..11cf498515ba572f8ef8c7f20d5620bf50289827 100644 (file)
--- a/youtube_dl/extractor/br.py
+++ b/youtube_dl/extractor/br.py
@@ -1,18 +1,21 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
      parse_duration,
+    xpath_element,
+    xpath_text,
  )
  
  
  class BRIE(InfoExtractor):
      IE_DESC = 'Bayerischer Rundfunk Mediathek'
-    _VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
-    _BASE_URL = 'http://www.br.de'
+    _VALID_URL = r'(?P<base_url>https?://(?:www\.)?br(?:-klassik)?\.de)/(?:[a-z0-9\-_]+/)+(?P<id>[a-z0-9\-_]+)\.html'
  
      _TESTS = [
          {
@@ -22,7 +25,7 @@ class BRIE(InfoExtractor):
                  'id': '48f656ef-287e-486f-be86-459122db22cc',
                  'ext': 'mp4',
                  'title': 'Die böse Überraschung',
-                'description': 'Betriebliche Altersvorsorge: Die böse Überraschung',
+                'description': 'md5:ce9ac81b466ce775b8018f6801b48ac9',
                  'duration': 180,
                  'uploader': 'Reinhard Weber',
                  'upload_date': '20150422',
@@ -30,23 +33,23 @@ class BRIE(InfoExtractor):
          },
          {
              'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
-            'md5': 'a44396d73ab6a68a69a568fae10705bb',
+            'md5': 'af3a3a4aa43ff0ce6a89504c67f427ef',
              'info_dict': {
                  'id': 'a4b83e34-123d-4b81-9f4e-c0d3121a4e05',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Manfred Schreiber ist tot',
-                'description': 'Abendschau kompakt: Manfred Schreiber ist tot',
+                'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
                  'duration': 26,
              }
          },
          {
-            'url': 'http://www.br.de/radio/br-klassik/sendungen/allegro/premiere-urauffuehrung-the-land-2015-dance-festival-muenchen-100.html',
+            'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
              'md5': '8b5b27c0b090f3b35eac4ab3f7a73d3d',
              'info_dict': {
                  'id': '74c603c9-26d3-48bb-b85b-079aeed66e0b',
                  'ext': 'aac',
                  'title': 'Kurzweilig und sehr bewegend',
-                'description': '"The Land" von Peeping Tom: Kurzweilig und sehr bewegend',
+                'description': 'md5:0351996e3283d64adeb38ede91fac54e',
                  'duration': 296,
              }
          },
@@ -57,7 +60,7 @@ class BRIE(InfoExtractor):
                  'id': '6ba73750-d405-45d3-861d-1ce8c524e059',
                  'ext': 'mp4',
                  'title': 'Umweltbewusster Häuslebauer',
-                'description': 'Uwe Erdelt: Umweltbewusster Häuslebauer',
+                'description': 'md5:d52dae9792d00226348c1dbb13c9bae2',
                  'duration': 116,
              }
          },
@@ -68,7 +71,7 @@ class BRIE(InfoExtractor):
                  'id': 'd982c9ce-8648-4753-b358-98abb8aec43d',
                  'ext': 'mp4',
                  'title': 'Folge 1 - Metaphysik',
-                'description': 'Kant für Anfänger: Folge 1 - Metaphysik',
+                'description': 'md5:bb659990e9e59905c3d41e369db1fbe3',
                  'duration': 893,
                  'uploader': 'Eva Maria Steimle',
                  'upload_date': '20140117',
@@ -77,28 +80,31 @@ class BRIE(InfoExtractor):
      ]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
+        base_url, display_id = re.search(self._VALID_URL, url).groups()
          page = self._download_webpage(url, display_id)
          xml_url = self._search_regex(
              r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/(?:[a-z0-9\-]+/)+[a-z0-9/~_.-]+)'}\)\);", page, 'XMLURL')
-        xml = self._download_xml(self._BASE_URL + xml_url, None)
+        xml = self._download_xml(base_url + xml_url, display_id)
  
          medias = []
  
          for xml_media in xml.findall('video') + xml.findall('audio'):
+            media_id = xml_media.get('externalId')
              media = {
-                'id': xml_media.get('externalId'),
-                'title': xml_media.find('title').text,
-                'duration': parse_duration(xml_media.find('duration').text),
-                'formats': self._extract_formats(xml_media.find('assets')),
-                'thumbnails': self._extract_thumbnails(xml_media.find('teaserImage/variants')),
-                'description': ' '.join(xml_media.find('shareTitle').text.splitlines()),
-                'webpage_url': xml_media.find('permalink').text
+                'id': media_id,
+                'title': xpath_text(xml_media, 'title', 'title', True),
+                'duration': parse_duration(xpath_text(xml_media, 'duration')),
+                'formats': self._extract_formats(xpath_element(
+                    xml_media, 'assets'), media_id),
+                'thumbnails': self._extract_thumbnails(xpath_element(
+                    xml_media, 'teaserImage/variants'), base_url),
+                'description': xpath_text(xml_media, 'desc'),
+                'webpage_url': xpath_text(xml_media, 'permalink'),
+                'uploader': xpath_text(xml_media, 'author'),
              }
-            if xml_media.find('author').text:
-                media['uploader'] = xml_media.find('author').text
-            if xml_media.find('broadcastDate').text:
-                media['upload_date'] = ''.join(reversed(xml_media.find('broadcastDate').text.split('.')))
+            broadcast_date = xpath_text(xml_media, 'broadcastDate')
+            if broadcast_date:
+                media['upload_date'] = ''.join(reversed(broadcast_date.split('.')))
              medias.append(media)
  
          if len(medias) > 1:
@@ -109,35 +115,54 @@ class BRIE(InfoExtractor):
              raise ExtractorError('No media entries found')
          return medias[0]
  
-    def _extract_formats(self, assets):
-
-        def text_or_none(asset, tag):
-            elem = asset.find(tag)
-            return None if elem is None else elem.text
-
-        formats = [{
-            'url': text_or_none(asset, 'downloadUrl'),
-            'ext': text_or_none(asset, 'mediaType'),
-            'format_id': asset.get('type'),
-            'width': int_or_none(text_or_none(asset, 'frameWidth')),
-            'height': int_or_none(text_or_none(asset, 'frameHeight')),
-            'tbr': int_or_none(text_or_none(asset, 'bitrateVideo')),
-            'abr': int_or_none(text_or_none(asset, 'bitrateAudio')),
-            'vcodec': text_or_none(asset, 'codecVideo'),
-            'acodec': text_or_none(asset, 'codecAudio'),
-            'container': text_or_none(asset, 'mediaType'),
-            'filesize': int_or_none(text_or_none(asset, 'size')),
-        } for asset in assets.findall('asset')
-            if asset.find('downloadUrl') is not None]
-
+    def _extract_formats(self, assets, media_id):
+        formats = []
+        for asset in assets.findall('asset'):
+            format_url = xpath_text(asset, ['downloadUrl', 'url'])
+            asset_type = asset.get('type')
+            if asset_type == 'HDS':
+                formats.extend(self._extract_f4m_formats(
+                    format_url + '?hdcore=3.2.0', media_id, f4m_id='hds', fatal=False))
+            elif asset_type == 'HLS':
+                formats.extend(self._extract_m3u8_formats(
+                    format_url, media_id, 'mp4', 'm3u8_native', m3u8_id='hds', fatal=False))
+            else:
+                format_info = {
+                    'ext': xpath_text(asset, 'mediaType'),
+                    'width': int_or_none(xpath_text(asset, 'frameWidth')),
+                    'height': int_or_none(xpath_text(asset, 'frameHeight')),
+                    'tbr': int_or_none(xpath_text(asset, 'bitrateVideo')),
+                    'abr': int_or_none(xpath_text(asset, 'bitrateAudio')),
+                    'vcodec': xpath_text(asset, 'codecVideo'),
+                    'acodec': xpath_text(asset, 'codecAudio'),
+                    'container': xpath_text(asset, 'mediaType'),
+                    'filesize': int_or_none(xpath_text(asset, 'size')),
+                }
+                format_url = self._proto_relative_url(format_url)
+                if format_url:
+                    http_format_info = format_info.copy()
+                    http_format_info.update({
+                        'url': format_url,
+                        'format_id': 'http-%s' % asset_type,
+                    })
+                    formats.append(http_format_info)
+                server_prefix = xpath_text(asset, 'serverPrefix')
+                if server_prefix:
+                    rtmp_format_info = format_info.copy()
+                    rtmp_format_info.update({
+                        'url': server_prefix,
+                        'play_path': xpath_text(asset, 'fileName'),
+                        'format_id': 'rtmp-%s' % asset_type,
+                    })
+                    formats.append(rtmp_format_info)
          self._sort_formats(formats)
          return formats
  
-    def _extract_thumbnails(self, variants):
+    def _extract_thumbnails(self, variants, base_url):
          thumbnails = [{
-            'url': self._BASE_URL + variant.find('url').text,
-            'width': int_or_none(variant.find('width').text),
-            'height': int_or_none(variant.find('height').text),
-        } for variant in variants.findall('variant')]
+            'url': base_url + xpath_text(variant, 'url'),
+            'width': int_or_none(xpath_text(variant, 'width')),
+            'height': int_or_none(xpath_text(variant, 'height')),
+        } for variant in variants.findall('variant') if xpath_text(variant, 'url')]
          thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True)
          return thumbnails
diff --git a/youtube_dl/extractor/bravotv.py b/youtube_dl/extractor/bravotv.py

new file mode 100644 (file)

index 0000000..541c769
--- /dev/null
+++ b/youtube_dl/extractor/bravotv.py
@@ -0,0 +1,31 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import smuggle_url
+
+
+class BravoTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
+    _TEST = {
+        'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
+        'md5': 'd60cdf68904e854fac669bd26cccf801',
+        'info_dict': {
+            'id': 'LitrBdX64qLn',
+            'ext': 'mp4',
+            'title': 'Last Chance Kitchen Returns',
+            'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
+            'timestamp': 1448926740,
+            'upload_date': '20151130',
+            'uploader': 'NBCU-BRAV',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
+        release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
+        return self.url_result(smuggle_url(
+            'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
+            {'force_smil_url': True}), 'ThePlatform', release_pid)
diff --git a/youtube_dl/extractor/breakcom.py b/youtube_dl/extractor/breakcom.py

index 809287d144ca7d629bf42bad7ac4e213a323e6dd..725859b4d2d554df91ff4793a2b3d245f02c8996 100644 (file)
--- a/youtube_dl/extractor/breakcom.py
+++ b/youtube_dl/extractor/breakcom.py
@@ -11,13 +11,14 @@ from ..utils import (
  
  
  class BreakIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
          'info_dict': {
              'id': '2468056',
              'ext': 'mp4',
              'title': 'When Girls Act Like D-Bags',
+            'age_limit': 13,
          }
      }, {
          'url': 'http://www.break.com/video/ugc/baby-flex-2773063',
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 4721c22930f15cb51d0daaac294eeeca3a329092..f0781fc273a18ec30c1ffa97546232d991ad8574 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -3,31 +3,36 @@ from __future__ import unicode_literals
  
  import re
  import json
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_etree_fromstring,
      compat_parse_qs,
      compat_str,
-    compat_urllib_parse,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
      compat_xml_parse_error,
+    compat_HTTPError,
  )
  from ..utils import (
      determine_ext,
      ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
+    float_or_none,
+    js_to_json,
+    int_or_none,
+    parse_iso8601,
      unescapeHTML,
      unsmuggle_url,
+    update_url_query,
  )
  
  
-class BrightcoveIE(InfoExtractor):
+class BrightcoveLegacyIE(InfoExtractor):
+    IE_NAME = 'brightcove:legacy'
      _VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
-    _FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
+    _FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
  
      _TESTS = [
          {
@@ -41,6 +46,9 @@ class BrightcoveIE(InfoExtractor):
                  'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
                  'uploader': '8TV',
                  'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
+                'timestamp': 1368213670,
+                'upload_date': '20130510',
+                'uploader_id': '1589608506001',
              }
          },
          {
@@ -52,6 +60,9 @@ class BrightcoveIE(InfoExtractor):
                  'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
                  'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
                  'uploader': 'Oracle',
+                'timestamp': 1344975024,
+                'upload_date': '20120814',
+                'uploader_id': '1460825906',
              },
          },
          {
@@ -63,6 +74,9 @@ class BrightcoveIE(InfoExtractor):
                  'title': 'This Bracelet Acts as a Personal Thermostat',
                  'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
                  'uploader': 'Mashable',
+                'timestamp': 1382041798,
+                'upload_date': '20131017',
+                'uploader_id': '1130468786001',
              },
          },
          {
@@ -80,14 +94,17 @@ class BrightcoveIE(InfoExtractor):
          {
              # test flv videos served by akamaihd.net
              # From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
-            'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
+            'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
              # The md5 checksum changes on each download
              'info_dict': {
-                'id': '2996102916001',
+                'id': '3750436379001',
                  'ext': 'flv',
                  'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
-                'uploader': 'Red Bull TV',
+                'uploader': 'RBTV Old (do not use)',
                  'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
+                'timestamp': 1409122195,
+                'upload_date': '20140827',
+                'uploader_id': '710858724001',
              },
          },
          {
@@ -101,6 +118,12 @@ class BrightcoveIE(InfoExtractor):
              'playlist_mincount': 7,
          },
      ]
+    FLV_VCODECS = {
+        1: 'SORENSON',
+        2: 'ON2',
+        3: 'H264',
+        4: 'VP8',
+    }
  
      @classmethod
      def _build_brighcove_url(cls, object_str):
@@ -119,7 +142,7 @@ class BrightcoveIE(InfoExtractor):
          object_str = fix_xml_ampersands(object_str)
  
          try:
-            object_doc = xml.etree.ElementTree.fromstring(object_str.encode('utf-8'))
+            object_doc = compat_etree_fromstring(object_str.encode('utf-8'))
          except compat_xml_parse_error:
              return
  
@@ -131,13 +154,16 @@ class BrightcoveIE(InfoExtractor):
          else:
              flashvars = {}
  
+        data_url = object_doc.attrib.get('data', '')
+        data_url_params = compat_parse_qs(compat_urllib_parse_urlparse(data_url).query)
+
          def find_param(name):
              if name in flashvars:
                  return flashvars[name]
              node = find_xpath_attr(object_doc, './param', 'name', name)
              if node is not None:
                  return node.attrib['value']
-            return None
+            return data_url_params.get(name)
  
          params = {}
  
@@ -150,8 +176,8 @@ class BrightcoveIE(InfoExtractor):
          # Not all pages define this value
          if playerKey is not None:
              params['playerKey'] = playerKey
-        # The three fields hold the id of the video
-        videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID')
+        # These fields hold the id of the video
+        videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
          if videoPlayer is not None:
              params['@videoPlayer'] = videoPlayer
          linkBase = find_param('linkBaseURL')
@@ -179,8 +205,7 @@ class BrightcoveIE(InfoExtractor):
  
      @classmethod
      def _make_brightcove_url(cls, params):
-        data = compat_urllib_parse.urlencode(params)
-        return cls._FEDERATED_URL_TEMPLATE % data
+        return update_url_query(cls._FEDERATED_URL, params)
  
      @classmethod
      def _extract_brightcove_url(cls, webpage):
@@ -234,7 +259,7 @@ class BrightcoveIE(InfoExtractor):
              # We set the original url as the default 'Referer' header
              referer = smuggled_data.get('Referer', url)
              return self._get_video_info(
-                videoPlayer[0], query_str, query, referer=referer)
+                videoPlayer[0], query, referer=referer)
          elif 'playerKey' in query:
              player_key = query['playerKey']
              return self._get_playlist_info(player_key[0])
@@ -243,15 +268,14 @@ class BrightcoveIE(InfoExtractor):
                  'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
                  expected=True)
  
-    def _get_video_info(self, video_id, query_str, query, referer=None):
-        request_url = self._FEDERATED_URL_TEMPLATE % query_str
-        req = compat_urllib_request.Request(request_url)
+    def _get_video_info(self, video_id, query, referer=None):
+        headers = {}
          linkBase = query.get('linkBaseURL')
          if linkBase is not None:
              referer = linkBase[0]
          if referer is not None:
-            req.add_header('Referer', referer)
-        webpage = self._download_webpage(req, video_id)
+            headers['Referer'] = referer
+        webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
  
          error_msg = self._html_search_regex(
              r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
@@ -283,15 +307,19 @@ class BrightcoveIE(InfoExtractor):
                                      playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
  
      def _extract_video_info(self, video_info):
+        publisher_id = video_info.get('publisherId')
          info = {
              'id': compat_str(video_info['id']),
              'title': video_info['displayName'].strip(),
              'description': video_info.get('shortDescription'),
              'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
              'uploader': video_info.get('publisherName'),
+            'uploader_id': compat_str(publisher_id) if publisher_id else None,
+            'duration': float_or_none(video_info.get('length'), 1000),
+            'timestamp': int_or_none(video_info.get('creationDate'), 1000),
          }
  
-        renditions = video_info.get('renditions')
+        renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
          if renditions:
              formats = []
              for rend in renditions:
@@ -312,19 +340,42 @@ class BrightcoveIE(InfoExtractor):
                          ext = 'flv'
                  if ext is None:
                      ext = determine_ext(url)
-                size = rend.get('size')
-                formats.append({
+                tbr = int_or_none(rend.get('encodingRate'), 1000)
+                a_format = {
+                    'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
                      'url': url,
                      'ext': ext,
-                    'height': rend.get('frameHeight'),
-                    'width': rend.get('frameWidth'),
-                    'filesize': size if size != 0 else None,
-                })
+                    'filesize': int_or_none(rend.get('size')) or None,
+                    'tbr': tbr,
+                }
+                if rend.get('audioOnly'):
+                    a_format.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    a_format.update({
+                        'height': int_or_none(rend.get('frameHeight')),
+                        'width': int_or_none(rend.get('frameWidth')),
+                        'vcodec': rend.get('videoCodec'),
+                    })
+
+                # m3u8 manifests with remote == false are media playlists
+                # Not calling _extract_m3u8_formats here to save network traffic
+                if ext == 'm3u8':
+                    a_format.update({
+                        'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
+                        'ext': 'mp4',
+                        'protocol': 'm3u8',
+                    })
+
+                formats.append(a_format)
              self._sort_formats(formats)
              info['formats'] = formats
          elif video_info.get('FLVFullLengthURL') is not None:
              info.update({
                  'url': video_info['FLVFullLengthURL'],
+                'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
+                'filesize': int_or_none(video_info.get('FLVFullSize')),
              })
  
          if self._downloader.params.get('include_ads', False):
@@ -346,3 +397,205 @@ class BrightcoveIE(InfoExtractor):
          if 'url' not in info and not info.get('formats'):
              raise ExtractorError('Unable to extract video url for %s' % info['id'])
          return info
+
+
+class BrightcoveNewIE(InfoExtractor):
+    IE_NAME = 'brightcove:new'
+    _VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+|ref:[^&]+)'
+    _TESTS = [{
+        'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
+        'md5': 'c8100925723840d4b0d243f7025703be',
+        'info_dict': {
+            'id': '4463358922001',
+            'ext': 'mp4',
+            'title': 'Meet the man behind Popcorn Time',
+            'description': 'md5:eac376a4fe366edc70279bfb681aea16',
+            'duration': 165.768,
+            'timestamp': 1441391203,
+            'upload_date': '20150904',
+            'uploader_id': '929656772001',
+            'formats': 'mincount:22',
+        },
+    }, {
+        # with rtmp streams
+        'url': 'http://players.brightcove.net/4036320279001/5d112ed9-283f-485f-a7f9-33f42e8bc042_default/index.html?videoId=4279049078001',
+        'info_dict': {
+            'id': '4279049078001',
+            'ext': 'mp4',
+            'title': 'Titansgrave: Chapter 0',
+            'description': 'Titansgrave: Chapter 0',
+            'duration': 1242.058,
+            'timestamp': 1433556729,
+            'upload_date': '20150606',
+            'uploader_id': '4036320279001',
+            'formats': 'mincount:41',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        # ref: prefixed video id
+        'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
+        'only_matching': True,
+    }, {
+        # non numeric ref: prefixed video id
+        'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        urls = BrightcoveNewIE._extract_urls(webpage)
+        return urls[0] if urls else None
+
+    @staticmethod
+    def _extract_urls(webpage):
+        # Reference:
+        # 1. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideoiniframe
+        # 2. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/publish-video.html#setvideousingjavascript
+        # 3. http://docs.brightcove.com/en/video-cloud/brightcove-player/guides/embed-in-page.html
+        # 4. https://support.brightcove.com/en/video-cloud/docs/dynamically-assigning-videos-player
+
+        entries = []
+
+        # Look for iframe embeds [1]
+        for _, url in re.findall(
+                r'<iframe[^>]+src=(["\'])((?:https?:)?//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
+            entries.append(url if url.startswith('http') else 'http:' + url)
+
+        # Look for embed_in_page embeds [2]
+        for video_id, account_id, player_id, embed in re.findall(
+                # According to examples from [3] it's unclear whether video id
+                # may be optional and what to do when it is
+                # According to [4] data-video-id may be prefixed with ref:
+                r'''(?sx)
+                    <video[^>]+
+                        data-video-id=["\'](\d+|ref:[^"\']+)["\'][^>]*>.*?
+                    </video>.*?
+                    <script[^>]+
+                        src=["\'](?:https?:)?//players\.brightcove\.net/
+                        (\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
+                ''', webpage):
+            entries.append(
+                'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
+                % (account_id, player_id, embed, video_id))
+
+        return entries
+
+    def _real_extract(self, url):
+        account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(
+            'http://players.brightcove.net/%s/%s_%s/index.min.js'
+            % (account_id, player_id, embed), video_id)
+
+        policy_key = None
+
+        catalog = self._search_regex(
+            r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
+        if catalog:
+            catalog = self._parse_json(
+                js_to_json(catalog), video_id, fatal=False)
+            if catalog:
+                policy_key = catalog.get('policyKey')
+
+        if not policy_key:
+            policy_key = self._search_regex(
+                r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+                webpage, 'policy key', group='pk')
+
+        api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
+        try:
+            json_data = self._download_json(api_url, video_id, headers={
+                'Accept': 'application/json;pk=%s' % policy_key
+            })
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                json_data = self._parse_json(e.cause.read().decode(), video_id)
+                raise ExtractorError(json_data[0]['message'], expected=True)
+            raise
+
+        title = json_data['name'].strip()
+
+        formats = []
+        for source in json_data.get('sources', []):
+            container = source.get('container')
+            source_type = source.get('type')
+            src = source.get('src')
+            if source_type == 'application/x-mpegURL' or container == 'M2TS':
+                if not src:
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, 'mp4', m3u8_id='hls', fatal=False))
+            elif source_type == 'application/dash+xml':
+                if not src:
+                    continue
+                formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
+            else:
+                streaming_src = source.get('streaming_src')
+                stream_name, app_name = source.get('stream_name'), source.get('app_name')
+                if not src and not streaming_src and (not stream_name or not app_name):
+                    continue
+                tbr = float_or_none(source.get('avg_bitrate'), 1000)
+                height = int_or_none(source.get('height'))
+                width = int_or_none(source.get('width'))
+                f = {
+                    'tbr': tbr,
+                    'filesize': int_or_none(source.get('size')),
+                    'container': container,
+                    'ext': container.lower(),
+                }
+                if width == 0 and height == 0:
+                    f.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    f.update({
+                        'width': width,
+                        'height': height,
+                        'vcodec': source.get('codec'),
+                    })
+
+                def build_format_id(kind):
+                    format_id = kind
+                    if tbr:
+                        format_id += '-%dk' % int(tbr)
+                    if height:
+                        format_id += '-%dp' % height
+                    return format_id
+
+                if src or streaming_src:
+                    f.update({
+                        'url': src or streaming_src,
+                        'format_id': build_format_id('http' if src else 'http-streaming'),
+                        'source_preference': 0 if src else -1,
+                    })
+                else:
+                    f.update({
+                        'url': app_name,
+                        'play_path': stream_name,
+                        'format_id': build_format_id('rtmp'),
+                    })
+                formats.append(f)
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for text_track in json_data.get('text_tracks', []):
+            if text_track.get('src'):
+                subtitles.setdefault(text_track.get('srclang'), []).append({
+                    'url': text_track['src'],
+                })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': json_data.get('description'),
+            'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
+            'duration': float_or_none(json_data.get('duration'), 1000),
+            'timestamp': parse_iso8601(json_data.get('published_at')),
+            'uploader_id': account_id,
+            'formats': formats,
+            'subtitles': subtitles,
+            'tags': json_data.get('tags', []),
+        }
diff --git a/youtube_dl/extractor/byutv.py b/youtube_dl/extractor/byutv.py

index 3b2de517e53da39e06912ce1a97c4aafe7fa250e..dda98059e9041c651de5a211fccb2c106b11bb75 100644 (file)
--- a/youtube_dl/extractor/byutv.py
+++ b/youtube_dl/extractor/byutv.py
@@ -14,9 +14,10 @@ class BYUtvIE(InfoExtractor):
          'info_dict': {
              'id': 'studio-c-season-5-episode-5',
              'ext': 'mp4',
-            'description': 'md5:5438d33774b6bdc662f9485a340401cc',
+            'description': 'md5:e07269172baff037f8e8bf9956bc9747',
              'title': 'Season 5 Episode 5',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 1486.486,
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/c56.py b/youtube_dl/extractor/c56.py

index cb96c3876b7cbf02220d06ad86a44414d69c9fa8..cac8fdcba4a9967b9aa028c29bac1d539af458ca 100644 (file)
--- a/youtube_dl/extractor/c56.py
+++ b/youtube_dl/extractor/c56.py
@@ -4,12 +4,13 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import js_to_json
  
  
  class C56IE(InfoExtractor):
      _VALID_URL = r'https?://(?:(?:www|player)\.)?56\.com/(?:.+?/)?(?:v_|(?:play_album.+-))(?P<textid>.+?)\.(?:html|swf)'
      IE_NAME = '56.com'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.56.com/u39/v_OTM0NDA3MTY.html',
          'md5': 'e59995ac63d0457783ea05f93f12a866',
          'info_dict': {
@@ -18,12 +19,29 @@ class C56IE(InfoExtractor):
              'title': '网事知多少 第32期：车怒',
              'duration': 283.813,
          },
-    }
+    }, {
+        'url': 'http://www.56.com/u47/v_MTM5NjQ5ODc2.html',
+        'md5': '',
+        'info_dict': {
+            'id': '82247482',
+            'title': '爱的诅咒之杜鹃花开',
+        },
+        'playlist_count': 7,
+        'add_ie': ['Sohu'],
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
          text_id = mobj.group('textid')
  
+        webpage = self._download_webpage(url, text_id)
+        sohu_video_info_str = self._search_regex(
+            r'var\s+sohuVideoInfo\s*=\s*({[^}]+});', webpage, 'Sohu video info', default=None)
+        if sohu_video_info_str:
+            sohu_video_info = self._parse_json(
+                sohu_video_info_str, text_id, transform_source=js_to_json)
+            return self.url_result(sohu_video_info['url'], 'Sohu')
+
          page = self._download_json(
              'http://vxml.56.com/json/%s/' % text_id, text_id, 'Downloading video info')
  
diff --git a/youtube_dl/extractor/camdemy.py b/youtube_dl/extractor/camdemy.py

index 897f3a104ce2d31aeac99e98197557ef502faf18..6ffbeabd371fd6f80a9ead1d23762f760a13ba2f 100644 (file)
--- a/youtube_dl/extractor/camdemy.py
+++ b/youtube_dl/extractor/camdemy.py
@@ -6,7 +6,7 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
  
  
  class CamdemyIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
      _TESTS = [{
          # single file
          'url': 'http://www.camdemy.com/media/5181/',
@@ -104,7 +104,7 @@ class CamdemyIE(InfoExtractor):
  
  
  class CamdemyFolderIE(InfoExtractor):
-    _VALID_URL = r'http://www.camdemy.com/folder/(?P<id>\d+)'
+    _VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
      _TESTS = [{
          # links with trailing slash
          'url': 'http://www.camdemy.com/folder/450',
@@ -139,7 +139,7 @@ class CamdemyFolderIE(InfoExtractor):
          parsed_url = list(compat_urlparse.urlparse(url))
          query = dict(compat_urlparse.parse_qsl(parsed_url[4]))
          query.update({'displayMode': 'list'})
-        parsed_url[4] = compat_urllib_parse.urlencode(query)
+        parsed_url[4] = compat_urllib_parse_urlencode(query)
          final_url = compat_urlparse.urlunparse(parsed_url)
  
          page = self._download_webpage(final_url, folder_id)
diff --git a/youtube_dl/extractor/camwithher.py b/youtube_dl/extractor/camwithher.py

new file mode 100644 (file)

index 0000000..afbc5ea
--- /dev/null
+++ b/youtube_dl/extractor/camwithher.py
@@ -0,0 +1,87 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    unified_strdate,
+)
+
+
+class CamWithHerIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
+
+    _TESTS = [{
+        'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
+        'info_dict': {
+            'id': '5644',
+            'ext': 'flv',
+            'title': 'Periscope Tease',
+            'description': 'In the clouds teasing on periscope to my favorite song',
+            'duration': 240,
+            'view_count': int,
+            'comment_count': int,
+            'uploader': 'MileenaK',
+            'upload_date': '20160322',
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
+        'only_matching': True,
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
+        'only_matching': True,
+    }, {
+        'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        flv_id = self._html_search_regex(
+            r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
+
+        # Video URL construction algorithm is reverse-engineered from cwhplayer.swf
+        rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
+            ('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
+
+        title = self._html_search_regex(
+            r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
+        description = self._html_search_regex(
+            r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
+
+        runtime = self._search_regex(
+            r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
+        if runtime:
+            runtime = re.sub(r'[\s-]', '', runtime)
+        duration = parse_duration(runtime)
+        view_count = int_or_none(self._search_regex(
+            r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
+        comment_count = int_or_none(self._search_regex(
+            r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
+
+        uploader = self._search_regex(
+            r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
+        upload_date = unified_strdate(self._search_regex(
+            r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
+
+        return {
+            'id': flv_id,
+            'url': rtmp_url,
+            'ext': 'flv',
+            'no_resume': True,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'view_count': view_count,
+            'comment_count': comment_count,
+            'uploader': uploader,
+            'upload_date': upload_date,
+        }
diff --git a/youtube_dl/extractor/canal13cl.py b/youtube_dl/extractor/canal13cl.py

deleted file mode 100644 (file)

index 93241fe..0000000
--- a/youtube_dl/extractor/canal13cl.py
+++ /dev/null
@@ -1,48 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class Canal13clIE(InfoExtractor):
-    _VALID_URL = r'^http://(?:www\.)?13\.cl/(?:[^/?#]+/)*(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://www.13.cl/t13/nacional/el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
-        'md5': '4cb1fa38adcad8fea88487a078831755',
-        'info_dict': {
-            'id': '1403022125',
-            'display_id': 'el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
-            'ext': 'mp4',
-            'title': 'El "círculo de hierro" de Michelle Bachelet en su regreso a La Moneda',
-            'description': '(Foto: Agencia Uno) En nueve días más, Michelle Bachelet va a asumir por segunda vez como presidenta de la República. Entre aquellos que la acompañarán hay caras que se repiten y otras que se consolidan en su entorno de colaboradores más cercanos.',
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        title = self._html_search_meta(
-            'twitter:title', webpage, 'title', fatal=True)
-        description = self._html_search_meta(
-            'twitter:description', webpage, 'description')
-        url = self._html_search_regex(
-            r'articuloVideo = \"(.*?)\"', webpage, 'url')
-        real_id = self._search_regex(
-            r'[^0-9]([0-9]{7,})[^0-9]', url, 'id', default=display_id)
-        thumbnail = self._html_search_regex(
-            r'articuloImagen = \"(.*?)\"', webpage, 'thumbnail')
-
-        return {
-            'id': real_id,
-            'display_id': display_id,
-            'url': url,
-            'title': title,
-            'description': description,
-            'ext': 'mp4',
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/canalc2.py b/youtube_dl/extractor/canalc2.py

index c4fefefe43b250c13c3a711cf397e1a3caa046a7..f1f128c45ae3df37f7fa57a533af3aa3bade76ba 100644 (file)
--- a/youtube_dl/extractor/canalc2.py
+++ b/youtube_dl/extractor/canalc2.py
@@ -4,38 +4,65 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import parse_duration
  
  
  class Canalc2IE(InfoExtractor):
      IE_NAME = 'canalc2.tv'
-    _VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?.*?idVideo=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?canalc2\.tv/video/|archives-canalc2\.u-strasbg\.fr/video\.asp\?.*\bidVideo=)(?P<id>\d+)'
  
-    _TEST = {
-        'url': 'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
+    _TESTS = [{
+        'url': 'http://www.canalc2.tv/video/12163',
          'md5': '060158428b650f896c542dfbb3d6487f',
          'info_dict': {
              'id': '12163',
-            'ext': 'mp4',
-            'title': 'Terrasses du Numérique'
+            'ext': 'flv',
+            'title': 'Terrasses du Numérique',
+            'duration': 122,
+        },
+        'params': {
+            'skip_download': True,  # Requires rtmpdump
          }
-    }
+    }, {
+        'url': 'http://archives-canalc2.u-strasbg.fr/video.asp?idVideo=11427&voir=oui',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = re.match(self._VALID_URL, url).group('id')
-        # We need to set the voir field for getting the file name
-        url = 'http://www.canalc2.tv/video.asp?idVideo=%s&voir=oui' % video_id
-        webpage = self._download_webpage(url, video_id)
-        file_name = self._search_regex(
-            r"so\.addVariable\('file','(.*?)'\);",
-            webpage, 'file name')
-        video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://www.canalc2.tv/video/%s' % video_id, video_id)
+
+        formats = []
+        for _, video_url in re.findall(r'file\s*=\s*(["\'])(.+?)\1', webpage):
+            if video_url.startswith('rtmp://'):
+                rtmp = re.search(
+                    r'^(?P<url>rtmp://[^/]+/(?P<app>.+/))(?P<play_path>mp4:.+)$', video_url)
+                formats.append({
+                    'url': rtmp.group('url'),
+                    'format_id': 'rtmp',
+                    'ext': 'flv',
+                    'app': rtmp.group('app'),
+                    'play_path': rtmp.group('play_path'),
+                    'page_url': url,
+                })
+            else:
+                formats.append({
+                    'url': video_url,
+                    'format_id': 'http',
+                })
+        self._sort_formats(formats)
  
          title = self._html_search_regex(
-            r'class="evenement8">(.*?)</a>', webpage, 'title')
+            r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.*?)</h3>', webpage, 'title')
+        duration = parse_duration(self._search_regex(
+            r'id=["\']video_duree["\'][^>]*>([^<]+)',
+            webpage, 'duration', fatal=False))
  
          return {
              'id': video_id,
-            'ext': 'mp4',
-            'url': video_url,
              'title': title,
+            'duration': duration,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/canalplus.py b/youtube_dl/extractor/canalplus.py

index 57e0cda2c4aafdf65cd070c24f7503af9f85adab..25b2d4efe5d54e1c3264f906a3105ad05dd2ca3f 100644 (file)
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -10,13 +10,14 @@ from ..utils import (
      unified_strdate,
      url_basename,
      qualities,
+    int_or_none,
  )
  
  
  class CanalplusIE(InfoExtractor):
      IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
      _VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
-    _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s'
+    _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
      _SITE_ID_MAP = {
          'canalplus.fr': 'cplus',
          'piwiplus.fr': 'teletoon',
@@ -26,10 +27,10 @@ class CanalplusIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
-        'md5': 'b3481d7ca972f61e37420798d0a9d934',
+        'md5': '12164a6f14ff6df8bd628e8ba9b10b78',
          'info_dict': {
              'id': '1263092',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Le Zapping - 13/05/15',
              'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
              'upload_date': '20150513',
@@ -56,10 +57,10 @@ class CanalplusIE(InfoExtractor):
          'skip': 'videos get deleted after a while',
      }, {
          'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
-        'md5': 'f3a46edcdf28006598ffaf5b30e6a2d4',
+        'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
          'info_dict': {
              'id': '1213714',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
              'description': 'md5:8216206ec53426ea6321321f3b3c16db',
              'upload_date': '20150211',
@@ -78,18 +79,20 @@ class CanalplusIE(InfoExtractor):
          if video_id is None:
              webpage = self._download_webpage(url, display_id)
              video_id = self._search_regex(
-                r'<canal:player[^>]+?videoId="(\d+)"', webpage, 'video id')
+                [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'],
+                webpage, 'video id', group='id')
  
          info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
-        doc = self._download_xml(info_url, video_id, 'Downloading video XML')
+        video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
  
-        video_info = [video for video in doc if video.find('ID').text == video_id][0]
-        media = video_info.find('MEDIA')
-        infos = video_info.find('INFOS')
+        if isinstance(video_data, list):
+            video_data = [video for video in video_data if video.get('ID') == video_id][0]
+        media = video_data['MEDIA']
+        infos = video_data['INFOS']
  
-        preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS'])
+        preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD'])
  
-        fmt_url = next(iter(media.find('VIDEOS'))).text
+        fmt_url = next(iter(media.get('VIDEOS')))
          if '/geo' in fmt_url.lower():
              response = self._request_webpage(
                  HEADRequest(fmt_url), video_id,
@@ -100,35 +103,42 @@ class CanalplusIE(InfoExtractor):
                      expected=True)
  
          formats = []
-        for fmt in media.find('VIDEOS'):
-            format_url = fmt.text
+        for format_id, format_url in media['VIDEOS'].items():
              if not format_url:
                  continue
-            format_id = fmt.tag
              if format_id == 'HLS':
                  formats.extend(self._extract_m3u8_formats(
-                    format_url, video_id, 'mp4', preference=preference(format_id)))
+                    format_url, video_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
              elif format_id == 'HDS':
                  formats.extend(self._extract_f4m_formats(
-                    format_url + '?hdcore=2.11.3', video_id, preference=preference(format_id)))
+                    format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
              else:
                  formats.append({
-                    'url': format_url,
+                    # the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
+                    'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
                      'format_id': format_id,
                      'preference': preference(format_id),
                  })
          self._sort_formats(formats)
  
+        thumbnails = [{
+            'id': image_id,
+            'url': image_url,
+        } for image_id, image_url in media.get('images', {}).items()]
+
+        titrage = infos['TITRAGE']
+
          return {
              'id': video_id,
              'display_id': display_id,
-            'title': '%s - %s' % (infos.find('TITRAGE/TITRE').text,
-                                  infos.find('TITRAGE/SOUS_TITRE').text),
-            'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
-            'thumbnail': media.find('IMAGES/GRAND').text,
-            'description': infos.find('DESCRIPTION').text,
-            'view_count': int(infos.find('NB_VUES').text),
-            'like_count': int(infos.find('NB_LIKES').text),
-            'comment_count': int(infos.find('NB_COMMENTS').text),
+            'title': '%s - %s' % (titrage['TITRE'],
+                                  titrage['SOUS_TITRE']),
+            'upload_date': unified_strdate(infos.get('PUBLICATION', {}).get('DATE')),
+            'thumbnails': thumbnails,
+            'description': infos.get('DESCRIPTION'),
+            'duration': int_or_none(infos.get('DURATION')),
+            'view_count': int_or_none(infos.get('NB_VUES')),
+            'like_count': int_or_none(infos.get('NB_LIKES')),
+            'comment_count': int_or_none(infos.get('NB_COMMENTS')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

new file mode 100644 (file)

index 0000000..ec6d24d
--- /dev/null
+++ b/youtube_dl/extractor/canvas.py
@@ -0,0 +1,94 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import float_or_none
+
+
+class CanvasIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
+        'md5': 'ea838375a547ac787d4064d8c7860a6c',
+        'info_dict': {
+            'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
+            'display_id': 'de-afspraak-veilt-voor-de-warmste-week',
+            'ext': 'mp4',
+            'title': 'De afspraak veilt voor de Warmste Week',
+            'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 49.02,
+        }
+    }, {
+        # with subtitles
+        'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167',
+        'info_dict': {
+            'id': 'mz-ast-5240ff21-2d30-4101-bba6-92b5ec67c625',
+            'display_id': 'pieter-0167',
+            'ext': 'mp4',
+            'title': 'Pieter 0167',
+            'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 2553.08,
+            'subtitles': {
+                'nl': [{
+                    'ext': 'vtt',
+                }],
+            },
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._search_regex(
+            r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
+            webpage, 'title', default=None) or self._og_search_title(webpage)
+
+        video_id = self._html_search_regex(
+            r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
+
+        data = self._download_json(
+            'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
+
+        formats = []
+        for target in data['targetUrls']:
+            format_url, format_type = target.get('url'), target.get('type')
+            if not format_url or not format_type:
+                continue
+            if format_type == 'HLS':
+                formats.extend(self._extract_m3u8_formats(
+                    format_url, display_id, entry_protocol='m3u8_native',
+                    ext='mp4', preference=0, fatal=False, m3u8_id=format_type))
+            elif format_type == 'HDS':
+                formats.extend(self._extract_f4m_formats(
+                    format_url, display_id, f4m_id=format_type, fatal=False))
+            else:
+                formats.append({
+                    'format_id': format_type,
+                    'url': format_url,
+                })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        subtitle_urls = data.get('subtitleUrls')
+        if isinstance(subtitle_urls, list):
+            for subtitle in subtitle_urls:
+                subtitle_url = subtitle.get('url')
+                if subtitle_url and subtitle.get('type') == 'CLOSED':
+                    subtitles.setdefault('nl', []).append({'url': subtitle_url})
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'formats': formats,
+            'duration': float_or_none(data.get('duration'), 1000),
+            'thumbnail': data.get('posterImageUrl'),
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py

new file mode 100644 (file)

index 0000000..68a0633
--- /dev/null
+++ b/youtube_dl/extractor/cbc.py
@@ -0,0 +1,114 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import js_to_json
+
+
+class CBCIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
+    _TESTS = [{
+        # with mediaId
+        'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
+        'info_dict': {
+            'id': '2682904050',
+            'ext': 'flv',
+            'title': 'Don Cherry – All-Stars',
+            'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guy’s got heart.',
+            'timestamp': 1454475540,
+            'upload_date': '20160203',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # with clipId
+        'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
+        'info_dict': {
+            'id': '2487345465',
+            'ext': 'flv',
+            'title': 'Robin Williams freestyles on 90 Minutes Live',
+            'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
+            'upload_date': '19700101',
+            'uploader': 'CBCC-NEW',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # multiple iframes
+        'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
+        'playlist': [{
+            'info_dict': {
+                'id': '2680832926',
+                'ext': 'flv',
+                'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
+                'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
+                'upload_date': '19700101',
+            },
+        }, {
+            'info_dict': {
+                'id': '2658915080',
+                'ext': 'flv',
+                'title': 'Fly like an eagle!',
+                'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
+                'upload_date': '19700101',
+            },
+        }],
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        player_init = self._search_regex(
+            r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
+            default=None)
+        if player_init:
+            player_info = self._parse_json(player_init, display_id, js_to_json)
+            media_id = player_info.get('mediaId')
+            if not media_id:
+                clip_id = player_info['clipId']
+                media_id = self._download_json(
+                    'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
+                    clip_id)['entries'][0]['id'].split('/')[-1]
+            return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
+        else:
+            entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
+            return self.playlist_result(entries)
+
+
+class CBCPlayerIE(InfoExtractor):
+    _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.cbc.ca/player/play/2683190193',
+        'info_dict': {
+            'id': '2683190193',
+            'ext': 'flv',
+            'title': 'Gerry Runs a Sweat Shop',
+            'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
+            'timestamp': 1455067800,
+            'upload_date': '20160210',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
+            'ThePlatformFeed', video_id)
diff --git a/youtube_dl/extractor/cbs.py b/youtube_dl/extractor/cbs.py

index 75fffb1563ae9f95bf862ad156111b6962a8429e..051d783a23cc7c0b5858af0c24f63187181cd276 100644 (file)
--- a/youtube_dl/extractor/cbs.py
+++ b/youtube_dl/extractor/cbs.py
@@ -1,20 +1,40 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
+from .theplatform import ThePlatformIE
+from ..utils import (
+    xpath_text,
+    xpath_element,
+    int_or_none,
+    find_xpath_attr,
+)
  
  
-class CBSIE(InfoExtractor):
+class CBSBaseIE(ThePlatformIE):
+    def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
+        closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
+        return {
+            'en': [{
+                'ext': 'ttml',
+                'url': closed_caption_e.attrib['value'],
+            }]
+        } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
+
+
+class CBSIE(CBSBaseIE):
      _VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
  
      _TESTS = [{
          'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
          'info_dict': {
-            'id': '4JUVEwq3wUT7',
+            'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
              'display_id': 'connect-chat-feat-garth-brooks',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Connect Chat feat. Garth Brooks',
              'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
              'duration': 1495,
+            'timestamp': 1385585425,
+            'upload_date': '20131127',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
              # rtmp download
@@ -43,16 +63,46 @@ class CBSIE(InfoExtractor):
          'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
          'only_matching': True,
      }]
+    TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        real_id = self._search_regex(
-            [r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
-            webpage, 'real video ID')
-        return {
-            '_type': 'url_transparent',
-            'ie_key': 'ThePlatform',
-            'url': 'theplatform:%s' % real_id,
+        content_id = self._search_regex(
+            [r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
+            webpage, 'content id')
+        items_data = self._download_xml(
+            'http://can.cbs.com/thunder/player/videoPlayerService.php',
+            content_id, query={'partner': 'cbs', 'contentId': content_id})
+        video_data = xpath_element(items_data, './/item')
+        title = xpath_text(video_data, 'videoTitle', 'title', True)
+
+        subtitles = {}
+        formats = []
+        for item in items_data.findall('.//item'):
+            pid = xpath_text(item, 'pid')
+            if not pid:
+                continue
+            tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
+            if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
+                tp_release_url += '&manifest=m3u'
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(
+                tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
+
+        info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
+        info.update({
+            'id': content_id,
              'display_id': display_id,
-        }
+            'title': title,
+            'series': xpath_text(video_data, 'seriesTitle'),
+            'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
+            'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
+            'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
+            'thumbnail': xpath_text(video_data, 'previewImageURL'),
+            'formats': formats,
+            'subtitles': subtitles,
+        })
+        return info
diff --git a/youtube_dl/extractor/cbsinteractive.py b/youtube_dl/extractor/cbsinteractive.py

new file mode 100644 (file)

index 0000000..0011c30
--- /dev/null
+++ b/youtube_dl/extractor/cbsinteractive.py
@@ -0,0 +1,108 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .theplatform import ThePlatformIE
+from ..utils import int_or_none
+
+
+class CBSInteractiveIE(ThePlatformIE):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
+    _TESTS = [{
+        'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
+        'info_dict': {
+            'id': '56f4ea68-bd21-4852-b08c-4de5b8354c60',
+            'ext': 'flv',
+            'title': 'Hands-on with Microsoft Windows 8.1 Update',
+            'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
+            'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
+            'uploader': 'Sarah Mitroff',
+            'duration': 70,
+            'timestamp': 1396479627,
+            'upload_date': '20140402',
+        },
+    }, {
+        'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
+        'info_dict': {
+            'id': '56527b93-d25d-44e3-b738-f989ce2e49ba',
+            'ext': 'flv',
+            'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
+            'description': 'Khail and Ashley wonder what other civic woes can be solved by self-tweeting objects, investigate a new kind of VR camera and watch an origami robot self-assemble, walk, climb, dig and dissolve. #TDPothole',
+            'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
+            'uploader': 'Ashley Esqueda',
+            'duration': 1482,
+            'timestamp': 1433289889,
+            'upload_date': '20150603',
+        },
+    }, {
+        'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
+        'info_dict': {
+            'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
+            'ext': 'mp4',
+            'title': 'Video: Keeping Android smartphones and tablets secure',
+            'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
+            'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
+            'uploader': 'Adrian Kingsley-Hughes',
+            'timestamp': 1448961720,
+            'upload_date': '20151201',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]
+    TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
+    MPX_ACCOUNTS = {
+        'cnet': 2288573011,
+        'zdnet': 2387448114,
+    }
+
+    def _real_extract(self, url):
+        site, display_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, display_id)
+
+        data_json = self._html_search_regex(
+            r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
+            webpage, 'data json')
+        data = self._parse_json(data_json, display_id)
+        vdata = data.get('video') or data['videos'][0]
+
+        video_id = vdata['id']
+        title = vdata['title']
+        author = vdata.get('author')
+        if author:
+            uploader = '%s %s' % (author['firstName'], author['lastName'])
+            uploader_id = author.get('id')
+        else:
+            uploader = None
+            uploader_id = None
+
+        media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
+        formats, subtitles = [], {}
+        if site == 'cnet':
+            formats, subtitles = self._extract_theplatform_smil(
+                self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
+        for (fkey, vid) in vdata['files'].items():
+            if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
+                continue
+            release_url = self.TP_RELEASE_URL_TEMPLATE % vid
+            if fkey == 'hds':
+                release_url += '&manifest=f4m'
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
+
+        info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
+        info.update({
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'duration': int_or_none(vdata.get('duration')),
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'subtitles': subtitles,
+            'formats': formats,
+        })
+        return info
diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py

index 52e61d85b3a20bc939771cee2b94188d32f16d17..79ddc20a09ca067922e25d315bd0fbdb03b0abf9 100644 (file)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@@ -1,15 +1,16 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import re
-import json
-
  from .common import InfoExtractor
+from .cbs import CBSBaseIE
+from ..utils import (
+    parse_duration,
+)
  
  
-class CBSNewsIE(InfoExtractor):
+class CBSNewsIE(CBSBaseIE):
      IE_DESC = 'CBS News'
-    _VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
+    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
  
      _TESTS = [
          {
@@ -30,53 +31,48 @@ class CBSNewsIE(InfoExtractor):
              'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
              'info_dict': {
                  'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 205,
+                'subtitles': {
+                    'en': [{
+                        'ext': 'ttml',
+                    }],
+                },
              },
              'params': {
-                # rtmp download
+                # m3u8 download
                  'skip_download': True,
              },
          },
      ]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
  
-        video_info = json.loads(self._html_search_regex(
+        video_info = self._parse_json(self._html_search_regex(
              r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
-            webpage, 'video JSON info'))
+            webpage, 'video JSON info'), video_id)
  
          item = video_info['item'] if 'item' in video_info else video_info
          title = item.get('articleTitle') or item.get('hed')
          duration = item.get('duration')
          thumbnail = item.get('mediaImage') or item.get('thumbnail')
  
+        subtitles = {}
          formats = []
          for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
-            uri = item.get('media' + format_id + 'URI')
-            if not uri:
+            pid = item.get('media' + format_id)
+            if not pid:
                  continue
-            fmt = {
-                'url': uri,
-                'format_id': format_id,
-            }
-            if uri.startswith('rtmp'):
-                fmt.update({
-                    'app': 'ondemand?auth=cbs',
-                    'play_path': 'mp4:' + uri.split('<break>')[-1],
-                    'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
-                    'page_url': 'http://www.cbsnews.com',
-                    'ext': 'flv',
-                })
-            elif uri.endswith('.m3u8'):
-                fmt['ext'] = 'mp4'
-            formats.append(fmt)
+            release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
+            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
+            formats.extend(tp_formats)
+            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
@@ -84,4 +80,44 @@ class CBSNewsIE(InfoExtractor):
              'thumbnail': thumbnail,
              'duration': duration,
              'formats': formats,
+            'subtitles': subtitles,
+        }
+
+
+class CBSNewsLiveVideoIE(InfoExtractor):
+    IE_DESC = 'CBS News Live Videos'
+    _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
+
+    _TEST = {
+        'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
+        'info_dict': {
+            'id': 'clinton-sanders-prepare-to-face-off-in-nh',
+            'ext': 'flv',
+            'title': 'Clinton, Sanders Prepare To Face Off In NH',
+            'duration': 334,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_info = self._parse_json(self._html_search_regex(
+            r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
+
+        hdcore_sign = 'hdcore=3.3.1'
+        f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
+        if f4m_formats:
+            for entry in f4m_formats:
+                # URLs without the extra param induce an 404 error
+                entry.update({'extra_param_to_segment_url': hdcore_sign})
+        self._sort_formats(f4m_formats)
+
+        return {
+            'id': video_id,
+            'title': video_info['headline'],
+            'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
+            'duration': parse_duration(video_info.get('segmentDur')),
+            'formats': f4m_formats,
          }
diff --git a/youtube_dl/extractor/cbssports.py b/youtube_dl/extractor/cbssports.py

index ae47e74ccf583ac9d821dd588f07f33ff57673db..549ae32f36c8ebd258896d4189ba90ae501c40d0 100644 (file)
--- a/youtube_dl/extractor/cbssports.py
+++ b/youtube_dl/extractor/cbssports.py
@@ -6,7 +6,7 @@ from .common import InfoExtractor
  
  
  class CBSSportsIE(InfoExtractor):
-    _VALID_URL = r'http://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
  
      _TEST = {
          'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
diff --git a/youtube_dl/extractor/ccc.py b/youtube_dl/extractor/ccc.py

index 6924eac704cd5cf02d266bd60125dad9cf13e765..dda2c0959882c3cd3c5de56b817ccd7815ef0068 100644 (file)
--- a/youtube_dl/extractor/ccc.py
+++ b/youtube_dl/extractor/ccc.py
@@ -5,6 +5,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
+    parse_duration,
      qualities,
      unified_strdate,
  )
@@ -12,21 +13,25 @@ from ..utils import (
  
  class CCCIE(InfoExtractor):
      IE_NAME = 'media.ccc.de'
-    _VALID_URL = r'https?://(?:www\.)?media\.ccc\.de/[^?#]+/[^?#/]*?_(?P<id>[0-9]{8,})._[^?#/]*\.html'
+    _VALID_URL = r'https?://(?:www\.)?media\.ccc\.de/v/(?P<id>[^/?#&]+)'
  
-    _TEST = {
-        'url': 'http://media.ccc.de/browse/congress/2013/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor.html#video',
+    _TESTS = [{
+        'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
          'md5': '3a1eda8f3a29515d27f5adb967d7e740',
          'info_dict': {
-            'id': '20131228183',
+            'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor',
              'ext': 'mp4',
              'title': 'Introduction to Processor Design',
-            'description': 'md5:5ddbf8c734800267f2cee4eab187bc1b',
+            'description': 'md5:80be298773966f66d56cb11260b879af',
              'thumbnail': 're:^https?://.*\.jpg$',
              'view_count': int,
-            'upload_date': '20131229',
+            'upload_date': '20131228',
+            'duration': 3660,
          }
-    }
+    }, {
+        'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -40,21 +45,25 @@ class CCCIE(InfoExtractor):
          title = self._html_search_regex(
              r'(?s)<h1>(.*?)</h1>', webpage, 'title')
          description = self._html_search_regex(
-            r"(?s)<p class='description'>(.*?)</p>",
+            r'(?s)<h3>About</h3>(.+?)<h3>',
              webpage, 'description', fatal=False)
          upload_date = unified_strdate(self._html_search_regex(
-            r"(?s)<span class='[^']*fa-calendar-o'></span>(.*?)</li>",
+            r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
              webpage, 'upload date', fatal=False))
          view_count = int_or_none(self._html_search_regex(
              r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
              webpage, 'view count', fatal=False))
+        duration = parse_duration(self._html_search_regex(
+            r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
+            webpage, 'duration', fatal=False, group='duration'))
  
          matches = re.finditer(r'''(?xs)
-            <(?:span|div)\s+class='label\s+filetype'>(?P<format>.*?)</(?:span|div)>\s*
+            <(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
+            <(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
              <a\s+download\s+href='(?P<http_url>[^']+)'>\s*
              (?:
                  .*?
-                <a\s+href='(?P<torrent_url>[^']+\.torrent)'
+                <a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
              )?''', webpage)
          formats = []
          for m in matches:
@@ -62,12 +71,15 @@ class CCCIE(InfoExtractor):
              format_id = self._search_regex(
                  r'.*/([a-z0-9_-]+)/[^/]*$',
                  m.group('http_url'), 'format id', default=None)
+            if format_id:
+                format_id = m.group('lang') + '-' + format_id
              vcodec = 'h264' if 'h264' in format_id else (
                  'none' if format_id in ('mp3', 'opus') else None
              )
              formats.append({
                  'format_id': format_id,
                  'format': format,
+                'language': m.group('lang'),
                  'url': m.group('http_url'),
                  'vcodec': vcodec,
                  'preference': preference(format_id),
@@ -95,5 +107,6 @@ class CCCIE(InfoExtractor):
              'thumbnail': thumbnail,
              'view_count': view_count,
              'upload_date': upload_date,
+            'duration': duration,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/cda.py b/youtube_dl/extractor/cda.py

new file mode 100755 (executable)

index 0000000..498d2c0
--- /dev/null
+++ b/youtube_dl/extractor/cda.py
@@ -0,0 +1,96 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    decode_packed_codes,
+    ExtractorError,
+    parse_duration
+)
+
+
+class CDAIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
+    _TESTS = [{
+        'url': 'http://www.cda.pl/video/5749950c',
+        'md5': '6f844bf51b15f31fae165365707ae970',
+        'info_dict': {
+            'id': '5749950c',
+            'ext': 'mp4',
+            'height': 720,
+            'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
+            'duration': 39
+        }
+    }, {
+        'url': 'http://www.cda.pl/video/57413289',
+        'md5': 'a88828770a8310fc00be6c95faf7f4d5',
+        'info_dict': {
+            'id': '57413289',
+            'ext': 'mp4',
+            'title': 'Lądowanie na lotnisku na Maderze',
+            'duration': 137
+        }
+    }, {
+        'url': 'http://ebd.cda.pl/0x0/5749950c',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
+
+        if 'Ten film jest dostępny dla użytkowników premium' in webpage:
+            raise ExtractorError('This video is only available for premium users.', expected=True)
+
+        title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
+
+        formats = []
+
+        info_dict = {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'duration': None,
+        }
+
+        def extract_format(page, version):
+            unpacked = decode_packed_codes(page)
+            format_url = self._search_regex(
+                r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
+            if not format_url:
+                return
+            f = {
+                'url': format_url,
+            }
+            m = re.search(
+                r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
+                page)
+            if m:
+                f.update({
+                    'format_id': m.group('format_id'),
+                    'height': int(m.group('height')),
+                })
+            info_dict['formats'].append(f)
+            if not info_dict['duration']:
+                info_dict['duration'] = parse_duration(self._search_regex(
+                    r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
+
+        extract_format(webpage, 'default')
+
+        for href, resolution in re.findall(
+                r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
+                webpage):
+            webpage = self._download_webpage(
+                href, video_id, 'Downloading %s version information' % resolution, fatal=False)
+            if not webpage:
+                # Manually report warning because empty page is returned when
+                # invalid version is requested.
+                self.report_warning('Unable to download %s version information' % resolution)
+                continue
+            extract_format(webpage, resolution)
+
+        self._sort_formats(formats)
+
+        return info_dict
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index dda583680a03ba3cb420beb74a99af2ec60cbc83..6652c8e42a279f45bdbbc1af3d36ad2500a454eb 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -5,67 +5,93 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
  )
  from ..utils import (
      ExtractorError,
      float_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class CeskaTelevizeIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(.+/)?(?P<id>[^?#]+)'
-
-    _TESTS = [
-        {
-            'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
+    _VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
+    _TESTS = [{
+        'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
+        'info_dict': {
+            'id': '61924494876951776',
+            'ext': 'mp4',
+            'title': 'Hyde Park Civilizace',
+            'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 3350,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.ceskatelevize.cz/ivysilani/10532695142-prvni-republika/bonus/14716-zpevacka-z-duparny-bobina',
+        'info_dict': {
+            'id': '61924494876844374',
+            'ext': 'mp4',
+            'title': 'První republika: Zpěvačka z Dupárny Bobina',
+            'description': 'Sága mapující atmosféru první republiky od r. 1918 do r. 1945.',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 88.4,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        # video with 18+ caution trailer
+        'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
+        'info_dict': {
+            'id': '215562210900007-bogotart',
+            'title': 'Queer: Bogotart',
+            'description': 'Alternativní průvodce současným queer světem',
+        },
+        'playlist': [{
              'info_dict': {
-                'id': '214411058091220',
+                'id': '61924494876844842',
                  'ext': 'mp4',
-                'title': 'Hyde Park Civilizace',
-                'description': 'Věda a současná civilizace. Interaktivní pořad - prostor pro vaše otázky a komentáře',
-                'thumbnail': 're:^https?://.*\.jpg',
-                'duration': 3350,
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
+                'title': 'Queer: Bogotart (Varování 18+)',
+                'duration': 10.2,
              },
-        },
-        {
-            'url': 'http://www.ceskatelevize.cz/ivysilani/10532695142-prvni-republika/bonus/14716-zpevacka-z-duparny-bobina',
+        }, {
              'info_dict': {
-                'id': '14716',
+                'id': '61924494877068022',
                  'ext': 'mp4',
-                'title': 'První republika: Zpěvačka z Dupárny Bobina',
-                'description': 'Sága mapující atmosféru první republiky od r. 1918 do r. 1945.',
+                'title': 'Queer: Bogotart (Queer)',
                  'thumbnail': 're:^https?://.*\.jpg',
-                'duration': 88.4,
-            },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
+                'duration': 1558.3,
              },
+        }],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
-    ]
+    }]
  
      def _real_extract(self, url):
          url = url.replace('/porady/', '/ivysilani/').replace('/video/', '')
  
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        playlist_id = mobj.group('id')
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, playlist_id)
  
          NOT_AVAILABLE_STRING = 'This content is not available at your territory due to limited copyright.'
          if '%s</p>' % NOT_AVAILABLE_STRING in webpage:
              raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
  
-        typ = self._html_search_regex(r'getPlaylistUrl\(\[\{"type":"(.+?)","id":".+?"\}\],', webpage, 'type')
-        episode_id = self._html_search_regex(r'getPlaylistUrl\(\[\{"type":".+?","id":"(.+?)"\}\],', webpage, 'episode_id')
+        typ = self._html_search_regex(
+            r'getPlaylistUrl\(\[\{"type":"(.+?)","id":".+?"\}\],', webpage, 'type')
+        episode_id = self._html_search_regex(
+            r'getPlaylistUrl\(\[\{"type":".+?","id":"(.+?)"\}\],', webpage, 'episode_id')
  
          data = {
              'playlist[0][type]': typ,
@@ -74,51 +100,62 @@ class CeskaTelevizeIE(InfoExtractor):
              'requestSource': 'iVysilani',
          }
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
-            data=compat_urllib_parse.urlencode(data))
+            data=urlencode_postdata(data))
  
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
          req.add_header('x-addr', '127.0.0.1')
          req.add_header('X-Requested-With', 'XMLHttpRequest')
          req.add_header('Referer', url)
  
-        playlistpage = self._download_json(req, video_id)
+        playlistpage = self._download_json(req, playlist_id)
  
          playlist_url = playlistpage['url']
          if playlist_url == 'error_region':
              raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
  
-        req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
+        req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
          req.add_header('Referer', url)
  
-        playlist = self._download_json(req, video_id)
-
-        item = playlist['playlist'][0]
-        formats = []
-        for format_id, stream_url in item['streamUrls'].items():
-            formats.extend(self._extract_m3u8_formats(stream_url, video_id, 'mp4'))
-        self._sort_formats(formats)
-
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        duration = float_or_none(item.get('duration'))
-        thumbnail = item.get('previewImageUrl')
-
-        subtitles = {}
-        subs = item.get('subtitles')
-        if subs:
-            subtitles = self.extract_subtitles(episode_id, subs)
-
-        return {
-            'id': episode_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        playlist_title = self._og_search_title(webpage)
+        playlist_description = self._og_search_description(webpage)
+
+        playlist = self._download_json(req, playlist_id)['playlist']
+        playlist_len = len(playlist)
+
+        entries = []
+        for item in playlist:
+            formats = []
+            for format_id, stream_url in item['streamUrls'].items():
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, playlist_id, 'mp4',
+                    entry_protocol='m3u8_native', fatal=False))
+            self._sort_formats(formats)
+
+            item_id = item.get('id') or item['assetId']
+            title = item['title']
+
+            duration = float_or_none(item.get('duration'))
+            thumbnail = item.get('previewImageUrl')
+
+            subtitles = {}
+            if item.get('type') == 'VOD':
+                subs = item.get('subtitles')
+                if subs:
+                    subtitles = self.extract_subtitles(episode_id, subs)
+
+            entries.append({
+                'id': item_id,
+                'title': playlist_title if playlist_len == 1 else '%s (%s)' % (playlist_title, title),
+                'description': playlist_description if playlist_len == 1 else None,
+                'thumbnail': thumbnail,
+                'duration': duration,
+                'formats': formats,
+                'subtitles': subtitles,
+            })
+
+        return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
  
      def _get_subtitles(self, episode_id, subs):
          original_subtitles = self._download_webpage(
@@ -141,16 +178,16 @@ class CeskaTelevizeIE(InfoExtractor):
              for divider in [1000, 60, 60, 100]:
                  components.append(msec % divider)
                  msec //= divider
-            return "{3:02}:{2:02}:{1:02},{0:03}".format(*components)
+            return '{3:02}:{2:02}:{1:02},{0:03}'.format(*components)
  
          def _fix_subtitle(subtitle):
              for line in subtitle.splitlines():
-                m = re.match(r"^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$", line)
+                m = re.match(r'^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$', line)
                  if m:
                      yield m.group(1)
                      start, stop = (_msectotimecode(int(t)) for t in m.groups()[1:])
-                    yield "{0} --> {1}".format(start, stop)
+                    yield '{0} --> {1}'.format(start, stop)
                  else:
                      yield line
  
-        return "\r\n".join(_fix_subtitle(subtitles))
+        return '\r\n'.join(_fix_subtitle(subtitles))
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

index 3dfc24f5ba447ea92858e89868ad3684caf3a6d2..c74553dcfa7c689b7fc8d69147625b1169e1e178 100644 (file)
--- a/youtube_dl/extractor/channel9.py
+++ b/youtube_dl/extractor/channel9.py
@@ -3,7 +3,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    parse_filesize,
+    qualities,
+)
  
  
  class Channel9IE(InfoExtractor):
@@ -28,7 +32,7 @@ class Channel9IE(InfoExtractor):
                  'title': 'Developer Kick-Off Session: Stuff We Love',
                  'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
                  'duration': 4576,
-                'thumbnail': 'http://video.ch9.ms/ch9/9d51/03902f2d-fc97-4d3c-b195-0bfe15a19d51/KOS002_220.jpg',
+                'thumbnail': 're:http://.*\.jpg',
                  'session_code': 'KOS002',
                  'session_day': 'Day 1',
                  'session_room': 'Arena 1A',
@@ -44,31 +48,29 @@ class Channel9IE(InfoExtractor):
                  'title': 'Self-service BI with Power BI - nuclear testing',
                  'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
                  'duration': 1540,
-                'thumbnail': 'http://video.ch9.ms/ch9/87e1/0300391f-a455-4c72-bec3-4422f19287e1/selfservicenuk_512.jpg',
+                'thumbnail': 're:http://.*\.jpg',
                  'authors': ['Mike Wilmot'],
              },
+        },
+        {
+            # low quality mp4 is best
+            'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
+            'info_dict': {
+                'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
+                'ext': 'mp4',
+                'title': 'Ranges for the Standard Library',
+                'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
+                'duration': 5646,
+                'thumbnail': 're:http://.*\.jpg',
+            },
+            'params': {
+                'skip_download': True,
+            },
          }
      ]
  
      _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
  
-    # Sorted by quality
-    _known_formats = ['MP3', 'MP4', 'Mid Quality WMV', 'Mid Quality MP4', 'High Quality WMV', 'High Quality MP4']
-
-    def _restore_bytes(self, formatted_size):
-        if not formatted_size:
-            return 0
-        m = re.match(r'^(?P<size>\d+(?:\.\d+)?)\s+(?P<units>[a-zA-Z]+)', formatted_size)
-        if not m:
-            return 0
-        units = m.group('units')
-        try:
-            exponent = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'].index(units.upper())
-        except ValueError:
-            return 0
-        size = float(m.group('size'))
-        return int(size * (1024 ** exponent))
-
      def _formats_from_html(self, html):
          FORMAT_REGEX = r'''
              (?x)
@@ -78,16 +80,20 @@ class Channel9IE(InfoExtractor):
              <h3>File\s+size</h3>\s*(?P<filesize>.*?)\s*
              </div>)?                                                # File size part may be missing
          '''
-        # Extract known formats
+        quality = qualities((
+            'MP3', 'MP4',
+            'Low Quality WMV', 'Low Quality MP4',
+            'Mid Quality WMV', 'Mid Quality MP4',
+            'High Quality WMV', 'High Quality MP4'))
          formats = [{
              'url': x.group('url'),
              'format_id': x.group('quality'),
              'format_note': x.group('note'),
              'format': '%s (%s)' % (x.group('quality'), x.group('note')),
-            'filesize': self._restore_bytes(x.group('filesize')),  # File size is approximate
-            'preference': self._known_formats.index(x.group('quality')),
+            'filesize_approx': parse_filesize(x.group('filesize')),
+            'quality': quality(x.group('quality')),
              'vcodec': 'none' if x.group('note') == 'Audio only' else None,
-        } for x in list(re.finditer(FORMAT_REGEX, html)) if x.group('quality') in self._known_formats]
+        } for x in list(re.finditer(FORMAT_REGEX, html))]
  
          self._sort_formats(formats)
  
@@ -158,7 +164,7 @@ class Channel9IE(InfoExtractor):
  
      def _extract_session_day(self, html):
          m = re.search(r'<li class="day">\s*<a href="/Events/[^"]+">(?P<day>[^<]+)</a>\s*</li>', html)
-        return m.group('day') if m is not None else None
+        return m.group('day').strip() if m is not None else None
  
      def _extract_session_room(self, html):
          m = re.search(r'<li class="room">\s*(?P<room>.+?)\s*</li>', html)
@@ -224,12 +230,12 @@ class Channel9IE(InfoExtractor):
          if contents is None:
              return contents
  
-        authors = self._extract_authors(html)
+        if len(contents) > 1:
+            raise ExtractorError('Got more than one entry')
+        result = contents[0]
+        result['authors'] = self._extract_authors(html)
  
-        for content in contents:
-            content['authors'] = authors
-
-        return contents
+        return result
  
      def _extract_session(self, html, content_path):
          contents = self._extract_content(html, content_path)
diff --git a/youtube_dl/extractor/chaturbate.py b/youtube_dl/extractor/chaturbate.py

new file mode 100644 (file)

index 0000000..b223454
--- /dev/null
+++ b/youtube_dl/extractor/chaturbate.py
@@ -0,0 +1,60 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+
+class ChaturbateIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[^/]+\.)?chaturbate\.com/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'https://www.chaturbate.com/siswet19/',
+        'info_dict': {
+            'id': 'siswet19',
+            'ext': 'mp4',
+            'title': 're:^siswet19 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'age_limit': 18,
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://en.chaturbate.com/siswet19/',
+        'only_matching': True,
+    }]
+
+    _ROOM_OFFLINE = 'Room is currently offline'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        m3u8_url = self._search_regex(
+            r'src=(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage,
+            'playlist', default=None, group='url')
+
+        if not m3u8_url:
+            error = self._search_regex(
+                [r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
+                 r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'],
+                webpage, 'error', group='error', default=None)
+            if not error:
+                if any(p not in webpage for p in (
+                        self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')):
+                    error = self._ROOM_OFFLINE
+            if error:
+                raise ExtractorError(error, expected=True)
+            raise ExtractorError('Unable to find stream URL')
+
+        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': self._live_title(video_id),
+            'thumbnail': 'https://cdn-s.highwebmedia.com/uHK3McUtGCG3SMFcd4ZJsRv8/roomimage/%s.jpg' % video_id,
+            'age_limit': self._rta_search(webpage),
+            'is_live': True,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/cinemassacre.py b/youtube_dl/extractor/cinemassacre.py

index c949a481477c187d9433f39e6e851b87c97edf79..042c4f2f13757ab3f0c942932ada8e7a01160055 100644 (file)
--- a/youtube_dl/extractor/cinemassacre.py
+++ b/youtube_dl/extractor/cinemassacre.py
@@ -5,7 +5,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import ExtractorError
-from .bliptv import BlipTVIE
+from .screenwavemedia import ScreenwaveMediaIE
  
  
  class CinemassacreIE(InfoExtractor):
@@ -21,6 +21,10 @@ class CinemassacreIE(InfoExtractor):
                  'title': '“Angry Video Game Nerd: The Movie” – Trailer',
                  'description': 'md5:fb87405fcb42a331742a0dce2708560b',
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          },
          {
              'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
@@ -31,31 +35,34 @@ class CinemassacreIE(InfoExtractor):
                  'upload_date': '20131002',
                  'title': 'The Mummy’s Hand (1940)',
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          },
          {
-            # blip.tv embedded video
+            # Youtube embedded video
              'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
-            'md5': 'ca9b3c8dd5a66f9375daeb5135f5a3de',
+            'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
              'info_dict': {
-                'id': '4065369',
-                'ext': 'flv',
+                'id': 'OEVzPCY2T-g',
+                'ext': 'webm',
                  'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
                  'upload_date': '20061207',
-                'uploader': 'cinemassacre',
-                'uploader_id': '250778',
-                'timestamp': 1283233867,
-                'description': 'md5:0a108c78d130676b207d0f6d029ecffd',
+                'uploader': 'Cinemassacre',
+                'uploader_id': 'JamesNintendoNerd',
+                'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
              }
          },
          {
              # Youtube embedded video
              'url': 'http://cinemassacre.com/2006/09/01/mckids/',
-            'md5': '6eb30961fa795fedc750eac4881ad2e1',
+            'md5': '7393c4e0f54602ad110c793eb7a6513a',
              'info_dict': {
                  'id': 'FnxsNhuikpo',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'upload_date': '20060901',
-                'uploader': 'Cinemassacre Extras',
+                'uploader': 'Cinemassacre Extra',
                  'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
                  'uploader_id': 'Cinemassacre',
                  'title': 'AVGN: McKids',
@@ -70,7 +77,11 @@ class CinemassacreIE(InfoExtractor):
                  'description': 'Let’s Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
                  'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
                  'upload_date': '20150525',
-            }
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          }
      ]
  
@@ -83,12 +94,10 @@ class CinemassacreIE(InfoExtractor):
  
          playerdata_url = self._search_regex(
              [
-                r'src="(http://(?:player2\.screenwavemedia\.com|player\.screenwavemedia\.com/play)/[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
-                r'<iframe[^>]+src="((?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
+                ScreenwaveMediaIE.EMBED_PATTERN,
+                r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
              ],
-            webpage, 'player data URL', default=None)
-        if not playerdata_url:
-            playerdata_url = BlipTVIE._extract_url(webpage)
+            webpage, 'player data URL', default=None, group='url')
          if not playerdata_url:
              raise ExtractorError('Unable to find player data')
  
diff --git a/youtube_dl/extractor/clipfish.py b/youtube_dl/extractor/clipfish.py

index a5c3cb7c6253776062fb5834d0fe4c121dfb9c99..3a47f6fa4e1cdf734670ff64abb9aa4c02c94a6e 100644 (file)
--- a/youtube_dl/extractor/clipfish.py
+++ b/youtube_dl/extractor/clipfish.py
@@ -1,53 +1,62 @@
  from __future__ import unicode_literals
  
-import re
-import time
-import xml.etree.ElementTree
-
  from .common import InfoExtractor
  from ..utils import (
-    ExtractorError,
-    parse_duration,
+    int_or_none,
+    unified_strdate,
  )
  
  
  class ClipfishIE(InfoExtractor):
-    IE_NAME = 'clipfish'
-
-    _VALID_URL = r'^https?://(?:www\.)?clipfish\.de/.*?/video/(?P<id>[0-9]+)/'
+    _VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
-        'md5': '2521cd644e862936cf2e698206e47385',
+        'md5': '79bc922f3e8a9097b3d68a93780fd475',
          'info_dict': {
              'id': '3966754',
              'ext': 'mp4',
              'title': 'FIFA 14 - E3 2013 Trailer',
+            'description': 'Video zu FIFA 14: E3 2013 Trailer',
+            'upload_date': '20130611',
              'duration': 82,
-        },
-        'skip': 'Blocked in the US'
+            'view_count': int,
+        }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group(1)
-
-        info_url = ('http://www.clipfish.de/devxml/videoinfo/%s?ts=%d' %
-                    (video_id, int(time.time())))
-        doc = self._download_xml(
-            info_url, video_id, note='Downloading info page')
-        title = doc.find('title').text
-        video_url = doc.find('filename').text
-        if video_url is None:
-            xml_bytes = xml.etree.ElementTree.tostring(doc)
-            raise ExtractorError('Cannot find video URL in document %r' %
-                                 xml_bytes)
-        thumbnail = doc.find('imageurl').text
-        duration = parse_duration(doc.find('duration').text)
+        video_id = self._match_id(url)
+
+        video_info = self._download_json(
+            'http://www.clipfish.de/devapi/id/%s?format=json&apikey=hbbtv' % video_id,
+            video_id)['items'][0]
+
+        formats = []
+
+        m3u8_url = video_info.get('media_videourl_hls')
+        if m3u8_url:
+            formats.append({
+                'url': m3u8_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
+                'ext': 'mp4',
+                'format_id': 'hls',
+            })
+
+        mp4_url = video_info.get('media_videourl')
+        if mp4_url:
+            formats.append({
+                'url': mp4_url,
+                'format_id': 'mp4',
+                'width': int_or_none(video_info.get('width')),
+                'height': int_or_none(video_info.get('height')),
+                'tbr': int_or_none(video_info.get('bitrate')),
+            })
  
          return {
              'id': video_id,
-            'title': title,
-            'url': video_url,
-            'thumbnail': thumbnail,
-            'duration': duration,
+            'title': video_info['title'],
+            'description': video_info.get('descr'),
+            'formats': formats,
+            'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
+            'duration': int_or_none(video_info.get('media_length')),
+            'upload_date': unified_strdate(video_info.get('pubDate')),
+            'view_count': int_or_none(video_info.get('media_views'))
          }
diff --git a/youtube_dl/extractor/cliphunter.py b/youtube_dl/extractor/cliphunter.py

index d46592cc5c8c71d30fda96c1b25c6f4a9c55ad75..19f8b397e44a679ea936ad638048ccb488dc4b93 100644 (file)
--- a/youtube_dl/extractor/cliphunter.py
+++ b/youtube_dl/extractor/cliphunter.py
@@ -1,7 +1,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import determine_ext
+from ..utils import int_or_none
  
  
  _translation_table = {
@@ -19,7 +19,7 @@ def _decode(s):
  class CliphunterIE(InfoExtractor):
      IE_NAME = 'cliphunter'
  
-    _VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/
+    _VALID_URL = r'''(?x)https?://(?:www\.)?cliphunter\.com/w/
          (?P<id>[0-9]+)/
          (?P<seo>.+?)(?:$|[#\?])
      '''
@@ -42,31 +42,26 @@ class CliphunterIE(InfoExtractor):
          video_title = self._search_regex(
              r'mediaTitle = "([^"]+)"', webpage, 'title')
  
-        fmts = {}
-        for fmt in ('mp4', 'flv'):
-            fmt_list = self._parse_json(self._search_regex(
-                r'var %sjson\s*=\s*(\[.*?\]);' % fmt, webpage, '%s formats' % fmt), video_id)
-            for f in fmt_list:
-                fmts[f['fname']] = _decode(f['sUrl'])
-
-        qualities = self._parse_json(self._search_regex(
-            r'var player_btns\s*=\s*(.*?);\n', webpage, 'quality info'), video_id)
+        gexo_files = self._parse_json(
+            self._search_regex(
+                r'var\s+gexoFiles\s*=\s*({.+?});', webpage, 'gexo files'),
+            video_id)
  
          formats = []
-        for fname, url in fmts.items():
-            f = {
-                'url': url,
-            }
-            if fname in qualities:
-                qual = qualities[fname]
-                f.update({
-                    'format_id': '%s_%sp' % (determine_ext(url), qual['h']),
-                    'width': qual['w'],
-                    'height': qual['h'],
-                    'tbr': qual['br'],
-                })
-            formats.append(f)
-
+        for format_id, f in gexo_files.items():
+            video_url = f.get('url')
+            if not video_url:
+                continue
+            fmt = f.get('fmt')
+            height = f.get('h')
+            format_id = '%s_%sp' % (fmt, height) if fmt and height else format_id
+            formats.append({
+                'url': _decode(video_url),
+                'format_id': format_id,
+                'width': int_or_none(f.get('w')),
+                'height': int_or_none(height),
+                'tbr': int_or_none(f.get('br')),
+            })
          self._sort_formats(formats)
  
          thumbnail = self._search_regex(
diff --git a/youtube_dl/extractor/cliprs.py b/youtube_dl/extractor/cliprs.py

new file mode 100644 (file)

index 0000000..4f9320e
--- /dev/null
+++ b/youtube_dl/extractor/cliprs.py
@@ -0,0 +1,90 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class ClipRsIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
+    _TEST = {
+        'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
+        'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
+        'info_dict': {
+            'id': '1488842.1399140381',
+            'ext': 'mp4',
+            'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
+            'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
+            'duration': 229,
+            'timestamp': 1459850243,
+            'upload_date': '20160405',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_id = self._search_regex(
+            r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
+
+        response = self._download_json(
+            'http://qi.ckm.onetapi.pl/', video_id,
+            query={
+                'body[id]': video_id,
+                'body[jsonrpc]': '2.0',
+                'body[method]': 'get_asset_detail',
+                'body[params][ID_Publikacji]': video_id,
+                'body[params][Service]': 'www.onet.pl',
+                'content-type': 'application/jsonp',
+                'x-onet-app': 'player.front.onetapi.pl',
+            })
+
+        error = response.get('error')
+        if error:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, error['message']), expected=True)
+
+        video = response['result'].get('0')
+
+        formats = []
+        for _, formats_dict in video['formats'].items():
+            if not isinstance(formats_dict, dict):
+                continue
+            for format_id, format_list in formats_dict.items():
+                if not isinstance(format_list, list):
+                    continue
+                for f in format_list:
+                    if not f.get('url'):
+                        continue
+                    formats.append({
+                        'url': f['url'],
+                        'format_id': format_id,
+                        'height': int_or_none(f.get('vertical_resolution')),
+                        'width': int_or_none(f.get('horizontal_resolution')),
+                        'abr': float_or_none(f.get('audio_bitrate')),
+                        'vbr': float_or_none(f.get('video_bitrate')),
+                    })
+        self._sort_formats(formats)
+
+        meta = video.get('meta', {})
+
+        title = self._og_search_title(webpage, default=None) or meta['title']
+        description = self._og_search_description(webpage, default=None) or meta.get('description')
+        duration = meta.get('length') or meta.get('lenght')
+        timestamp = parse_iso8601(meta.get('addDate'), ' ')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': timestamp,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/clipsyndicate.py b/youtube_dl/extractor/clipsyndicate.py

index 8306d6fb7d0d4414cff36f7b381ca9c877820f58..0b6ad895fd7841e70b7dc0dd136052ff0459dd3c 100644 (file)
--- a/youtube_dl/extractor/clipsyndicate.py
+++ b/youtube_dl/extractor/clipsyndicate.py
@@ -8,7 +8,7 @@ from ..utils import (
  
  
  class ClipsyndicateIE(InfoExtractor):
-    _VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',
diff --git a/youtube_dl/extractor/cloudy.py b/youtube_dl/extractor/cloudy.py

index 0fa720ee8745cfc728b4413b41888e17787fb5db..9e267e6c0260e0391ff04b61c613a2fb6d916313 100644 (file)
--- a/youtube_dl/extractor/cloudy.py
+++ b/youtube_dl/extractor/cloudy.py
@@ -6,7 +6,7 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_parse_qs,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_HTTPError,
  )
  from ..utils import (
@@ -64,7 +64,7 @@ class CloudyIE(InfoExtractor):
                  'errorUrl': error_url,
              })
  
-        data_url = self._API_URL % (video_host, compat_urllib_parse.urlencode(form))
+        data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
          player_data = self._download_webpage(
              data_url, video_id, 'Downloading player data')
          data = compat_parse_qs(player_data)
diff --git a/youtube_dl/extractor/clubic.py b/youtube_dl/extractor/clubic.py

index 14f215c5c27e3001414e512e7a1bf06f725cd8b0..2fba93543474cd7ebd53848aca62848c32bf7164 100644 (file)
--- a/youtube_dl/extractor/clubic.py
+++ b/youtube_dl/extractor/clubic.py
@@ -12,9 +12,9 @@ from ..utils import (
  
  
  class ClubicIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?clubic\.com/video/[^/]+/video.*-(?P<id>[0-9]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html',
          'md5': '1592b694ba586036efac1776b0b43cd3',
          'info_dict': {
@@ -24,7 +24,10 @@ class ClubicIE(InfoExtractor):
              'description': 're:Gueule de bois chez Nokia. Le constructeur a indiqué cette.*',
              'thumbnail': 're:^http://img\.clubic\.com/.*\.jpg$',
          }
-    }
+    }, {
+        'url': 'http://www.clubic.com/video/video-clubic-week-2-0-apple-iphone-6s-et-plus-mais-surtout-le-pencil-469792.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/clyp.py b/youtube_dl/extractor/clyp.py

new file mode 100644 (file)

index 0000000..57e6437
--- /dev/null
+++ b/youtube_dl/extractor/clyp.py
@@ -0,0 +1,57 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    parse_iso8601,
+)
+
+
+class ClypIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?clyp\.it/(?P<id>[a-z0-9]+)'
+    _TEST = {
+        'url': 'https://clyp.it/ojz2wfah',
+        'md5': '1d4961036c41247ecfdcc439c0cddcbb',
+        'info_dict': {
+            'id': 'ojz2wfah',
+            'ext': 'mp3',
+            'title': 'Krisson80 - bits wip wip',
+            'description': '#Krisson80BitsWipWip #chiptune\n#wip',
+            'duration': 263.21,
+            'timestamp': 1443515251,
+            'upload_date': '20150929',
+        },
+    }
+
+    def _real_extract(self, url):
+        audio_id = self._match_id(url)
+
+        metadata = self._download_json(
+            'https://api.clyp.it/%s' % audio_id, audio_id)
+
+        formats = []
+        for secure in ('', 'Secure'):
+            for ext in ('Ogg', 'Mp3'):
+                format_id = '%s%s' % (secure, ext)
+                format_url = metadata.get('%sUrl' % format_id)
+                if format_url:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                        'vcodec': 'none',
+                    })
+        self._sort_formats(formats)
+
+        title = metadata['Title']
+        description = metadata.get('Description')
+        duration = float_or_none(metadata.get('Duration'))
+        timestamp = parse_iso8601(metadata.get('DateCreated'))
+
+        return {
+            'id': audio_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': timestamp,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/cmt.py b/youtube_dl/extractor/cmt.py

index e96c59f718a5dc412a2ce7eaa962d6bdca98e187..f1311b14f8f1c572c9647bcd36de3f19de676373 100644 (file)
--- a/youtube_dl/extractor/cmt.py
+++ b/youtube_dl/extractor/cmt.py
@@ -4,7 +4,7 @@ from .mtv import MTVIE
  
  class CMTIE(MTVIE):
      IE_NAME = 'cmt.com'
-    _VALID_URL = r'https?://www\.cmt\.com/videos/.+?/(?P<videoid>[^/]+)\.jhtml'
+    _VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
      _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
  
      _TESTS = [{
@@ -16,4 +16,7 @@ class CMTIE(MTVIE):
              'title': 'Garth Brooks - "The Call (featuring Trisha Yearwood)"',
              'description': 'Blame It All On My Roots',
          },
+    }, {
+        'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
+        'only_matching': True,
      }]
diff --git a/youtube_dl/extractor/cnbc.py b/youtube_dl/extractor/cnbc.py

new file mode 100644 (file)

index 0000000..d354d9f
--- /dev/null
+++ b/youtube_dl/extractor/cnbc.py
@@ -0,0 +1,36 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import smuggle_url
+
+
+class CNBCIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://video.cnbc.com/gallery/?video=3000503714',
+        'info_dict': {
+            'id': '3000503714',
+            'ext': 'mp4',
+            'title': 'Fighting zombies is big business',
+            'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
+            'timestamp': 1459332000,
+            'upload_date': '20160330',
+            'uploader': 'NBCU-CNBC',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
+                {'force_smil_url': True}),
+            'id': video_id,
+        }
diff --git a/youtube_dl/extractor/cnet.py b/youtube_dl/extractor/cnet.py

deleted file mode 100644 (file)

index 5dd69bf..0000000
--- a/youtube_dl/extractor/cnet.py
+++ /dev/null
@@ -1,85 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import json
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-)
-
-
-class CNETIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
-    _TESTS = [{
-        'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
-        'info_dict': {
-            'id': '56f4ea68-bd21-4852-b08c-4de5b8354c60',
-            'ext': 'flv',
-            'title': 'Hands-on with Microsoft Windows 8.1 Update',
-            'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
-            'thumbnail': 're:^http://.*/flmswindows8.jpg$',
-            'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
-            'uploader': 'Sarah Mitroff',
-        },
-        'params': {
-            'skip_download': 'requires rtmpdump',
-        }
-    }, {
-        'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
-        'info_dict': {
-            'id': '56527b93-d25d-44e3-b738-f989ce2e49ba',
-            'ext': 'flv',
-            'description': 'Khail and Ashley wonder what other civic woes can be solved by self-tweeting objects, investigate a new kind of VR camera and watch an origami robot self-assemble, walk, climb, dig and dissolve. #TDPothole',
-            'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
-            'uploader': 'Ashley Esqueda',
-            'title': 'Whiny potholes tweet at local government when hit by cars (Tomorrow Daily 187)',
-        },
-        'params': {
-            'skip_download': True,  # requires rtmpdump
-        },
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        data_json = self._html_search_regex(
-            r"<div class=\"cnetVideoPlayer\"\s+.*?data-cnet-video-options='([^']+)'",
-            webpage, 'data json')
-        data = json.loads(data_json)
-        vdata = data['video']
-        if not vdata:
-            vdata = data['videos'][0]
-        if not vdata:
-            raise ExtractorError('Cannot find video data')
-
-        mpx_account = data['config']['players']['default']['mpx_account']
-        vid = vdata['files'].get('rtmp', vdata['files']['hds'])
-        tp_link = 'http://link.theplatform.com/s/%s/%s' % (mpx_account, vid)
-
-        video_id = vdata['id']
-        title = vdata.get('headline')
-        if title is None:
-            title = vdata.get('title')
-        if title is None:
-            raise ExtractorError('Cannot find title!')
-        thumbnail = vdata.get('image', {}).get('path')
-        author = vdata.get('author')
-        if author:
-            uploader = '%s %s' % (author['firstName'], author['lastName'])
-            uploader_id = author.get('id')
-        else:
-            uploader = None
-            uploader_id = None
-
-        return {
-            '_type': 'url_transparent',
-            'url': tp_link,
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/cnn.py b/youtube_dl/extractor/cnn.py

index 3b1bd4033fd1c01986c83ab44cc1cebaa1b19e5b..53489a14e38399680c8338f4f22a521f7fa6ad45 100644 (file)
--- a/youtube_dl/extractor/cnn.py
+++ b/youtube_dl/extractor/cnn.py
@@ -26,14 +26,14 @@ class CNNIE(InfoExtractor):
              'upload_date': '20130609',
          },
      }, {
-        "url": "http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
-        "md5": "b5cc60c60a3477d185af8f19a2a26f4e",
-        "info_dict": {
+        'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
+        'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
+        'info_dict': {
              'id': 'us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology',
              'ext': 'mp4',
-            "title": "Student's epic speech stuns new freshmen",
-            "description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
-            "upload_date": "20130821",
+            'title': "Student's epic speech stuns new freshmen",
+            'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
+            'upload_date': '20130821',
          }
      }, {
          'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',
diff --git a/youtube_dl/extractor/collegerama.py b/youtube_dl/extractor/collegerama.py

index fedd48490c4ef1169d9620c7b3a7436c6a9f6d1c..f9e84193d95a8ebd2e49331a34c91b04ad95c649 100644 (file)
--- a/youtube_dl/extractor/collegerama.py
+++ b/youtube_dl/extractor/collegerama.py
@@ -3,10 +3,10 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      float_or_none,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -46,13 +46,13 @@ class CollegeRamaIE(InfoExtractor):
          video_id = self._match_id(url)
  
          player_options_request = {
-            "getPlayerOptionsRequest": {
-                "ResourceId": video_id,
-                "QueryString": "",
+            'getPlayerOptionsRequest': {
+                'ResourceId': video_id,
+                'QueryString': '',
              }
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
              json.dumps(player_options_request))
          request.add_header('Content-Type', 'application/json')
diff --git a/youtube_dl/extractor/comcarcoff.py b/youtube_dl/extractor/comcarcoff.py

index 81f3d7697b843d3d9abb23fbc047b8230a69b351..747c245c844171958637213b37daec3dd03f3a7e 100644 (file)
--- a/youtube_dl/extractor/comcarcoff.py
+++ b/youtube_dl/extractor/comcarcoff.py
@@ -1,24 +1,27 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import json
-
  from .common import InfoExtractor
-from ..utils import parse_iso8601
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+)
  
  
  class ComCarCoffIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
+    _VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
      _TESTS = [{
          'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
          'info_dict': {
-            'id': 'miranda-sings-happy-thanksgiving-miranda',
+            'id': '2494164',
              'ext': 'mp4',
              'upload_date': '20141127',
              'timestamp': 1417107600,
+            'duration': 1232,
              'title': 'Happy Thanksgiving Miranda',
              'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
-            'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
          },
          'params': {
              'skip_download': 'requires ffmpeg',
@@ -31,27 +34,41 @@ class ComCarCoffIE(InfoExtractor):
              display_id = 'comediansincarsgettingcoffee.com'
          webpage = self._download_webpage(url, display_id)
  
-        full_data = json.loads(self._search_regex(
-            r'<script type="application/json" id="videoData">(?P<json>.+?)</script>',
-            webpage, 'full data json'))
+        full_data = self._parse_json(
+            self._search_regex(
+                r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
+            display_id)['videoData']
+
+        display_id = full_data['activeVideo']['video']
+        video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
+
+        video_id = compat_str(video_data['mediaId'])
+        title = video_data['title']
+        formats = self._extract_m3u8_formats(
+            video_data['mediaUrl'], video_id, 'mp4')
+        self._sort_formats(formats)
  
-        video_id = full_data['activeVideo']['video']
-        video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
          thumbnails = [{
              'url': video_data['images']['thumb'],
          }, {
              'url': video_data['images']['poster'],
          }]
-        formats = self._extract_m3u8_formats(
-            video_data['mediaUrl'], video_id, ext='mp4')
+
+        timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
+            video_data.get('pubDate'))
+        duration = int_or_none(video_data.get('durationSeconds')) or parse_duration(
+            video_data.get('duration'))
  
          return {
              'id': video_id,
              'display_id': display_id,
-            'title': video_data['title'],
+            'title': title,
              'description': video_data.get('description'),
-            'timestamp': parse_iso8601(video_data.get('pubDate')),
+            'timestamp': timestamp,
+            'duration': duration,
              'thumbnails': thumbnails,
              'formats': formats,
+            'season_number': int_or_none(video_data.get('season')),
+            'episode_number': int_or_none(video_data.get('episode')),
              'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
          }
diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py

index 91ebb0ce57136dc0076927acdca4e250774746e1..0c59102e072594857cc0f1c53e15c183b1885a93 100644 (file)
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -5,7 +5,7 @@ import re
  from .mtv import MTVServicesInfoExtractor
  from ..compat import (
      compat_str,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
  )
  from ..utils import (
      ExtractorError,
@@ -16,11 +16,11 @@ from ..utils import (
  
  class ComedyCentralIE(MTVServicesInfoExtractor):
      _VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
-        (video-clips|episodes|cc-studios|video-collections|full-episodes)
+        (video-clips|episodes|cc-studios|video-collections|full-episodes|shows)
          /(?P<title>.*)'''
      _FEED_URL = 'http://comedycentral.com/feeds/mrss/'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
          'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
          'info_dict': {
@@ -29,7 +29,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
              'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother',
              'description': 'After a certain point, breastfeeding becomes c**kblocking.',
          },
-    }
+    }, {
+        'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
+        'only_matching': True,
+    }]
  
  
  class ComedyCentralShowsIE(MTVServicesInfoExtractor):
@@ -151,12 +154,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
          mobj = re.match(self._VALID_URL, url)
  
          if mobj.group('shortname'):
-            if mobj.group('shortname') in ('tds', 'thedailyshow'):
-                url = 'http://thedailyshow.cc.com/full-episodes/'
-            else:
-                url = 'http://thecolbertreport.cc.com/full-episodes/'
-            mobj = re.match(self._VALID_URL, url, re.VERBOSE)
-            assert mobj is not None
+            return self.url_result('http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes')
  
          if mobj.group('clip'):
              if mobj.group('videotitle'):
@@ -197,13 +195,13 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
              if len(altMovieParams) == 0:
                  raise ExtractorError('unable to find Flash URL in webpage ' + url)
              else:
-                mMovieParams = [("http://media.mtvnservices.com/" + altMovieParams[0], altMovieParams[0])]
+                mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])]
  
          uri = mMovieParams[0][1]
          # Correct cc.com in uri
          uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
  
-        index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse.urlencode({'uri': uri}))
+        index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
          idoc = self._download_xml(
              index_url, epTitle,
              'Downloading show index', 'Unable to download episode index')
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index e3d1dd076364261dcd2e8860f281a48dfc2b39cc..a285ee7d898e0777005ff625b5fcc44f1013d683 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -10,19 +10,22 @@ import re
  import socket
  import sys
  import time
-import xml.etree.ElementTree
+import math
  
  from ..compat import (
      compat_cookiejar,
      compat_cookies,
-    compat_HTTPError,
+    compat_etree_fromstring,
+    compat_getpass,
      compat_http_client,
+    compat_os_name,
+    compat_str,
      compat_urllib_error,
-    compat_urllib_parse_urlparse,
+    compat_urllib_parse_urlencode,
      compat_urllib_request,
      compat_urlparse,
-    compat_str,
  )
+from ..downloader.f4m import remove_encrypted_media
  from ..utils import (
      NO_DEFAULT,
      age_restricted,
@@ -30,13 +33,25 @@ from ..utils import (
      clean_html,
      compiled_regex_type,
      determine_ext,
+    error_to_compat_str,
      ExtractorError,
      fix_xml_ampersands,
      float_or_none,
      int_or_none,
+    parse_iso8601,
      RegexNotFoundError,
      sanitize_filename,
+    sanitized_Request,
      unescapeHTML,
+    unified_strdate,
+    url_basename,
+    xpath_text,
+    xpath_with_ns,
+    determine_protocol,
+    parse_duration,
+    mimetype2ext,
+    update_Request,
+    update_url_query,
  )
  
  
@@ -94,7 +109,7 @@ class InfoExtractor(object):
                      * protocol   The protocol that will be used for the actual
                                   download, lower-case.
                                   "http", "https", "rtsp", "rtmp", "rtmpe",
-                                 "m3u8", or "m3u8_native".
+                                 "m3u8", "m3u8_native" or "http_dash_segments".
                      * preference Order number of this format. If this field is
                                   present and not None, the formats get sorted
                                   by this field, regardless of all other values.
@@ -102,8 +117,9 @@ class InfoExtractor(object):
                                   -2 or smaller for less than default.
                                   < -1000 to hide the format (if there is
                                      another one which is strictly better)
-                    * language_preference  Is this in the correct requested
-                                 language?
+                    * language   Language code, e.g. "de" or "en-US".
+                    * language_preference  Is this in the language mentioned in
+                                 the URL?
                                   10 if it's what the URL is about,
                                   -1 for default (don't know),
                                   -10 otherwise, other values reserved for now.
@@ -146,11 +162,14 @@ class InfoExtractor(object):
      thumbnail:      Full URL to a video thumbnail image.
      description:    Full video description.
      uploader:       Full name of the video uploader.
+    license:        License name the video is licensed under.
      creator:        The main artist who created the video.
+    release_date:   The date (YYYYMMDD) when the video was released.
      timestamp:      UNIX timestamp of the moment the video became available.
      upload_date:    Video upload date (YYYYMMDD).
                      If not explicitly set, calculated from timestamp.
      uploader_id:    Nickname or id of the video uploader.
+    uploader_url:   Full URL to a personal webpage of the video uploader.
      location:       Physical location where the video was filmed.
      subtitles:      The available subtitles as a dictionary in the format
                      {language: subformats}. "subformats" is a list sorted from
@@ -158,12 +177,14 @@ class InfoExtractor(object):
                      with the "ext" entry and one of:
                          * "data": The subtitles file contents
                          * "url": A URL pointing to the subtitles file
+                    "ext" will be calculated from URL if missing
      automatic_captions: Like 'subtitles', used by the YoutubeIE for
                      automatically generated captions
-    duration:       Length of the video in seconds, as an integer.
+    duration:       Length of the video in seconds, as an integer or float.
      view_count:     How many users have watched the video on the platform.
      like_count:     Number of positive ratings of the video
      dislike_count:  Number of negative ratings of the video
+    repost_count:   Number of reposts of the video
      average_rating: Average rating give by users, the scale used depends on the webpage
      comment_count:  Number of comments on the video
      comments:       A list of comments, each with one or more of the following
@@ -191,6 +212,44 @@ class InfoExtractor(object):
      end_time:       Time in seconds where the reproduction should end, as
                      specified in the URL.
  
+    The following fields should only be used when the video belongs to some logical
+    chapter or section:
+
+    chapter:        Name or title of the chapter the video belongs to.
+    chapter_number: Number of the chapter the video belongs to, as an integer.
+    chapter_id:     Id of the chapter the video belongs to, as a unicode string.
+
+    The following fields should only be used when the video is an episode of some
+    series or programme:
+
+    series:         Title of the series or programme the video episode belongs to.
+    season:         Title of the season the video episode belongs to.
+    season_number:  Number of the season the video episode belongs to, as an integer.
+    season_id:      Id of the season the video episode belongs to, as a unicode string.
+    episode:        Title of the video episode. Unlike mandatory video title field,
+                    this field should denote the exact title of the video episode
+                    without any kind of decoration.
+    episode_number: Number of the video episode within a season, as an integer.
+    episode_id:     Id of the video episode, as a unicode string.
+
+    The following fields should only be used when the media is a track or a part of
+    a music album:
+
+    track:          Title of the track.
+    track_number:   Number of the track within an album or a disc, as an integer.
+    track_id:       Id of the track (useful in case of custom indexing, e.g. 6.iii),
+                    as a unicode string.
+    artist:         Artist(s) of the track.
+    genre:          Genre(s) of the track.
+    album:          Title of the album the track belongs to.
+    album_type:     Type of the album (e.g. "Demo", "Full-length", "Split", "Compilation", etc).
+    album_artist:   List of all artists appeared on the album (e.g.
+                    "Ash Borer / Fell Voices" or "Various Artists", useful for splits
+                    and compilations).
+    disc_number:    Number of the disc or other physical medium the track belongs to,
+                    as an integer.
+    release_year:   Year (YYYY) when the album was released.
+
      Unless mentioned otherwise, the fields should be Unicode strings.
  
      Unless mentioned otherwise, None is equivalent to absence of information.
@@ -200,8 +259,8 @@ class InfoExtractor(object):
      There must be a key "entries", which is a list, an iterable, or a PagedList
      object, each element of which is a valid dictionary by this specification.
  
-    Additionally, playlists can have "title" and "id" attributes with the same
-    semantics as videos (see above).
+    Additionally, playlists can have "title", "description" and "id" attributes
+    with the same semantics as videos (see above).
  
  
      _type "multi_video" indicates that there are multiple videos that
@@ -283,9 +342,9 @@ class InfoExtractor(object):
          except ExtractorError:
              raise
          except compat_http_client.IncompleteRead as e:
-            raise ExtractorError('A network error has occured.', cause=e, expected=True)
+            raise ExtractorError('A network error has occurred.', cause=e, expected=True)
          except (KeyError, StopIteration) as e:
-            raise ExtractorError('An extractor error has occured.', cause=e)
+            raise ExtractorError('An extractor error has occurred.', cause=e)
  
      def set_downloader(self, downloader):
          """Sets the downloader for this IE."""
@@ -302,13 +361,13 @@ class InfoExtractor(object):
      @classmethod
      def ie_key(cls):
          """A string for getting the InfoExtractor with get_info_extractor"""
-        return cls.__name__[:-2]
+        return compat_str(cls.__name__[:-2])
  
      @property
      def IE_NAME(self):
-        return type(self).__name__[:-2]
+        return compat_str(type(self).__name__[:-2])
  
-    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
+    def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
          """ Returns the response handle """
          if note is None:
              self.report_download_webpage(video_id)
@@ -317,6 +376,14 @@ class InfoExtractor(object):
                  self.to_screen('%s' % (note,))
              else:
                  self.to_screen('%s: %s' % (video_id, note))
+        if isinstance(url_or_request, compat_urllib_request.Request):
+            url_or_request = update_Request(
+                url_or_request, data=data, headers=headers, query=query)
+        else:
+            if query:
+                url_or_request = update_url_query(url_or_request, query)
+            if data is not None or headers:
+                url_or_request = sanitized_Request(url_or_request, data, headers)
          try:
              return self._downloader.urlopen(url_or_request)
          except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@@ -324,20 +391,21 @@ class InfoExtractor(object):
                  return False
              if errnote is None:
                  errnote = 'Unable to download webpage'
-            errmsg = '%s: %s' % (errnote, compat_str(err))
+
+            errmsg = '%s: %s' % (errnote, error_to_compat_str(err))
              if fatal:
                  raise ExtractorError(errmsg, sys.exc_info()[2], cause=err)
              else:
                  self._downloader.report_warning(errmsg)
                  return False
  
-    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None):
+    def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
          """ Returns a tuple (page content as string, URL handle) """
          # Strip hashes from the URL (#1038)
          if isinstance(url_or_request, (compat_str, str)):
              url_or_request = url_or_request.partition('#')[0]
  
-        urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal)
+        urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
          if urlh is False:
              assert not fatal
              return False
@@ -390,7 +458,7 @@ class InfoExtractor(object):
              self.to_screen('Saving request to ' + filename)
              # Working around MAX_PATH limitation on Windows (see
              # http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
-            if os.name == 'nt':
+            if compat_os_name == 'nt':
                  absfilepath = os.path.abspath(filename)
                  if len(absfilepath) > 259:
                      filename = '\\\\?\\' + absfilepath
@@ -424,13 +492,13 @@ class InfoExtractor(object):
  
          return content
  
-    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None):
+    def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
          """ Returns the data of the page as a string """
          success = False
          try_count = 0
          while success is False:
              try:
-                res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding)
+                res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding, data=data, headers=headers, query=query)
                  success = True
              except compat_http_client.IncompleteRead as e:
                  try_count += 1
@@ -445,24 +513,24 @@ class InfoExtractor(object):
  
      def _download_xml(self, url_or_request, video_id,
                        note='Downloading XML', errnote='Unable to download XML',
-                      transform_source=None, fatal=True, encoding=None):
+                      transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
          """Return the xml as an xml.etree.ElementTree.Element"""
          xml_string = self._download_webpage(
-            url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding)
+            url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
          if xml_string is False:
              return xml_string
          if transform_source:
              xml_string = transform_source(xml_string)
-        return xml.etree.ElementTree.fromstring(xml_string.encode('utf-8'))
+        return compat_etree_fromstring(xml_string.encode('utf-8'))
  
      def _download_json(self, url_or_request, video_id,
                         note='Downloading JSON metadata',
                         errnote='Unable to download JSON metadata',
                         transform_source=None,
-                       fatal=True, encoding=None):
+                       fatal=True, encoding=None, data=None, headers={}, query={}):
          json_string = self._download_webpage(
              url_or_request, video_id, note, errnote, fatal=fatal,
-            encoding=encoding)
+            encoding=encoding, data=data, headers=headers, query=query)
          if (not fatal) and json_string is False:
              return None
          return self._parse_json(
@@ -505,6 +573,18 @@ class InfoExtractor(object):
          """Report attempt to log in."""
          self.to_screen('Logging in')
  
+    @staticmethod
+    def raise_login_required(msg='This video is only available for registered users'):
+        raise ExtractorError(
+            '%s. Use --username and --password or --netrc to provide account credentials.' % msg,
+            expected=True)
+
+    @staticmethod
+    def raise_geo_restricted(msg='This video is not available from your location due to geo restriction'):
+        raise ExtractorError(
+            '%s. You might want to use --proxy to workaround.' % msg,
+            expected=True)
+
      # Methods for following #608
      @staticmethod
      def url_result(url, ie=None, video_id=None, video_title=None):
@@ -547,7 +627,7 @@ class InfoExtractor(object):
                  if mobj:
                      break
  
-        if not self._downloader.params.get('no_color') and os.name != 'nt' and sys.stderr.isatty():
+        if not self._downloader.params.get('no_color') and compat_os_name != 'nt' and sys.stderr.isatty():
              _name = '\033[0;34m%s\033[0m' % name
          else:
              _name = name
@@ -590,7 +670,7 @@ class InfoExtractor(object):
          downloader_params = self._downloader.params
  
          # Attempt to use provided username and password or .netrc data
-        if downloader_params.get('username', None) is not None:
+        if downloader_params.get('username') is not None:
              username = downloader_params['username']
              password = downloader_params['password']
          elif downloader_params.get('usenetrc', False):
@@ -602,11 +682,11 @@ class InfoExtractor(object):
                  else:
                      raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
              except (IOError, netrc.NetrcParseError) as err:
-                self._downloader.report_warning('parsing .netrc: %s' % compat_str(err))
+                self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err))
  
          return (username, password)
  
-    def _get_tfa_info(self):
+    def _get_tfa_info(self, note='two-factor verification code'):
          """
          Get the two-factor authentication info
          TODO - asking the user will be required for sms/phone verify
@@ -617,16 +697,17 @@ class InfoExtractor(object):
              return None
          downloader_params = self._downloader.params
  
-        if downloader_params.get('twofactor', None) is not None:
+        if downloader_params.get('twofactor') is not None:
              return downloader_params['twofactor']
  
-        return None
+        return compat_getpass('Type %s and press [Return]: ' % note)
  
      # Helper functions for extracting OpenGraph info
      @staticmethod
      def _og_regexes(prop):
-        content_re = r'content=(?:"([^>]+?)"|\'([^>]+?)\')'
-        property_re = r'(?:name|property)=[\'"]og:%s[\'"]' % re.escape(prop)
+        content_re = r'content=(?:"([^"]+?)"|\'([^\']+?)\'|\s*([^\s"\'=<>`]+?))'
+        property_re = (r'(?:name|property)=(?:\'og:%(prop)s\'|"og:%(prop)s"|\s*og:%(prop)s\b)'
+                       % {'prop': re.escape(prop)})
          template = r'<meta[^>]+?%s[^>]+?%s'
          return [
              template % (property_re, content_re),
@@ -636,7 +717,7 @@ class InfoExtractor(object):
      @staticmethod
      def _meta_regex(prop):
          return r'''(?isx)<meta
-                    (?=[^>]+(?:itemprop|name|property)=(["\']?)%s\1)
+                    (?=[^>]+(?:itemprop|name|property|id|http-equiv)=(["\']?)%s\1)
                      [^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
  
      def _og_search_property(self, prop, html, name=None, **kargs):
@@ -697,7 +778,7 @@ class InfoExtractor(object):
              'mature': 17,
              'restricted': 19,
          }
-        return RATING_TABLE.get(rating.lower(), None)
+        return RATING_TABLE.get(rating.lower())
  
      def _family_friendly_search(self, html):
          # See http://schema.org/VideoObject
@@ -712,28 +793,67 @@ class InfoExtractor(object):
              '0': 18,
              'false': 18,
          }
-        return RATING_TABLE.get(family_friendly.lower(), None)
+        return RATING_TABLE.get(family_friendly.lower())
  
      def _twitter_search_player(self, html):
          return self._html_search_meta('twitter:player', html,
                                        'twitter card player')
  
+    def _search_json_ld(self, html, video_id, **kwargs):
+        json_ld = self._search_regex(
+            r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
+            html, 'JSON-LD', group='json_ld', **kwargs)
+        if not json_ld:
+            return {}
+        return self._json_ld(json_ld, video_id, fatal=kwargs.get('fatal', True))
+
+    def _json_ld(self, json_ld, video_id, fatal=True):
+        if isinstance(json_ld, compat_str):
+            json_ld = self._parse_json(json_ld, video_id, fatal=fatal)
+        if not json_ld:
+            return {}
+        info = {}
+        if json_ld.get('@context') == 'http://schema.org':
+            item_type = json_ld.get('@type')
+            if item_type == 'TVEpisode':
+                info.update({
+                    'episode': unescapeHTML(json_ld.get('name')),
+                    'episode_number': int_or_none(json_ld.get('episodeNumber')),
+                    'description': unescapeHTML(json_ld.get('description')),
+                })
+                part_of_season = json_ld.get('partOfSeason')
+                if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
+                    info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
+                part_of_series = json_ld.get('partOfSeries')
+                if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
+                    info['series'] = unescapeHTML(part_of_series.get('name'))
+            elif item_type == 'Article':
+                info.update({
+                    'timestamp': parse_iso8601(json_ld.get('datePublished')),
+                    'title': unescapeHTML(json_ld.get('headline')),
+                    'description': unescapeHTML(json_ld.get('articleBody')),
+                })
+        return dict((k, v) for k, v in info.items() if v is not None)
+
      @staticmethod
      def _hidden_inputs(html):
-        return dict([
-            (input.group('name'), input.group('value')) for input in re.finditer(
-                r'''(?x)
-                    <input\s+
-                        type=(?P<q_hidden>["\'])hidden(?P=q_hidden)\s+
-                        name=(?P<q_name>["\'])(?P<name>.+?)(?P=q_name)\s+
-                        (?:id=(?P<q_id>["\']).+?(?P=q_id)\s+)?
-                        value=(?P<q_value>["\'])(?P<value>.*?)(?P=q_value)
-                ''', html)
-        ])
+        html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
+        hidden_inputs = {}
+        for input in re.findall(r'(?i)<input([^>]+)>', html):
+            if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
+                continue
+            name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
+            if not name:
+                continue
+            value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
+            if not value:
+                continue
+            hidden_inputs[name.group('value')] = value.group('value')
+        return hidden_inputs
  
      def _form_hidden_inputs(self, form_id, html):
          form = self._search_regex(
-            r'(?s)<form[^>]+?id=(["\'])%s\1[^>]*>(?P<form>.+?)</form>' % form_id,
+            r'(?is)<form[^>]+?id=(["\'])%s\1[^>]*>(?P<form>.+?)</form>' % form_id,
              html, '%s form' % form_id, group='form')
          return self._hidden_inputs(form)
  
@@ -741,6 +861,12 @@ class InfoExtractor(object):
          if not formats:
              raise ExtractorError('No video formats found')
  
+        for f in formats:
+            # Automatically determine tbr when missing based on abr and vbr (improves
+            # formats sorting in some cases)
+            if 'tbr' not in f and f.get('abr') is not None and f.get('vbr') is not None:
+                f['tbr'] = f['abr'] + f['vbr']
+
          def _formats_key(f):
              # TODO remove the following workaround
              from ..utils import determine_ext
@@ -752,15 +878,14 @@ class InfoExtractor(object):
  
              preference = f.get('preference')
              if preference is None:
-                proto = f.get('protocol')
-                if proto is None:
-                    proto = compat_urllib_parse_urlparse(f.get('url', '')).scheme
-
-                preference = 0 if proto in ['http', 'https'] else -0.1
+                preference = 0
                  if f.get('ext') in ['f4f', 'f4m']:  # Not yet supported
                      preference -= 0.5
  
+            proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
+
              if f.get('vcodec') == 'none':  # audio only
+                preference -= 50
                  if self._downloader.params.get('prefer_free_formats'):
                      ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus']
                  else:
@@ -771,6 +896,8 @@ class InfoExtractor(object):
                  except ValueError:
                      audio_ext_preference = -1
              else:
+                if f.get('acodec') == 'none':  # video only
+                    preference -= 40
                  if self._downloader.params.get('prefer_free_formats'):
                      ORDER = ['flv', 'mp4', 'webm']
                  else:
@@ -790,6 +917,7 @@ class InfoExtractor(object):
                  f.get('vbr') if f.get('vbr') is not None else -1,
                  f.get('height') if f.get('height') is not None else -1,
                  f.get('width') if f.get('width') is not None else -1,
+                proto_preference,
                  ext_preference,
                  f.get('abr') if f.get('abr') is not None else -1,
                  audio_ext_preference,
@@ -808,6 +936,16 @@ class InfoExtractor(object):
                      item='%s video format' % f.get('format_id') if f.get('format_id') else 'video'),
                  formats)
  
+    @staticmethod
+    def _remove_duplicate_formats(formats):
+        format_urls = set()
+        unique_formats = []
+        for f in formats:
+            if f['url'] not in format_urls:
+                format_urls.add(f['url'])
+                unique_formats.append(f)
+        formats[:] = unique_formats
+
      def _is_valid_url(self, url, video_id, item='video'):
          url = self._proto_relative_url(url, scheme='http:')
          # For now assume non HTTP(S) URLs always valid
@@ -817,7 +955,7 @@ class InfoExtractor(object):
              self._request_webpage(url, video_id, 'Checking %s URL' % item)
              return True
          except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError):
+            if isinstance(e.cause, compat_urllib_error.URLError):
                  self.to_screen(
                      '%s: %s URL is invalid, skipping' % (video_id, item))
                  return False
@@ -848,14 +986,26 @@ class InfoExtractor(object):
          time.sleep(timeout)
  
      def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
-                             transform_source=lambda s: fix_xml_ampersands(s).strip()):
+                             transform_source=lambda s: fix_xml_ampersands(s).strip(),
+                             fatal=True):
          manifest = self._download_xml(
              manifest_url, video_id, 'Downloading f4m manifest',
              'Unable to download f4m manifest',
              # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
              # (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244)
-            transform_source=transform_source)
+            transform_source=transform_source,
+            fatal=fatal)
+
+        if manifest is False:
+            return []
  
+        return self._parse_f4m_formats(
+            manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
+            transform_source=transform_source, fatal=fatal)
+
+    def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
+                           transform_source=lambda s: fix_xml_ampersands(s).strip(),
+                           fatal=True):
          # currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
          akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
          if akamai_pv is not None and ';' in akamai_pv.text:
@@ -869,6 +1019,16 @@ class InfoExtractor(object):
          if not media_nodes:
              manifest_version = '2.0'
              media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
+        # Remove unsupported DRM protected media from final formats
+        # rendition (see https://github.com/rg3/youtube-dl/issues/8573).
+        media_nodes = remove_encrypted_media(media_nodes)
+        if not media_nodes:
+            return formats
+        base_url = xpath_text(
+            manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
+            'base URL', default=None)
+        if base_url:
+            base_url = base_url.strip()
          for i, media_el in enumerate(media_nodes):
              if manifest_version == '2.0':
                  media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
@@ -876,13 +1036,15 @@ class InfoExtractor(object):
                      continue
                  manifest_url = (
                      media_url if media_url.startswith('http://') or media_url.startswith('https://')
-                    else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
+                    else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url))
                  # If media_url is itself a f4m manifest do the recursive extraction
                  # since bitrates in parent manifest (this one) and media_url manifest
                  # may differ leading to inability to resolve the format by requested
                  # bitrate in f4m downloader
                  if determine_ext(manifest_url) == 'f4m':
-                    formats.extend(self._extract_f4m_formats(manifest_url, video_id, preference, f4m_id))
+                    formats.extend(self._extract_f4m_formats(
+                        manifest_url, video_id, preference=preference, f4m_id=f4m_id,
+                        transform_source=transform_source, fatal=fatal))
                      continue
              tbr = int_or_none(media_el.attrib.get('bitrate'))
              formats.append({
@@ -894,8 +1056,6 @@ class InfoExtractor(object):
                  'height': int_or_none(media_el.attrib.get('height')),
                  'preference': preference,
              })
-        self._sort_formats(formats)
-
          return formats
  
      def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
@@ -918,13 +1078,37 @@ class InfoExtractor(object):
              if re.match(r'^https?://', u)
              else compat_urlparse.urljoin(m3u8_url, u))
  
-        m3u8_doc = self._download_webpage(
+        res = self._download_webpage_handle(
              m3u8_url, video_id,
              note=note or 'Downloading m3u8 information',
              errnote=errnote or 'Failed to download m3u8 information',
              fatal=fatal)
-        if m3u8_doc is False:
-            return m3u8_doc
+        if res is False:
+            return []
+        m3u8_doc, urlh = res
+        m3u8_url = urlh.geturl()
+
+        # We should try extracting formats only from master playlists [1], i.e.
+        # playlists that describe available qualities. On the other hand media
+        # playlists [2] should be returned as is since they contain just the media
+        # without qualities renditions.
+        # Fortunately, master playlist can be easily distinguished from media
+        # playlist based on particular tags availability. As of [1, 2] master
+        # playlist tags MUST NOT appear in a media playist and vice versa.
+        # As of [3] #EXT-X-TARGETDURATION tag is REQUIRED for every media playlist
+        # and MUST NOT appear in master playlist thus we can clearly detect media
+        # playlist with this criterion.
+        # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.4
+        # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3
+        # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1
+        if '#EXT-X-TARGETDURATION' in m3u8_doc:  # media playlist, return as is
+            return [{
+                'url': m3u8_url,
+                'format_id': m3u8_id,
+                'ext': ext,
+                'protocol': entry_protocol,
+                'preference': preference,
+            }]
          last_info = None
          last_media = None
          kv_rex = re.compile(
@@ -964,95 +1148,437 @@ class InfoExtractor(object):
                      'protocol': entry_protocol,
                      'preference': preference,
                  }
-                codecs = last_info.get('CODECS')
-                if codecs:
-                    # TODO: looks like video codec is not always necessarily goes first
-                    va_codecs = codecs.split(',')
-                    if va_codecs[0]:
-                        f['vcodec'] = va_codecs[0].partition('.')[0]
-                    if len(va_codecs) > 1 and va_codecs[1]:
-                        f['acodec'] = va_codecs[1].partition('.')[0]
                  resolution = last_info.get('RESOLUTION')
                  if resolution:
                      width_str, height_str = resolution.split('x')
                      f['width'] = int(width_str)
                      f['height'] = int(height_str)
+                codecs = last_info.get('CODECS')
+                if codecs:
+                    vcodec, acodec = [None] * 2
+                    va_codecs = codecs.split(',')
+                    if len(va_codecs) == 1:
+                        # Audio only entries usually come with single codec and
+                        # no resolution. For more robustness we also check it to
+                        # be mp4 audio.
+                        if not resolution and va_codecs[0].startswith('mp4a'):
+                            vcodec, acodec = 'none', va_codecs[0]
+                        else:
+                            vcodec = va_codecs[0]
+                    else:
+                        vcodec, acodec = va_codecs[:2]
+                    f.update({
+                        'acodec': acodec,
+                        'vcodec': vcodec,
+                    })
                  if last_media is not None:
                      f['m3u8_media'] = last_media
                      last_media = None
                  formats.append(f)
                  last_info = {}
-        self._sort_formats(formats)
          return formats
  
-    # TODO: improve extraction
-    def _extract_smil_formats(self, smil_url, video_id, fatal=True):
-        smil = self._download_xml(
-            smil_url, video_id, 'Downloading SMIL file',
-            'Unable to download SMIL file', fatal=fatal)
+    @staticmethod
+    def _xpath_ns(path, namespace=None):
+        if not namespace:
+            return path
+        out = []
+        for c in path.split('/'):
+            if not c or c == '.':
+                out.append(c)
+            else:
+                out.append('{%s}%s' % (namespace, c))
+        return '/'.join(out)
+
+    def _extract_smil_formats(self, smil_url, video_id, fatal=True, f4m_params=None, transform_source=None):
+        smil = self._download_smil(smil_url, video_id, fatal=fatal, transform_source=transform_source)
+
          if smil is False:
              assert not fatal
              return []
  
-        base = smil.find('./head/meta').get('base')
+        namespace = self._parse_smil_namespace(smil)
+
+        return self._parse_smil_formats(
+            smil, smil_url, video_id, namespace=namespace, f4m_params=f4m_params)
+
+    def _extract_smil_info(self, smil_url, video_id, fatal=True, f4m_params=None):
+        smil = self._download_smil(smil_url, video_id, fatal=fatal)
+        if smil is False:
+            return {}
+        return self._parse_smil(smil, smil_url, video_id, f4m_params=f4m_params)
+
+    def _download_smil(self, smil_url, video_id, fatal=True, transform_source=None):
+        return self._download_xml(
+            smil_url, video_id, 'Downloading SMIL file',
+            'Unable to download SMIL file', fatal=fatal, transform_source=transform_source)
+
+    def _parse_smil(self, smil, smil_url, video_id, f4m_params=None):
+        namespace = self._parse_smil_namespace(smil)
+
+        formats = self._parse_smil_formats(
+            smil, smil_url, video_id, namespace=namespace, f4m_params=f4m_params)
+        subtitles = self._parse_smil_subtitles(smil, namespace=namespace)
+
+        video_id = os.path.splitext(url_basename(smil_url))[0]
+        title = None
+        description = None
+        upload_date = None
+        for meta in smil.findall(self._xpath_ns('./head/meta', namespace)):
+            name = meta.attrib.get('name')
+            content = meta.attrib.get('content')
+            if not name or not content:
+                continue
+            if not title and name == 'title':
+                title = content
+            elif not description and name in ('description', 'abstract'):
+                description = content
+            elif not upload_date and name == 'date':
+                upload_date = unified_strdate(content)
+
+        thumbnails = [{
+            'id': image.get('type'),
+            'url': image.get('src'),
+            'width': int_or_none(image.get('width')),
+            'height': int_or_none(image.get('height')),
+        } for image in smil.findall(self._xpath_ns('.//image', namespace)) if image.get('src')]
+
+        return {
+            'id': video_id,
+            'title': title or video_id,
+            'description': description,
+            'upload_date': upload_date,
+            'thumbnails': thumbnails,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
+    def _parse_smil_namespace(self, smil):
+        return self._search_regex(
+            r'(?i)^{([^}]+)?}smil$', smil.tag, 'namespace', default=None)
+
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        base = smil_url
+        for meta in smil.findall(self._xpath_ns('./head/meta', namespace)):
+            b = meta.get('base') or meta.get('httpBase')
+            if b:
+                base = b
+                break
  
          formats = []
          rtmp_count = 0
-        if smil.findall('./body/seq/video'):
-            video = smil.findall('./body/seq/video')[0]
-            fmts, rtmp_count = self._parse_smil_video(video, video_id, base, rtmp_count)
-            formats.extend(fmts)
-        else:
-            for video in smil.findall('./body/switch/video'):
-                fmts, rtmp_count = self._parse_smil_video(video, video_id, base, rtmp_count)
-                formats.extend(fmts)
+        http_count = 0
+        m3u8_count = 0
+
+        srcs = []
+        videos = smil.findall(self._xpath_ns('.//video', namespace))
+        for video in videos:
+            src = video.get('src')
+            if not src or src in srcs:
+                continue
+            srcs.append(src)
+
+            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
+            filesize = int_or_none(video.get('size') or video.get('fileSize'))
+            width = int_or_none(video.get('width'))
+            height = int_or_none(video.get('height'))
+            proto = video.get('proto')
+            ext = video.get('ext')
+            src_ext = determine_ext(src)
+            streamer = video.get('streamer') or base
+
+            if proto == 'rtmp' or streamer.startswith('rtmp'):
+                rtmp_count += 1
+                formats.append({
+                    'url': streamer,
+                    'play_path': src,
+                    'ext': 'flv',
+                    'format_id': 'rtmp-%d' % (rtmp_count if bitrate is None else bitrate),
+                    'tbr': bitrate,
+                    'filesize': filesize,
+                    'width': width,
+                    'height': height,
+                })
+                if transform_rtmp_url:
+                    streamer, src = transform_rtmp_url(streamer, src)
+                    formats[-1].update({
+                        'url': streamer,
+                        'play_path': src,
+                    })
+                continue
+
+            src_url = src if src.startswith('http') else compat_urlparse.urljoin(base, src)
+            src_url = src_url.strip()
+
+            if proto == 'm3u8' or src_ext == 'm3u8':
+                m3u8_formats = self._extract_m3u8_formats(
+                    src_url, video_id, ext or 'mp4', m3u8_id='hls', fatal=False)
+                if len(m3u8_formats) == 1:
+                    m3u8_count += 1
+                    m3u8_formats[0].update({
+                        'format_id': 'hls-%d' % (m3u8_count if bitrate is None else bitrate),
+                        'tbr': bitrate,
+                        'width': width,
+                        'height': height,
+                    })
+                formats.extend(m3u8_formats)
+                continue
+
+            if src_ext == 'f4m':
+                f4m_url = src_url
+                if not f4m_params:
+                    f4m_params = {
+                        'hdcore': '3.2.0',
+                        'plugin': 'flowplayer-3.2.0.1',
+                    }
+                f4m_url += '&' if '?' in f4m_url else '?'
+                f4m_url += compat_urllib_parse_urlencode(f4m_params)
+                formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
+                continue
  
-        self._sort_formats(formats)
+            if src_url.startswith('http') and self._is_valid_url(src, video_id):
+                http_count += 1
+                formats.append({
+                    'url': src_url,
+                    'ext': ext or src_ext or 'flv',
+                    'format_id': 'http-%d' % (bitrate or http_count),
+                    'tbr': bitrate,
+                    'filesize': filesize,
+                    'width': width,
+                    'height': height,
+                })
+                continue
  
          return formats
  
-    def _parse_smil_video(self, video, video_id, base, rtmp_count):
-        src = video.get('src')
-        if not src:
-            return [], rtmp_count
-        bitrate = int_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
-        width = int_or_none(video.get('width'))
-        height = int_or_none(video.get('height'))
-        proto = video.get('proto')
-        if not proto:
-            if base:
-                if base.startswith('rtmp'):
-                    proto = 'rtmp'
-                elif base.startswith('http'):
-                    proto = 'http'
-        ext = video.get('ext')
-        if proto == 'm3u8':
-            return self._extract_m3u8_formats(src, video_id, ext), rtmp_count
-        elif proto == 'rtmp':
-            rtmp_count += 1
-            streamer = video.get('streamer') or base
-            return ([{
-                'url': streamer,
-                'play_path': src,
-                'ext': 'flv',
-                'format_id': 'rtmp-%d' % (rtmp_count if bitrate is None else bitrate),
-                'tbr': bitrate,
-                'width': width,
-                'height': height,
-            }], rtmp_count)
-        elif proto.startswith('http'):
-            return ([{
-                'url': base + src,
-                'ext': ext or 'flv',
-                'tbr': bitrate,
-                'width': width,
-                'height': height,
-            }], rtmp_count)
+    def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
+        urls = []
+        subtitles = {}
+        for num, textstream in enumerate(smil.findall(self._xpath_ns('.//textstream', namespace))):
+            src = textstream.get('src')
+            if not src or src in urls:
+                continue
+            urls.append(src)
+            ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
+            lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
+            subtitles.setdefault(lang, []).append({
+                'url': src,
+                'ext': ext,
+            })
+        return subtitles
+
+    def _extract_xspf_playlist(self, playlist_url, playlist_id, fatal=True):
+        xspf = self._download_xml(
+            playlist_url, playlist_id, 'Downloading xpsf playlist',
+            'Unable to download xspf manifest', fatal=fatal)
+        if xspf is False:
+            return []
+        return self._parse_xspf(xspf, playlist_id)
+
+    def _parse_xspf(self, playlist, playlist_id):
+        NS_MAP = {
+            'xspf': 'http://xspf.org/ns/0/',
+            's1': 'http://static.streamone.nl/player/ns/0',
+        }
+
+        entries = []
+        for track in playlist.findall(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP)):
+            title = xpath_text(
+                track, xpath_with_ns('./xspf:title', NS_MAP), 'title', default=playlist_id)
+            description = xpath_text(
+                track, xpath_with_ns('./xspf:annotation', NS_MAP), 'description')
+            thumbnail = xpath_text(
+                track, xpath_with_ns('./xspf:image', NS_MAP), 'thumbnail')
+            duration = float_or_none(
+                xpath_text(track, xpath_with_ns('./xspf:duration', NS_MAP), 'duration'), 1000)
+
+            formats = [{
+                'url': location.text,
+                'format_id': location.get(xpath_with_ns('s1:label', NS_MAP)),
+                'width': int_or_none(location.get(xpath_with_ns('s1:width', NS_MAP))),
+                'height': int_or_none(location.get(xpath_with_ns('s1:height', NS_MAP))),
+            } for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP))]
+            self._sort_formats(formats)
+
+            entries.append({
+                'id': playlist_id,
+                'title': title,
+                'description': description,
+                'thumbnail': thumbnail,
+                'duration': duration,
+                'formats': formats,
+            })
+        return entries
+
+    def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
+        res = self._download_webpage_handle(
+            mpd_url, video_id,
+            note=note or 'Downloading MPD manifest',
+            errnote=errnote or 'Failed to download MPD manifest',
+            fatal=fatal)
+        if res is False:
+            return []
+        mpd, urlh = res
+        mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
+
+        return self._parse_mpd_formats(
+            compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
+
+    def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
+        if mpd_doc.get('type') == 'dynamic':
+            return []
+
+        namespace = self._search_regex(r'(?i)^{([^}]+)?}MPD$', mpd_doc.tag, 'namespace', default=None)
+
+        def _add_ns(path):
+            return self._xpath_ns(path, namespace)
+
+        def is_drm_protected(element):
+            return element.find(_add_ns('ContentProtection')) is not None
+
+        def extract_multisegment_info(element, ms_parent_info):
+            ms_info = ms_parent_info.copy()
+            segment_list = element.find(_add_ns('SegmentList'))
+            if segment_list is not None:
+                segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
+                if segment_urls_e:
+                    ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
+                initialization = segment_list.find(_add_ns('Initialization'))
+                if initialization is not None:
+                    ms_info['initialization_url'] = initialization.attrib['sourceURL']
+            else:
+                segment_template = element.find(_add_ns('SegmentTemplate'))
+                if segment_template is not None:
+                    start_number = segment_template.get('startNumber')
+                    if start_number:
+                        ms_info['start_number'] = int(start_number)
+                    segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
+                    if segment_timeline is not None:
+                        s_e = segment_timeline.findall(_add_ns('S'))
+                        if s_e:
+                            ms_info['total_number'] = 0
+                            for s in s_e:
+                                ms_info['total_number'] += 1 + int(s.get('r', '0'))
+                    else:
+                        timescale = segment_template.get('timescale')
+                        if timescale:
+                            ms_info['timescale'] = int(timescale)
+                        segment_duration = segment_template.get('duration')
+                        if segment_duration:
+                            ms_info['segment_duration'] = int(segment_duration)
+                    media_template = segment_template.get('media')
+                    if media_template:
+                        ms_info['media_template'] = media_template
+                    initialization = segment_template.get('initialization')
+                    if initialization:
+                        ms_info['initialization_url'] = initialization
+                    else:
+                        initialization = segment_template.find(_add_ns('Initialization'))
+                        if initialization is not None:
+                            ms_info['initialization_url'] = initialization.attrib['sourceURL']
+            return ms_info
+
+        mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
+        formats = []
+        for period in mpd_doc.findall(_add_ns('Period')):
+            period_duration = parse_duration(period.get('duration')) or mpd_duration
+            period_ms_info = extract_multisegment_info(period, {
+                'start_number': 1,
+                'timescale': 1,
+            })
+            for adaptation_set in period.findall(_add_ns('AdaptationSet')):
+                if is_drm_protected(adaptation_set):
+                    continue
+                adaption_set_ms_info = extract_multisegment_info(adaptation_set, period_ms_info)
+                for representation in adaptation_set.findall(_add_ns('Representation')):
+                    if is_drm_protected(representation):
+                        continue
+                    representation_attrib = adaptation_set.attrib.copy()
+                    representation_attrib.update(representation.attrib)
+                    # According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory
+                    mime_type = representation_attrib['mimeType']
+                    content_type = mime_type.split('/')[0]
+                    if content_type == 'text':
+                        # TODO implement WebVTT downloading
+                        pass
+                    elif content_type == 'video' or content_type == 'audio':
+                        base_url = ''
+                        for element in (representation, adaptation_set, period, mpd_doc):
+                            base_url_e = element.find(_add_ns('BaseURL'))
+                            if base_url_e is not None:
+                                base_url = base_url_e.text + base_url
+                                if re.match(r'^https?://', base_url):
+                                    break
+                        if mpd_base_url and not re.match(r'^https?://', base_url):
+                            if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
+                                mpd_base_url += '/'
+                            base_url = mpd_base_url + base_url
+                        representation_id = representation_attrib.get('id')
+                        lang = representation_attrib.get('lang')
+                        url_el = representation.find(_add_ns('BaseURL'))
+                        filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
+                        f = {
+                            'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
+                            'url': base_url,
+                            'ext': mimetype2ext(mime_type),
+                            'width': int_or_none(representation_attrib.get('width')),
+                            'height': int_or_none(representation_attrib.get('height')),
+                            'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
+                            'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
+                            'fps': int_or_none(representation_attrib.get('frameRate')),
+                            'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
+                            'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
+                            'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
+                            'format_note': 'DASH %s' % content_type,
+                            'filesize': filesize,
+                        }
+                        representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
+                        if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
+                            if 'total_number' not in representation_ms_info and 'segment_duration':
+                                segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
+                                representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
+                            media_template = representation_ms_info['media_template']
+                            media_template = media_template.replace('$RepresentationID$', representation_id)
+                            media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
+                            media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template)
+                            media_template.replace('$$', '$')
+                            representation_ms_info['segment_urls'] = [
+                                media_template % {
+                                    'Number': segment_number,
+                                    'Bandwidth': representation_attrib.get('bandwidth')}
+                                for segment_number in range(
+                                    representation_ms_info['start_number'],
+                                    representation_ms_info['total_number'] + representation_ms_info['start_number'])]
+                        if 'segment_urls' in representation_ms_info:
+                            f.update({
+                                'segment_urls': representation_ms_info['segment_urls'],
+                                'protocol': 'http_dash_segments',
+                            })
+                            if 'initialization_url' in representation_ms_info:
+                                initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
+                                f.update({
+                                    'initialization_url': initialization_url,
+                                })
+                                if not f.get('url'):
+                                    f['url'] = initialization_url
+                        try:
+                            existing_format = next(
+                                fo for fo in formats
+                                if fo['format_id'] == representation_id)
+                        except StopIteration:
+                            full_info = formats_dict.get(representation_id, {}).copy()
+                            full_info.update(f)
+                            formats.append(full_info)
+                        else:
+                            existing_format.update(f)
+                    else:
+                        self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
+        return formats
  
      def _live_title(self, name):
          """ Generate the title for a live video """
          now = datetime.datetime.now()
-        now_str = now.strftime("%Y-%m-%d %H:%M")
+        now_str = now.strftime('%Y-%m-%d %H:%M')
          return name + ' ' + now_str
  
      def _int(self, v, name, fatal=False, **kwargs):
@@ -1085,7 +1611,7 @@ class InfoExtractor(object):
  
      def _get_cookies(self, url):
          """ Return a compat_cookies.SimpleCookie with the cookies for the url """
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          self._downloader.cookiejar.add_cookie_header(req)
          return compat_cookies.SimpleCookie(req.get_header('Cookie'))
  
@@ -1125,7 +1651,24 @@ class InfoExtractor(object):
          return {}
  
      def _get_subtitles(self, *args, **kwargs):
-        raise NotImplementedError("This method must be implemented by subclasses")
+        raise NotImplementedError('This method must be implemented by subclasses')
+
+    @staticmethod
+    def _merge_subtitle_items(subtitle_list1, subtitle_list2):
+        """ Merge subtitle items for one language. Items with duplicated URLs
+        will be dropped. """
+        list1_urls = set([item['url'] for item in subtitle_list1])
+        ret = list(subtitle_list1)
+        ret.extend([item for item in subtitle_list2 if item['url'] not in list1_urls])
+        return ret
+
+    @classmethod
+    def _merge_subtitles(cls, subtitle_dict1, subtitle_dict2):
+        """ Merge two subtitle dictionaries, language by language. """
+        ret = dict(subtitle_dict1)
+        for lang in subtitle_dict2:
+            ret[lang] = cls._merge_subtitle_items(subtitle_dict1.get(lang, []), subtitle_dict2[lang])
+        return ret
  
      def extract_automatic_captions(self, *args, **kwargs):
          if (self._downloader.params.get('writeautomaticsub', False) or
@@ -1134,7 +1677,16 @@ class InfoExtractor(object):
          return {}
  
      def _get_automatic_captions(self, *args, **kwargs):
-        raise NotImplementedError("This method must be implemented by subclasses")
+        raise NotImplementedError('This method must be implemented by subclasses')
+
+    def mark_watched(self, *args, **kwargs):
+        if (self._downloader.params.get('mark_watched', False) and
+                (self._get_login_info()[0] is not None or
+                    self._downloader.params.get('cookiefile') is not None)):
+            self._mark_watched(*args, **kwargs)
+
+    def _mark_watched(self, *args, **kwargs):
+        raise NotImplementedError('This method must be implemented by subclasses')
  
  
  class SearchInfoExtractor(InfoExtractor):
@@ -1174,7 +1726,7 @@ class SearchInfoExtractor(InfoExtractor):
  
      def _get_n_results(self, query, n):
          """Get a specified number of results for a query"""
-        raise NotImplementedError("This method must be implemented by subclasses")
+        raise NotImplementedError('This method must be implemented by subclasses')
  
      @property
      def SEARCH_KEY(self):
diff --git a/youtube_dl/extractor/commonprotocols.py b/youtube_dl/extractor/commonprotocols.py

new file mode 100644 (file)

index 0000000..5d130a1
--- /dev/null
+++ b/youtube_dl/extractor/commonprotocols.py
@@ -0,0 +1,36 @@
+from __future__ import unicode_literals
+
+import os
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_urlparse,
+)
+from ..utils import url_basename
+
+
+class RtmpIE(InfoExtractor):
+    IE_DESC = False  # Do not list
+    _VALID_URL = r'(?i)rtmp[est]?://.+'
+
+    _TESTS = [{
+        'url': 'rtmp://cp44293.edgefcs.net/ondemand?auth=daEcTdydfdqcsb8cZcDbAaCbhamacbbawaS-bw7dBb-bWG-GqpGFqCpNCnGoyL&aifp=v001&slist=public/unsecure/audio/2c97899446428e4301471a8cb72b4b97--audio--pmg-20110908-0900a_flv_aac_med_int.mp4',
+        'only_matching': True,
+    }, {
+        'url': 'rtmp://edge.live.hitbox.tv/live/dimak',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
+        title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': [{
+                'url': url,
+                'ext': 'flv',
+                'format_id': compat_urlparse.urlparse(url).scheme,
+            }],
+        }
diff --git a/youtube_dl/extractor/condenast.py b/youtube_dl/extractor/condenast.py

index 3db4db4e4db816ae532060bc2386cd91a9c71a92..e8f2b5a07591410c16fe6fe096678a12006abe48 100644 (file)
--- a/youtube_dl/extractor/condenast.py
+++ b/youtube_dl/extractor/condenast.py
@@ -2,16 +2,16 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
      compat_urlparse,
  )
  from ..utils import (
      orderedSet,
+    remove_end,
  )
  
  
@@ -24,21 +24,33 @@ class CondeNastIE(InfoExtractor):
      # The keys are the supported sites and the values are the name to be shown
      # to the user and in the extractor description.
      _SITES = {
-        'wired': 'WIRED',
+        'allure': 'Allure',
+        'architecturaldigest': 'Architectural Digest',
+        'arstechnica': 'Ars Technica',
+        'bonappetit': 'Bon Appétit',
+        'brides': 'Brides',
+        'cnevids': 'Condé Nast',
+        'cntraveler': 'Condé Nast Traveler',
+        'details': 'Details',
+        'epicurious': 'Epicurious',
+        'glamour': 'Glamour',
+        'golfdigest': 'Golf Digest',
          'gq': 'GQ',
+        'newyorker': 'The New Yorker',
+        'self': 'SELF',
+        'teenvogue': 'Teen Vogue',
+        'vanityfair': 'Vanity Fair',
          'vogue': 'Vogue',
-        'glamour': 'Glamour',
+        'wired': 'WIRED',
          'wmagazine': 'W Magazine',
-        'vanityfair': 'Vanity Fair',
-        'cnevids': 'Condé Nast',
      }
  
-    _VALID_URL = r'http://(video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
+    _VALID_URL = r'https?://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
      IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
  
-    EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed)/.+?' % '|'.join(_SITES.keys())
+    EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed(?:js)?)/.+?' % '|'.join(_SITES.keys())
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://video.wired.com/watch/3d-printed-speakers-lit-with-led',
          'md5': '1921f713ed48aabd715691f774c451f7',
          'info_dict': {
@@ -47,7 +59,16 @@ class CondeNastIE(InfoExtractor):
              'title': '3D Printed Speakers Lit With LED',
              'description': 'Check out these beautiful 3D printed LED speakers.  You can\'t actually buy them, but LumiGeek is working on a board that will let you make you\'re own.',
          }
-    }
+    }, {
+        # JS embed
+        'url': 'http://player.cnevids.com/embedjs/55f9cf8b61646d1acf00000c/5511d76261646d5566020000.js',
+        'md5': 'f1a6f9cafb7083bab74a710f65d08999',
+        'info_dict': {
+            'id': '55f9cf8b61646d1acf00000c',
+            'ext': 'mp4',
+            'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
+        }
+    }]
  
      def _extract_series(self, url, webpage):
          title = self._html_search_regex(r'<div class="cne-series-info">.*?<h1>(.+?)</h1>',
@@ -76,7 +97,7 @@ class CondeNastIE(InfoExtractor):
          video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
          player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
          target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
-        data = compat_urllib_parse.urlencode({'videoId': video_id,
+        data = compat_urllib_parse_urlencode({'videoId': video_id,
                                                'playerId': player_id,
                                                'target': target,
                                                })
@@ -86,8 +107,8 @@ class CondeNastIE(InfoExtractor):
          info_url = base_info_url + data
          info_page = self._download_webpage(info_url, video_id,
                                             'Downloading video info')
-        video_info = self._search_regex(r'var video = ({.+?});', info_page, 'video info')
-        video_info = json.loads(video_info)
+        video_info = self._search_regex(r'var\s+video\s*=\s*({.+?});', info_page, 'video info')
+        video_info = self._parse_json(video_info, video_id)
  
          formats = [{
              'format_id': '%s-%s' % (fdata['type'].split('/')[-1], fdata['quality']),
@@ -111,6 +132,13 @@ class CondeNastIE(InfoExtractor):
          url_type = mobj.group('type')
          item_id = mobj.group('id')
  
+        # Convert JS embed to regular embed
+        if url_type == 'embedjs':
+            parsed_url = compat_urlparse.urlparse(url)
+            url = compat_urlparse.urlunparse(parsed_url._replace(
+                path=remove_end(parsed_url.path, '.js').replace('/embedjs/', '/embed/')))
+            url_type = 'embed'
+
          self.to_screen('Extracting from %s with the Condé Nast extractor' % self._SITES[site])
          webpage = self._download_webpage(url, item_id)
  
diff --git a/youtube_dl/extractor/crackle.py b/youtube_dl/extractor/crackle.py

new file mode 100644 (file)

index 0000000..79238cc
--- /dev/null
+++ b/youtube_dl/extractor/crackle.py
@@ -0,0 +1,95 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class CrackleIE(InfoExtractor):
+    _VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.crackle.com/the-art-of-more/2496419',
+        'info_dict': {
+            'id': '2496419',
+            'ext': 'mp4',
+            'title': 'Heavy Lies the Head',
+            'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }
+
+    # extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
+    _SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
+    _UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
+    _THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
+
+    # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
+    _MEDIA_FILE_SLOTS = {
+        'c544.flv': {
+            'width': 544,
+            'height': 306,
+        },
+        '360p.mp4': {
+            'width': 640,
+            'height': 360,
+        },
+        '480p.mp4': {
+            'width': 852,
+            'height': 478,
+        },
+        '480p_1mbps.mp4': {
+            'width': 852,
+            'height': 478,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        item = self._download_xml(
+            'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
+            video_id).find('i')
+        title = item.attrib['t']
+
+        thumbnail = None
+        subtitles = {}
+        formats = self._extract_m3u8_formats(
+            'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id),
+            video_id, 'mp4', m3u8_id='hls', fatal=None)
+        path = item.attrib.get('p')
+        if path:
+            thumbnail = self._THUMBNAIL_TEMPLATE % path
+            http_base_url = 'http://ahttp.crackle.com/' + path
+            for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
+                formats.append({
+                    'url': http_base_url + mfs_path,
+                    'format_id': 'http-' + mfs_path.split('.')[0],
+                    'width': mfs_info['width'],
+                    'height': mfs_info['height'],
+                })
+            for cc in item.findall('cc'):
+                locale = cc.attrib.get('l')
+                v = cc.attrib.get('v')
+                if locale and v:
+                    if locale not in subtitles:
+                        subtitles[locale] = []
+                    subtitles[locale] = [{
+                        'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v),
+                        'ext': 'ttml',
+                    }]
+        self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': item.attrib.get('d'),
+            'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None,
+            'series': item.attrib.get('sn'),
+            'season_number': int_or_none(item.attrib.get('se')),
+            'episode_number': int_or_none(item.attrib.get('ep')),
+            'thumbnail': thumbnail,
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/criterion.py b/youtube_dl/extractor/criterion.py

index 4fb1781659b3266b4c475b566911385ec2f7c5b7..dedb810a092618a090641dfaf582939efabf3fc0 100644 (file)
--- a/youtube_dl/extractor/criterion.py
+++ b/youtube_dl/extractor/criterion.py
@@ -27,9 +27,7 @@ class CriterionIE(InfoExtractor):
          final_url = self._search_regex(
              r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
          title = self._og_search_title(webpage)
-        description = self._html_search_regex(
-            r'<meta name="description" content="(.+?)" />',
-            webpage, 'video description')
+        description = self._html_search_meta('description', webpage)
          thumbnail = self._search_regex(
              r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
              webpage, 'thumbnail url')
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index d1b6d7366e847015af4581160c918d4a1ee6e11f..8ae3f28903deeb94cd723c7f431a3c86a45538ea 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -5,31 +5,84 @@ import re
  import json
  import base64
  import zlib
-import xml.etree.ElementTree
  
  from hashlib import sha1
  from math import pow, sqrt, floor
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_etree_fromstring,
      compat_urllib_parse_unquote,
+    compat_urllib_parse_urlencode,
      compat_urllib_request,
+    compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
      bytes_to_intlist,
      intlist_to_bytes,
+    int_or_none,
+    lowercase_escape,
+    remove_end,
+    sanitized_Request,
      unified_strdate,
      urlencode_postdata,
+    xpath_text,
  )
  from ..aes import (
      aes_cbc_decrypt,
  )
  
  
-class CrunchyrollIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
+class CrunchyrollBaseIE(InfoExtractor):
      _NETRC_MACHINE = 'crunchyroll'
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+        self.report_login()
+        login_url = 'https://www.crunchyroll.com/?a=formhandler'
+        data = urlencode_postdata({
+            'formname': 'RpcApiUser_Login',
+            'name': username,
+            'password': password,
+        })
+        login_request = sanitized_Request(login_url, data)
+        login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        self._download_webpage(login_request, None, False, 'Wrong login info')
+
+    def _real_initialize(self):
+        self._login()
+
+    def _download_webpage(self, url_or_request, *args, **kwargs):
+        request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
+                   else sanitized_Request(url_or_request))
+        # Accept-Language must be set explicitly to accept any language to avoid issues
+        # similar to https://github.com/rg3/youtube-dl/issues/6797.
+        # Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
+        # should be imposed or not (from what I can see it just takes the first language
+        # ignoring the priority and requires it to correspond the IP). By the way this causes
+        # Crunchyroll to not work in georestriction cases in some browsers that don't place
+        # the locale lang first in header. However allowing any language seems to workaround the issue.
+        request.add_header('Accept-Language', '*')
+        return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
+
+    @staticmethod
+    def _add_skip_wall(url):
+        parsed_url = compat_urlparse.urlparse(url)
+        qs = compat_urlparse.parse_qs(parsed_url.query)
+        # Always force skip_wall to bypass maturity wall, namely 18+ confirmation message:
+        # > This content may be inappropriate for some people.
+        # > Are you sure you want to continue?
+        # since it's not disabled by default in crunchyroll account's settings.
+        # See https://github.com/rg3/youtube-dl/issues/7202.
+        qs['skip_wall'] = ['1']
+        return compat_urlparse.urlunparse(
+            parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
+
+
+class CrunchyrollIE(CrunchyrollBaseIE):
+    _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
      _TESTS = [{
          'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
          'info_dict': {
@@ -52,7 +105,7 @@ class CrunchyrollIE(InfoExtractor):
              'id': '589804',
              'ext': 'flv',
              'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11',
-            'description': 'md5:fe2743efedb49d279552926d0bd0cd9e',
+            'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'Danny Choo Network',
              'upload_date': '20120213',
@@ -61,10 +114,13 @@ class CrunchyrollIE(InfoExtractor):
              # rtmp
              'skip_download': True,
          },
-
      }, {
          'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
          'only_matching': True,
+    }, {
+        # geo-restricted (US), 18+ maturity wall, non-premium available
+        'url': 'http://www.crunchyroll.com/cosplay-complex-ova/episode-1-the-birth-of-the-cosplay-club-565617',
+        'only_matching': True,
      }]
  
      _FORMAT_IDS = {
@@ -74,24 +130,6 @@ class CrunchyrollIE(InfoExtractor):
          '1080': ('80', '108'),
      }
  
-    def _login(self):
-        (username, password) = self._get_login_info()
-        if username is None:
-            return
-        self.report_login()
-        login_url = 'https://www.crunchyroll.com/?a=formhandler'
-        data = urlencode_postdata({
-            'formname': 'RpcApiUser_Login',
-            'name': username,
-            'password': password,
-        })
-        login_request = compat_urllib_request.Request(login_url, data)
-        login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        self._download_webpage(login_request, None, False, 'Wrong login info')
-
-    def _real_initialize(self):
-        self._login()
-
      def _decrypt_subtitles(self, data, iv, id):
          data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
          iv = bytes_to_intlist(base64.b64decode(iv.encode('utf-8')))
@@ -141,40 +179,40 @@ class CrunchyrollIE(InfoExtractor):
              return assvalue
  
          output = '[Script Info]\n'
-        output += 'Title: %s\n' % sub_root.attrib["title"]
+        output += 'Title: %s\n' % sub_root.attrib['title']
          output += 'ScriptType: v4.00+\n'
-        output += 'WrapStyle: %s\n' % sub_root.attrib["wrap_style"]
-        output += 'PlayResX: %s\n' % sub_root.attrib["play_res_x"]
-        output += 'PlayResY: %s\n' % sub_root.attrib["play_res_y"]
+        output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
+        output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
+        output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
          output += """ScaledBorderAndShadow: yes
  
  [V4+ Styles]
  Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
  """
          for style in sub_root.findall('./styles/style'):
-            output += 'Style: ' + style.attrib["name"]
-            output += ',' + style.attrib["font_name"]
-            output += ',' + style.attrib["font_size"]
-            output += ',' + style.attrib["primary_colour"]
-            output += ',' + style.attrib["secondary_colour"]
-            output += ',' + style.attrib["outline_colour"]
-            output += ',' + style.attrib["back_colour"]
-            output += ',' + ass_bool(style.attrib["bold"])
-            output += ',' + ass_bool(style.attrib["italic"])
-            output += ',' + ass_bool(style.attrib["underline"])
-            output += ',' + ass_bool(style.attrib["strikeout"])
-            output += ',' + style.attrib["scale_x"]
-            output += ',' + style.attrib["scale_y"]
-            output += ',' + style.attrib["spacing"]
-            output += ',' + style.attrib["angle"]
-            output += ',' + style.attrib["border_style"]
-            output += ',' + style.attrib["outline"]
-            output += ',' + style.attrib["shadow"]
-            output += ',' + style.attrib["alignment"]
-            output += ',' + style.attrib["margin_l"]
-            output += ',' + style.attrib["margin_r"]
-            output += ',' + style.attrib["margin_v"]
-            output += ',' + style.attrib["encoding"]
+            output += 'Style: ' + style.attrib['name']
+            output += ',' + style.attrib['font_name']
+            output += ',' + style.attrib['font_size']
+            output += ',' + style.attrib['primary_colour']
+            output += ',' + style.attrib['secondary_colour']
+            output += ',' + style.attrib['outline_colour']
+            output += ',' + style.attrib['back_colour']
+            output += ',' + ass_bool(style.attrib['bold'])
+            output += ',' + ass_bool(style.attrib['italic'])
+            output += ',' + ass_bool(style.attrib['underline'])
+            output += ',' + ass_bool(style.attrib['strikeout'])
+            output += ',' + style.attrib['scale_x']
+            output += ',' + style.attrib['scale_y']
+            output += ',' + style.attrib['spacing']
+            output += ',' + style.attrib['angle']
+            output += ',' + style.attrib['border_style']
+            output += ',' + style.attrib['outline']
+            output += ',' + style.attrib['shadow']
+            output += ',' + style.attrib['alignment']
+            output += ',' + style.attrib['margin_l']
+            output += ',' + style.attrib['margin_r']
+            output += ',' + style.attrib['margin_v']
+            output += ',' + style.attrib['encoding']
              output += '\n'
  
          output += """
@@ -183,21 +221,21 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
  """
          for event in sub_root.findall('./events/event'):
              output += 'Dialogue: 0'
-            output += ',' + event.attrib["start"]
-            output += ',' + event.attrib["end"]
-            output += ',' + event.attrib["style"]
-            output += ',' + event.attrib["name"]
-            output += ',' + event.attrib["margin_l"]
-            output += ',' + event.attrib["margin_r"]
-            output += ',' + event.attrib["margin_v"]
-            output += ',' + event.attrib["effect"]
-            output += ',' + event.attrib["text"]
+            output += ',' + event.attrib['start']
+            output += ',' + event.attrib['end']
+            output += ',' + event.attrib['style']
+            output += ',' + event.attrib['name']
+            output += ',' + event.attrib['margin_l']
+            output += ',' + event.attrib['margin_r']
+            output += ',' + event.attrib['margin_v']
+            output += ',' + event.attrib['effect']
+            output += ',' + event.attrib['text']
              output += '\n'
  
          return output
  
      def _extract_subtitles(self, subtitle):
-        sub_root = xml.etree.ElementTree.fromstring(subtitle)
+        sub_root = compat_etree_fromstring(subtitle)
          return [{
              'ext': 'srt',
              'data': self._convert_subtitles_to_srt(sub_root),
@@ -208,7 +246,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
  
      def _get_subtitles(self, video_id, webpage):
          subtitles = {}
-        for sub_id, sub_name in re.findall(r'\?ssid=([0-9]+)" title="([^"]+)', webpage):
+        for sub_id, sub_name in re.findall(r'\bssid=([0-9]+)"[^>]+?\btitle="([^"]+)', webpage):
              sub_page = self._download_webpage(
                  'http://www.crunchyroll.com/xml/?req=RpcApiSubtitle_GetXml&subtitle_script_id=' + sub_id,
                  video_id, note='Downloading subtitles for ' + sub_name)
@@ -234,8 +272,10 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          else:
              webpage_url = 'http://www.' + mobj.group('url')
  
-        webpage = self._download_webpage(webpage_url, video_id, 'Downloading webpage')
-        note_m = self._html_search_regex(r'<div class="showmedia-trailer-notice">(.+?)</div>', webpage, 'trailer-notice', default='')
+        webpage = self._download_webpage(self._add_skip_wall(webpage_url), video_id, 'Downloading webpage')
+        note_m = self._html_search_regex(
+            r'<div class="showmedia-trailer-notice">(.+?)</div>',
+            webpage, 'trailer-notice', default='')
          if note_m:
              raise ExtractorError(note_m)
  
@@ -245,19 +285,30 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
              if msg.get('type') == 'error':
                  raise ExtractorError('crunchyroll returned error: %s' % msg['message_body'], expected=True)
  
-        video_title = self._html_search_regex(r'<h1[^>]*>(.+?)</h1>', webpage, 'video_title', flags=re.DOTALL)
+        if 'To view this, please log in to verify you are 18 or older.' in webpage:
+            self.raise_login_required()
+
+        video_title = self._html_search_regex(
+            r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
+            webpage, 'video_title')
          video_title = re.sub(r' {2,}', ' ', video_title)
-        video_description = self._html_search_regex(r'"description":"([^"]+)', webpage, 'video_description', default='')
-        if not video_description:
-            video_description = None
-        video_upload_date = self._html_search_regex(r'<div>Availability for free users:(.+?)</div>', webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
+        video_description = self._html_search_regex(
+            r'<script[^>]*>\s*.+?\[media_id=%s\].+?"description"\s*:\s*"([^"]+)' % video_id,
+            webpage, 'description', default=None)
+        if video_description:
+            video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
+        video_upload_date = self._html_search_regex(
+            [r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
+            webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
          if video_upload_date:
              video_upload_date = unified_strdate(video_upload_date)
-        video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, 'video_uploader', fatal=False, flags=re.DOTALL)
+        video_uploader = self._html_search_regex(
+            r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
+            'video_uploader', fatal=False)
  
          playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
-        playerdata_req = compat_urllib_request.Request(playerdata_url)
-        playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
+        playerdata_req = sanitized_Request(playerdata_url)
+        playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
          playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
          playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
  
@@ -268,24 +319,46 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
              stream_quality, stream_format = self._FORMAT_IDS[fmt]
              video_format = fmt + 'p'
-            streamdata_req = compat_urllib_request.Request(
+            streamdata_req = sanitized_Request(
                  'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
                  % (stream_id, stream_format, stream_quality),
-                compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
+                compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
              streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
              streamdata = self._download_xml(
                  streamdata_req, video_id,
                  note='Downloading media info for %s' % video_format)
              stream_info = streamdata.find('./{default}preload/stream_info')
-            video_url = stream_info.find('./host').text
-            video_play_path = stream_info.find('./file').text
-            formats.append({
+            video_url = xpath_text(stream_info, './host')
+            video_play_path = xpath_text(stream_info, './file')
+            if not video_url or not video_play_path:
+                continue
+            metadata = stream_info.find('./metadata')
+            format_info = {
+                'format': video_format,
+                'format_id': video_format,
+                'height': int_or_none(xpath_text(metadata, './height')),
+                'width': int_or_none(xpath_text(metadata, './width')),
+            }
+
+            if '.fplive.net/' in video_url:
+                video_url = re.sub(r'^rtmpe?://', 'http://', video_url.strip())
+                parsed_video_url = compat_urlparse.urlparse(video_url)
+                direct_video_url = compat_urlparse.urlunparse(parsed_video_url._replace(
+                    netloc='v.lvlt.crcdn.net',
+                    path='%s/%s' % (remove_end(parsed_video_url.path, '/'), video_play_path.split(':')[-1])))
+                if self._is_valid_url(direct_video_url, video_id, video_format):
+                    format_info.update({
+                        'url': direct_video_url,
+                    })
+                    formats.append(format_info)
+                    continue
+
+            format_info.update({
                  'url': video_url,
                  'play_path': video_play_path,
                  'ext': 'flv',
-                'format': video_format,
-                'format_id': video_format,
              })
+            formats.append(format_info)
  
          subtitles = self.extract_subtitles(video_id, webpage)
  
@@ -301,9 +374,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
          }
  
  
-class CrunchyrollShowPlaylistIE(InfoExtractor):
-    IE_NAME = "crunchyroll:playlist"
-    _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?$'
+class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
+    IE_NAME = 'crunchyroll:playlist'
+    _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?(?:\?|$)'
  
      _TESTS = [{
          'url': 'http://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',
@@ -312,12 +385,25 @@ class CrunchyrollShowPlaylistIE(InfoExtractor):
              'title': 'A Bridge to the Starry Skies - Hoshizora e Kakaru Hashi'
          },
          'playlist_count': 13,
+    }, {
+        # geo-restricted (US), 18+ maturity wall, non-premium available
+        'url': 'http://www.crunchyroll.com/cosplay-complex-ova',
+        'info_dict': {
+            'id': 'cosplay-complex-ova',
+            'title': 'Cosplay Complex OVA'
+        },
+        'playlist_count': 3,
+        'skip': 'Georestricted',
+    }, {
+        # geo-restricted (US), 18+ maturity wall, non-premium will be available since 2015.11.14
+        'url': 'http://www.crunchyroll.com/ladies-versus-butlers?skip_wall=1',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          show_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, show_id)
+        webpage = self._download_webpage(self._add_skip_wall(url), show_id)
          title = self._html_search_regex(
              r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>',
              webpage, 'title')
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index fbefd37d09a98bb19c82b4c09b7b08c99d147d35..84b36f44cfac7bd45a8a7d28adb6767093a7d19b 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -9,42 +9,42 @@ from ..utils import (
      find_xpath_attr,
      smuggle_url,
      determine_ext,
+    ExtractorError,
  )
  from .senateisvp import SenateISVPIE
  
  
  class CSpanIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
+    _VALID_URL = r'https?://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
      IE_DESC = 'C-SPAN'
      _TESTS = [{
          'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
-        'md5': '8e44ce11f0f725527daccc453f553eb0',
+        'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
          'info_dict': {
              'id': '315139',
              'ext': 'mp4',
              'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
-            'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
+            'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
          },
          'skip': 'Regularly fails on travis, for unknown reasons',
      }, {
          'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
-        # For whatever reason, the served video alternates between
-        # two different ones
+        'md5': '8e5fbfabe6ad0f89f3012a7943c1287b',
          'info_dict': {
-            'id': '340723',
+            'id': 'c4486943',
              'ext': 'mp4',
-            'title': 'International Health Care Models',
+            'title': 'CSPAN - International Health Care Models',
              'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
          }
      }, {
          'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
-        'md5': '446562a736c6bf97118e389433ed88d4',
+        'md5': '2ae5051559169baadba13fc35345ae74',
          'info_dict': {
              'id': '342759',
              'ext': 'mp4',
              'title': 'General Motors Ignition Switch Recall',
              'duration': 14848,
-            'description': 'md5:70c7c3b8fa63fa60d42772440596034c'
+            'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
          },
      }, {
          # Video from senate.gov
@@ -57,67 +57,94 @@ class CSpanIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        page_id = mobj.group('id')
-        webpage = self._download_webpage(url, page_id)
-        video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
+        video_id = self._match_id(url)
+        video_type = None
+        webpage = self._download_webpage(url, video_id)
+        # We first look for clipid, because clipprog always appears before
+        patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
+        results = list(filter(None, (re.search(p, webpage) for p in patterns)))
+        if results:
+            matches = results[0]
+            video_type, video_id = matches.groups()
+            video_type = 'clip' if video_type == 'id' else 'program'
+        else:
+            m = re.search(r'data-(?P<type>clip|prog)id=["\'](?P<id>\d+)', webpage)
+            if m:
+                video_id = m.group('id')
+                video_type = 'program' if m.group('type') == 'prog' else 'clip'
+            else:
+                senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
+                if senate_isvp_url:
+                    title = self._og_search_title(webpage)
+                    surl = smuggle_url(senate_isvp_url, {'force_title': title})
+                    return self.url_result(surl, 'SenateISVP', video_id, title)
+        if video_type is None or video_id is None:
+            raise ExtractorError('unable to find video id and type')
  
-        description = self._html_search_regex(
-            [
-                # The full description
-                r'<div class=\'expandable\'>(.*?)<a href=\'#\'',
-                # If the description is small enough the other div is not
-                # present, otherwise this is a stripped version
-                r'<p class=\'initial\'>(.*?)</p>'
-            ],
-            webpage, 'description', flags=re.DOTALL, default=None)
+        def get_text_attr(d, attr):
+            return d.get(attr, {}).get('#text')
  
-        info_url = 'http://c-spanvideo.org/videoLibrary/assets/player/ajax-player.php?os=android&html5=program&id=' + video_id
-        data = self._download_json(info_url, video_id)
+        data = self._download_json(
+            'http://www.c-span.org/assets/player/ajax-player.php?os=android&html5=%s&id=%s' % (video_type, video_id),
+            video_id)['video']
+        if data['@status'] != 'Success':
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, get_text_attr(data, 'error')), expected=True)
  
          doc = self._download_xml(
-            'http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
+            'http://www.c-span.org/common/services/flashXml.php?%sid=%s' % (video_type, video_id),
              video_id)
  
+        description = self._html_search_meta('description', webpage)
+
          title = find_xpath_attr(doc, './/string', 'name', 'title').text
          thumbnail = find_xpath_attr(doc, './/string', 'name', 'poster').text
  
-        senate_isvp_url = SenateISVPIE._search_iframe_url(webpage)
-        if senate_isvp_url:
-            surl = smuggle_url(senate_isvp_url, {'force_title': title})
-            return self.url_result(surl, 'SenateISVP', video_id, title)
-
-        files = data['video']['files']
-        try:
-            capfile = data['video']['capfile']['#text']
-        except KeyError:
-            capfile = None
+        files = data['files']
+        capfile = get_text_attr(data, 'capfile')
  
-        entries = [{
-            'id': '%s_%d' % (video_id, partnum + 1),
-            'title': (
-                title if len(files) == 1 else
-                '%s part %d' % (title, partnum + 1)),
-            'url': unescapeHTML(f['path']['#text']),
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': int_or_none(f.get('length', {}).get('#text')),
-            'subtitles': {
-                'en': [{
-                    'url': capfile,
-                    'ext': determine_ext(capfile, 'dfxp')
-                }],
-            } if capfile else None,
-        } for partnum, f in enumerate(files)]
+        entries = []
+        for partnum, f in enumerate(files):
+            formats = []
+            for quality in f['qualities']:
+                formats.append({
+                    'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
+                    'url': unescapeHTML(get_text_attr(quality, 'file')),
+                    'height': int_or_none(get_text_attr(quality, 'height')),
+                    'tbr': int_or_none(get_text_attr(quality, 'bitrate')),
+                })
+            if not formats:
+                path = unescapeHTML(get_text_attr(f, 'path'))
+                if not path:
+                    continue
+                formats = self._extract_m3u8_formats(
+                    path, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
+            self._sort_formats(formats)
+            entries.append({
+                'id': '%s_%d' % (video_id, partnum + 1),
+                'title': (
+                    title if len(files) == 1 else
+                    '%s part %d' % (title, partnum + 1)),
+                'formats': formats,
+                'description': description,
+                'thumbnail': thumbnail,
+                'duration': int_or_none(get_text_attr(f, 'length')),
+                'subtitles': {
+                    'en': [{
+                        'url': capfile,
+                        'ext': determine_ext(capfile, 'dfxp')
+                    }],
+                } if capfile else None,
+            })
  
          if len(entries) == 1:
              entry = dict(entries[0])
-            entry['id'] = video_id
+            entry['id'] = 'c' + video_id if video_type == 'clip' else video_id
              return entry
          else:
              return {
                  '_type': 'playlist',
                  'entries': entries,
                  'title': title,
-                'id': video_id,
+                'id': 'c' + video_id if video_type == 'clip' else video_id,
              }
diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py

index 45049bf371370da6e4b64952441e76d86814fd6a..1622fc844a1b8d4794fc12694f03f37c00076f15 100644 (file)
--- a/youtube_dl/extractor/ctsnews.py
+++ b/youtube_dl/extractor/ctsnews.py
@@ -8,7 +8,7 @@ from ..utils import parse_iso8601, ExtractorError
  class CtsNewsIE(InfoExtractor):
      IE_DESC = '華視新聞'
      # https connection failed (Connection reset)
-    _VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
+    _VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
      _TESTS = [{
          'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
          'md5': 'a9875cb790252b08431186d741beaabe',
diff --git a/youtube_dl/extractor/cultureunplugged.py b/youtube_dl/extractor/cultureunplugged.py

new file mode 100644 (file)

index 0000000..9c764fe
--- /dev/null
+++ b/youtube_dl/extractor/cultureunplugged.py
@@ -0,0 +1,63 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class CultureUnpluggedIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?'
+    _TESTS = [{
+        'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
+        'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
+        'info_dict': {
+            'id': '53662',
+            'display_id': 'The-Next--Best-West',
+            'ext': 'mp4',
+            'title': 'The Next, Best West',
+            'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'creator': 'Coldstream Creative',
+            'duration': 2203,
+            'view_count': int,
+        }
+    }, {
+        'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        movie_data = self._download_json(
+            'http://www.cultureunplugged.com/movie-data/cu-%s.json' % video_id, display_id)
+
+        video_url = movie_data['url']
+        title = movie_data['title']
+
+        description = movie_data.get('synopsis')
+        creator = movie_data.get('producer')
+        duration = int_or_none(movie_data.get('duration'))
+        view_count = int_or_none(movie_data.get('views'))
+
+        thumbnails = [{
+            'url': movie_data['%s_thumb' % size],
+            'id': size,
+            'preference': preference,
+        } for preference, size in enumerate((
+            'small', 'large')) if movie_data.get('%s_thumb' % size)]
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'creator': creator,
+            'duration': duration,
+            'view_count': view_count,
+            'thumbnails': thumbnails,
+        }
diff --git a/youtube_dl/extractor/cwtv.py b/youtube_dl/extractor/cwtv.py

new file mode 100644 (file)

index 0000000..f5cefd9
--- /dev/null
+++ b/youtube_dl/extractor/cwtv.py
@@ -0,0 +1,89 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class CWTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
+    _TESTS = [{
+        'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
+        'info_dict': {
+            'id': '6b15e985-9345-4f60-baf8-56e96be57c63',
+            'ext': 'mp4',
+            'title': 'Legends of Yesterday',
+            'description': 'Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote location to keep them hidden from Vandal Savage while they figure out how to defeat him.',
+            'duration': 2665,
+            'series': 'Arrow',
+            'season_number': 4,
+            'season': '4',
+            'episode_number': 8,
+            'upload_date': '20151203',
+            'timestamp': 1449122100,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
+        'info_dict': {
+            'id': '24282b12-ead2-42f2-95ad-26770c2c6088',
+            'ext': 'mp4',
+            'title': 'Jeff Davis 4',
+            'description': 'Jeff Davis is back to make you laugh.',
+            'duration': 1263,
+            'series': 'Whose Line Is It Anyway?',
+            'season_number': 11,
+            'season': '11',
+            'episode_number': 20,
+            'upload_date': '20151006',
+            'timestamp': 1444107300,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
+
+        formats = self._extract_m3u8_formats(
+            video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
+        self._sort_formats(formats)
+
+        thumbnails = [{
+            'url': image['uri'],
+            'width': image.get('width'),
+            'height': image.get('height'),
+        } for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
+
+        video_metadata = video_data['assetFields']
+
+        subtitles = {
+            'en': [{
+                'url': video_metadata['UnicornCcUrl'],
+            }],
+        } if video_metadata.get('UnicornCcUrl') else None
+
+        return {
+            'id': video_id,
+            'title': video_metadata['title'],
+            'description': video_metadata.get('description'),
+            'duration': int_or_none(video_metadata.get('duration')),
+            'series': video_metadata.get('seriesName'),
+            'season_number': int_or_none(video_metadata.get('seasonNumber')),
+            'season': video_metadata.get('seasonName'),
+            'episode_number': int_or_none(video_metadata.get('episodeNumber')),
+            'timestamp': parse_iso8601(video_data.get('startTime')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 85d945509d07ca686490c3471d0b109991524ed1..2e6226ea0774af2e636cbc4b4a4ca9f1ecb763a3 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -7,16 +7,13 @@ import itertools
  
  from .common import InfoExtractor
  
-from ..compat import (
-    compat_str,
-    compat_urllib_request,
-)
  from ..utils import (
-    ExtractorError,
      determine_ext,
+    error_to_compat_str,
+    ExtractorError,
      int_or_none,
-    orderedSet,
      parse_iso8601,
+    sanitized_Request,
      str_to_int,
      unescapeHTML,
  )
@@ -26,7 +23,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
      @staticmethod
      def _build_request(url):
          """Build a request with the family filter disabled"""
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          request.add_header('Cookie', 'family_filter=off; ff=off')
          return request
  
@@ -40,7 +37,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
  
  
  class DailymotionIE(DailymotionBaseInfoExtractor):
-    _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
+    _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:embed|swf|#)/)?video/(?P<id>[^/?_]+)'
      IE_NAME = 'dailymotion'
  
      _FORMATS = [
@@ -97,6 +94,20 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                  'uploader': 'HotWaves1012',
                  'age_limit': 18,
              }
+        },
+        # geo-restricted, player v5
+        {
+            'url': 'http://www.dailymotion.com/video/xhza0o',
+            'only_matching': True,
+        },
+        # with subtitles
+        {
+            'url': 'http://www.dailymotion.com/video/x20su5f_the-power-of-nightmares-1-the-rise-of-the-politics-of-fear-bbc-2004_news',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://www.dailymotion.com/swf/video/x3n92nf',
+            'only_matching': True,
          }
      ]
  
@@ -111,20 +122,28 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
          description = self._og_search_description(webpage) or self._html_search_meta(
              'description', webpage, 'description')
  
-        view_count = str_to_int(self._search_regex(
-            [r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:(\d+)"',
-             r'video_views_count[^>]+>\s+([\d\.,]+)'],
-            webpage, 'view count', fatal=False))
+        view_count_str = self._search_regex(
+            (r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
+             r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
+            webpage, 'view count', fatal=False)
+        if view_count_str:
+            view_count_str = re.sub(r'\s', '', view_count_str)
+        view_count = str_to_int(view_count_str)
          comment_count = int_or_none(self._search_regex(
              r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
              webpage, 'comment count', fatal=False))
  
          player_v5 = self._search_regex(
-            r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
+            [r'buildPlayer\(({.+?})\);\n',  # See https://github.com/rg3/youtube-dl/issues/7826
+             r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
+             r'buildPlayer\(({.+?})\);'],
              webpage, 'player v5', default=None)
          if player_v5:
              player = self._parse_json(player_v5, video_id)
              metadata = player['metadata']
+
+            self._check_error(metadata)
+
              formats = []
              for quality, media_list in metadata['qualities'].items():
                  for media in media_list:
@@ -134,13 +153,18 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                      type_ = media.get('type')
                      if type_ == 'application/vnd.lumberjack.manifest':
                          continue
-                    if type_ == 'application/x-mpegURL' or determine_ext(media_url) == 'm3u8':
+                    ext = determine_ext(media_url)
+                    if type_ == 'application/x-mpegURL' or ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            media_url, video_id, 'mp4', m3u8_id='hls'))
+                            media_url, video_id, 'mp4', preference=-1,
+                            m3u8_id='hls', fatal=False))
+                    elif type_ == 'application/f4m' or ext == 'f4m':
+                        formats.extend(self._extract_f4m_formats(
+                            media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
                      else:
                          f = {
                              'url': media_url,
-                            'format_id': quality,
+                            'format_id': 'http-%s' % quality,
                          }
                          m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
                          if m:
@@ -159,11 +183,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
              uploader_id = metadata.get('owner', {}).get('id')
  
              subtitles = {}
-            for subtitle_lang, subtitle in metadata.get('subtitles', {}).get('data', {}).items():
-                subtitles[subtitle_lang] = [{
-                    'ext': determine_ext(subtitle_url),
-                    'url': subtitle_url,
-                } for subtitle_url in subtitle.get('urls', [])]
+            subtitles_data = metadata.get('subtitles', {}).get('data', {})
+            if subtitles_data and isinstance(subtitles_data, dict):
+                for subtitle_lang, subtitle in subtitles_data.items():
+                    subtitles[subtitle_lang] = [{
+                        'ext': determine_ext(subtitle_url),
+                        'url': subtitle_url,
+                    } for subtitle_url in subtitle.get('urls', [])]
  
              return {
                  'id': video_id,
@@ -202,9 +228,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                  'video info', flags=re.MULTILINE),
              video_id)
  
-        if info.get('error') is not None:
-            msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
-            raise ExtractorError(msg, expected=True)
+        self._check_error(info)
  
          formats = []
          for (key, format_id) in self._FORMATS:
@@ -247,13 +271,18 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
              'duration': info['duration']
          }
  
+    def _check_error(self, info):
+        if info.get('error') is not None:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, info['error']['title']), expected=True)
+
      def _get_subtitles(self, video_id, webpage):
          try:
              sub_list = self._download_webpage(
                  'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
                  video_id, note=False)
          except ExtractorError as err:
-            self._downloader.report_warning('unable to download video subtitles: %s' % compat_str(err))
+            self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
              return {}
          info = json.loads(sub_list)
          if (info['total'] > 0):
@@ -278,7 +307,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
      }]
  
      def _extract_entries(self, id):
-        video_ids = []
+        video_ids = set()
          processed_urls = set()
          for pagenum in itertools.count(1):
              page_url = self._PAGE_TEMPLATE % (id, pagenum)
@@ -291,12 +320,13 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  
              processed_urls.add(urlh.geturl())
  
-            video_ids.extend(re.findall(r'data-xid="(.+?)"', webpage))
+            for video_id in re.findall(r'data-xid="(.+?)"', webpage):
+                if video_id not in video_ids:
+                    yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
+                    video_ids.add(video_id)
  
              if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
                  break
-        return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
-                for video_id in orderedSet(video_ids)]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -313,7 +343,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  
  class DailymotionUserIE(DailymotionPlaylistIE):
      IE_NAME = 'dailymotion:user'
-    _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
      _PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
      _TESTS = [{
          'url': 'https://www.dailymotion.com/user/nqtv',
@@ -369,13 +399,13 @@ class DailymotionCloudIE(DailymotionBaseInfoExtractor):
      }]
  
      @classmethod
-    def _extract_dmcloud_url(self, webpage):
-        mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % self._VALID_EMBED_URL, webpage)
+    def _extract_dmcloud_url(cls, webpage):
+        mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
          if mobj:
              return mobj.group(1)
  
          mobj = re.search(
-            r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % self._VALID_EMBED_URL,
+            r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
              webpage)
          if mobj:
              return mobj.group(1)
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index 934da765ee700712721281a85dd955c28405001e..86024a745661dda2da9d3fb883ccf4db017a722c 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -3,56 +3,91 @@
  from __future__ import unicode_literals
  
  import re
+import itertools
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_parse_qs,
+    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
+from ..utils import (
+    int_or_none,
+    str_to_int,
+    xpath_text,
+    unescapeHTML,
  )
  
  
  class DaumIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:v/|.*?clipid=)(?P<id>[^?#&]+)'
+    _VALID_URL = r'https?://(?:(?:m\.)?tvpot\.daum\.net/v/|videofarm\.daum\.net/controller/player/VodPlayer\.swf\?vid=)(?P<id>[^?#&]+)'
      IE_NAME = 'daum.net'
  
      _TESTS = [{
-        'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
+        'url': 'http://tvpot.daum.net/v/vab4dyeDBysyBssyukBUjBz',
          'info_dict': {
-            'id': '52554690',
+            'id': 'vab4dyeDBysyBssyukBUjBz',
              'ext': 'mp4',
-            'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
-            'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
-            'upload_date': '20130831',
-            'duration': 3868,
+            'title': '마크 헌트 vs 안토니오 실바',
+            'description': 'Mark Hunt vs Antonio Silva',
+            'upload_date': '20131217',
+            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'duration': 2117,
+            'view_count': int,
+            'comment_count': int,
          },
      }, {
-        'url': 'http://tvpot.daum.net/v/vab4dyeDBysyBssyukBUjBz',
-        'only_matching': True,
+        'url': 'http://m.tvpot.daum.net/v/65139429',
+        'info_dict': {
+            'id': '65139429',
+            'ext': 'mp4',
+            'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
+            'description': 'md5:79794514261164ff27e36a21ad229fc5',
+            'upload_date': '20150604',
+            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'duration': 154,
+            'view_count': int,
+            'comment_count': int,
+        },
      }, {
          'url': 'http://tvpot.daum.net/v/07dXWRka62Y%24',
          'only_matching': True,
+    }, {
+        'url': 'http://videofarm.daum.net/controller/player/VodPlayer.swf?vid=vwIpVpCQsT8%24&ref=',
+        'info_dict': {
+            'id': 'vwIpVpCQsT8$',
+            'ext': 'flv',
+            'title': '01-Korean War ( Trouble on the horizon )',
+            'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
+            'upload_date': '20080223',
+            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'duration': 249,
+            'view_count': int,
+            'comment_count': int,
+        },
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
-        webpage = self._download_webpage(canonical_url, video_id)
-        full_id = self._search_regex(
-            r'src=["\']http://videofarm\.daum\.net/controller/video/viewer/Video\.html\?.*?vid=(.+?)[&"\']',
-            webpage, 'full id')
-        query = compat_urllib_parse.urlencode({'vid': full_id})
+        video_id = compat_urllib_parse_unquote(self._match_id(url))
+        query = compat_urllib_parse_urlencode({'vid': video_id})
+        movie_data = self._download_json(
+            'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
+            video_id, 'Downloading video formats info')
+
+        # For urls like http://m.tvpot.daum.net/v/65139429, where the video_id is really a clipid
+        if not movie_data.get('output_list', {}).get('output_list') and re.match(r'^\d+$', video_id):
+            return self.url_result('http://tvpot.daum.net/clip/ClipView.do?clipid=%s' % video_id)
+
          info = self._download_xml(
              'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
              'Downloading video info')
-        urls = self._download_xml(
-            'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
-            video_id, 'Downloading video formats info')
  
          formats = []
-        for format_el in urls.findall('result/output_list/output_list'):
-            profile = format_el.attrib['profile']
-            format_query = compat_urllib_parse.urlencode({
-                'vid': full_id,
+        for format_el in movie_data['output_list']['output_list']:
+            profile = format_el['profile']
+            format_query = compat_urllib_parse_urlencode({
+                'vid': video_id,
                  'profile': profile,
              })
              url_doc = self._download_xml(
@@ -62,14 +97,202 @@ class DaumIE(InfoExtractor):
              formats.append({
                  'url': format_url,
                  'format_id': profile,
+                'width': int_or_none(format_el.get('width')),
+                'height': int_or_none(format_el.get('height')),
+                'filesize': int_or_none(format_el.get('filesize')),
              })
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': info.find('TITLE').text,
              'formats': formats,
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'description': info.find('CONTENTS').text,
-            'duration': int(info.find('DURATION').text),
+            'thumbnail': xpath_text(info, 'THUMB_URL'),
+            'description': xpath_text(info, 'CONTENTS'),
+            'duration': int_or_none(xpath_text(info, 'DURATION')),
              'upload_date': info.find('REGDTTM').text[:8],
+            'view_count': str_to_int(xpath_text(info, 'PLAY_CNT')),
+            'comment_count': str_to_int(xpath_text(info, 'COMMENT_CNT')),
          }
+
+
+class DaumClipIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/(?:clip/ClipView.(?:do|tv)|mypot/View.do)\?.*?clipid=(?P<id>\d+)'
+    IE_NAME = 'daum.net:clip'
+    _URL_TEMPLATE = 'http://tvpot.daum.net/clip/ClipView.do?clipid=%s'
+
+    _TESTS = [{
+        'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
+        'info_dict': {
+            'id': '52554690',
+            'ext': 'mp4',
+            'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
+            'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
+            'upload_date': '20130831',
+            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'duration': 3868,
+            'view_count': int,
+        },
+    }, {
+        'url': 'http://m.tvpot.daum.net/clip/ClipView.tv?clipid=54999425',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if DaumPlaylistIE.suitable(url) or DaumUserIE.suitable(url) else super(DaumClipIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        clip_info = self._download_json(
+            'http://tvpot.daum.net/mypot/json/GetClipInfo.do?clipid=%s' % video_id,
+            video_id, 'Downloading clip info')['clip_bean']
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'http://tvpot.daum.net/v/%s' % clip_info['vid'],
+            'title': unescapeHTML(clip_info['title']),
+            'thumbnail': clip_info.get('thumb_url'),
+            'description': clip_info.get('contents'),
+            'duration': int_or_none(clip_info.get('duration')),
+            'upload_date': clip_info.get('up_date')[:8],
+            'view_count': int_or_none(clip_info.get('play_count')),
+            'ie_key': 'Daum',
+        }
+
+
+class DaumListIE(InfoExtractor):
+    def _get_entries(self, list_id, list_id_type):
+        name = None
+        entries = []
+        for pagenum in itertools.count(1):
+            list_info = self._download_json(
+                'http://tvpot.daum.net/mypot/json/GetClipInfo.do?size=48&init=true&order=date&page=%d&%s=%s' % (
+                    pagenum, list_id_type, list_id), list_id, 'Downloading list info - %s' % pagenum)
+
+            entries.extend([
+                self.url_result(
+                    'http://tvpot.daum.net/v/%s' % clip['vid'])
+                for clip in list_info['clip_list']
+            ])
+
+            if not name:
+                name = list_info.get('playlist_bean', {}).get('name') or \
+                    list_info.get('potInfo', {}).get('name')
+
+            if not list_info.get('has_more'):
+                break
+
+        return name, entries
+
+    def _check_clip(self, url, list_id):
+        query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
+        if 'clipid' in query_dict:
+            clip_id = query_dict['clipid'][0]
+            if self._downloader.params.get('noplaylist'):
+                self.to_screen('Downloading just video %s because of --no-playlist' % clip_id)
+                return self.url_result(DaumClipIE._URL_TEMPLATE % clip_id, 'DaumClip')
+            else:
+                self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % list_id)
+
+
+class DaumPlaylistIE(DaumListIE):
+    _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View\.do|Top\.tv)\?.*?playlistid=(?P<id>[0-9]+)'
+    IE_NAME = 'daum.net:playlist'
+    _URL_TEMPLATE = 'http://tvpot.daum.net/mypot/View.do?playlistid=%s'
+
+    _TESTS = [{
+        'note': 'Playlist url with clipid',
+        'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
+        'info_dict': {
+            'id': '6213966',
+            'title': 'Woorissica Official',
+        },
+        'playlist_mincount': 181
+    }, {
+        'note': 'Playlist url with clipid - noplaylist',
+        'url': 'http://tvpot.daum.net/mypot/View.do?playlistid=6213966&clipid=73806844',
+        'info_dict': {
+            'id': '73806844',
+            'ext': 'mp4',
+            'title': '151017 Airport',
+            'upload_date': '20160117',
+        },
+        'params': {
+            'noplaylist': True,
+            'skip_download': True,
+        }
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if DaumUserIE.suitable(url) else super(DaumPlaylistIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        list_id = self._match_id(url)
+
+        clip_result = self._check_clip(url, list_id)
+        if clip_result:
+            return clip_result
+
+        name, entries = self._get_entries(list_id, 'playlistid')
+
+        return self.playlist_result(entries, list_id, name)
+
+
+class DaumUserIE(DaumListIE):
+    _VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/mypot/(?:View|Top)\.(?:do|tv)\?.*?ownerid=(?P<id>[0-9a-zA-Z]+)'
+    IE_NAME = 'daum.net:user'
+
+    _TESTS = [{
+        'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0',
+        'info_dict': {
+            'id': 'o2scDLIVbHc0',
+            'title': '마이 리틀 텔레비전',
+        },
+        'playlist_mincount': 213
+    }, {
+        'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&clipid=73801156',
+        'info_dict': {
+            'id': '73801156',
+            'ext': 'mp4',
+            'title': '[미공개] 김구라, 오만석이 부릅니다 \'오케피\' - 마이 리틀 텔레비전 20160116',
+            'upload_date': '20160117',
+            'description': 'md5:5e91d2d6747f53575badd24bd62b9f36'
+        },
+        'params': {
+            'noplaylist': True,
+            'skip_download': True,
+        }
+    }, {
+        'note': 'Playlist url has ownerid and playlistid, playlistid takes precedence',
+        'url': 'http://tvpot.daum.net/mypot/View.do?ownerid=o2scDLIVbHc0&playlistid=6196631',
+        'info_dict': {
+            'id': '6196631',
+            'title': '마이 리틀 텔레비전 - 20160109',
+        },
+        'playlist_count': 11
+    }, {
+        'url': 'http://tvpot.daum.net/mypot/Top.do?ownerid=o2scDLIVbHc0',
+        'only_matching': True,
+    }, {
+        'url': 'http://m.tvpot.daum.net/mypot/Top.tv?ownerid=45x1okb1If50&playlistid=3569733',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        list_id = self._match_id(url)
+
+        clip_result = self._check_clip(url, list_id)
+        if clip_result:
+            return clip_result
+
+        query_dict = compat_parse_qs(compat_urlparse.urlparse(url).query)
+        if 'playlistid' in query_dict:
+            playlist_id = query_dict['playlistid'][0]
+            return self.url_result(DaumPlaylistIE._URL_TEMPLATE % playlist_id, 'DaumPlaylist')
+
+        name, entries = self._get_entries(list_id, 'ownerid')
+
+        return self.playlist_result(entries, list_id, name)
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index 2122176254eeacce3241a9e517a70daf90b0b187..133cdc50b8c8379021d6d7fffb7c7c28dd2ba30d 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -13,8 +13,8 @@ from ..utils import (
  
  
  class DBTVIE(InfoExtractor):
-    _VALID_URL = r'http://dbtv\.no/(?P<id>[0-9]+)#(?P<display_id>.+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:(?:lazyplayer|player)/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
+    _TESTS = [{
          'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
          'md5': 'b89953ed25dacb6edb3ef6c6f430f8bc',
          'info_dict': {
@@ -30,12 +30,18 @@ class DBTVIE(InfoExtractor):
              'view_count': int,
              'categories': list,
          }
-    }
+    }, {
+        'url': 'http://dbtv.no/3649835190001',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
  
          data = self._download_json(
              'http://api.dbtv.no/discovery/%s' % video_id, display_id)
diff --git a/youtube_dl/extractor/dcn.py b/youtube_dl/extractor/dcn.py

new file mode 100644 (file)

index 0000000..5deff5f
--- /dev/null
+++ b/youtube_dl/extractor/dcn.py
@@ -0,0 +1,197 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import base64
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_str,
+)
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+    smuggle_url,
+    unsmuggle_url,
+    urlencode_postdata,
+)
+
+
+class DCNIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
+
+    def _real_extract(self, url):
+        show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
+        if video_id and int(video_id) > 0:
+            return self.url_result(
+                'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo')
+        elif season_id and int(season_id) > 0:
+            return self.url_result(smuggle_url(
+                'http://www.dcndigital.ae/program/season/%s' % season_id,
+                {'show_id': show_id}), 'DCNSeason')
+        else:
+            return self.url_result(
+                'http://www.dcndigital.ae/program/%s' % show_id, 'DCNSeason')
+
+
+class DCNBaseIE(InfoExtractor):
+    def _extract_video_info(self, video_data, video_id, is_live):
+        title = video_data.get('title_en') or video_data['title_ar']
+        img = video_data.get('img')
+        thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
+        duration = int_or_none(video_data.get('duration'))
+        description = video_data.get('description_en') or video_data.get('description_ar')
+        timestamp = parse_iso8601(video_data.get('create_time'), ' ')
+
+        return {
+            'id': video_id,
+            'title': self._live_title(title) if is_live else title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'is_live': is_live,
+        }
+
+    def _extract_video_formats(self, webpage, video_id, entry_protocol):
+        formats = []
+        m3u8_url = self._html_search_regex(
+            r'file\s*:\s*"([^"]+)', webpage, 'm3u8 url', fatal=False)
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=None))
+
+        rtsp_url = self._search_regex(
+            r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False)
+        if rtsp_url:
+            formats.append({
+                'url': rtsp_url,
+                'format_id': 'rtsp',
+            })
+
+        self._sort_formats(formats)
+        return formats
+
+
+class DCNVideoIE(DCNBaseIE):
+    IE_NAME = 'dcn:video'
+    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/[^/]+|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
+        'info_dict':
+        {
+            'id': '17375',
+            'ext': 'mp4',
+            'title': 'رحلة العمر : الحلقة 1',
+            'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
+            'duration': 2041,
+            'timestamp': 1227504126,
+            'upload_date': '20081124',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        request = sanitized_Request(
+            'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
+            headers={'Origin': 'http://www.dcndigital.ae'})
+        video_data = self._download_json(request, video_id)
+        info = self._extract_video_info(video_data, video_id, False)
+
+        webpage = self._download_webpage(
+            'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
+            compat_urllib_parse_urlencode({
+                'id': video_data['id'],
+                'user_id': video_data['user_id'],
+                'signature': video_data['signature'],
+                'countries': 'Q0M=',
+                'filter': 'DENY',
+            }), video_id)
+        info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
+        return info
+
+
+class DCNLiveIE(DCNBaseIE):
+    IE_NAME = 'dcn:live'
+    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?live/(?P<id>\d+)'
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+
+        request = sanitized_Request(
+            'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
+            headers={'Origin': 'http://www.dcndigital.ae'})
+
+        channel_data = self._download_json(request, channel_id)
+        info = self._extract_video_info(channel_data, channel_id, True)
+
+        webpage = self._download_webpage(
+            'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
+            compat_urllib_parse_urlencode({
+                'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
+                'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
+                'signature': channel_data['signature'],
+                'countries': 'Q0M=',
+                'filter': 'DENY',
+            }), channel_id)
+        info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
+        return info
+
+
+class DCNSeasonIE(InfoExtractor):
+    IE_NAME = 'dcn:season'
+    _VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
+    _TEST = {
+        'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
+        'info_dict':
+        {
+            'id': '7910',
+            'title': 'محاضرات الشيخ الشعراوي',
+        },
+        'playlist_mincount': 27,
+    }
+
+    def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+        show_id, season_id = re.match(self._VALID_URL, url).groups()
+
+        data = {}
+        if season_id:
+            data['season'] = season_id
+            show_id = smuggled_data.get('show_id')
+            if show_id is None:
+                request = sanitized_Request(
+                    'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
+                    headers={'Origin': 'http://www.dcndigital.ae'})
+                season = self._download_json(request, season_id)
+                show_id = season['id']
+        data['show_id'] = show_id
+        request = sanitized_Request(
+            'http://admin.mangomolo.com/analytics/index.php/plus/show',
+            urlencode_postdata(data),
+            {
+                'Origin': 'http://www.dcndigital.ae',
+                'Content-Type': 'application/x-www-form-urlencoded'
+            })
+
+        show = self._download_json(request, show_id)
+        if not season_id:
+            season_id = show['default_season']
+        for season in show['seasons']:
+            if season['id'] == season_id:
+                title = season.get('title_en') or season['title_ar']
+
+                entries = []
+                for video in show['videos']:
+                    video_id = compat_str(video['id'])
+                    entries.append(self.url_result(
+                        'http://www.dcndigital.ae/media/%s' % video_id, 'DCNVideo', video_id))
+
+                return self.playlist_result(entries, season_id, title)
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index aa2c09eb686f9da5a7bedfdfe57566e9d29a0700..9099f5046a14ad7c769a6da50d813076f8b9231e 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -6,7 +6,7 @@ from ..compat import compat_str
  
  
  class DctpTvIE(InfoExtractor):
-    _VALID_URL = r'http://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
+    _VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
      _TEST = {
          'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
          'info_dict': {
diff --git a/youtube_dl/extractor/deezer.py b/youtube_dl/extractor/deezer.py

index c3205ff5fc243c069494e9d50f3522299ea84d34..7a07f3267db874649e5bcc5228a1c7881ebe19d3 100644 (file)
--- a/youtube_dl/extractor/deezer.py
+++ b/youtube_dl/extractor/deezer.py
@@ -41,7 +41,9 @@ class DeezerPlaylistIE(InfoExtractor):
                  'Deezer said: %s' % geoblocking_msg, expected=True)
  
          data_json = self._search_regex(
-            r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n', webpage, 'data JSON')
+            (r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
+             r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
+            webpage, 'data JSON')
          data = json.loads(data_json)
  
          playlist_title = data.get('DATA', {}).get('TITLE')
diff --git a/youtube_dl/extractor/defense.py b/youtube_dl/extractor/defense.py

index 98e3aedfd08ada1300cbf3114a41022949062402..9fe144e1431941051f3f2b7134fd9eb888522e96 100644 (file)
--- a/youtube_dl/extractor/defense.py
+++ b/youtube_dl/extractor/defense.py
@@ -5,7 +5,7 @@ from .common import InfoExtractor
  
  class DefenseGouvFrIE(InfoExtractor):
      IE_NAME = 'defense.gouv.fr'
-    _VALID_URL = r'http://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
+    _VALID_URL = r'https?://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
  
      _TEST = {
          'url': 'http://www.defense.gouv.fr/layout/set/ligthboxvideo/base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1',
diff --git a/youtube_dl/extractor/democracynow.py b/youtube_dl/extractor/democracynow.py

new file mode 100644 (file)

index 0000000..65a98d7
--- /dev/null
+++ b/youtube_dl/extractor/democracynow.py
@@ -0,0 +1,95 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import os.path
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    url_basename,
+    remove_start,
+)
+
+
+class DemocracynowIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)'
+    IE_NAME = 'democracynow'
+    _TESTS = [{
+        'url': 'http://www.democracynow.org/shows/2015/7/3',
+        'md5': '3757c182d3d84da68f5c8f506c18c196',
+        'info_dict': {
+            'id': '2015-0703-001',
+            'ext': 'mp4',
+            'title': 'Daily Show',
+        },
+    }, {
+        'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
+        'info_dict': {
+            'id': '2015-0703-001',
+            'ext': 'mp4',
+            'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
+            'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        json_data = self._parse_json(self._search_regex(
+            r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
+            display_id)
+
+        title = json_data['title']
+        formats = []
+
+        video_id = None
+
+        for key in ('file', 'audio', 'video', 'high_res_video'):
+            media_url = json_data.get(key, '')
+            if not media_url:
+                continue
+            media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
+            video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
+            formats.append({
+                'url': media_url,
+                'vcodec': 'none' if key == 'audio' else None,
+            })
+
+        self._sort_formats(formats)
+
+        default_lang = 'en'
+        subtitles = {}
+
+        def add_subtitle_item(lang, info_dict):
+            if lang not in subtitles:
+                subtitles[lang] = []
+            subtitles[lang].append(info_dict)
+
+        # chapter_file are not subtitles
+        if 'caption_file' in json_data:
+            add_subtitle_item(default_lang, {
+                'url': compat_urlparse.urljoin(url, json_data['caption_file']),
+            })
+
+        for subtitle_item in json_data.get('captions', []):
+            lang = subtitle_item.get('language', '').lower() or default_lang
+            add_subtitle_item(lang, {
+                'url': compat_urlparse.urljoin(url, subtitle_item['url']),
+            })
+
+        description = self._og_search_description(webpage, default=None)
+
+        return {
+            'id': video_id or display_id,
+            'title': title,
+            'description': description,
+            'thumbnail': json_data.get('image'),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/dfb.py b/youtube_dl/extractor/dfb.py

index 263532cc6e66a94c79670caa5e1600444ce909da..cdfeccacb447591f4dcc776a9c1a374a794fa5ba 100644 (file)
--- a/youtube_dl/extractor/dfb.py
+++ b/youtube_dl/extractor/dfb.py
@@ -38,6 +38,7 @@ class DFBIE(InfoExtractor):
          token_el = f4m_info.find('token')
          manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
          formats = self._extract_f4m_formats(manifest_url, display_id)
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/dhm.py b/youtube_dl/extractor/dhm.py

index 3ed1f1663d9130de0fd621cc33fedd8079ec34dc..44e0c5d4d7094cf965555431e39387a78bdb6f83 100644 (file)
--- a/youtube_dl/extractor/dhm.py
+++ b/youtube_dl/extractor/dhm.py
@@ -1,10 +1,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    xpath_text,
-    parse_duration,
-)
+from ..utils import parse_duration
  
  
  class DHMIE(InfoExtractor):
@@ -34,24 +31,14 @@ class DHMIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        playlist_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, playlist_id)
  
          playlist_url = self._search_regex(
              r"file\s*:\s*'([^']+)'", webpage, 'playlist url')
  
-        playlist = self._download_xml(playlist_url, video_id)
-
-        track = playlist.find(
-            './{http://xspf.org/ns/0/}trackList/{http://xspf.org/ns/0/}track')
-
-        video_url = xpath_text(
-            track, './{http://xspf.org/ns/0/}location',
-            'video url', fatal=True)
-        thumbnail = xpath_text(
-            track, './{http://xspf.org/ns/0/}image',
-            'thumbnail')
+        entries = self._extract_xspf_playlist(playlist_url, playlist_id)
  
          title = self._search_regex(
              [r'dc:title="([^"]+)"', r'<title> &raquo;([^<]+)</title>'],
@@ -63,11 +50,10 @@ class DHMIE(InfoExtractor):
              r'<em>Length\s*</em>\s*:\s*</strong>([^<]+)',
              webpage, 'duration', default=None))
  
-        return {
-            'id': video_id,
-            'url': video_url,
+        entries[0].update({
              'title': title,
              'description': description,
              'duration': duration,
-            'thumbnail': thumbnail,
-        }
+        })
+
+        return self.playlist_result(entries, playlist_id)
diff --git a/youtube_dl/extractor/digiteka.py b/youtube_dl/extractor/digiteka.py

new file mode 100644 (file)

index 0000000..7bb79ff
--- /dev/null
+++ b/youtube_dl/extractor/digiteka.py
@@ -0,0 +1,112 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class DigitekaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+        https?://(?:www\.)?(?:digiteka\.net|ultimedia\.com)/
+        (?:
+            deliver/
+            (?P<embed_type>
+                generic|
+                musique
+            )
+            (?:/[^/]+)*/
+            (?:
+                src|
+                article
+            )|
+            default/index/video
+            (?P<site_type>
+                generic|
+                music
+            )
+            /id
+        )/(?P<id>[\d+a-z]+)'''
+    _TESTS = [{
+        # news
+        'url': 'https://www.ultimedia.com/default/index/videogeneric/id/s8uk0r',
+        'md5': '276a0e49de58c7e85d32b057837952a2',
+        'info_dict': {
+            'id': 's8uk0r',
+            'ext': 'mp4',
+            'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 74,
+            'upload_date': '20150317',
+            'timestamp': 1426604939,
+            'uploader_id': '3fszv',
+        },
+    }, {
+        # music
+        'url': 'https://www.ultimedia.com/default/index/videomusic/id/xvpfp8',
+        'md5': '2ea3513813cf230605c7e2ffe7eca61c',
+        'info_dict': {
+            'id': 'xvpfp8',
+            'ext': 'mp4',
+            'title': 'Two - C\'est La Vie (clip)',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 233,
+            'upload_date': '20150224',
+            'timestamp': 1424760500,
+            'uploader_id': '3rfzk',
+        },
+    }, {
+        'url': 'https://www.digiteka.net/deliver/generic/iframe/mdtk/01637594/src/lqm3kl/zone/1/showtitle/1/autoplay/yes',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<(?:iframe|script)[^>]+src=["\'](?P<url>(?:https?:)?//(?:www\.)?ultimedia\.com/deliver/(?:generic|musique)(?:/[^/]+)*/(?:src|article)/[\d+a-z]+)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        video_type = mobj.group('embed_type') or mobj.group('site_type')
+        if video_type == 'music':
+            video_type = 'musique'
+
+        deliver_info = self._download_json(
+            'http://www.ultimedia.com/deliver/video?video=%s&topic=%s' % (video_id, video_type),
+            video_id)
+
+        yt_id = deliver_info.get('yt_id')
+        if yt_id:
+            return self.url_result(yt_id, 'Youtube')
+
+        jwconf = deliver_info['jwconf']
+
+        formats = []
+        for source in jwconf['playlist'][0]['sources']:
+            formats.append({
+                'url': source['file'],
+                'format_id': source.get('label'),
+            })
+
+        self._sort_formats(formats)
+
+        title = deliver_info['title']
+        thumbnail = jwconf.get('image')
+        duration = int_or_none(deliver_info.get('duration'))
+        timestamp = int_or_none(deliver_info.get('release_time'))
+        uploader_id = deliver_info.get('owner_id')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'uploader_id': uploader_id,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/discovery.py b/youtube_dl/extractor/discovery.py

index d6723ecf26ea67356b288df6e5f3bf612141b91a..5f1275b39a1e40048dfdbe865f38bb5d72d89426 100644 (file)
--- a/youtube_dl/extractor/discovery.py
+++ b/youtube_dl/extractor/discovery.py
@@ -9,7 +9,17 @@ from ..compat import compat_str
  
  
  class DiscoveryIE(InfoExtractor):
-    _VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9_\-]*)(?:\.htm)?'
+    _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
+            discovery|
+            investigationdiscovery|
+            discoverylife|
+            animalplanet|
+            ahctv|
+            destinationamerica|
+            sciencechannel|
+            tlc|
+            velocity
+        )\.com/(?:[^/]+/)*(?P<id>[^./?#]+)'''
      _TESTS = [{
          'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
          'info_dict': {
@@ -21,8 +31,8 @@ class DiscoveryIE(InfoExtractor):
                              'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
                              ' back.'),
              'duration': 156,
-            'timestamp': 1303099200,
-            'upload_date': '20110418',
+            'timestamp': 1302032462,
+            'upload_date': '20110405',
          },
          'params': {
              'skip_download': True,  # requires ffmpeg
@@ -33,27 +43,43 @@ class DiscoveryIE(InfoExtractor):
              'id': 'mythbusters-the-simpsons',
              'title': 'MythBusters: The Simpsons',
          },
-        'playlist_count': 9,
+        'playlist_mincount': 10,
+    }, {
+        'url': 'http://www.animalplanet.com/longfin-eels-maneaters/',
+        'info_dict': {
+            'id': '78326',
+            'ext': 'mp4',
+            'title': 'Longfin Eels: Maneaters?',
+            'description': 'Jeremy Wade tests whether or not New Zealand\'s longfin eels are man-eaters by covering himself in fish guts and getting in the water with them.',
+            'upload_date': '20140725',
+            'timestamp': 1406246400,
+            'duration': 116,
+        },
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        info = self._download_json(url + '?flat=1', video_id)
+        display_id = self._match_id(url)
+        info = self._download_json(url + '?flat=1', display_id)
  
          video_title = info.get('playlist_title') or info.get('video_title')
  
-        entries = [{
-            'id': compat_str(video_info['id']),
-            'formats': self._extract_m3u8_formats(
-                video_info['src'], video_id, ext='mp4',
-                note='Download m3u8 information for video %d' % (idx + 1)),
-            'title': video_info['title'],
-            'description': video_info.get('description'),
-            'duration': parse_duration(video_info.get('video_length')),
-            'webpage_url': video_info.get('href'),
-            'thumbnail': video_info.get('thumbnailURL'),
-            'alt_title': video_info.get('secondary_title'),
-            'timestamp': parse_iso8601(video_info.get('publishedDate')),
-        } for idx, video_info in enumerate(info['playlist'])]
-
-        return self.playlist_result(entries, video_id, video_title)
+        entries = []
+
+        for idx, video_info in enumerate(info['playlist']):
+            formats = self._extract_m3u8_formats(
+                video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
+                note='Download m3u8 information for video %d' % (idx + 1))
+            self._sort_formats(formats)
+            entries.append({
+                'id': compat_str(video_info['id']),
+                'formats': formats,
+                'title': video_info['title'],
+                'description': video_info.get('description'),
+                'duration': parse_duration(video_info.get('video_length')),
+                'webpage_url': video_info.get('href') or video_info.get('url'),
+                'thumbnail': video_info.get('thumbnailURL'),
+                'alt_title': video_info.get('secondary_title'),
+                'timestamp': parse_iso8601(video_info.get('publishedDate')),
+            })
+
+        return self.playlist_result(entries, display_id, video_title)
diff --git a/youtube_dl/extractor/dispeak.py b/youtube_dl/extractor/dispeak.py

new file mode 100644 (file)

index 0000000..a78cb8a
--- /dev/null
+++ b/youtube_dl/extractor/dispeak.py
@@ -0,0 +1,114 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    remove_end,
+    xpath_element,
+    xpath_text,
+)
+
+
+class DigitallySpeakingIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
+
+    _TESTS = [{
+        # From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
+        'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
+        'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
+        'info_dict': {
+            'id': '840376_BQRC',
+            'ext': 'mp4',
+            'title': 'Tenacious Design and The Interface of \'Destiny\'',
+        },
+    }, {
+        # From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
+        'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
+        'only_matching': True,
+    }]
+
+    def _parse_mp4(self, metadata):
+        video_formats = []
+        video_root = None
+
+        mp4_video = xpath_text(metadata, './mp4video', default=None)
+        if mp4_video is not None:
+            mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
+            video_root = mobj.group('root')
+        if video_root is None:
+            http_host = xpath_text(metadata, 'httpHost', default=None)
+            if http_host:
+                video_root = 'http://%s/' % http_host
+        if video_root is None:
+            # Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
+            # Works for GPUTechConf, too
+            video_root = 'http://s3-2u.digitallyspeaking.com/'
+
+        formats = metadata.findall('./MBRVideos/MBRVideo')
+        if not formats:
+            return None
+        for a_format in formats:
+            stream_name = xpath_text(a_format, 'streamName', fatal=True)
+            video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
+            url = video_root + video_path
+            vbr = xpath_text(a_format, 'bitrate')
+            video_formats.append({
+                'url': url,
+                'vbr': int_or_none(vbr),
+            })
+        return video_formats
+
+    def _parse_flv(self, metadata):
+        formats = []
+        akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
+        audios = metadata.findall('./audios/audio')
+        for audio in audios:
+            formats.append({
+                'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+                'play_path': remove_end(audio.get('url'), '.flv'),
+                'ext': 'flv',
+                'vcodec': 'none',
+                'format_id': audio.get('code'),
+            })
+        slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
+        formats.append({
+            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+            'play_path': remove_end(slide_video_path, '.flv'),
+            'ext': 'flv',
+            'format_note': 'slide deck video',
+            'quality': -2,
+            'preference': -2,
+            'format_id': 'slides',
+        })
+        speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
+        formats.append({
+            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
+            'play_path': remove_end(speaker_video_path, '.flv'),
+            'ext': 'flv',
+            'format_note': 'speaker video',
+            'quality': -1,
+            'preference': -1,
+            'format_id': 'speaker',
+        })
+        return formats
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        xml_description = self._download_xml(url, video_id)
+        metadata = xpath_element(xml_description, 'metadata')
+
+        video_formats = self._parse_mp4(metadata)
+        if video_formats is None:
+            video_formats = self._parse_flv(metadata)
+
+        return {
+            'id': video_id,
+            'formats': video_formats,
+            'title': xpath_text(metadata, 'title', fatal=True),
+            'duration': parse_duration(xpath_text(metadata, 'endTime')),
+            'creator': xpath_text(metadata, 'speaker'),
+        }
diff --git a/youtube_dl/extractor/divxstage.py b/youtube_dl/extractor/divxstage.py

deleted file mode 100644 (file)

index b88379e..0000000
--- a/youtube_dl/extractor/divxstage.py
+++ /dev/null
@@ -1,27 +0,0 @@
-from __future__ import unicode_literals
-
-from .novamov import NovaMovIE
-
-
-class DivxStageIE(NovaMovIE):
-    IE_NAME = 'divxstage'
-    IE_DESC = 'DivxStage'
-
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'divxstage\.(?:eu|net|ch|co|at|ag|to)'}
-
-    _HOST = 'www.divxstage.eu'
-
-    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
-    _TITLE_REGEX = r'<div class="video_det">\s*<strong>([^<]+)</strong>'
-    _DESCRIPTION_REGEX = r'<div class="video_det">\s*<strong>[^<]+</strong>\s*<p>([^<]+)</p>'
-
-    _TEST = {
-        'url': 'http://www.divxstage.eu/video/57f238e2e5e01',
-        'md5': '63969f6eb26533a1968c4d325be63e72',
-        'info_dict': {
-            'id': '57f238e2e5e01',
-            'ext': 'flv',
-            'title': 'youtubedl test video',
-            'description': 'This is a test video for youtubedl.',
-        }
-    }
diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py

index 373b3b4b4735d8544128c48a10037eed3c570e5d..ce6962755831a8c6f853271ad49112fee4fcbc63 100644 (file)
--- a/youtube_dl/extractor/douyutv.py
+++ b/youtube_dl/extractor/douyutv.py
@@ -10,7 +10,7 @@ from ..compat import (compat_str, compat_basestring)
  
  class DouyuTVIE(InfoExtractor):
      IE_DESC = '斗鱼'
-    _VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
      _TESTS = [{
          'url': 'http://www.douyutv.com/iseven',
          'info_dict': {
@@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
              'display_id': 'iseven',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 'md5:c93d6692dde6fe33809a46edcbecca44',
+            'description': 're:.*m7show@163\.com.*',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': '7师傅',
              'uploader_id': '431925',
@@ -26,7 +26,7 @@ class DouyuTVIE(InfoExtractor):
          },
          'params': {
              'skip_download': True,
-        }
+        },
      }, {
          'url': 'http://www.douyutv.com/85982',
          'info_dict': {
@@ -42,7 +42,27 @@ class DouyuTVIE(InfoExtractor):
          },
          'params': {
              'skip_download': True,
-        }
+        },
+        'skip': 'Room not found',
+    }, {
+        'url': 'http://www.douyutv.com/17732',
+        'info_dict': {
+            'id': '17732',
+            'display_id': '17732',
+            'ext': 'flv',
+            'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'description': 're:.*m7show@163\.com.*',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': '7师傅',
+            'uploader_id': '431925',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.douyu.com/xiaocang',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -55,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
              room_id = self._html_search_regex(
                  r'"room_id"\s*:\s*(\d+),', page, 'room id')
  
-        prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
-            room_id, int(time.time()))
-
-        auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
-        config = self._download_json(
-            'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
-            video_id)
+        config = None
+        # Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
+        # Retry with different parameters - same parameters cause same errors
+        for i in range(5):
+            prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
+                room_id, int(time.time()))
+            auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
+
+            config_page = self._download_webpage(
+                'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
+                video_id)
+            try:
+                config = self._parse_json(config_page, video_id, fatal=False)
+            except ExtractorError:
+                # Wait some time before retrying to get a different time() value
+                self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
+                                                      'Waiting for %(timeout)s seconds before retrying')
+                continue
+            else:
+                break
+        if config is None:
+            raise ExtractorError('Unable to fetch API result')
  
          data = config['data']
  
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

new file mode 100644 (file)

index 0000000..5790553
--- /dev/null
+++ b/youtube_dl/extractor/dplay.py
@@ -0,0 +1,163 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+import time
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    update_url_query,
+)
+
+
+class DPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
+
+    _TESTS = [{
+        # geo restricted, via direct unsigned hls URL
+        'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
+        'info_dict': {
+            'id': '1255600',
+            'display_id': 'stagione-1-episodio-25',
+            'ext': 'mp4',
+            'title': 'Episodio 25',
+            'description': 'md5:cae5f40ad988811b197d2d27a53227eb',
+            'duration': 2761,
+            'timestamp': 1454701800,
+            'upload_date': '20160205',
+            'creator': 'RTIT',
+            'series': 'Take me out',
+            'season_number': 1,
+            'episode_number': 25,
+            'age_limit': 0,
+        },
+        'expected_warnings': ['Unable to download f4m manifest'],
+    }, {
+        # non geo restricted, via secure api, unsigned download hls URL
+        'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
+        'info_dict': {
+            'id': '3172',
+            'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
+            'ext': 'mp4',
+            'title': 'Svensken lär sig njuta av livet',
+            'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
+            'duration': 2650,
+            'timestamp': 1365454320,
+            'upload_date': '20130408',
+            'creator': 'Kanal 5 (Home)',
+            'series': 'Nugammalt - 77 händelser som format Sverige',
+            'season_number': 1,
+            'episode_number': 1,
+            'age_limit': 0,
+        },
+    }, {
+        # geo restricted, via secure api, unsigned download hls URL
+        'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
+        'info_dict': {
+            'id': '70816',
+            'display_id': 'season-6-episode-12',
+            'ext': 'mp4',
+            'title': 'Episode 12',
+            'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
+            'duration': 2563,
+            'timestamp': 1429696800,
+            'upload_date': '20150422',
+            'creator': 'Kanal 4 (Home)',
+            'series': 'Mig og min mor',
+            'season_number': 6,
+            'episode_number': 12,
+            'age_limit': 0,
+        },
+    }, {
+        # geo restricted, via direct unsigned hls URL
+        'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        display_id = mobj.group('id')
+        domain = mobj.group('domain')
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._search_regex(
+            r'data-video-id=["\'](\d+)', webpage, 'video id')
+
+        info = self._download_json(
+            'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
+            video_id)['data'][0]
+
+        title = info['title']
+
+        PROTOCOLS = ('hls', 'hds')
+        formats = []
+
+        def extract_formats(protocol, manifest_url):
+            if protocol == 'hls':
+                m3u8_formats = self._extract_m3u8_formats(
+                    manifest_url, video_id, ext='mp4',
+                    entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
+                # Sometimes final URLs inside m3u8 are unsigned, let's fix this
+                # ourselves
+                query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
+                for m3u8_format in m3u8_formats:
+                    m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
+                formats.extend(m3u8_formats)
+            elif protocol == 'hds':
+                formats.extend(self._extract_f4m_formats(
+                    manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
+                    video_id, f4m_id=protocol, fatal=False))
+
+        domain_tld = domain.split('.')[-1]
+        if domain_tld in ('se', 'dk', 'no'):
+            for protocol in PROTOCOLS:
+                # Providing dsc-geo allows to bypass geo restriction in some cases
+                self._set_cookie(
+                    'secure.dplay.%s' % domain_tld, 'dsc-geo',
+                    json.dumps({
+                        'countryCode': domain_tld.upper(),
+                        'expiry': (time.time() + 20 * 60) * 1000,
+                    }))
+                stream = self._download_json(
+                    'https://secure.dplay.%s/secure/api/v2/user/authorization/stream/%s?stream_type=%s'
+                    % (domain_tld, video_id, protocol), video_id,
+                    'Downloading %s stream JSON' % protocol, fatal=False)
+                if stream and stream.get(protocol):
+                    extract_formats(protocol, stream[protocol])
+
+        # The last resort is to try direct unsigned hls/hds URLs from info dictionary.
+        # Sometimes this does work even when secure API with dsc-geo has failed (e.g.
+        # http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
+        if not formats:
+            for protocol in PROTOCOLS:
+                if info.get(protocol):
+                    extract_formats(protocol, info[protocol])
+
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for lang in ('se', 'sv', 'da', 'nl', 'no'):
+            for format_id in ('web_vtt', 'vtt', 'srt'):
+                subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
+                if subtitle_url:
+                    subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': info.get('video_metadata_longDescription'),
+            'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
+            'timestamp': int_or_none(info.get('video_publish_date')),
+            'creator': info.get('video_metadata_homeChannel'),
+            'series': info.get('video_metadata_show'),
+            'season_number': int_or_none(info.get('season')),
+            'episode_number': int_or_none(info.get('episode')),
+            'age_limit': int_or_none(info.get('minimum_age')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py

index 38e6597c80f203b30a90a13c92027a4a5a305bd7..3b6529f4b108052e3019c8e400bbc2cd0eb5a9a1 100644 (file)
--- a/youtube_dl/extractor/dramafever.py
+++ b/youtube_dl/extractor/dramafever.py
@@ -3,23 +3,21 @@ from __future__ import unicode_literals
  
  import itertools
  
-from .common import InfoExtractor
+from .amp import AMPIE
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
      clean_html,
-    determine_ext,
      int_or_none,
-    parse_iso8601,
+    sanitized_Request,
+    urlencode_postdata
  )
  
  
-class DramaFeverBaseIE(InfoExtractor):
+class DramaFeverBaseIE(AMPIE):
      _LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
      _NETRC_MACHINE = 'dramafever'
  
@@ -51,8 +49,8 @@ class DramaFeverBaseIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form))
          response = self._download_webpage(
              request, None, 'Logging in as %s' % username)
  
@@ -69,71 +67,56 @@ class DramaFeverBaseIE(InfoExtractor):
  class DramaFeverIE(DramaFeverBaseIE):
      IE_NAME = 'dramafever'
      _VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
          'info_dict': {
              'id': '4512.1',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Cooking with Shin 4512.1',
              'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
+            'episode': 'Episode 1',
+            'episode_number': 1,
              'thumbnail': 're:^https?://.*\.jpg',
              'timestamp': 1404336058,
              'upload_date': '20140702',
              'duration': 343,
-        }
-    }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
+        'info_dict': {
+            'id': '4826.4',
+            'ext': 'mp4',
+            'title': 'Mnet Asian Music Awards 2015 4826.4',
+            'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
+            'episode': 'Mnet Asian Music Awards 2015 - Part 3',
+            'episode_number': 4,
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1450213200,
+            'upload_date': '20151215',
+            'duration': 5602,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url).replace('/', '.')
  
          try:
-            feed = self._download_json(
-                'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id,
-                video_id, 'Downloading episode JSON')['channel']['item']
+            info = self._extract_feed_info(
+                'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
          except ExtractorError as e:
              if isinstance(e.cause, compat_HTTPError):
                  raise ExtractorError(
                      'Currently unavailable in your country.', expected=True)
              raise
  
-        media_group = feed.get('media-group', {})
-
-        formats = []
-        for media_content in media_group['media-content']:
-            src = media_content.get('@attributes', {}).get('url')
-            if not src:
-                continue
-            ext = determine_ext(src)
-            if ext == 'f4m':
-                formats.extend(self._extract_f4m_formats(
-                    src, video_id, f4m_id='hds'))
-            elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
-                    src, video_id, 'mp4', m3u8_id='hls'))
-            else:
-                formats.append({
-                    'url': src,
-                })
-        self._sort_formats(formats)
-
-        title = media_group.get('media-title')
-        description = media_group.get('media-description')
-        duration = int_or_none(media_group['media-content'][0].get('@attributes', {}).get('duration'))
-        thumbnail = self._proto_relative_url(
-            media_group.get('media-thumbnail', {}).get('@attributes', {}).get('url'))
-        timestamp = parse_iso8601(feed.get('pubDate'), ' ')
-
-        subtitles = {}
-        for media_subtitle in media_group.get('media-subTitle', []):
-            lang = media_subtitle.get('@attributes', {}).get('lang')
-            href = media_subtitle.get('@attributes', {}).get('href')
-            if not lang or not href:
-                continue
-            subtitles[lang] = [{
-                'ext': 'ttml',
-                'url': href,
-            }]
-
          series_id, episode_number = video_id.split('.')
          episode_info = self._download_json(
              # We only need a single episode info, so restricting page size to one episode
@@ -143,24 +126,24 @@ class DramaFeverIE(DramaFeverBaseIE):
              video_id, 'Downloading episode info JSON', fatal=False)
          if episode_info:
              value = episode_info.get('value')
-            if value:
-                subfile = value[0].get('subfile') or value[0].get('new_subfile')
-                if subfile and subfile != 'http://www.dramafever.com/st/':
-                    subtitles.setdefault('English', []).append({
-                        'ext': 'srt',
-                        'url': subfile,
-                    })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+            if isinstance(value, list):
+                for v in value:
+                    if v.get('type') == 'Episode':
+                        subfile = v.get('subfile') or v.get('new_subfile')
+                        if subfile and subfile != 'http://www.dramafever.com/st/':
+                            info.setdefault('subtitles', {}).setdefault('English', []).append({
+                                'ext': 'srt',
+                                'url': subfile,
+                            })
+                        episode_number = int_or_none(v.get('number'))
+                        episode_fallback = 'Episode'
+                        if episode_number:
+                            episode_fallback += ' %d' % episode_number
+                        info['episode'] = v.get('title') or episode_fallback
+                        info['episode_number'] = episode_number
+                        break
+
+        return info
  
  
  class DramaFeverSeriesIE(DramaFeverBaseIE):
diff --git a/youtube_dl/extractor/drbonanza.py b/youtube_dl/extractor/drbonanza.py

index 8b98b013adeee32c67c769acfb88d76edff9a1f7..01271f8f06ff91b22680314d644485fe94434391 100644 (file)
--- a/youtube_dl/extractor/drbonanza.py
+++ b/youtube_dl/extractor/drbonanza.py
@@ -87,7 +87,7 @@ class DRBonanzaIE(InfoExtractor):
  
          formats = []
          for file in info['Files']:
-            if info['Type'] == "Video":
+            if info['Type'] == 'Video':
                  if file['Type'] in video_types:
                      format = parse_filename_info(file['Location'])
                      format.update({
@@ -101,10 +101,10 @@ class DRBonanzaIE(InfoExtractor):
                          if '/bonanza/' in rtmp_url:
                              format['play_path'] = rtmp_url.split('/bonanza/')[1]
                      formats.append(format)
-                elif file['Type'] == "Thumb":
+                elif file['Type'] == 'Thumb':
                      thumbnail = file['Location']
-            elif info['Type'] == "Audio":
-                if file['Type'] == "Audio":
+            elif info['Type'] == 'Audio':
+                if file['Type'] == 'Audio':
                      format = parse_filename_info(file['Location'])
                      format.update({
                          'url': file['Location'],
@@ -112,7 +112,7 @@ class DRBonanzaIE(InfoExtractor):
                          'vcodec': 'none',
                      })
                      formats.append(format)
-                elif file['Type'] == "Thumb":
+                elif file['Type'] == 'Thumb':
                      thumbnail = file['Location']
  
          description = '%s\n%s\n%s\n' % (
diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py

index 8ac8587be6af564af3674c8ff7e7754364bc311e..0040e70d4929828ebf2dc7dc74199ed639dcfebd 100644 (file)
--- a/youtube_dl/extractor/dreisat.py
+++ b/youtube_dl/extractor/dreisat.py
@@ -2,16 +2,12 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    unified_strdate,
-)
+from .zdf import ZDFIE
  
  
-class DreiSatIE(InfoExtractor):
+class DreiSatIE(ZDFIE):
      IE_NAME = '3sat'
-    _VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
+    _VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
      _TESTS = [
          {
              'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
@@ -35,53 +31,4 @@ class DreiSatIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
-        details_doc = self._download_xml(details_url, video_id, 'Downloading video details')
-
-        status_code = details_doc.find('./status/statuscode')
-        if status_code is not None and status_code.text != 'ok':
-            code = status_code.text
-            if code == 'notVisibleAnymore':
-                message = 'Video %s is not available' % video_id
-            else:
-                message = '%s returned error: %s' % (self.IE_NAME, code)
-            raise ExtractorError(message, expected=True)
-
-        thumbnail_els = details_doc.findall('.//teaserimage')
-        thumbnails = [{
-            'width': int(te.attrib['key'].partition('x')[0]),
-            'height': int(te.attrib['key'].partition('x')[2]),
-            'url': te.text,
-        } for te in thumbnail_els]
-
-        information_el = details_doc.find('.//information')
-        video_title = information_el.find('./title').text
-        video_description = information_el.find('./detail').text
-
-        details_el = details_doc.find('.//details')
-        video_uploader = details_el.find('./channel').text
-        upload_date = unified_strdate(details_el.find('./airtime').text)
-
-        format_els = details_doc.findall('.//formitaet')
-        formats = [{
-            'format_id': fe.attrib['basetype'],
-            'width': int(fe.find('./width').text),
-            'height': int(fe.find('./height').text),
-            'url': fe.find('./url').text,
-            'filesize': int(fe.find('./filesize').text),
-            'video_bitrate': int(fe.find('./videoBitrate').text),
-        } for fe in format_els
-            if not fe.find('./url').text.startswith('http://www.metafilegenerator.de/')]
-
-        self._sort_formats(formats)
-
-        return {
-            '_type': 'video',
-            'id': video_id,
-            'title': video_title,
-            'formats': formats,
-            'description': video_description,
-            'thumbnails': thumbnails,
-            'thumbnail': thumbnails[-1]['url'],
-            'uploader': video_uploader,
-            'upload_date': upload_date,
-        }
+        return self.extract_from_xml_url(video_id, details_url)
diff --git a/youtube_dl/extractor/drtv.py b/youtube_dl/extractor/drtv.py

index baa24c6d13abe016cceb83bb927db15d7d300509..2d74ff855f1670e0dcb46e35d1875e8e9c9fd144 100644 (file)
--- a/youtube_dl/extractor/drtv.py
+++ b/youtube_dl/extractor/drtv.py
@@ -91,7 +91,7 @@ class DRTVIE(InfoExtractor):
                  subtitles_list = asset.get('SubtitlesList')
                  if isinstance(subtitles_list, list):
                      LANGS = {
-                        'Danish': 'dk',
+                        'Danish': 'da',
                      }
                      for subs in subtitles_list:
                          lang = subs['Language']
diff --git a/youtube_dl/extractor/dump.py b/youtube_dl/extractor/dump.py

deleted file mode 100644 (file)

index ff78d4f..0000000
--- a/youtube_dl/extractor/dump.py
+++ /dev/null
@@ -1,39 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class DumpIE(InfoExtractor):
-    _VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
-
-    _TEST = {
-        'url': 'http://www.dump.com/oneus/',
-        'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
-        'info_dict': {
-            'id': 'oneus',
-            'ext': 'flv',
-            'title': "He's one of us.",
-            'thumbnail': 're:^https?://.*\.jpg$',
-        },
-    }
-
-    def _real_extract(self, url):
-        m = re.match(self._VALID_URL, url)
-        video_id = m.group('id')
-
-        webpage = self._download_webpage(url, video_id)
-        video_url = self._search_regex(
-            r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
-
-        title = self._og_search_title(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'url': video_url,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index 999fb5620df2976073122fb95fbad1bb133f357a..e5aadcd25ccccb6f9838d0bd1417edc2fbe3bd0f 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -2,15 +2,18 @@
  from __future__ import unicode_literals
  
  import base64
+import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import qualities
+from ..utils import (
+    qualities,
+    sanitized_Request,
+)
  
  
  class DumpertIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?dumpert\.nl/mediabase/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
-    _TEST = {
+    _VALID_URL = r'(?P<protocol>https?)://(?:www\.)?dumpert\.nl/(?:mediabase|embed)/(?P<id>[0-9]+/[0-9a-zA-Z]+)'
+    _TESTS = [{
          'url': 'http://www.dumpert.nl/mediabase/6646981/951bc60f/',
          'md5': '1b9318d7d5054e7dcb9dc7654f21d643',
          'info_dict': {
@@ -20,12 +23,18 @@ class DumpertIE(InfoExtractor):
              'description': 'Niet schrikken hoor',
              'thumbnail': 're:^https?://.*\.jpg$',
          }
-    }
+    }, {
+        'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        protocol = mobj.group('protocol')
  
-        req = compat_urllib_request.Request(url)
+        url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'nsfw=1; cpc=10')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/dvtv.py b/youtube_dl/extractor/dvtv.py

index c1a4bc757f78770179f332d651227f47cceb8a99..974c69dbc75fcb29bd57e30432fa466182b68743 100644 (file)
--- a/youtube_dl/extractor/dvtv.py
+++ b/youtube_dl/extractor/dvtv.py
@@ -15,7 +15,7 @@ class DVTVIE(InfoExtractor):
      IE_NAME = 'dvtv'
      IE_DESC = 'http://video.aktualne.cz/'
  
-    _VALID_URL = r'http://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
+    _VALID_URL = r'https?://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
  
      _TESTS = [{
          'url': 'http://video.aktualne.cz/dvtv/vondra-o-ceskem-stoleti-pri-pohledu-na-havla-mi-bylo-trapne/r~e5efe9ca855511e4833a0025900fea04/',
diff --git a/youtube_dl/extractor/dw.py b/youtube_dl/extractor/dw.py

new file mode 100644 (file)

index 0000000..ae7c571
--- /dev/null
+++ b/youtube_dl/extractor/dw.py
@@ -0,0 +1,85 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+from ..compat import compat_urlparse
+
+
+class DWIE(InfoExtractor):
+    IE_NAME = 'dw'
+    _VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+av-(?P<id>\d+)'
+    _TESTS = [{
+        # video
+        'url': 'http://www.dw.com/en/intelligent-light/av-19112290',
+        'md5': '7372046e1815c5a534b43f3c3c36e6e9',
+        'info_dict': {
+            'id': '19112290',
+            'ext': 'mp4',
+            'title': 'Intelligent light',
+            'description': 'md5:90e00d5881719f2a6a5827cb74985af1',
+            'upload_date': '20160311',
+        }
+    }, {
+        # audio
+        'url': 'http://www.dw.com/en/worldlink-my-business/av-19111941',
+        'md5': '2814c9a1321c3a51f8a7aeb067a360dd',
+        'info_dict': {
+            'id': '19111941',
+            'ext': 'mp3',
+            'title': 'WorldLink: My business',
+            'description': 'md5:bc9ca6e4e063361e21c920c53af12405',
+            'upload_date': '20160311',
+        }
+    }]
+
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        webpage = self._download_webpage(url, media_id)
+        hidden_inputs = self._hidden_inputs(webpage)
+        title = hidden_inputs['media_title']
+
+        if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
+            formats = self._extract_smil_formats(
+                'http://www.dw.com/smil/v-%s' % media_id, media_id,
+                transform_source=lambda s: s.replace(
+                    'rtmp://tv-od.dw.de/flash/',
+                    'http://tv-download.dw.de/dwtv_video/flv/'))
+            self._sort_formats(formats)
+        else:
+            formats = [{'url': hidden_inputs['file_name']}]
+
+        return {
+            'id': media_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnail': hidden_inputs.get('preview_image'),
+            'duration': int_or_none(hidden_inputs.get('file_duration')),
+            'upload_date': hidden_inputs.get('display_date'),
+            'formats': formats,
+        }
+
+
+class DWArticleIE(InfoExtractor):
+    IE_NAME = 'dw:article'
+    _VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+a-(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.dw.com/en/no-hope-limited-options-for-refugees-in-idomeni/a-19111009',
+        'md5': '8ca657f9d068bbef74d6fc38b97fc869',
+        'info_dict': {
+            'id': '19105868',
+            'ext': 'mp4',
+            'title': 'The harsh life of refugees in Idomeni',
+            'description': 'md5:196015cc7e48ebf474db9399420043c7',
+            'upload_date': '20160310',
+        }
+    }
+
+    def _real_extract(self, url):
+        article_id = self._match_id(url)
+        webpage = self._download_webpage(url, article_id)
+        hidden_inputs = self._hidden_inputs(webpage)
+        media_id = hidden_inputs['media_id']
+        media_path = self._search_regex(r'href="([^"]+av-%s)"\s+class="overlayLink"' % media_id, webpage, 'media url')
+        media_url = compat_urlparse.urljoin(url, media_path)
+        return self.url_result(media_url, 'DW', media_id)
diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py

index 688dfc2f7f34d15e712481351a6073987325add5..0f8c73fd7d330da33afd3c2a3d2cb732e4d3ff33 100644 (file)
--- a/youtube_dl/extractor/eagleplatform.py
+++ b/youtube_dl/extractor/eagleplatform.py
@@ -4,9 +4,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
      int_or_none,
+    url_basename,
  )
  
  
@@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
      _TESTS = [{
          # http://lenta.ru/news/2015/03/06/navalny/
          'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
-        'md5': '0b7994faa2bd5c0f69a3db6db28d078d',
+        'md5': '881ee8460e1b7735a8be938e2ffb362b',
          'info_dict': {
              'id': '227304',
              'ext': 'mp4',
@@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
          # http://muz-tv.ru/play/7129/
          # http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
          'url': 'eagleplatform:media.clipyou.ru:12820',
-        'md5': '6c2ebeab03b739597ce8d86339d5a905',
+        'md5': '358597369cf8ba56675c1df15e7af624',
          'info_dict': {
              'id': '12820',
              'ext': 'mp4',
@@ -48,16 +50,25 @@ class EaglePlatformIE(InfoExtractor):
          'skip': 'Georestricted',
      }]
  
-    def _handle_error(self, response):
+    @staticmethod
+    def _handle_error(response):
          status = int_or_none(response.get('status', 200))
          if status != 200:
              raise ExtractorError(' '.join(response['errors']), expected=True)
  
      def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
-        response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
-        self._handle_error(response)
+        try:
+            response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
+        except ExtractorError as ee:
+            if isinstance(ee.cause, compat_HTTPError):
+                response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
+                self._handle_error(response)
+            raise
          return response
  
+    def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
+        return self._download_json(url_or_request, video_id, note)['data'][0]
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          host, video_id = mobj.group('custom_host') or mobj.group('host'), mobj.group('id')
@@ -69,7 +80,7 @@ class EaglePlatformIE(InfoExtractor):
  
          title = media['title']
          description = media.get('description')
-        thumbnail = media.get('snapshot')
+        thumbnail = self._proto_relative_url(media.get('snapshot'), 'http:')
          duration = int_or_none(media.get('duration'))
          view_count = int_or_none(media.get('views'))
  
@@ -78,13 +89,33 @@ class EaglePlatformIE(InfoExtractor):
          if age_restriction:
              age_limit = 0 if age_restriction == 'allow_all' else 18
  
-        m3u8_data = self._download_json(
-            media['sources']['secure_m3u8']['auto'],
-            video_id, 'Downloading m3u8 JSON')
+        secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
+
+        formats = []
+
+        m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
+        m3u8_formats = self._extract_m3u8_formats(
+            m3u8_url, video_id,
+            'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
+        formats.extend(m3u8_formats)
+
+        mp4_url = self._get_video_url(
+            # Secure mp4 URL is constructed according to Player.prototype.mp4 from
+            # http://lentaru.media.eagleplatform.com/player/player.js
+            re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
+            video_id, 'Downloading mp4 JSON')
+        mp4_url_basename = url_basename(mp4_url)
+        for m3u8_format in m3u8_formats:
+            mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
+            if mobj:
+                http_format = m3u8_format.copy()
+                http_format.update({
+                    'url': mp4_url.replace(mp4_url_basename, mobj.group(1)),
+                    'format_id': m3u8_format['format_id'].replace('hls', 'http'),
+                    'protocol': 'http',
+                })
+                formats.append(http_format)
  
-        formats = self._extract_m3u8_formats(
-            m3u8_data['data'][0], video_id,
-            'mp4', entry_protocol='m3u8_native')
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/ebaumsworld.py b/youtube_dl/extractor/ebaumsworld.py

index b6bfd2b2dedc5388ef383a3cd8853bbb0c541f68..c97682cd367edebfd9fc6a476ad073cb03240054 100644 (file)
--- a/youtube_dl/extractor/ebaumsworld.py
+++ b/youtube_dl/extractor/ebaumsworld.py
@@ -4,10 +4,10 @@ from .common import InfoExtractor
  
  
  class EbaumsWorldIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
  
      _TEST = {
-        'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
+        'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
          'info_dict': {
              'id': '83367677',
              'ext': 'mp4',
diff --git a/youtube_dl/extractor/echomsk.py b/youtube_dl/extractor/echomsk.py

index d2d94049d368e74413d93ca40628e3d5174a7675..6b7cc652fe43c60cb8d8326f1cf6bd0c51fbd59f 100644 (file)
--- a/youtube_dl/extractor/echomsk.py
+++ b/youtube_dl/extractor/echomsk.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  
  class EchoMskIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.echo.msk.ru/sounds/1464134.html',
          'md5': '2e44b3b78daff5b458e4dbc37f191f7c',
diff --git a/youtube_dl/extractor/eighttracks.py b/youtube_dl/extractor/eighttracks.py

index 0b61ea0ba60218043156d4f90680ff0348e827c7..9a44f89f3fe801047ef283e9b1fa410c8b88cb37 100644 (file)
--- a/youtube_dl/extractor/eighttracks.py
+++ b/youtube_dl/extractor/eighttracks.py
@@ -17,85 +17,85 @@ class EightTracksIE(InfoExtractor):
      IE_NAME = '8tracks'
      _VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
      _TEST = {
-        "name": "EightTracks",
-        "url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
-        "info_dict": {
+        'name': 'EightTracks',
+        'url': 'http://8tracks.com/ytdl/youtube-dl-test-tracks-a',
+        'info_dict': {
              'id': '1336550',
              'display_id': 'youtube-dl-test-tracks-a',
-            "description": "test chars:  \"'/\\ä↭",
-            "title": "youtube-dl test tracks \"'/\\ä↭<>",
+            'description': "test chars:  \"'/\\ä↭",
+            'title': "youtube-dl test tracks \"'/\\ä↭<>",
          },
-        "playlist": [
+        'playlist': [
              {
-                "md5": "96ce57f24389fc8734ce47f4c1abcc55",
-                "info_dict": {
-                    "id": "11885610",
-                    "ext": "m4a",
-                    "title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': '96ce57f24389fc8734ce47f4c1abcc55',
+                'info_dict': {
+                    'id': '11885610',
+                    'ext': 'm4a',
+                    'title': "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "4ab26f05c1f7291ea460a3920be8021f",
-                "info_dict": {
-                    "id": "11885608",
-                    "ext": "m4a",
-                    "title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': '4ab26f05c1f7291ea460a3920be8021f',
+                'info_dict': {
+                    'id': '11885608',
+                    'ext': 'm4a',
+                    'title': "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "d30b5b5f74217410f4689605c35d1fd7",
-                "info_dict": {
-                    "id": "11885679",
-                    "ext": "m4a",
-                    "title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': 'd30b5b5f74217410f4689605c35d1fd7',
+                'info_dict': {
+                    'id': '11885679',
+                    'ext': 'm4a',
+                    'title': "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "4eb0a669317cd725f6bbd336a29f923a",
-                "info_dict": {
-                    "id": "11885680",
-                    "ext": "m4a",
-                    "title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': '4eb0a669317cd725f6bbd336a29f923a',
+                'info_dict': {
+                    'id': '11885680',
+                    'ext': 'm4a',
+                    'title': "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "1893e872e263a2705558d1d319ad19e8",
-                "info_dict": {
-                    "id": "11885682",
-                    "ext": "m4a",
-                    "title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': '1893e872e263a2705558d1d319ad19e8',
+                'info_dict': {
+                    'id': '11885682',
+                    'ext': 'm4a',
+                    'title': "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "b673c46f47a216ab1741ae8836af5899",
-                "info_dict": {
-                    "id": "11885683",
-                    "ext": "m4a",
-                    "title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': 'b673c46f47a216ab1741ae8836af5899',
+                'info_dict': {
+                    'id': '11885683',
+                    'ext': 'm4a',
+                    'title': "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "1d74534e95df54986da7f5abf7d842b7",
-                "info_dict": {
-                    "id": "11885684",
-                    "ext": "m4a",
-                    "title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': '1d74534e95df54986da7f5abf7d842b7',
+                'info_dict': {
+                    'id': '11885684',
+                    'ext': 'm4a',
+                    'title': "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              },
              {
-                "md5": "f081f47af8f6ae782ed131d38b9cd1c0",
-                "info_dict": {
-                    "id": "11885685",
-                    "ext": "m4a",
-                    "title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
-                    "uploader_id": "ytdl"
+                'md5': 'f081f47af8f6ae782ed131d38b9cd1c0',
+                'info_dict': {
+                    'id': '11885685',
+                    'ext': 'm4a',
+                    'title': "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
+                    'uploader_id': 'ytdl'
                  }
              }
          ]
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index 5dfea0d39c4a45b9e4eedede4e4737700f50bae8..f7339702cad3ed2804fe276b9d1fc6857c368206 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -1,9 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    remove_start,
+    sanitized_Request,
+)
  
  
  class EinthusanIE(InfoExtractor):
@@ -34,27 +37,33 @@ class EinthusanIE(InfoExtractor):
      ]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        webpage = self._download_webpage(url, video_id)
+        video_id = self._match_id(url)
+
+        request = sanitized_Request(url)
+        request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0')
+        webpage = self._download_webpage(request, video_id)
+
+        title = self._html_search_regex(
+            r'<h1><a[^>]+class=["\']movie-title["\'][^>]*>(.+?)</a></h1>',
+            webpage, 'title')
  
-        video_title = self._html_search_regex(
-            r'<h1><a class="movie-title".*?>(.*?)</a></h1>', webpage, 'title')
+        video_id = self._search_regex(
+            r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
  
-        video_url = self._html_search_regex(
-            r'''(?s)jwplayer\("mediaplayer"\)\.setup\({.*?'file': '([^']+)'.*?}\);''',
-            webpage, 'video url')
+        video_url = self._download_webpage(
+            'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
+            % video_id, video_id)
  
          description = self._html_search_meta('description', webpage)
          thumbnail = self._html_search_regex(
              r'''<a class="movie-cover-wrapper".*?><img src=["'](.*?)["'].*?/></a>''',
              webpage, "thumbnail url", fatal=False)
          if thumbnail is not None:
-            thumbnail = thumbnail.replace('..', 'http://www.einthusan.com')
+            thumbnail = compat_urlparse.urljoin(url, remove_start(thumbnail, '..'))
  
          return {
              'id': video_id,
-            'title': video_title,
+            'title': title,
              'url': video_url,
              'thumbnail': thumbnail,
              'description': description,
diff --git a/youtube_dl/extractor/eitb.py b/youtube_dl/extractor/eitb.py

index 2cba825325ad46caf931c5382c54dab263b7c15a..713cb7b329208d3c761b12858cc265b401c16dd0 100644 (file)
--- a/youtube_dl/extractor/eitb.py
+++ b/youtube_dl/extractor/eitb.py
@@ -1,39 +1,88 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
-from ..utils import ExtractorError
+from ..utils import (
+    float_or_none,
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+)
  
  
  class EitbIE(InfoExtractor):
      IE_NAME = 'eitb.tv'
-    _VALID_URL = r'https?://www\.eitb\.tv/(eu/bideoa|es/video)/[^/]+/(?P<playlist_id>\d+)/(?P<chapter_id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?eitb\.tv/(?:eu/bideoa|es/video)/[^/]+/\d+/(?P<id>\d+)'
  
      _TEST = {
-        'add_ie': ['Brightcove'],
-        'url': 'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/2677100210001/2743577154001/lasa-y-zabala-30-anos/',
+        'url': 'http://www.eitb.tv/es/video/60-minutos-60-minutos-2013-2014/4104995148001/4090227752001/lasa-y-zabala-30-anos/',
          'md5': 'edf4436247185adee3ea18ce64c47998',
          'info_dict': {
-            'id': '2743577154001',
+            'id': '4090227752001',
              'ext': 'mp4',
              'title': '60 minutos (Lasa y Zabala, 30 años)',
-            # All videos from eitb has this description in the brightcove info
-            'description': '.',
-            'uploader': 'Euskal Telebista',
+            'description': 'Programa de reportajes de actualidad.',
+            'duration': 3996.76,
+            'timestamp': 1381789200,
+            'upload_date': '20131014',
+            'tags': list,
          },
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        chapter_id = mobj.group('chapter_id')
-        webpage = self._download_webpage(url, chapter_id)
-        bc_url = BrightcoveIE._extract_brightcove_url(webpage)
-        if bc_url is None:
-            raise ExtractorError('Could not extract the Brightcove url')
-        # The BrightcoveExperience object doesn't contain the video id, we set
-        # it manually
-        bc_url += '&%40videoPlayer={0}'.format(chapter_id)
-        return self.url_result(bc_url, BrightcoveIE.ie_key())
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://mam.eitb.eus/mam/REST/ServiceMultiweb/Video/MULTIWEBTV/%s/' % video_id,
+            video_id, 'Downloading video JSON')
+
+        media = video['web_media'][0]
+
+        formats = []
+        for rendition in media['RENDITIONS']:
+            video_url = rendition.get('PMD_URL')
+            if not video_url:
+                continue
+            tbr = float_or_none(rendition.get('ENCODING_RATE'), 1000)
+            format_id = 'http'
+            if tbr:
+                format_id += '-%d' % int(tbr)
+            formats.append({
+                'url': rendition['PMD_URL'],
+                'format_id': format_id,
+                'width': int_or_none(rendition.get('FRAME_WIDTH')),
+                'height': int_or_none(rendition.get('FRAME_HEIGHT')),
+                'tbr': tbr,
+            })
+
+        hls_url = media.get('HLS_SURL')
+        if hls_url:
+            request = sanitized_Request(
+                'http://mam.eitb.eus/mam/REST/ServiceMultiweb/DomainRestrictedSecurity/TokenAuth/',
+                headers={'Referer': url})
+            token_data = self._download_json(
+                request, video_id, 'Downloading auth token', fatal=False)
+            if token_data:
+                token = token_data.get('token')
+                if token:
+                    formats.extend(self._extract_m3u8_formats(
+                        '%s?hdnts=%s' % (hls_url, token), video_id, m3u8_id='hls', fatal=False))
+
+        hds_url = media.get('HDS_SURL')
+        if hds_url:
+            formats.extend(self._extract_f4m_formats(
+                '%s?hdcore=3.7.0' % hds_url.replace('euskalsvod', 'euskalvod'),
+                video_id, f4m_id='hds', fatal=False))
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': media.get('NAME_ES') or media.get('name') or media['NAME_EU'],
+            'description': media.get('SHORT_DESC_ES') or video.get('desc_group') or media.get('SHORT_DESC_EU'),
+            'thumbnail': media.get('STILL_URL') or media.get('THUMBNAIL_URL'),
+            'duration': float_or_none(media.get('LENGTH'), 1000),
+            'timestamp': parse_iso8601(media.get('BROADCST_DATE'), ' '),
+            'tags': media.get('TAGS'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/ellentv.py b/youtube_dl/extractor/ellentv.py

index 02c6a4615c4436fecda86fb152a131f084640612..4c8190d68d712bf702b5e015c95bbdda4643cefd 100644 (file)
--- a/youtube_dl/extractor/ellentv.py
+++ b/youtube_dl/extractor/ellentv.py
@@ -13,12 +13,12 @@ class EllenTVIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
      _TEST = {
          'url': 'http://www.ellentv.com/videos/0-ipq1gsai/',
-        'md5': '8e3c576bf2e9bfff4d76565f56f94c9c',
+        'md5': '4294cf98bc165f218aaa0b89e0fd8042',
          'info_dict': {
              'id': '0_ipq1gsai',
-            'ext': 'mp4',
+            'ext': 'mov',
              'title': 'Fast Fingers of Fate',
-            'description': 'md5:587e79fbbd0d73b148bc596d99ce48e6',
+            'description': 'md5:3539013ddcbfa64b2a6d1b38d910868a',
              'timestamp': 1428035648,
              'upload_date': '20150403',
              'uploader_id': 'batchUser',
@@ -72,7 +72,7 @@ class EllenTVClipsIE(InfoExtractor):
      def _extract_playlist(self, webpage):
          json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
          try:
-            return json.loads("[{" + json_string + "}]")
+            return json.loads('[{' + json_string + '}]')
          except ValueError as ve:
              raise ExtractorError('Failed to download JSON', cause=ve)
  
diff --git a/youtube_dl/extractor/elpais.py b/youtube_dl/extractor/elpais.py

index 00a69e6312aede6069e062c6abff29137939daa9..8c725a4e631860584781b116e72b02dd05813fc2 100644 (file)
--- a/youtube_dl/extractor/elpais.py
+++ b/youtube_dl/extractor/elpais.py
@@ -9,7 +9,7 @@ class ElPaisIE(InfoExtractor):
      _VALID_URL = r'https?://(?:[^.]+\.)?elpais\.com/.*/(?P<id>[^/#?]+)\.html(?:$|[?#])'
      IE_DESC = 'El País'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://blogs.elpais.com/la-voz-de-inaki/2014/02/tiempo-nuevo-recetas-viejas.html',
          'md5': '98406f301f19562170ec071b83433d55',
          'info_dict': {
@@ -19,30 +19,41 @@ class ElPaisIE(InfoExtractor):
              'description': 'De lunes a viernes, a partir de las ocho de la mañana, Iñaki Gabilondo nos cuenta su visión de la actualidad nacional e internacional.',
              'upload_date': '20140206',
          }
-    }
+    }, {
+        'url': 'http://elcomidista.elpais.com/elcomidista/2016/02/24/articulo/1456340311_668921.html#?id_externo_nwl=newsletter_diaria20160303t',
+        'md5': '3bd5b09509f3519d7d9e763179b013de',
+        'info_dict': {
+            'id': '1456340311_668921',
+            'ext': 'mp4',
+            'title': 'Cómo hacer el mejor café con cafetera italiana',
+            'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.',
+            'upload_date': '20160303',
+        }
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
          prefix = self._html_search_regex(
-            r'var url_cache = "([^"]+)";', webpage, 'URL prefix')
+            r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix')
          video_suffix = self._search_regex(
-            r"URLMediaFile = url_cache \+ '([^']+)'", webpage, 'video URL')
+            r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
          video_url = prefix + video_suffix
          thumbnail_suffix = self._search_regex(
-            r"URLMediaStill = url_cache \+ '([^']+)'", webpage, 'thumbnail URL',
-            fatal=False)
+            r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
+            webpage, 'thumbnail URL', fatal=False)
          thumbnail = (
              None if thumbnail_suffix is None
              else prefix + thumbnail_suffix)
          title = self._html_search_regex(
-            '<h2 class="entry-header entry-title.*?>(.*?)</h2>',
+            (r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title',
+             r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'),
              webpage, 'title')
-        date_str = self._search_regex(
+        upload_date = unified_strdate(self._search_regex(
              r'<p class="date-header date-int updated"\s+title="([^"]+)">',
-            webpage, 'upload date', fatal=False)
-        upload_date = (None if date_str is None else unified_strdate(date_str))
+            webpage, 'upload date', default=None) or self._html_search_meta(
+            'datePublished', webpage, 'timestamp'))
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/engadget.py b/youtube_dl/extractor/engadget.py

index 4ea37ebd9f2072ea7610cfc4a8630e120fcfa81b..e5e57d48518d3dd3999dad650d0c32406079ce33 100644 (file)
--- a/youtube_dl/extractor/engadget.py
+++ b/youtube_dl/extractor/engadget.py
@@ -1,21 +1,13 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import (
-    url_basename,
-)
  
  
  class EngadgetIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://www.engadget.com/
-        (?:video/5min/(?P<id>\d+)|
-            [\d/]+/.*?)
-        '''
+    _VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)'
  
      _TEST = {
-        'url': 'http://www.engadget.com/video/5min/518153925/',
+        'url': 'http://www.engadget.com/video/518153925/',
          'md5': 'c6820d4828a5064447a4d9fc73f312c9',
          'info_dict': {
              'id': '518153925',
@@ -27,15 +19,4 @@ class EngadgetIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-
-        if video_id is not None:
-            return self.url_result('5min:%s' % video_id)
-        else:
-            title = url_basename(url)
-            webpage = self._download_webpage(url, title)
-            ids = re.findall(r'<iframe[^>]+?playList=(\d+)', webpage)
-            return {
-                '_type': 'playlist',
-                'title': title,
-                'entries': [self.url_result('5min:%s' % vid) for vid in ids]
-            }
+        return self.url_result('5min:%s' % video_id)
diff --git a/youtube_dl/extractor/eroprofile.py b/youtube_dl/extractor/eroprofile.py

index 316033cf18b42cefead780ceca15b361ebbddac7..297f8a6f5fa4371415554bfe6c44d0745c262491 100644 (file)
--- a/youtube_dl/extractor/eroprofile.py
+++ b/youtube_dl/extractor/eroprofile.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import compat_urllib_parse_urlencode
  from ..utils import (
      ExtractorError,
      unescapeHTML
@@ -43,7 +43,7 @@ class EroProfileIE(InfoExtractor):
          if username is None:
              return
  
-        query = compat_urllib_parse.urlencode({
+        query = compat_urllib_parse_urlencode({
              'username': username,
              'password': password,
              'url': 'http://www.eroprofile.com/',
@@ -71,8 +71,7 @@ class EroProfileIE(InfoExtractor):
  
          m = re.search(r'You must be logged in to view this video\.', webpage)
          if m:
-            raise ExtractorError(
-                'This video requires login. Please specify a username and password and try again.', expected=True)
+            self.raise_login_required('This video requires login')
  
          video_id = self._search_regex(
              [r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
diff --git a/youtube_dl/extractor/escapist.py b/youtube_dl/extractor/escapist.py

index c85b4c458d95882f56675fa135aab1f3492b6194..a3d7bbbcb3f45a4c098397d0622fc59324412fcc 100644 (file)
--- a/youtube_dl/extractor/escapist.py
+++ b/youtube_dl/extractor/escapist.py
@@ -3,13 +3,12 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-
  from ..utils import (
      determine_ext,
      clean_html,
      int_or_none,
      float_or_none,
+    sanitized_Request,
  )
  
  
@@ -75,7 +74,7 @@ class EscapistIE(InfoExtractor):
          video_id = ims_video['videoID']
          key = ims_video['hash']
  
-        config_req = compat_urllib_request.Request(
+        config_req = sanitized_Request(
              'http://www.escapistmagazine.com/videos/'
              'vidconfig.php?videoID=%s&hash=%s' % (video_id, key))
          config_req.add_header('Referer', url)
diff --git a/youtube_dl/extractor/espn.py b/youtube_dl/extractor/espn.py

index e6f8f0337fa562335b577db3d1cee2d75684fe31..db4b263bcbf40a9cb133d2a9729e4fe07292bae3 100644 (file)
--- a/youtube_dl/extractor/espn.py
+++ b/youtube_dl/extractor/espn.py
@@ -1,18 +1,30 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import remove_end
  
  
  class ESPNIE(InfoExtractor):
      _VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
-    _WORKING = False
      _TESTS = [{
          'url': 'http://espn.go.com/video/clip?id=10365079',
          'info_dict': {
              'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
              'ext': 'mp4',
-            'title': 'dm_140128_30for30Shorts___JudgingJewellv2',
-            'description': '',
+            'title': '30 for 30 Shorts: Judging Jewell',
+            'description': None,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        # intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
+        'url': 'http://espn.go.com/video/clip?id=2743663',
+        'info_dict': {
+            'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
+            'ext': 'mp4',
+            'title': 'Must-See Moments: Best of the MLS season',
          },
          'params': {
              # m3u8 download
@@ -41,15 +53,26 @@ class ESPNIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          video_id = self._search_regex(
-            r'class="video-play-button"[^>]+data-id="(\d+)',
-            webpage, 'video id')
+            r'class=(["\']).*?video-play-button.*?\1[^>]+data-id=["\'](?P<id>\d+)',
+            webpage, 'video id', group='id')
  
+        cms = 'espn'
+        if 'data-source="intl"' in webpage:
+            cms = 'intl'
+        player_url = 'https://espn.go.com/video/iframe/twitter/?id=%s&cms=%s' % (video_id, cms)
          player = self._download_webpage(
-            'https://espn.go.com/video/iframe/twitter/?id=%s' % video_id, video_id)
+            player_url, video_id)
  
          pcode = self._search_regex(
              r'["\']pcode=([^"\']+)["\']', player, 'pcode')
  
-        return self.url_result(
-            'ooyalaexternal:espn:%s:%s' % (video_id, pcode),
-            'OoyalaExternal')
+        title = remove_end(
+            self._og_search_title(webpage),
+            '- ESPN Video').strip()
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'ooyalaexternal:%s:%s:%s' % (cms, video_id, pcode),
+            'ie_key': 'OoyalaExternal',
+            'title': title,
+        }
diff --git a/youtube_dl/extractor/esri.py b/youtube_dl/extractor/esri.py

new file mode 100644 (file)

index 0000000..d4205d7
--- /dev/null
+++ b/youtube_dl/extractor/esri.py
@@ -0,0 +1,74 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    parse_filesize,
+    unified_strdate,
+)
+
+
+class EsriVideoIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.esri\.com/watch/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'https://video.esri.com/watch/1124/arcgis-online-_dash_-developing-applications',
+        'md5': 'd4aaf1408b221f1b38227a9bbaeb95bc',
+        'info_dict': {
+            'id': '1124',
+            'ext': 'mp4',
+            'title': 'ArcGIS Online - Developing Applications',
+            'description': 'Jeremy Bartley demonstrates how to develop applications with ArcGIS Online.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 185,
+            'upload_date': '20120419',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        formats = []
+        for width, height, content in re.findall(
+                r'(?s)<li><strong>(\d+)x(\d+):</strong>(.+?)</li>', webpage):
+            for video_url, ext, filesize in re.findall(
+                    r'<a[^>]+href="([^"]+)">([^<]+)&nbsp;\(([^<]+)\)</a>', content):
+                formats.append({
+                    'url': compat_urlparse.urljoin(url, video_url),
+                    'ext': ext.lower(),
+                    'format_id': '%s-%s' % (ext.lower(), height),
+                    'width': int(width),
+                    'height': int(height),
+                    'filesize_approx': parse_filesize(filesize),
+                })
+        self._sort_formats(formats)
+
+        title = self._html_search_meta('title', webpage, 'title')
+        description = self._html_search_meta(
+            'description', webpage, 'description', fatal=False)
+
+        thumbnail = self._html_search_meta('thumbnail', webpage, 'thumbnail', fatal=False)
+        if thumbnail:
+            thumbnail = re.sub(r'_[st]\.jpg$', '_x.jpg', thumbnail)
+
+        duration = int_or_none(self._search_regex(
+            [r'var\s+videoSeconds\s*=\s*(\d+)', r"'duration'\s*:\s*(\d+)"],
+            webpage, 'duration', fatal=False))
+
+        upload_date = unified_strdate(self._html_search_meta(
+            'last-modified', webpage, 'upload date', fatal=False))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'upload_date': upload_date,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/europa.py b/youtube_dl/extractor/europa.py

new file mode 100644 (file)

index 0000000..adc4391
--- /dev/null
+++ b/youtube_dl/extractor/europa.py
@@ -0,0 +1,93 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    orderedSet,
+    parse_duration,
+    qualities,
+    unified_strdate,
+    xpath_text
+)
+
+
+class EuropaIE(InfoExtractor):
+    _VALID_URL = r'https?://ec\.europa\.eu/avservices/(?:video/player|audio/audioDetails)\.cfm\?.*?\bref=(?P<id>[A-Za-z0-9-]+)'
+    _TESTS = [{
+        'url': 'http://ec.europa.eu/avservices/video/player.cfm?ref=I107758',
+        'md5': '574f080699ddd1e19a675b0ddf010371',
+        'info_dict': {
+            'id': 'I107758',
+            'ext': 'mp4',
+            'title': 'TRADE - Wikileaks on TTIP',
+            'description': 'NEW  LIVE EC Midday press briefing of 11/08/2015',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'upload_date': '20150811',
+            'duration': 34,
+            'view_count': int,
+            'formats': 'mincount:3',
+        }
+    }, {
+        'url': 'http://ec.europa.eu/avservices/video/player.cfm?sitelang=en&ref=I107786',
+        'only_matching': True,
+    }, {
+        'url': 'http://ec.europa.eu/avservices/audio/audioDetails.cfm?ref=I-109295&sitelang=en',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        playlist = self._download_xml(
+            'http://ec.europa.eu/avservices/video/player/playlist.cfm?ID=%s' % video_id, video_id)
+
+        def get_item(type_, preference):
+            items = {}
+            for item in playlist.findall('./info/%s/item' % type_):
+                lang, label = xpath_text(item, 'lg', default=None), xpath_text(item, 'label', default=None)
+                if lang and label:
+                    items[lang] = label.strip()
+            for p in preference:
+                if items.get(p):
+                    return items[p]
+
+        query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+        preferred_lang = query.get('sitelang', ('en', ))[0]
+
+        preferred_langs = orderedSet((preferred_lang, 'en', 'int'))
+
+        title = get_item('title', preferred_langs) or video_id
+        description = get_item('description', preferred_langs)
+        thumbnmail = xpath_text(playlist, './info/thumburl', 'thumbnail')
+        upload_date = unified_strdate(xpath_text(playlist, './info/date', 'upload date'))
+        duration = parse_duration(xpath_text(playlist, './info/duration', 'duration'))
+        view_count = int_or_none(xpath_text(playlist, './info/views', 'views'))
+
+        language_preference = qualities(preferred_langs[::-1])
+
+        formats = []
+        for file_ in playlist.findall('./files/file'):
+            video_url = xpath_text(file_, './url')
+            if not video_url:
+                continue
+            lang = xpath_text(file_, './lg')
+            formats.append({
+                'url': video_url,
+                'format_id': lang,
+                'format_note': xpath_text(file_, './lglabel'),
+                'language_preference': language_preference(lang)
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnmail,
+            'upload_date': upload_date,
+            'duration': duration,
+            'view_count': view_count,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/everyonesmixtape.py b/youtube_dl/extractor/everyonesmixtape.py

index d872d828fcc8e10fea4770e1e56ab21cda027336..84a9b750e56254950198d16ed61e3e0af2756f91 100644 (file)
--- a/youtube_dl/extractor/everyonesmixtape.py
+++ b/youtube_dl/extractor/everyonesmixtape.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -16,14 +14,14 @@ class EveryonesMixtapeIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://everyonesmixtape.com/#/mix/m7m0jJAbMQi/5',
-        "info_dict": {
+        'info_dict': {
              'id': '5bfseWNmlds',
              'ext': 'mp4',
-            "title": "Passion Pit - \"Sleepyhead\" (Official Music Video)",
-            "uploader": "FKR.TV",
-            "uploader_id": "frenchkissrecords",
-            "description": "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
-            "upload_date": "20081015"
+            'title': "Passion Pit - \"Sleepyhead\" (Official Music Video)",
+            'uploader': 'FKR.TV',
+            'uploader_id': 'frenchkissrecords',
+            'description': "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
+            'upload_date': '20081015'
          },
          'params': {
              'skip_download': True,  # This is simply YouTube
@@ -42,7 +40,7 @@ class EveryonesMixtapeIE(InfoExtractor):
          playlist_id = mobj.group('id')
  
          pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id
-        pllist_req = compat_urllib_request.Request(pllist_url)
+        pllist_req = sanitized_Request(pllist_url)
          pllist_req.add_header('X-Requested-With', 'XMLHttpRequest')
  
          playlist_list = self._download_json(
@@ -55,7 +53,7 @@ class EveryonesMixtapeIE(InfoExtractor):
              raise ExtractorError('Playlist id not found')
  
          pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no
-        pl_req = compat_urllib_request.Request(pl_url)
+        pl_req = sanitized_Request(pl_url)
          pl_req.add_header('X-Requested-With', 'XMLHttpRequest')
          playlist = self._download_json(
              pl_req, playlist_id, note='Downloading playlist info')
diff --git a/youtube_dl/extractor/exfm.py b/youtube_dl/extractor/exfm.py

index 4de02aee9b2ef46d54e43964560410b82ac3aff9..09ed4f2b5644c5c8d55ea944d98e1684acacc125 100644 (file)
--- a/youtube_dl/extractor/exfm.py
+++ b/youtube_dl/extractor/exfm.py
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  class ExfmIE(InfoExtractor):
      IE_NAME = 'exfm'
      IE_DESC = 'ex.fm'
-    _VALID_URL = r'http://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
      _SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
      _TESTS = [
          {
@@ -41,7 +41,7 @@ class ExfmIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          song_id = mobj.group('id')
-        info_url = "http://ex.fm/api/v3/song/%s" % song_id
+        info_url = 'http://ex.fm/api/v3/song/%s' % song_id
          info = self._download_json(info_url, song_id)['song']
          song_url = info['url']
          if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
diff --git a/youtube_dl/extractor/expotv.py b/youtube_dl/extractor/expotv.py

index a38b773e868205b55740c26103f6665560f9f4c3..1585a03bb9235a520c63e84c306294f6958dfe13 100644 (file)
--- a/youtube_dl/extractor/expotv.py
+++ b/youtube_dl/extractor/expotv.py
@@ -33,20 +33,27 @@ class ExpoTVIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
          player_key = self._search_regex(
              r'<param name="playerKey" value="([^"]+)"', webpage, 'player key')
-        config_url = 'http://client.expotv.com/video/config/%s/%s' % (
-            video_id, player_key)
          config = self._download_json(
-            config_url, video_id,
-            note='Downloading video configuration')
+            'http://client.expotv.com/video/config/%s/%s' % (video_id, player_key),
+            video_id, 'Downloading video configuration')
  
-        formats = [{
-            'url': fcfg['file'],
-            'height': int_or_none(fcfg.get('height')),
-            'format_note': fcfg.get('label'),
-            'ext': self._search_regex(
-                r'filename=.*\.([a-z0-9_A-Z]+)&', fcfg['file'],
-                'file extension', default=None),
-        } for fcfg in config['sources']]
+        formats = []
+        for fcfg in config['sources']:
+            media_url = fcfg.get('file')
+            if not media_url:
+                continue
+            if fcfg.get('type') == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    media_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls'))
+            else:
+                formats.append({
+                    'url': media_url,
+                    'height': int_or_none(fcfg.get('height')),
+                    'format_id': fcfg.get('label'),
+                    'ext': self._search_regex(
+                        r'filename=.*\.([a-z0-9_A-Z]+)&', media_url,
+                        'file extension', default=None) or fcfg.get('type'),
+                })
          self._sort_formats(formats)
  
          title = self._og_search_title(webpage)
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

new file mode 100644 (file)

index 0000000..6de3438
--- /dev/null
+++ b/youtube_dl/extractor/extractors.py
@@ -0,0 +1,995 @@
+# flake8: noqa
+from __future__ import unicode_literals
+
+from .abc import ABCIE
+from .abc7news import Abc7NewsIE
+from .academicearth import AcademicEarthCourseIE
+from .acast import (
+    ACastIE,
+    ACastChannelIE,
+)
+from .addanime import AddAnimeIE
+from .adobetv import (
+    AdobeTVIE,
+    AdobeTVShowIE,
+    AdobeTVChannelIE,
+    AdobeTVVideoIE,
+)
+from .adultswim import AdultSwimIE
+from .aenetworks import AENetworksIE
+from .aftonbladet import AftonbladetIE
+from .airmozilla import AirMozillaIE
+from .aljazeera import AlJazeeraIE
+from .alphaporno import AlphaPornoIE
+from .animeondemand import AnimeOnDemandIE
+from .anitube import AnitubeIE
+from .anysex import AnySexIE
+from .aol import (
+    AolIE,
+    AolFeaturesIE,
+)
+from .allocine import AllocineIE
+from .aparat import AparatIE
+from .appleconnect import AppleConnectIE
+from .appletrailers import (
+    AppleTrailersIE,
+    AppleTrailersSectionIE,
+)
+from .archiveorg import ArchiveOrgIE
+from .ard import (
+    ARDIE,
+    ARDMediathekIE,
+    SportschauIE,
+)
+from .arte import (
+    ArteTvIE,
+    ArteTVPlus7IE,
+    ArteTVCreativeIE,
+    ArteTVConcertIE,
+    ArteTVInfoIE,
+    ArteTVFutureIE,
+    ArteTVCinemaIE,
+    ArteTVDDCIE,
+    ArteTVMagazineIE,
+    ArteTVEmbedIE,
+)
+from .atresplayer import AtresPlayerIE
+from .atttechchannel import ATTTechChannelIE
+from .audimedia import AudiMediaIE
+from .audioboom import AudioBoomIE
+from .audiomack import AudiomackIE, AudiomackAlbumIE
+from .azubu import AzubuIE, AzubuLiveIE
+from .baidu import BaiduVideoIE
+from .bambuser import BambuserIE, BambuserChannelIE
+from .bandcamp import BandcampIE, BandcampAlbumIE
+from .bbc import (
+    BBCCoUkIE,
+    BBCCoUkArticleIE,
+    BBCIE,
+)
+from .beeg import BeegIE
+from .behindkink import BehindKinkIE
+from .beatportpro import BeatportProIE
+from .bet import BetIE
+from .bigflix import BigflixIE
+from .bild import BildIE
+from .bilibili import BiliBiliIE
+from .biobiochiletv import BioBioChileTVIE
+from .bleacherreport import (
+    BleacherReportIE,
+    BleacherReportCMSIE,
+)
+from .blinkx import BlinkxIE
+from .bloomberg import BloombergIE
+from .bokecc import BokeCCIE
+from .bpb import BpbIE
+from .br import BRIE
+from .bravotv import BravoTVIE
+from .breakcom import BreakIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
+from .buzzfeed import BuzzFeedIE
+from .byutv import BYUtvIE
+from .c56 import C56IE
+from .camdemy import (
+    CamdemyIE,
+    CamdemyFolderIE
+)
+from .camwithher import CamWithHerIE
+from .canalplus import CanalplusIE
+from .canalc2 import Canalc2IE
+from .canvas import CanvasIE
+from .cbc import (
+    CBCIE,
+    CBCPlayerIE,
+)
+from .cbs import CBSIE
+from .cbsinteractive import CBSInteractiveIE
+from .cbsnews import (
+    CBSNewsIE,
+    CBSNewsLiveVideoIE,
+)
+from .cbssports import CBSSportsIE
+from .ccc import CCCIE
+from .cda import CDAIE
+from .ceskatelevize import CeskaTelevizeIE
+from .channel9 import Channel9IE
+from .chaturbate import ChaturbateIE
+from .chilloutzone import ChilloutzoneIE
+from .chirbit import (
+    ChirbitIE,
+    ChirbitProfileIE,
+)
+from .cinchcast import CinchcastIE
+from .cinemassacre import CinemassacreIE
+from .cliprs import ClipRsIE
+from .clipfish import ClipfishIE
+from .cliphunter import CliphunterIE
+from .clipsyndicate import ClipsyndicateIE
+from .cloudy import CloudyIE
+from .clubic import ClubicIE
+from .clyp import ClypIE
+from .cmt import CMTIE
+from .cnbc import CNBCIE
+from .cnn import (
+    CNNIE,
+    CNNBlogsIE,
+    CNNArticleIE,
+)
+from .collegehumor import CollegeHumorIE
+from .collegerama import CollegeRamaIE
+from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
+from .comcarcoff import ComCarCoffIE
+from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
+from .commonprotocols import RtmpIE
+from .condenast import CondeNastIE
+from .cracked import CrackedIE
+from .crackle import CrackleIE
+from .criterion import CriterionIE
+from .crooksandliars import CrooksAndLiarsIE
+from .crunchyroll import (
+    CrunchyrollIE,
+    CrunchyrollShowPlaylistIE
+)
+from .cspan import CSpanIE
+from .ctsnews import CtsNewsIE
+from .cultureunplugged import CultureUnpluggedIE
+from .cwtv import CWTVIE
+from .dailymotion import (
+    DailymotionIE,
+    DailymotionPlaylistIE,
+    DailymotionUserIE,
+    DailymotionCloudIE,
+)
+from .daum import (
+    DaumIE,
+    DaumClipIE,
+    DaumPlaylistIE,
+    DaumUserIE,
+)
+from .dbtv import DBTVIE
+from .dcn import (
+    DCNIE,
+    DCNVideoIE,
+    DCNLiveIE,
+    DCNSeasonIE,
+)
+from .dctp import DctpTvIE
+from .deezer import DeezerPlaylistIE
+from .democracynow import DemocracynowIE
+from .dfb import DFBIE
+from .dhm import DHMIE
+from .dotsub import DotsubIE
+from .douyutv import DouyuTVIE
+from .dplay import DPlayIE
+from .dramafever import (
+    DramaFeverIE,
+    DramaFeverSeriesIE,
+)
+from .dreisat import DreiSatIE
+from .drbonanza import DRBonanzaIE
+from .drtuber import DrTuberIE
+from .drtv import DRTVIE
+from .dvtv import DVTVIE
+from .dumpert import DumpertIE
+from .defense import DefenseGouvFrIE
+from .discovery import DiscoveryIE
+from .dispeak import DigitallySpeakingIE
+from .dropbox import DropboxIE
+from .dw import (
+    DWIE,
+    DWArticleIE,
+)
+from .eagleplatform import EaglePlatformIE
+from .ebaumsworld import EbaumsWorldIE
+from .echomsk import EchoMskIE
+from .ehow import EHowIE
+from .eighttracks import EightTracksIE
+from .einthusan import EinthusanIE
+from .eitb import EitbIE
+from .ellentv import (
+    EllenTVIE,
+    EllenTVClipsIE,
+)
+from .elpais import ElPaisIE
+from .embedly import EmbedlyIE
+from .engadget import EngadgetIE
+from .eporner import EpornerIE
+from .eroprofile import EroProfileIE
+from .escapist import EscapistIE
+from .espn import ESPNIE
+from .esri import EsriVideoIE
+from .europa import EuropaIE
+from .everyonesmixtape import EveryonesMixtapeIE
+from .exfm import ExfmIE
+from .expotv import ExpoTVIE
+from .extremetube import ExtremeTubeIE
+from .facebook import FacebookIE
+from .faz import FazIE
+from .fc2 import FC2IE
+from .fczenit import FczenitIE
+from .firstpost import FirstpostIE
+from .firsttv import FirstTVIE
+from .fivemin import FiveMinIE
+from .fivetv import FiveTVIE
+from .fktv import FKTVIE
+from .flickr import FlickrIE
+from .folketinget import FolketingetIE
+from .footyroom import FootyRoomIE
+from .fourtube import FourTubeIE
+from .fox import FOXIE
+from .foxgay import FoxgayIE
+from .foxnews import FoxNewsIE
+from .foxsports import FoxSportsIE
+from .franceculture import (
+    FranceCultureIE,
+    FranceCultureEmissionIE,
+)
+from .franceinter import FranceInterIE
+from .francetv import (
+    PluzzIE,
+    FranceTvInfoIE,
+    FranceTVIE,
+    GenerationQuoiIE,
+    CultureboxIE,
+)
+from .freesound import FreesoundIE
+from .freespeech import FreespeechIE
+from .freevideo import FreeVideoIE
+from .funimation import FunimationIE
+from .funnyordie import FunnyOrDieIE
+from .gameinformer import GameInformerIE
+from .gamekings import GamekingsIE
+from .gameone import (
+    GameOneIE,
+    GameOnePlaylistIE,
+)
+from .gamersyde import GamersydeIE
+from .gamespot import GameSpotIE
+from .gamestar import GameStarIE
+from .gametrailers import GametrailersIE
+from .gazeta import GazetaIE
+from .gdcvault import GDCVaultIE
+from .generic import GenericIE
+from .gfycat import GfycatIE
+from .giantbomb import GiantBombIE
+from .giga import GigaIE
+from .glide import GlideIE
+from .globo import (
+    GloboIE,
+    GloboArticleIE,
+)
+from .godtube import GodTubeIE
+from .goldenmoustache import GoldenMoustacheIE
+from .golem import GolemIE
+from .googledrive import GoogleDriveIE
+from .googleplus import GooglePlusIE
+from .googlesearch import GoogleSearchIE
+from .goshgay import GoshgayIE
+from .gputechconf import GPUTechConfIE
+from .groupon import GrouponIE
+from .hark import HarkIE
+from .hbo import HBOIE
+from .hearthisat import HearThisAtIE
+from .heise import HeiseIE
+from .hellporno import HellPornoIE
+from .helsinki import HelsinkiIE
+from .hentaistigma import HentaiStigmaIE
+from .historicfilms import HistoricFilmsIE
+from .hitbox import HitboxIE, HitboxLiveIE
+from .hornbunny import HornBunnyIE
+from .hotnewhiphop import HotNewHipHopIE
+from .hotstar import HotStarIE
+from .howcast import HowcastIE
+from .howstuffworks import HowStuffWorksIE
+from .huffpost import HuffPostIE
+from .hypem import HypemIE
+from .iconosquare import IconosquareIE
+from .ign import (
+    IGNIE,
+    OneUPIE,
+    PCMagIE,
+)
+from .imdb import (
+    ImdbIE,
+    ImdbListIE
+)
+from .imgur import (
+    ImgurIE,
+    ImgurAlbumIE,
+)
+from .ina import InaIE
+from .indavideo import (
+    IndavideoIE,
+    IndavideoEmbedIE,
+)
+from .infoq import InfoQIE
+from .instagram import InstagramIE, InstagramUserIE
+from .internetvideoarchive import InternetVideoArchiveIE
+from .iprima import IPrimaIE
+from .iqiyi import IqiyiIE
+from .ir90tv import Ir90TvIE
+from .ivi import (
+    IviIE,
+    IviCompilationIE
+)
+from .ivideon import IvideonIE
+from .izlesene import IzleseneIE
+from .jeuxvideo import JeuxVideoIE
+from .jove import JoveIE
+from .jwplatform import JWPlatformIE
+from .jpopsukitv import JpopsukiIE
+from .kaltura import KalturaIE
+from .kanalplay import KanalPlayIE
+from .kankan import KankanIE
+from .karaoketv import KaraoketvIE
+from .karrierevideos import KarriereVideosIE
+from .keezmovies import KeezMoviesIE
+from .khanacademy import KhanAcademyIE
+from .kickstarter import KickStarterIE
+from .keek import KeekIE
+from .konserthusetplay import KonserthusetPlayIE
+from .kontrtube import KontrTubeIE
+from .krasview import KrasViewIE
+from .ku6 import Ku6IE
+from .kusi import KUSIIE
+from .kuwo import (
+    KuwoIE,
+    KuwoAlbumIE,
+    KuwoChartIE,
+    KuwoSingerIE,
+    KuwoCategoryIE,
+    KuwoMvIE,
+)
+from .la7 import LA7IE
+from .laola1tv import Laola1TvIE
+from .lecture2go import Lecture2GoIE
+from .lemonde import LemondeIE
+from .leeco import (
+    LeIE,
+    LePlaylistIE,
+    LetvCloudIE,
+)
+from .libsyn import LibsynIE
+from .lifenews import (
+    LifeNewsIE,
+    LifeEmbedIE,
+)
+from .limelight import (
+    LimelightMediaIE,
+    LimelightChannelIE,
+    LimelightChannelListIE,
+)
+from .liveleak import LiveLeakIE
+from .livestream import (
+    LivestreamIE,
+    LivestreamOriginalIE,
+    LivestreamShortenerIE,
+)
+from .lnkgo import LnkGoIE
+from .lovehomeporn import LoveHomePornIE
+from .lrt import LRTIE
+from .lynda import (
+    LyndaIE,
+    LyndaCourseIE
+)
+from .m6 import M6IE
+from .macgamestore import MacGameStoreIE
+from .mailru import MailRuIE
+from .makerschannel import MakersChannelIE
+from .makertv import MakerTVIE
+from .malemotion import MalemotionIE
+from .matchtv import MatchTVIE
+from .mdr import MDRIE
+from .metacafe import MetacafeIE
+from .metacritic import MetacriticIE
+from .mgoon import MgoonIE
+from .mgtv import MGTVIE
+from .minhateca import MinhatecaIE
+from .ministrygrid import MinistryGridIE
+from .minoto import MinotoIE
+from .miomio import MioMioIE
+from .mit import TechTVMITIE, MITIE, OCWMITIE
+from .mitele import MiTeleIE
+from .mixcloud import (
+    MixcloudIE,
+    MixcloudUserIE,
+    MixcloudPlaylistIE,
+    MixcloudStreamIE,
+)
+from .mlb import MLBIE
+from .mnet import MnetIE
+from .mpora import MporaIE
+from .moevideo import MoeVideoIE
+from .mofosex import MofosexIE
+from .mojvideo import MojvideoIE
+from .moniker import MonikerIE
+from .morningstar import MorningstarIE
+from .motherless import MotherlessIE
+from .motorsport import MotorsportIE
+from .movieclips import MovieClipsIE
+from .moviezine import MoviezineIE
+from .mtv import (
+    MTVIE,
+    MTVServicesEmbeddedIE,
+    MTVIggyIE,
+    MTVDEIE,
+)
+from .muenchentv import MuenchenTVIE
+from .musicplayon import MusicPlayOnIE
+from .muzu import MuzuTVIE
+from .mwave import MwaveIE
+from .myspace import MySpaceIE, MySpaceAlbumIE
+from .myspass import MySpassIE
+from .myvi import MyviIE
+from .myvideo import MyVideoIE
+from .myvidster import MyVidsterIE
+from .nationalgeographic import (
+    NationalGeographicIE,
+    NationalGeographicChannelIE,
+)
+from .naver import NaverIE
+from .nba import NBAIE
+from .nbc import (
+    CSNNEIE,
+    NBCIE,
+    NBCNewsIE,
+    NBCSportsIE,
+    NBCSportsVPlayerIE,
+    MSNBCIE,
+)
+from .ndr import (
+    NDRIE,
+    NJoyIE,
+    NDREmbedBaseIE,
+    NDREmbedIE,
+    NJoyEmbedIE,
+)
+from .ndtv import NDTVIE
+from .netzkino import NetzkinoIE
+from .nerdcubed import NerdCubedFeedIE
+from .neteasemusic import (
+    NetEaseMusicIE,
+    NetEaseMusicAlbumIE,
+    NetEaseMusicSingerIE,
+    NetEaseMusicListIE,
+    NetEaseMusicMvIE,
+    NetEaseMusicProgramIE,
+    NetEaseMusicDjRadioIE,
+)
+from .newgrounds import NewgroundsIE
+from .newstube import NewstubeIE
+from .nextmedia import (
+    NextMediaIE,
+    NextMediaActionNewsIE,
+    AppleDailyIE,
+)
+from .nextmovie import NextMovieIE
+from .nfb import NFBIE
+from .nfl import NFLIE
+from .nhl import (
+    NHLVideocenterIE,
+    NHLNewsIE,
+    NHLVideocenterCategoryIE,
+    NHLIE,
+)
+from .nick import NickIE
+from .niconico import NiconicoIE, NiconicoPlaylistIE
+from .ninegag import NineGagIE
+from .noco import NocoIE
+from .normalboots import NormalbootsIE
+from .nosvideo import NosVideoIE
+from .nova import NovaIE
+from .novamov import (
+    AuroraVidIE,
+    CloudTimeIE,
+    NowVideoIE,
+    VideoWeedIE,
+    WholeCloudIE,
+)
+from .nowness import (
+    NownessIE,
+    NownessPlaylistIE,
+    NownessSeriesIE,
+)
+from .nowtv import (
+    NowTVIE,
+    NowTVListIE,
+)
+from .noz import NozIE
+from .npo import (
+    NPOIE,
+    NPOLiveIE,
+    NPORadioIE,
+    NPORadioFragmentIE,
+    SchoolTVIE,
+    VPROIE,
+    WNLIE
+)
+from .npr import NprIE
+from .nrk import (
+    NRKIE,
+    NRKPlaylistIE,
+    NRKSkoleIE,
+    NRKTVIE,
+)
+from .ntvde import NTVDeIE
+from .ntvru import NTVRuIE
+from .nytimes import (
+    NYTimesIE,
+    NYTimesArticleIE,
+)
+from .nuvid import NuvidIE
+from .odnoklassniki import OdnoklassnikiIE
+from .oktoberfesttv import OktoberfestTVIE
+from .onionstudios import OnionStudiosIE
+from .ooyala import (
+    OoyalaIE,
+    OoyalaExternalIE,
+)
+from .openload import OpenloadIE
+from .ora import OraTVIE
+from .orf import (
+    ORFTVthekIE,
+    ORFOE1IE,
+    ORFFM4IE,
+    ORFIPTVIE,
+)
+from .pandoratv import PandoraTVIE
+from .parliamentliveuk import ParliamentLiveUKIE
+from .patreon import PatreonIE
+from .pbs import PBSIE
+from .people import PeopleIE
+from .periscope import PeriscopeIE
+from .philharmoniedeparis import PhilharmonieDeParisIE
+from .phoenix import PhoenixIE
+from .photobucket import PhotobucketIE
+from .pinkbike import PinkbikeIE
+from .pladform import PladformIE
+from .played import PlayedIE
+from .playfm import PlayFMIE
+from .plays import PlaysTVIE
+from .playtvak import PlaytvakIE
+from .playvid import PlayvidIE
+from .playwire import PlaywireIE
+from .pluralsight import (
+    PluralsightIE,
+    PluralsightCourseIE,
+)
+from .podomatic import PodomaticIE
+from .porn91 import Porn91IE
+from .pornhd import PornHdIE
+from .pornhub import (
+    PornHubIE,
+    PornHubPlaylistIE,
+    PornHubUserVideosIE,
+)
+from .pornotube import PornotubeIE
+from .pornovoisines import PornoVoisinesIE
+from .pornoxo import PornoXOIE
+from .presstv import PressTVIE
+from .primesharetv import PrimeShareTVIE
+from .promptfile import PromptFileIE
+from .prosiebensat1 import ProSiebenSat1IE
+from .puls4 import Puls4IE
+from .pyvideo import PyvideoIE
+from .qqmusic import (
+    QQMusicIE,
+    QQMusicSingerIE,
+    QQMusicAlbumIE,
+    QQMusicToplistIE,
+    QQMusicPlaylistIE,
+)
+from .r7 import R7IE
+from .radiode import RadioDeIE
+from .radiojavan import RadioJavanIE
+from .radiobremen import RadioBremenIE
+from .radiofrance import RadioFranceIE
+from .rai import (
+    RaiTVIE,
+    RaiIE,
+)
+from .rbmaradio import RBMARadioIE
+from .rds import RDSIE
+from .redtube import RedTubeIE
+from .regiotv import RegioTVIE
+from .restudy import RestudyIE
+from .reverbnation import ReverbNationIE
+from .revision3 import Revision3IE
+from .rice import RICEIE
+from .ringtv import RingTVIE
+from .ro220 import Ro220IE
+from .rottentomatoes import RottenTomatoesIE
+from .roxwel import RoxwelIE
+from .rtbf import RTBFIE
+from .rte import RteIE, RteRadioIE
+from .rtlnl import RtlNlIE
+from .rtl2 import RTL2IE
+from .rtp import RTPIE
+from .rts import RTSIE
+from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
+from .rtvnh import RTVNHIE
+from .ruhd import RUHDIE
+from .ruleporn import RulePornIE
+from .rutube import (
+    RutubeIE,
+    RutubeChannelIE,
+    RutubeEmbedIE,
+    RutubeMovieIE,
+    RutubePersonIE,
+)
+from .rutv import RUTVIE
+from .ruutu import RuutuIE
+from .sandia import SandiaIE
+from .safari import (
+    SafariIE,
+    SafariApiIE,
+    SafariCourseIE,
+)
+from .sapo import SapoIE
+from .savefrom import SaveFromIE
+from .sbs import SBSIE
+from .scivee import SciVeeIE
+from .screencast import ScreencastIE
+from .screencastomatic import ScreencastOMaticIE
+from .screenjunkies import ScreenJunkiesIE
+from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
+from .senateisvp import SenateISVPIE
+from .servingsys import ServingSysIE
+from .sexu import SexuIE
+from .sexykarma import SexyKarmaIE
+from .shahid import ShahidIE
+from .shared import SharedIE
+from .sharesix import ShareSixIE
+from .sina import SinaIE
+from .skynewsarabia import (
+    SkyNewsArabiaIE,
+    SkyNewsArabiaArticleIE,
+)
+from .slideshare import SlideshareIE
+from .slutload import SlutloadIE
+from .smotri import (
+    SmotriIE,
+    SmotriCommunityIE,
+    SmotriUserIE,
+    SmotriBroadcastIE,
+)
+from .snagfilms import (
+    SnagFilmsIE,
+    SnagFilmsEmbedIE,
+)
+from .snotr import SnotrIE
+from .sohu import SohuIE
+from .soundcloud import (
+    SoundcloudIE,
+    SoundcloudSetIE,
+    SoundcloudUserIE,
+    SoundcloudPlaylistIE,
+    SoundcloudSearchIE
+)
+from .soundgasm import (
+    SoundgasmIE,
+    SoundgasmProfileIE
+)
+from .southpark import (
+    SouthParkIE,
+    SouthParkDeIE,
+    SouthParkDkIE,
+    SouthParkEsIE,
+    SouthParkNlIE
+)
+from .spankbang import SpankBangIE
+from .spankwire import SpankwireIE
+from .spiegel import SpiegelIE, SpiegelArticleIE
+from .spiegeltv import SpiegeltvIE
+from .spike import SpikeIE
+from .stitcher import StitcherIE
+from .sport5 import Sport5IE
+from .sportbox import (
+    SportBoxIE,
+    SportBoxEmbedIE,
+)
+from .sportdeutschland import SportDeutschlandIE
+from .srgssr import (
+    SRGSSRIE,
+    SRGSSRPlayIE,
+)
+from .srmediathek import SRMediathekIE
+from .ssa import SSAIE
+from .stanfordoc import StanfordOpenClassroomIE
+from .steam import SteamIE
+from .streamcloud import StreamcloudIE
+from .streamcz import StreamCZIE
+from .streetvoice import StreetVoiceIE
+from .sunporno import SunPornoIE
+from .svt import (
+    SVTIE,
+    SVTPlayIE,
+)
+from .swrmediathek import SWRMediathekIE
+from .syfy import SyfyIE
+from .sztvhu import SztvHuIE
+from .tagesschau import TagesschauIE
+from .tapely import TapelyIE
+from .tass import TassIE
+from .tdslifeway import TDSLifewayIE
+from .teachertube import (
+    TeacherTubeIE,
+    TeacherTubeUserIE,
+)
+from .teachingchannel import TeachingChannelIE
+from .teamcoco import TeamcocoIE
+from .techtalks import TechTalksIE
+from .ted import TEDIE
+from .tele13 import Tele13IE
+from .telebruxelles import TeleBruxellesIE
+from .telecinco import TelecincoIE
+from .telegraaf import TelegraafIE
+from .telemb import TeleMBIE
+from .teletask import TeleTaskIE
+from .testurl import TestURLIE
+from .tf1 import TF1IE
+from .theintercept import TheInterceptIE
+from .theplatform import (
+    ThePlatformIE,
+    ThePlatformFeedIE,
+)
+from .thescene import TheSceneIE
+from .thesixtyone import TheSixtyOneIE
+from .thestar import TheStarIE
+from .thisamericanlife import ThisAmericanLifeIE
+from .thisav import ThisAVIE
+from .tinypic import TinyPicIE
+from .tlc import TlcDeIE
+from .tmz import (
+    TMZIE,
+    TMZArticleIE,
+)
+from .tnaflix import (
+    TNAFlixNetworkEmbedIE,
+    TNAFlixIE,
+    EMPFlixIE,
+    MovieFapIE,
+)
+from .toggle import ToggleIE
+from .thvideo import (
+    THVideoIE,
+    THVideoPlaylistIE
+)
+from .toutv import TouTvIE
+from .toypics import ToypicsUserIE, ToypicsIE
+from .traileraddict import TrailerAddictIE
+from .trilulilu import TriluliluIE
+from .trollvids import TrollvidsIE
+from .trutube import TruTubeIE
+from .tube8 import Tube8IE
+from .tubitv import TubiTvIE
+from .tudou import (
+    TudouIE,
+    TudouPlaylistIE,
+    TudouAlbumIE,
+)
+from .tumblr import TumblrIE
+from .tunein import (
+    TuneInClipIE,
+    TuneInStationIE,
+    TuneInProgramIE,
+    TuneInTopicIE,
+    TuneInShortenerIE,
+)
+from .turbo import TurboIE
+from .tutv import TutvIE
+from .tv2 import (
+    TV2IE,
+    TV2ArticleIE,
+)
+from .tv3 import TV3IE
+from .tv4 import TV4IE
+from .tvc import (
+    TVCIE,
+    TVCArticleIE,
+)
+from .tvigle import TvigleIE
+from .tvland import TVLandIE
+from .tvp import TvpIE, TvpSeriesIE
+from .tvplay import TVPlayIE
+from .tweakers import TweakersIE
+from .twentyfourvideo import TwentyFourVideoIE
+from .twentymin import TwentyMinutenIE
+from .twentytwotracks import (
+    TwentyTwoTracksIE,
+    TwentyTwoTracksGenreIE
+)
+from .twitch import (
+    TwitchVideoIE,
+    TwitchChapterIE,
+    TwitchVodIE,
+    TwitchProfileIE,
+    TwitchPastBroadcastsIE,
+    TwitchBookmarksIE,
+    TwitchStreamIE,
+)
+from .twitter import (
+    TwitterCardIE,
+    TwitterIE,
+    TwitterAmplifyIE,
+)
+from .udemy import (
+    UdemyIE,
+    UdemyCourseIE
+)
+from .udn import UDNEmbedIE
+from .digiteka import DigitekaIE
+from .unistra import UnistraIE
+from .urort import UrortIE
+from .usatoday import USATodayIE
+from .ustream import UstreamIE, UstreamChannelIE
+from .ustudio import UstudioIE
+from .varzesh3 import Varzesh3IE
+from .vbox7 import Vbox7IE
+from .veehd import VeeHDIE
+from .veoh import VeohIE
+from .vessel import VesselIE
+from .vesti import VestiIE
+from .vevo import VevoIE
+from .vgtv import (
+    BTArticleIE,
+    BTVestlendingenIE,
+    VGTVIE,
+)
+from .vh1 import VH1IE
+from .vice import (
+    ViceIE,
+    ViceShowIE,
+)
+from .viddler import ViddlerIE
+from .videodetective import VideoDetectiveIE
+from .videofyme import VideofyMeIE
+from .videomega import VideoMegaIE
+from .videomore import (
+    VideomoreIE,
+    VideomoreVideoIE,
+    VideomoreSeasonIE,
+)
+from .videopremium import VideoPremiumIE
+from .videott import VideoTtIE
+from .vidme import (
+    VidmeIE,
+    VidmeUserIE,
+    VidmeUserLikesIE,
+)
+from .vidzi import VidziIE
+from .vier import VierIE, VierVideosIE
+from .viewster import ViewsterIE
+from .viidea import ViideaIE
+from .vimeo import (
+    VimeoIE,
+    VimeoAlbumIE,
+    VimeoChannelIE,
+    VimeoGroupsIE,
+    VimeoLikesIE,
+    VimeoOndemandIE,
+    VimeoReviewIE,
+    VimeoUserIE,
+    VimeoWatchLaterIE,
+)
+from .vimple import VimpleIE
+from .vine import (
+    VineIE,
+    VineUserIE,
+)
+from .viki import (
+    VikiIE,
+    VikiChannelIE,
+)
+from .vk import (
+    VKIE,
+    VKUserVideosIE,
+)
+from .vlive import VLiveIE
+from .vodlocker import VodlockerIE
+from .voicerepublic import VoiceRepublicIE
+from .voxmedia import VoxMediaIE
+from .vporn import VpornIE
+from .vrt import VRTIE
+from .vube import VubeIE
+from .vuclip import VuClipIE
+from .vulture import VultureIE
+from .walla import WallaIE
+from .washingtonpost import WashingtonPostIE
+from .wat import WatIE
+from .wdr import (
+    WDRIE,
+    WDRMobileIE,
+    WDRMausIE,
+)
+from .webofstories import (
+    WebOfStoriesIE,
+    WebOfStoriesPlaylistIE,
+)
+from .weibo import WeiboIE
+from .weiqitv import WeiqiTVIE
+from .wimp import WimpIE
+from .wistia import WistiaIE
+from .worldstarhiphop import WorldStarHipHopIE
+from .wrzuta import WrzutaIE
+from .wsj import WSJIE
+from .xbef import XBefIE
+from .xboxclips import XboxClipsIE
+from .xfileshare import XFileShareIE
+from .xhamster import (
+    XHamsterIE,
+    XHamsterEmbedIE,
+)
+from .xminus import XMinusIE
+from .xnxx import XNXXIE
+from .xstream import XstreamIE
+from .xtube import XTubeUserIE, XTubeIE
+from .xuite import XuiteIE
+from .xvideos import XVideosIE
+from .xxxymovies import XXXYMoviesIE
+from .yahoo import (
+    YahooIE,
+    YahooSearchIE,
+)
+from .yam import YamIE
+from .yandexmusic import (
+    YandexMusicTrackIE,
+    YandexMusicAlbumIE,
+    YandexMusicPlaylistIE,
+)
+from .yesjapan import YesJapanIE
+from .yinyuetai import YinYueTaiIE
+from .ynet import YnetIE
+from .youjizz import YouJizzIE
+from .youku import YoukuIE
+from .youporn import YouPornIE
+from .yourupload import YourUploadIE
+from .youtube import (
+    YoutubeIE,
+    YoutubeChannelIE,
+    YoutubeFavouritesIE,
+    YoutubeHistoryIE,
+    YoutubeLiveIE,
+    YoutubePlaylistIE,
+    YoutubePlaylistsIE,
+    YoutubeRecommendedIE,
+    YoutubeSearchDateIE,
+    YoutubeSearchIE,
+    YoutubeSearchURLIE,
+    YoutubeShowIE,
+    YoutubeSubscriptionsIE,
+    YoutubeTruncatedIDIE,
+    YoutubeTruncatedURLIE,
+    YoutubeUserIE,
+    YoutubeWatchLaterIE,
+)
+from .zapiks import ZapiksIE
+from .zdf import ZDFIE, ZDFChannelIE
+from .zingmp3 import (
+    ZingMp3SongIE,
+    ZingMp3AlbumIE,
+)
+from .zippcast import ZippCastIE
diff --git a/youtube_dl/extractor/extremetube.py b/youtube_dl/extractor/extremetube.py

index c826a5404a4f7da298927460f1f8e41dd013d3a7..3403581fddf08a0928a8e4c5b22e740117646bd2 100644 (file)
--- a/youtube_dl/extractor/extremetube.py
+++ b/youtube_dl/extractor/extremetube.py
@@ -3,23 +3,20 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_request,
-)
  from ..utils import (
-    qualities,
+    int_or_none,
+    sanitized_Request,
      str_to_int,
  )
  
  
  class ExtremeTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>extremetube\.com/.*?video/.+?(?P<id>[0-9]+))(?:[/?&]|$)'
+    _VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
      _TESTS = [{
          'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
          'md5': '344d0c6d50e2f16b06e49ca011d8ac69',
          'info_dict': {
-            'id': '652431',
+            'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
              'ext': 'mp4',
              'title': 'Music Video 14 british euro brit european cumshots swallow',
              'uploader': 'unknown',
@@ -29,14 +26,18 @@ class ExtremeTubeIE(InfoExtractor):
      }, {
          'url': 'http://www.extremetube.com/gay/video/abcde-1234',
          'only_matching': True,
+    }, {
+        'url': 'http://www.extremetube.com/video/latina-slut-fucked-by-fat-black-dick',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.extremetube.com/video/652431',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        url = 'http://www.' + mobj.group('url')
+        video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
@@ -49,20 +50,36 @@ class ExtremeTubeIE(InfoExtractor):
              r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
              webpage, 'view count', fatal=False))
  
-        flash_vars = compat_parse_qs(self._search_regex(
-            r'<param[^>]+?name="flashvars"[^>]+?value="([^"]+)"', webpage, 'flash vars'))
+        flash_vars = self._parse_json(
+            self._search_regex(
+                r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flash vars'),
+            video_id)
  
          formats = []
-        quality = qualities(['180p', '240p', '360p', '480p', '720p', '1080p'])
-        for k, vals in flash_vars.items():
-            m = re.match(r'quality_(?P<quality>[0-9]+p)$', k)
-            if m is not None:
-                formats.append({
-                    'format_id': m.group('quality'),
-                    'quality': quality(m.group('quality')),
-                    'url': vals[0],
+        for quality_key, video_url in flash_vars.items():
+            height = int_or_none(self._search_regex(
+                r'quality_(\d+)[pP]$', quality_key, 'height', default=None))
+            if not height:
+                continue
+            f = {
+                'url': video_url,
+            }
+            mobj = re.search(
+                r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+', video_url)
+            if mobj:
+                height = int(mobj.group('height'))
+                bitrate = int(mobj.group('bitrate'))
+                f.update({
+                    'format_id': '%dp-%dk' % (height, bitrate),
+                    'height': height,
+                    'tbr': bitrate,
                  })
-
+            else:
+                f.update({
+                    'format_id': '%dp' % height,
+                    'height': height,
+                })
+            formats.append(f)
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index e17bb9aeac51e2e10e2b68b4391d3022af35bcd5..f5bbd39d2d0e90996c118e3fae325034fc2bbb6d 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -6,35 +6,54 @@ import socket
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_etree_fromstring,
      compat_http_client,
-    compat_str,
      compat_urllib_error,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
+    compat_urllib_parse_unquote_plus,
  )
  from ..utils import (
+    error_to_compat_str,
      ExtractorError,
-    int_or_none,
      limit_length,
+    sanitized_Request,
      urlencode_postdata,
+    get_element_by_id,
+    clean_html,
  )
  
  
  class FacebookIE(InfoExtractor):
      _VALID_URL = r'''(?x)
-        https?://(?:\w+\.)?facebook\.com/
-        (?:[^#]*?\#!/)?
-        (?:
-            (?:video/video\.php|photo\.php|video\.php|video/embed)\?(?:.*?)
-            (?:v|video_id)=|
-            [^/]+/videos/(?:[^/]+/)?
-        )
-        (?P<id>[0-9]+)
-        (?:.*)'''
+                (?:
+                    https?://
+                        (?:\w+\.)?facebook\.com/
+                        (?:[^#]*?\#!/)?
+                        (?:
+                            (?:
+                                video/video\.php|
+                                photo\.php|
+                                video\.php|
+                                video/embed|
+                                story\.php
+                            )\?(?:.*?)(?:v|video_id|story_fbid)=|
+                            [^/]+/videos/(?:[^/]+/)?|
+                            [^/]+/posts/|
+                            groups/[^/]+/permalink/
+                        )|
+                    facebook:
+                )
+                (?P<id>[0-9]+)
+                '''
      _LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
      _CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
      _NETRC_MACHINE = 'facebook'
      IE_NAME = 'facebook'
+
+    _CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
+
+    _VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
+
      _TESTS = [{
          'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
          'md5': '6a40d33c0eccbb1af76cf0485a052659',
@@ -42,6 +61,7 @@ class FacebookIE(InfoExtractor):
              'id': '637842556329505',
              'ext': 'mp4',
              'title': 're:Did you know Kei Nishikori is the first Asian man to ever reach a Grand Slam',
+            'uploader': 'Tennis on Facebook',
          }
      }, {
          'note': 'Video without discernible title',
@@ -50,10 +70,48 @@ class FacebookIE(InfoExtractor):
              'id': '274175099429670',
              'ext': 'mp4',
              'title': 'Facebook video #274175099429670',
+            'uploader': 'Asif Nawab Butt',
          },
          'expected_warnings': [
              'title'
          ]
+    }, {
+        'note': 'Video with DASH manifest',
+        'url': 'https://www.facebook.com/video.php?v=957955867617029',
+        'md5': '54706e4db4f5ad58fbad82dde1f1213f',
+        'info_dict': {
+            'id': '957955867617029',
+            'ext': 'mp4',
+            'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...',
+            'uploader': 'Demy de Zeeuw',
+        },
+    }, {
+        'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
+        'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
+        'info_dict': {
+            'id': '544765982287235',
+            'ext': 'mp4',
+            'title': '"What are you doing running in the snow?"',
+            'uploader': 'FailArmy',
+        }
+    }, {
+        'url': 'https://m.facebook.com/story.php?story_fbid=1035862816472149&id=116132035111903',
+        'md5': '1deb90b6ac27f7efcf6d747c8a27f5e3',
+        'info_dict': {
+            'id': '1035862816472149',
+            'ext': 'mp4',
+            'title': 'What the Flock Is Going On In New Zealand  Credit: ViralHog',
+            'uploader': 'S. Saint',
+        },
+    }, {
+        'note': 'swf params escaped',
+        'url': 'https://www.facebook.com/barackobama/posts/10153664894881749',
+        'md5': '97ba073838964d12c70566e0085c2b91',
+        'info_dict': {
+            'id': '10153664894881749',
+            'ext': 'mp4',
+            'title': 'Facebook video #10153664894881749',
+        },
      }, {
          'url': 'https://www.facebook.com/video.php?v=10204634152394104',
          'only_matching': True,
@@ -63,6 +121,12 @@ class FacebookIE(InfoExtractor):
      }, {
          'url': 'https://www.facebook.com/ChristyClarkForBC/videos/vb.22819070941/10153870694020942/?type=2&theater',
          'only_matching': True,
+    }, {
+        'url': 'facebook:544765982287235',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
+        'only_matching': True,
      }]
  
      def _login(self):
@@ -70,8 +134,8 @@ class FacebookIE(InfoExtractor):
          if useremail is None:
              return
  
-        login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
-        login_page_req.add_header('Cookie', 'locale=en_US')
+        login_page_req = sanitized_Request(self._LOGIN_URL)
+        self._set_cookie('facebook.com', 'locale', 'en_US')
          login_page = self._download_webpage(login_page_req, None,
                                              note='Downloading login page',
                                              errnote='Unable to download login page')
@@ -91,43 +155,80 @@ class FacebookIE(InfoExtractor):
              'timezone': '-60',
              'trynum': '1',
          }
-        request = compat_urllib_request.Request(self._LOGIN_URL, urlencode_postdata(login_form))
+        request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          try:
              login_results = self._download_webpage(request, None,
                                                     note='Logging in', errnote='unable to fetch login page')
              if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
-                self._downloader.report_warning('unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
+                error = self._html_search_regex(
+                    r'(?s)<div[^>]+class=(["\']).*?login_error_box.*?\1[^>]*><div[^>]*>.*?</div><div[^>]*>(?P<error>.+?)</div>',
+                    login_results, 'login error', default=None, group='error')
+                if error:
+                    raise ExtractorError('Unable to login: %s' % error, expected=True)
+                self._downloader.report_warning('unable to log in: bad username/password, or exceeded login rate limit (~3/min). Check credentials or wait.')
+                return
+
+            fb_dtsg = self._search_regex(
+                r'name="fb_dtsg" value="(.+?)"', login_results, 'fb_dtsg', default=None)
+            h = self._search_regex(
+                r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h', default=None)
+
+            if not fb_dtsg or not h:
                  return
  
              check_form = {
-                'fb_dtsg': self._search_regex(r'name="fb_dtsg" value="(.+?)"', login_results, 'fb_dtsg'),
-                'h': self._search_regex(
-                    r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h'),
+                'fb_dtsg': fb_dtsg,
+                'h': h,
                  'name_action_selected': 'dont_save',
              }
-            check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
+            check_req = sanitized_Request(self._CHECKPOINT_URL, urlencode_postdata(check_form))
              check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
              check_response = self._download_webpage(check_req, None,
                                                      note='Confirming login')
              if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
-                self._downloader.report_warning('Unable to confirm login, you have to login in your brower and authorize the login.')
+                self._downloader.report_warning('Unable to confirm login, you have to login in your browser and authorize the login.')
          except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
-            self._downloader.report_warning('unable to log in: %s' % compat_str(err))
+            self._downloader.report_warning('unable to log in: %s' % error_to_compat_str(err))
              return
  
      def _real_initialize(self):
          self._login()
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        url = 'https://www.facebook.com/video/video.php?v=%s' % video_id
-        webpage = self._download_webpage(url, video_id)
+    def _extract_from_url(self, url, video_id, fatal_if_no_video=True):
+        req = sanitized_Request(url)
+        req.add_header('User-Agent', self._CHROME_USER_AGENT)
+        webpage = self._download_webpage(req, video_id)
+
+        video_data = None
  
-        BEFORE = '{swf.addParam(param[0], param[1]);});\n'
+        BEFORE = '{swf.addParam(param[0], param[1]);});'
          AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
-        m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
-        if not m:
+        m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
+        if m:
+            swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
+            data = dict(json.loads(swf_params))
+            params_raw = compat_urllib_parse_unquote(data['params'])
+            video_data = json.loads(params_raw)['video_data']
+
+        def video_data_list2dict(video_data):
+            ret = {}
+            for item in video_data:
+                format_id = item['stream_type']
+                ret.setdefault(format_id, []).append(item)
+            return ret
+
+        if not video_data:
+            server_js_data = self._parse_json(self._search_regex(
+                r'handleServerJS\(({.+})\);', webpage, 'server js data', default='{}'), video_id)
+            for item in server_js_data.get('instances', []):
+                if item[1][0] == 'VideoConfig':
+                    video_data = video_data_list2dict(item[2][0]['videoData'])
+                    break
+
+        if not video_data:
+            if not fatal_if_no_video:
+                return webpage, False
              m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
              if m_msg is not None:
                  raise ExtractorError(
@@ -135,37 +236,74 @@ class FacebookIE(InfoExtractor):
                      expected=True)
              else:
                  raise ExtractorError('Cannot parse data')
-        data = dict(json.loads(m.group(1)))
-        params_raw = compat_urllib_parse_unquote(data['params'])
-        params = json.loads(params_raw)
-        video_data = params['video_data'][0]
  
          formats = []
-        for quality in ['sd', 'hd']:
-            src = video_data.get('%s_src' % quality)
-            if src is not None:
-                formats.append({
-                    'format_id': quality,
-                    'url': src,
-                })
+        for format_id, f in video_data.items():
+            if not f or not isinstance(f, list):
+                continue
+            for quality in ('sd', 'hd'):
+                for src_type in ('src', 'src_no_ratelimit'):
+                    src = f[0].get('%s_%s' % (quality, src_type))
+                    if src:
+                        preference = -10 if format_id == 'progressive' else 0
+                        if quality == 'hd':
+                            preference += 5
+                        formats.append({
+                            'format_id': '%s_%s_%s' % (format_id, quality, src_type),
+                            'url': src,
+                            'preference': preference,
+                        })
+            dash_manifest = f[0].get('dash_manifest')
+            if dash_manifest:
+                formats.extend(self._parse_mpd_formats(
+                    compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
          if not formats:
              raise ExtractorError('Cannot find video formats')
  
+        self._sort_formats(formats)
+
          video_title = self._html_search_regex(
              r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, 'title',
              default=None)
          if not video_title:
              video_title = self._html_search_regex(
                  r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
-                webpage, 'alternative title', fatal=False)
+                webpage, 'alternative title', default=None)
              video_title = limit_length(video_title, 80)
          if not video_title:
              video_title = 'Facebook video #%s' % video_id
+        uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
  
-        return {
+        info_dict = {
              'id': video_id,
              'title': video_title,
              'formats': formats,
-            'duration': int_or_none(video_data.get('video_duration')),
-            'thumbnail': video_data.get('thumbnail_src'),
+            'uploader': uploader,
          }
+
+        return webpage, info_dict
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url
+        webpage, info_dict = self._extract_from_url(real_url, video_id, fatal_if_no_video=False)
+
+        if info_dict:
+            return info_dict
+
+        if '/posts/' in url:
+            entries = [
+                self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
+                for vid in self._parse_json(
+                    self._search_regex(
+                        r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
+                        webpage, 'video ids', group='ids'),
+                    video_id)]
+
+            return self.playlist_result(entries, video_id)
+        else:
+            _, info_dict = self._extract_from_url(
+                self._VIDEO_PAGE_TEMPLATE % video_id,
+                video_id, fatal_if_no_video=True)
+            return info_dict
diff --git a/youtube_dl/extractor/faz.py b/youtube_dl/extractor/faz.py

index cebdd0193a82eaccc673dffe9d001f766e9e31d1..fd535457dc56a589eaf9e062dc40fe5374735020 100644 (file)
--- a/youtube_dl/extractor/faz.py
+++ b/youtube_dl/extractor/faz.py
@@ -2,6 +2,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import (
+    xpath_element,
+    xpath_text,
+    int_or_none,
+)
  
  
  class FazIE(InfoExtractor):
@@ -37,31 +42,32 @@ class FazIE(InfoExtractor):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
+        description = self._og_search_description(webpage)
          config_xml_url = self._search_regex(
-            r'writeFLV\(\'(.+?)\',', webpage, 'config xml url')
+            r'videoXMLURL\s*=\s*"([^"]+)', webpage, 'config xml url')
          config = self._download_xml(
              config_xml_url, video_id, 'Downloading config xml')
  
-        encodings = config.find('ENCODINGS')
+        encodings = xpath_element(config, 'ENCODINGS', 'encodings', True)
          formats = []
          for pref, code in enumerate(['LOW', 'HIGH', 'HQ']):
-            encoding = encodings.find(code)
-            if encoding is None:
-                continue
-            encoding_url = encoding.find('FILENAME').text
-            formats.append({
-                'url': encoding_url,
-                'format_id': code.lower(),
-                'quality': pref,
-            })
+            encoding = xpath_element(encodings, code)
+            if encoding is not None:
+                encoding_url = xpath_text(encoding, 'FILENAME')
+                if encoding_url:
+                    formats.append({
+                        'url': encoding_url,
+                        'format_id': code.lower(),
+                        'quality': pref,
+                        'tbr': int_or_none(xpath_text(encoding, 'AVERAGEBITRATE')),
+                    })
          self._sort_formats(formats)
  
-        descr = self._html_search_regex(
-            r'<p class="Content Copy">(.*?)</p>', webpage, 'description', fatal=False)
          return {
              'id': video_id,
              'title': self._og_search_title(webpage),
              'formats': formats,
-            'description': descr,
-            'thumbnail': config.find('STILL/STILL_BIG').text,
+            'description': description.strip() if description else None,
+            'thumbnail': xpath_text(config, 'STILL/STILL_BIG'),
+            'duration': int_or_none(xpath_text(config, 'DURATION')),
          }
diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py

index 1ccc1a9642bb09ed84bdd2747c665520ab3c98c4..c7d69ff1f980de46bd4ecce96e4ac301b1f1be59 100644 (file)
--- a/youtube_dl/extractor/fc2.py
+++ b/youtube_dl/extractor/fc2.py
@@ -5,17 +5,18 @@ import hashlib
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
      compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class FC2IE(InfoExtractor):
-    _VALID_URL = r'^http://video\.fc2\.com/(?:[^/]+/)?content/(?P<id>[^/]+)'
+    _VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
      IE_NAME = 'fc2'
      _NETRC_MACHINE = 'fc2'
      _TESTS = [{
@@ -35,8 +36,11 @@ class FC2IE(InfoExtractor):
          'params': {
              'username': 'ytdl@yt-dl.org',
              'password': '(snip)',
-            'skip': 'requires actual password'
-        }
+        },
+        'skip': 'requires actual password',
+    }, {
+        'url': 'http://video.fc2.com/en/a/content/20130926eZpARwsF',
+        'only_matching': True,
      }]
  
      def _login(self):
@@ -52,11 +56,8 @@ class FC2IE(InfoExtractor):
              'Submit': ' Login ',
          }
  
-        # Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
-        # chokes on unicode
-        login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k, v in login_form_strs.items())
-        login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
-        request = compat_urllib_request.Request(
+        login_data = urlencode_postdata(login_form_strs)
+        request = sanitized_Request(
              'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)
  
          login_results = self._download_webpage(request, None, note='Logging in', errnote='Unable to log in')
@@ -65,7 +66,7 @@ class FC2IE(InfoExtractor):
              return False
  
          # this is also needed
-        login_redir = compat_urllib_request.Request('http://id.fc2.com/?mode=redirect&login=done')
+        login_redir = sanitized_Request('http://id.fc2.com/?mode=redirect&login=done')
          self._download_webpage(
              login_redir, None, note='Login redirect', errnote='Login redirect failed')
  
@@ -80,13 +81,13 @@ class FC2IE(InfoExtractor):
  
          title = self._og_search_title(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
-        refer = url.replace('/content/', '/a/content/')
+        refer = url.replace('/content/', '/a/content/') if '/a/content/' not in url else url
  
          mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
  
          info_url = (
-            "http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
-            format(video_id, mimi, compat_urllib_request.quote(refer, safe='').replace('.', '%2E')))
+            'http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&'.
+            format(video_id, mimi, compat_urllib_request.quote(refer, safe=b'').replace('.', '%2E')))
  
          info_webpage = self._download_webpage(
              info_url, video_id, note='Downloading info page')
diff --git a/youtube_dl/extractor/fczenit.py b/youtube_dl/extractor/fczenit.py

new file mode 100644 (file)

index 0000000..f1f150e
--- /dev/null
+++ b/youtube_dl/extractor/fczenit.py
@@ -0,0 +1,41 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class FczenitIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/gl(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://fc-zenit.ru/video/gl6785/',
+        'md5': '458bacc24549173fe5a5aa29174a5606',
+        'info_dict': {
+            'id': '6785',
+            'ext': 'mp4',
+            'title': '«Зенит-ТВ»: как Олег Шатов играл против «Урала»',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        video_title = self._html_search_regex(r'<div class=\"photoalbum__title\">([^<]+)', webpage, 'title')
+
+        bitrates_raw = self._html_search_regex(r'bitrates:.*\n(.*)\]', webpage, 'video URL')
+        bitrates = re.findall(r'url:.?\'(.+?)\'.*?bitrate:.?([0-9]{3}?)', bitrates_raw)
+
+        formats = [{
+            'url': furl,
+            'tbr': tbr,
+        } for furl, tbr in bitrates]
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_title,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/firstpost.py b/youtube_dl/extractor/firstpost.py

index 298227d5793770c82d8868256d655fa7ea3dc31c..e8936cb2468f78bd2c1b59008a13e9411204380e 100644 (file)
--- a/youtube_dl/extractor/firstpost.py
+++ b/youtube_dl/extractor/firstpost.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class FirstpostIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
+    _VALID_URL = r'https?://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
  
      _TEST = {
          'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',
diff --git a/youtube_dl/extractor/firsttv.py b/youtube_dl/extractor/firsttv.py

index 510d4b108944d1f220c45ddc2fbe85cdad6114ca..88bca100763337011a444369543a1479296e097e 100644 (file)
--- a/youtube_dl/extractor/firsttv.py
+++ b/youtube_dl/extractor/firsttv.py
@@ -2,78 +2,133 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_xpath
+from ..utils import (
+    int_or_none,
+    qualities,
+    unified_strdate,
+    xpath_attr,
+    xpath_element,
+    xpath_text,
+    xpath_with_ns,
+)
  
  
  class FirstTVIE(InfoExtractor):
      IE_NAME = '1tv'
      IE_DESC = 'Первый канал'
-    _VALID_URL = r'http://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://www.1tv.ru/videoarchive/73390',
-        'md5': '777f525feeec4806130f4f764bc18a4f',
+        # single format via video_materials.json API
+        'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
+        'md5': '82a2777648acae812d58b3f5bd42882b',
          'info_dict': {
-            'id': '73390',
+            'id': '35930',
              'ext': 'mp4',
-            'title': 'Ð\9eÐ»Ð¸Ð¼Ð¿Ð¸Ð¹Ñ\81ÐºÐ¸Ðµ ÐºÐ°Ð½Ð°Ñ\82Ð½Ñ\8bÐµ Ð´Ð¾Ñ\80Ð¾Ð³Ð¸',
-            'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+            'title': 'Ð\93Ð¾Ñ\81Ñ\82Ñ\8c Ð\9bÑ\8eÐ´Ð¼Ð¸Ð»Ð° Ð¡ÐµÐ½Ñ\87Ð¸Ð½Ð°. Ð\9dÐ°ÐµÐ´Ð¸Ð½Ðµ Ñ\81Ð¾ Ð²Ñ\81ÐµÐ¼Ð¸. Ð\92Ñ\8bÐ¿Ñ\83Ñ\81Ðº Ð¾Ñ\82 12.02.2015',
+            'description': 'md5:357933adeede13b202c7c21f91b871b2',
              'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
-            'duration': 149,
-            'like_count': int,
-            'dislike_count': int,
+            'upload_date': '20150212',
+            'duration': 2694,
          },
-        'skip': 'Only works from Russia',
      }, {
-        'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
-        'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
+        # multiple formats via video_materials.json API
+        'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
          'info_dict': {
-            'id': '35930',
+            'id': '113641',
              'ext': 'mp4',
-            'title': 'Ð\9dÐ°ÐµÐ´Ð¸Ð½Ðµ Ñ\81Ð¾ Ð²Ñ\81ÐµÐ¼Ð¸. Ð\9bÑ\8eÐ´Ð¼Ð¸Ð»Ð° Ð¡ÐµÐ½Ñ\87Ð¸Ð½Ð°',
-            'description': 'md5:89553aed1d641416001fe8d450f06cb9',
+            'title': 'Ð\92ÐµÑ\81ÐµÐ½Ð½Ñ\8fÑ\8f Ð°Ð»Ð»ÐµÑ\80Ð³Ð¸Ñ\8f. Ð\94Ð¾Ð±Ñ\80Ð¾Ðµ Ñ\83Ñ\82Ñ\80Ð¾. Ð¤Ñ\80Ð°Ð³Ð¼ÐµÐ½Ñ\82 Ð²Ñ\8bÐ¿Ñ\83Ñ\81ÐºÐ° Ð¾Ñ\82 07.04.2016',
+            'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
              'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
-            'duration': 2694,
+            'upload_date': '20160407',
+            'duration': 179,
+            'formats': 'mincount:3',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'skip': 'Only works from Russia',
+    }, {
+        # single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
+        'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
+        'md5': '519d306c5b5669761fd8906c39dbee23',
+        'info_dict': {
+            'id': '47038',
+            'ext': 'mp4',
+            'title': '"Побег". Второй сезон. 3 серия',
+            'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
+            'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
+            'upload_date': '20120516',
+            'duration': 3080,
+        },
+    }, {
+        'url': 'http://www.1tv.ru/videoarchive/9967',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id, 'Downloading page')
+        # Videos with multiple formats only available via this API
+        video = self._download_json(
+            'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
+            video_id, fatal=False)
  
-        video_url = self._html_search_regex(
-            r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
-            webpage, 'video URL')
+        description, thumbnail, upload_date, duration = [None] * 4
  
-        title = self._html_search_regex(
-            [r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
-             r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
-        description = self._html_search_regex(
-            r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
-            webpage, 'description', default=None) or self._html_search_meta(
-                'description', webpage, 'description')
+        if video:
+            item = video[0]
+            title = item['title']
+            quality = qualities(('ld', 'sd', 'hd', ))
+            formats = [{
+                'url': f['src'],
+                'format_id': f.get('name'),
+                'quality': quality(f.get('name')),
+            } for f in item['mbr'] if f.get('src')]
+            thumbnail = item.get('poster')
+        else:
+            # Some videos are not available via video_materials.json
+            video = self._download_xml(
+                'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
+                video_id)
+
+            NS_MAP = {
+                'media': 'http://search.yahoo.com/mrss/',
+            }
  
-        thumbnail = self._og_search_thumbnail(webpage)
-        duration = self._og_search_property(
-            'video:duration', webpage,
-            'video duration', fatal=False)
+            item = xpath_element(video, './channel/item', fatal=True)
+            title = xpath_text(item, './title', fatal=True)
+            formats = [{
+                'url': content.attrib['url'],
+            } for content in item.findall(
+                compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
+            thumbnail = xpath_attr(
+                item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
  
-        like_count = self._html_search_regex(
-            r'title="Понравилось".*?/></label> \[(\d+)\]',
-            webpage, 'like count', default=None)
-        dislike_count = self._html_search_regex(
-            r'title="Не понравилось".*?/></label> \[(\d+)\]',
-            webpage, 'dislike count', default=None)
+        self._sort_formats(formats)
+
+        webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
+        if webpage:
+            title = self._html_search_regex(
+                (r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
+                 r"'title'\s*:\s*'([^']+)'"),
+                webpage, 'title', default=None) or title
+            description = self._html_search_regex(
+                r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
+                webpage, 'description', default=None) or self._html_search_meta(
+                'description', webpage, 'description')
+            thumbnail = thumbnail or self._og_search_thumbnail(webpage)
+            duration = int_or_none(self._html_search_meta(
+                'video:duration', webpage, 'video duration', fatal=False))
+            upload_date = unified_strdate(self._html_search_meta(
+                'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
  
          return {
              'id': video_id,
-            'url': video_url,
              'thumbnail': thumbnail,
              'title': title,
              'description': description,
+            'upload_date': upload_date,
              'duration': int_or_none(duration),
-            'like_count': int_or_none(like_count),
-            'dislike_count': int_or_none(dislike_count),
+            'formats': formats
          }
diff --git a/youtube_dl/extractor/fivemin.py b/youtube_dl/extractor/fivemin.py

index 157094e8c99a598a66a98e51ee70e70502494057..6b834541636533d808ce396ae456f980f989c731 100644 (file)
--- a/youtube_dl/extractor/fivemin.py
+++ b/youtube_dl/extractor/fivemin.py
@@ -1,23 +1,24 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import (
-    compat_str,
-    compat_urllib_parse,
+    compat_parse_qs,
+    compat_urllib_parse_urlencode,
+    compat_urllib_parse_urlparse,
+    compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
+    parse_duration,
+    replace_extension,
  )
  
  
  class FiveMinIE(InfoExtractor):
      IE_NAME = '5min'
-    _VALID_URL = r'''(?x)
-        (?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?:.*?&)?playList=|
-            https?://(?:(?:massively|www)\.)?joystiq\.com/video/|
-            5min:)
-        (?P<id>\d+)
-        '''
+    _VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))'
  
      _TESTS = [
          {
@@ -28,6 +29,7 @@ class FiveMinIE(InfoExtractor):
                  'id': '518013791',
                  'ext': 'mp4',
                  'title': 'iPad Mini with Retina Display Review',
+                'duration': 177,
              },
          },
          {
@@ -38,47 +40,112 @@ class FiveMinIE(InfoExtractor):
                  'id': '518086247',
                  'ext': 'mp4',
                  'title': 'How to Make a Next-Level Fruit Salad',
+                'duration': 184,
              },
+            'skip': 'no longer available',
          },
      ]
+    _ERRORS = {
+        'ErrorVideoNotExist': 'We\'re sorry, but the video you are trying to watch does not exist.',
+        'ErrorVideoNoLongerAvailable': 'We\'re sorry, but the video you are trying to watch is no longer available.',
+        'ErrorVideoRejected': 'We\'re sorry, but the video you are trying to watch has been removed.',
+        'ErrorVideoUserNotGeo': 'We\'re sorry, but the video you are trying to watch cannot be viewed from your current location.',
+        'ErrorVideoLibraryRestriction': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
+        'ErrorExposurePermission': 'We\'re sorry, but the video you are trying to watch is currently unavailable for viewing at this domain.',
+    }
+    _QUALITIES = {
+        1: {
+            'width': 640,
+            'height': 360,
+        },
+        2: {
+            'width': 854,
+            'height': 480,
+        },
+        4: {
+            'width': 1280,
+            'height': 720,
+        },
+        8: {
+            'width': 1920,
+            'height': 1080,
+        },
+        16: {
+            'width': 640,
+            'height': 360,
+        },
+        32: {
+            'width': 854,
+            'height': 480,
+        },
+        64: {
+            'width': 1280,
+            'height': 720,
+        },
+        128: {
+            'width': 640,
+            'height': 360,
+        },
+    }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        sid = mobj.group('sid')
+
+        if mobj.group('query'):
+            qs = compat_parse_qs(mobj.group('query'))
+            if not qs.get('playList'):
+                raise ExtractorError('Invalid URL', expected=True)
+            video_id = qs['playList'][0]
+            if qs.get('sid'):
+                sid = qs['sid'][0]
+
          embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
-        embed_page = self._download_webpage(embed_url, video_id,
-                                            'Downloading embed page')
-        sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
-        query = compat_urllib_parse.urlencode({
-            'func': 'GetResults',
-            'playlist': video_id,
-            'sid': sid,
-            'isPlayerSeed': 'true',
-            'url': embed_url,
-        })
+        if not sid:
+            embed_page = self._download_webpage(embed_url, video_id,
+                                                'Downloading embed page')
+            sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
+
          response = self._download_json(
-            'https://syn.5min.com/handlers/SenseHandler.ashx?' + query,
+            'https://syn.5min.com/handlers/SenseHandler.ashx?' +
+            compat_urllib_parse_urlencode({
+                'func': 'GetResults',
+                'playlist': video_id,
+                'sid': sid,
+                'isPlayerSeed': 'true',
+                'url': embed_url,
+            }),
              video_id)
          if not response['success']:
-            err_msg = response['errorMessage']
-            if err_msg == 'ErrorVideoUserNotGeo':
-                msg = 'Video not available from your location'
-            else:
-                msg = 'Aol said: %s' % err_msg
-            raise ExtractorError(msg, expected=True, video_id=video_id)
+            raise ExtractorError(
+                '%s said: %s' % (
+                    self.IE_NAME,
+                    self._ERRORS.get(response['errorMessage'], response['errorMessage'])),
+                expected=True)
          info = response['binding'][0]
  
-        second_id = compat_str(int(video_id[:-2]) + 1)
          formats = []
-        for quality, height in [(1, 320), (2, 480), (4, 720), (8, 1080)]:
-            if any(r['ID'] == quality for r in info['Renditions']):
+        parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
+            compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
+        for rendition in info['Renditions']:
+            if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
+                continue
+            else:
+                rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))
+                quality = self._QUALITIES.get(rendition['ID'], {})
                  formats.append({
-                    'format_id': compat_str(quality),
-                    'url': 'http://avideos.5min.com/%s/%s/%s_%s.mp4' % (second_id[-3:], second_id, video_id, quality),
-                    'height': height,
+                    'format_id': '%s-%d' % (rendition['RenditionType'], rendition['ID']),
+                    'url': rendition_url,
+                    'width': quality.get('width'),
+                    'height': quality.get('height'),
                  })
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': info['Title'],
+            'thumbnail': info.get('ThumbURL'),
+            'duration': parse_duration(info.get('Duration')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/fktv.py b/youtube_dl/extractor/fktv.py

index 190d9f9adc292bfc33d2085eb9bd057ec4c95502..a3a2915998dc1cc2fca8f5ccdf6cec6cac0d528b 100644 (file)
--- a/youtube_dl/extractor/fktv.py
+++ b/youtube_dl/extractor/fktv.py
@@ -1,82 +1,51 @@
  from __future__ import unicode_literals
  
-import re
-import random
-import json
-
  from .common import InfoExtractor
  from ..utils import (
-    get_element_by_id,
      clean_html,
+    determine_ext,
+    js_to_json,
  )
  
  
  class FKTVIE(InfoExtractor):
      IE_NAME = 'fernsehkritik.tv'
-    _VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
+    _VALID_URL = r'https?://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
  
      _TEST = {
          'url': 'http://fernsehkritik.tv/folge-1',
+        'md5': '21f0b0c99bce7d5b524eb1b17b1c6d79',
          'info_dict': {
-            'id': '00011',
-            'ext': 'flv',
+            'id': '1',
+            'ext': 'mp4',
              'title': 'Folge 1 vom 10. April 2007',
-            'description': 'md5:fb4818139c7cfe6907d4b83412a6864f',
+            'thumbnail': 're:^https?://.*\.jpg$',
          },
      }
  
      def _real_extract(self, url):
-        episode = int(self._match_id(url))
-
-        video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%s.jpg' % episode
-        start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%s/Start' % episode,
-                                               episode)
-        playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
-                                      'playlist', flags=re.DOTALL)
-        files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
-
-        videos = []
-        for i, _ in enumerate(files, 1):
-            video_id = '%04d%d' % (episode, i)
-            video_url = 'http://fernsehkritik.tv/js/directme.php?file=%s%s.flv' % (episode, '' if i == 1 else '-%d' % i)
-            videos.append({
-                'ext': 'flv',
-                'id': video_id,
-                'url': video_url,
-                'title': clean_html(get_element_by_id('eptitle', start_webpage)),
-                'description': clean_html(get_element_by_id('contentlist', start_webpage)),
-                'thumbnail': video_thumbnail
-            })
-        return {
-            '_type': 'multi_video',
-            'entries': videos,
-            'id': 'folge-%s' % episode,
-        }
-
-
-class FKTVPosteckeIE(InfoExtractor):
-    IE_NAME = 'fernsehkritik.tv:postecke'
-    _VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/inline-video/postecke\.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
-    _TEST = {
-        'url': 'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
-        'md5': '262f0adbac80317412f7e57b4808e5c4',
-        'info_dict': {
-            'id': '0120',
-            'ext': 'flv',
-            'title': 'Postecke 120',
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        episode = int(mobj.group('ep'))
+        episode = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://fernsehkritik.tv/folge-%s/play' % episode, episode)
+        title = clean_html(self._html_search_regex(
+            '<h3>([^<]+)</h3>', webpage, 'title'))
+        thumbnail = self._search_regex(r'POSTER\s*=\s*"([^"]+)', webpage, 'thumbnail', fatal=False)
+        sources = self._parse_json(self._search_regex(r'(?s)MEDIA\s*=\s*(\[.+?\]);', webpage, 'media'), episode, js_to_json)
+
+        formats = []
+        for source in sources:
+            furl = source.get('src')
+            if furl:
+                formats.append({
+                    'url': furl,
+                    'format_id': determine_ext(furl),
+                })
+        self._sort_formats(formats)
  
-        server = random.randint(2, 4)
-        video_id = '%04d' % episode
-        video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
-        video_title = 'Postecke %d' % episode
          return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_title,
+            'id': episode,
+            'title': title,
+            'formats': formats,
+            'thumbnail': thumbnail,
          }
diff --git a/youtube_dl/extractor/flickr.py b/youtube_dl/extractor/flickr.py

index 2fe76d661bb432580cd2bd3f48c85035a4b6d7d9..0a3de14988dc06e92a7a27e52c4c7838caf69b2b 100644 (file)
--- a/youtube_dl/extractor/flickr.py
+++ b/youtube_dl/extractor/flickr.py
@@ -1,67 +1,93 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
+from ..compat import compat_urllib_parse_urlencode
  from ..utils import (
      ExtractorError,
-    find_xpath_attr,
+    int_or_none,
+    qualities,
  )
  
  
  class FlickrIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
+    _VALID_URL = r'https?://(?:www\.|secure\.)?flickr\.com/photos/[\w\-_@]+/(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
-        'md5': '6fdc01adbc89d72fc9c4f15b4a4ba87b',
+        'md5': '164fe3fa6c22e18d448d4d5af2330f31',
          'info_dict': {
              'id': '5645318632',
-            'ext': 'mp4',
-            "description": "Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.",
-            "uploader_id": "forestwander-nature-pictures",
-            "title": "Dark Hollow Waterfalls"
+            'ext': 'mpg',
+            'description': 'Waterfalls in the Springtime at Dark Hollow Waterfalls. These are located just off of Skyline Drive in Virginia. They are only about 6/10 of a mile hike but it is a pretty steep hill and a good climb back up.',
+            'title': 'Dark Hollow Waterfalls',
+            'duration': 19,
+            'timestamp': 1303528740,
+            'upload_date': '20110423',
+            'uploader_id': '10922353@N03',
+            'uploader': 'Forest Wander',
+            'comment_count': int,
+            'view_count': int,
+            'tags': list,
          }
      }
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
+    _API_BASE_URL = 'https://api.flickr.com/services/rest?'
  
-        video_id = mobj.group('id')
-        video_uploader_id = mobj.group('uploader_id')
-        webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id
-        req = compat_urllib_request.Request(webpage_url)
-        req.add_header(
-            'User-Agent',
-            # it needs a more recent version
-            'Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20150101 Firefox/38.0 (Chrome)')
-        webpage = self._download_webpage(req, video_id)
+    def _call_api(self, method, video_id, api_key, note, secret=None):
+        query = {
+            'photo_id': video_id,
+            'method': 'flickr.%s' % method,
+            'api_key': api_key,
+            'format': 'json',
+            'nojsoncallback': 1,
+        }
+        if secret:
+            query['secret'] = secret
+        data = self._download_json(self._API_BASE_URL + compat_urllib_parse_urlencode(query), video_id, note)
+        if data['stat'] != 'ok':
+            raise ExtractorError(data['message'])
+        return data
  
-        secret = self._search_regex(r'secret"\s*:\s*"(\w+)"', webpage, 'secret')
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
  
-        first_url = 'https://secure.flickr.com/apps/video/video_mtl_xml.gne?v=x&photo_id=' + video_id + '&secret=' + secret + '&bitrate=700&target=_self'
-        first_xml = self._download_xml(first_url, video_id, 'Downloading first data webpage')
+        api_key = self._download_json(
+            'https://www.flickr.com/hermes_error_beacon.gne', video_id,
+            'Downloading api key')['site_key']
  
-        node_id = find_xpath_attr(
-            first_xml, './/{http://video.yahoo.com/YEP/1.0/}Item', 'id',
-            'id').text
+        video_info = self._call_api(
+            'photos.getInfo', video_id, api_key, 'Downloading video info')['photo']
+        if video_info['media'] == 'video':
+            streams = self._call_api(
+                'video.getStreamInfo', video_id, api_key,
+                'Downloading streams info', video_info['secret'])['streams']
  
-        second_url = 'https://secure.flickr.com/video_playlist.gne?node_id=' + node_id + '&tech=flash&mode=playlist&bitrate=700&secret=' + secret + '&rd=video.yahoo.com&noad=1'
-        second_xml = self._download_xml(second_url, video_id, 'Downloading second data webpage')
+            preference = qualities(
+                ['288p', 'iphone_wifi', '100', '300', '700', '360p', 'appletv', '720p', '1080p', 'orig'])
  
-        self.report_extraction(video_id)
+            formats = []
+            for stream in streams['stream']:
+                stream_type = str(stream.get('type'))
+                formats.append({
+                    'format_id': stream_type,
+                    'url': stream['_content'],
+                    'preference': preference(stream_type),
+                })
+            self._sort_formats(formats)
  
-        stream = second_xml.find('.//STREAM')
-        if stream is None:
-            raise ExtractorError('Unable to extract video url')
-        video_url = stream.attrib['APP'] + stream.attrib['FULLPATH']
+            owner = video_info.get('owner', {})
  
-        return {
-            'id': video_id,
-            'url': video_url,
-            'ext': 'mp4',
-            'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'uploader_id': video_uploader_id,
-        }
+            return {
+                'id': video_id,
+                'title': video_info['title']['_content'],
+                'description': video_info.get('description', {}).get('_content'),
+                'formats': formats,
+                'timestamp': int_or_none(video_info.get('dateuploaded')),
+                'duration': int_or_none(video_info.get('video', {}).get('duration')),
+                'uploader_id': owner.get('nsid'),
+                'uploader': owner.get('realname'),
+                'comment_count': int_or_none(video_info.get('comments', {}).get('_content')),
+                'view_count': int_or_none(video_info.get('views')),
+                'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])]
+            }
+        else:
+            raise ExtractorError('not a video', expected=True)
diff --git a/youtube_dl/extractor/folketinget.py b/youtube_dl/extractor/folketinget.py

index 0fb29de75228f0133c0b8d54a015fbd5d90954c1..75399fa7d2a3164c67f2d72c24628a861ed77806 100644 (file)
--- a/youtube_dl/extractor/folketinget.py
+++ b/youtube_dl/extractor/folketinget.py
@@ -30,6 +30,10 @@ class FolketingetIE(InfoExtractor):
              'upload_date': '20141120',
              'duration': 3960,
          },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/footyroom.py b/youtube_dl/extractor/footyroom.py

index 4c7dbca4023944fddaf5ff65748e338108774ec8..d2503ae2eff3d2e46497bbcba356af11db665452 100644 (file)
--- a/youtube_dl/extractor/footyroom.py
+++ b/youtube_dl/extractor/footyroom.py
@@ -5,7 +5,7 @@ from .common import InfoExtractor
  
  
  class FootyRoomIE(InfoExtractor):
-    _VALID_URL = r'http://footyroom\.com/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
          'info_dict': {
@@ -13,6 +13,7 @@ class FootyRoomIE(InfoExtractor):
              'title': 'Schalke 04 0 – 2 Real Madrid',
          },
          'playlist_count': 3,
+        'skip': 'Video for this match is not available',
      }, {
          'url': 'http://footyroom.com/georgia-0-2-germany-2015-03/',
          'info_dict': {
diff --git a/youtube_dl/extractor/fourtube.py b/youtube_dl/extractor/fourtube.py

index b2284ab01cad03fa3152fbc0a2edb70df2ab020f..fc4a5a0fbf01801d598e20a9addd29ebef4a298e 100644 (file)
--- a/youtube_dl/extractor/fourtube.py
+++ b/youtube_dl/extractor/fourtube.py
@@ -3,12 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_duration,
      parse_iso8601,
+    sanitized_Request,
      str_to_int,
  )
  
@@ -32,6 +30,7 @@ class FourTubeIE(InfoExtractor):
              'view_count': int,
              'like_count': int,
              'categories': list,
+            'age_limit': 18,
          }
      }
  
@@ -45,10 +44,10 @@ class FourTubeIE(InfoExtractor):
          thumbnail = self._html_search_meta('thumbnailUrl', webpage)
          uploader_id = self._html_search_regex(
              r'<a class="img-avatar" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
-            webpage, 'uploader id')
+            webpage, 'uploader id', fatal=False)
          uploader = self._html_search_regex(
              r'<a class="img-avatar" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
-            webpage, 'uploader')
+            webpage, 'uploader', fatal=False)
  
          categories_html = self._search_regex(
              r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="list">(.*?)</ul>',
@@ -67,13 +66,24 @@ class FourTubeIE(InfoExtractor):
              webpage, 'like count', fatal=False))
          duration = parse_duration(self._html_search_meta('duration', webpage))
  
-        params_js = self._search_regex(
-            r'\$\.ajax\(url,\ opts\);\s*\}\s*\}\)\(([0-9,\[\] ]+)\)',
-            webpage, 'initialization parameters'
-        )
-        params = self._parse_json('[%s]' % params_js, video_id)
-        media_id = params[0]
-        sources = ['%s' % p for p in params[2]]
+        media_id = self._search_regex(
+            r'<button[^>]+data-id=(["\'])(?P<id>\d+)\1[^>]+data-quality=', webpage,
+            'media id', default=None, group='id')
+        sources = [
+            quality
+            for _, quality in re.findall(r'<button[^>]+data-quality=(["\'])(.+?)\1', webpage)]
+        if not (media_id and sources):
+            player_js = self._download_webpage(
+                self._search_regex(
+                    r'<script[^>]id=(["\'])playerembed\1[^>]+src=(["\'])(?P<url>.+?)\2',
+                    webpage, 'player JS', group='url'),
+                video_id, 'Downloading player JS')
+            params_js = self._search_regex(
+                r'\$\.ajax\(url,\ opts\);\s*\}\s*\}\)\(([0-9,\[\] ]+)\)',
+                player_js, 'initialization parameters')
+            params = self._parse_json('[%s]' % params_js, video_id)
+            media_id = params[0]
+            sources = ['%s' % p for p in params[2]]
  
          token_url = 'http://tkn.4tube.com/{0}/desktop/{1}'.format(
              media_id, '+'.join(sources))
@@ -81,7 +91,7 @@ class FourTubeIE(InfoExtractor):
              b'Content-Type': b'application/x-www-form-urlencoded',
              b'Origin': b'http://www.4tube.com',
          }
-        token_req = compat_urllib_request.Request(token_url, b'{}', headers)
+        token_req = sanitized_Request(token_url, b'{}', headers)
          tokens = self._download_json(token_req, video_id)
          formats = [{
              'url': tokens[format]['token'],
diff --git a/youtube_dl/extractor/fox.py b/youtube_dl/extractor/fox.py

new file mode 100644 (file)

index 0000000..95c1abf
--- /dev/null
+++ b/youtube_dl/extractor/fox.py
@@ -0,0 +1,39 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import smuggle_url
+
+
+class FOXIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.fox.com/watch/255180355939/7684182528',
+        'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
+        'info_dict': {
+            'id': '255180355939',
+            'ext': 'mp4',
+            'title': 'Official Trailer: Gotham',
+            'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
+            'duration': 129,
+            'timestamp': 1400020798,
+            'upload_date': '20140513',
+            'uploader': 'NEWA-FNG-FOXCOM',
+        },
+        'add_ie': ['ThePlatform'],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        release_url = self._parse_json(self._search_regex(
+            r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
+            video_id)['release_url'] + '&switch=http'
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(release_url, {'force_smil_url': True}),
+            'id': video_id,
+        }
diff --git a/youtube_dl/extractor/foxgay.py b/youtube_dl/extractor/foxgay.py

index 08b8ea36235993f78ce7a4ba05ac05f255ee4ea7..70c1a815d3121bf048da9510a00abf10dc516126 100644 (file)
--- a/youtube_dl/extractor/foxgay.py
+++ b/youtube_dl/extractor/foxgay.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class FoxgayIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
+    _VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
      _TEST = {
          'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
          'md5': '80d72beab5d04e1655a56ad37afe6841',
diff --git a/youtube_dl/extractor/foxnews.py b/youtube_dl/extractor/foxnews.py

index 917f76b1effb3a2fff9d4f4c17c1cca348280132..b04da2415246974c4959c6baa7745e550c0c9fa4 100644 (file)
--- a/youtube_dl/extractor/foxnews.py
+++ b/youtube_dl/extractor/foxnews.py
@@ -1,14 +1,13 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import (
-    parse_iso8601,
-    int_or_none,
-)
+import re
  
+from .amp import AMPIE
  
-class FoxNewsIE(InfoExtractor):
-    _VALID_URL = r'https?://video\.foxnews\.com/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
+
+class FoxNewsIE(AMPIE):
+    IE_DESC = 'Fox News and Fox Business Video'
+    _VALID_URL = r'https?://(?P<host>video\.fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://video.foxnews.com/v/3937480/frozen-in-time/#sp=show-clips',
@@ -17,7 +16,7 @@ class FoxNewsIE(InfoExtractor):
                  'id': '3937480',
                  'ext': 'flv',
                  'title': 'Frozen in Time',
-                'description': 'Doctors baffled by 16-year-old girl that is the size of a toddler',
+                'description': '16-year-old girl is size of toddler',
                  'duration': 265,
                  'timestamp': 1304411491,
                  'upload_date': '20110503',
@@ -31,64 +30,31 @@ class FoxNewsIE(InfoExtractor):
                  'id': '3922535568001',
                  'ext': 'mp4',
                  'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal",
-                'description': "Congressman discusses the president's executive action",
+                'description': "Congressman discusses president's plan",
                  'duration': 292,
                  'timestamp': 1417662047,
                  'upload_date': '20141204',
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          },
          {
              'url': 'http://video.foxnews.com/v/video-embed.html?video_id=3937480&d=video.foxnews.com',
              'only_matching': True,
          },
+        {
+            'url': 'http://video.foxbusiness.com/v/4442309889001',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        video = self._download_json(
-            'http://video.foxnews.com/v/feed/video/%s.js?template=fox' % video_id, video_id)
-
-        item = video['channel']['item']
-        title = item['title']
-        description = item['description']
-        timestamp = parse_iso8601(item['dc-date'])
-
-        media_group = item['media-group']
-        duration = None
-        formats = []
-        for media in media_group['media-content']:
-            attributes = media['@attributes']
-            video_url = attributes['url']
-            if video_url.endswith('.f4m'):
-                formats.extend(self._extract_f4m_formats(video_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124', video_id))
-            elif video_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(video_url, video_id, 'flv'))
-            elif not video_url.endswith('.smil'):
-                duration = int_or_none(attributes.get('duration'))
-                formats.append({
-                    'url': video_url,
-                    'format_id': media['media-category']['@attributes']['label'],
-                    'preference': 1,
-                    'vbr': int_or_none(attributes.get('bitrate')),
-                    'filesize': int_or_none(attributes.get('fileSize'))
-                })
-        self._sort_formats(formats)
-
-        media_thumbnail = media_group['media-thumbnail']['@attributes']
-        thumbnails = [{
-            'url': media_thumbnail['url'],
-            'width': int_or_none(media_thumbnail.get('width')),
-            'height': int_or_none(media_thumbnail.get('height')),
-        }] if media_thumbnail else []
+        host, video_id = re.match(self._VALID_URL, url).groups()
  
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'duration': duration,
-            'timestamp': timestamp,
-            'formats': formats,
-            'thumbnails': thumbnails,
-        }
+        info = self._extract_feed_info(
+            'http://%s/v/feed/video/%s.js?template=fox' % (host, video_id))
+        info['id'] = video_id
+        return info
diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py

index 1e83a4e7e1eadfb322e7e691848fae462711596c..e2ca962838932f682f0ac833bd64169ccaba8fc5 100644 (file)
--- a/youtube_dl/extractor/franceculture.py
+++ b/youtube_dl/extractor/franceculture.py
@@ -8,6 +8,7 @@ from ..compat import (
  from ..utils import (
      determine_ext,
      int_or_none,
+    ExtractorError,
  )
  
  
@@ -22,14 +23,13 @@ class FranceCultureIE(InfoExtractor):
              'alt_title': 'Carnet nomade | 13-14',
              'vcodec': 'none',
              'upload_date': '20140301',
-            'thumbnail': r're:^http://www\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
-            'description': 'startswith:Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche des « geeks », une enquête menée aux Etats',
+            'thumbnail': r're:^http://static\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
+            'description': 'startswith:Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche',
              'timestamp': 1393700400,
          }
      }
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
+    def _extract_from_player(self, url, video_id):
          webpage = self._download_webpage(url, video_id)
  
          video_path = self._search_regex(
@@ -42,6 +42,9 @@ class FranceCultureIE(InfoExtractor):
              r'<a id="player".*?>\s+<img src="([^"]+)"',
              webpage, 'thumbnail', fatal=False)
  
+        display_id = self._search_regex(
+            r'<span class="path-diffusion">emission-(.*?)</span>', webpage, 'display_id')
+
          title = self._html_search_regex(
              r'<span class="title-diffusion">(.*?)</span>', webpage, 'title')
          alt_title = self._html_search_regex(
@@ -66,4 +69,37 @@ class FranceCultureIE(InfoExtractor):
              'alt_title': alt_title,
              'thumbnail': thumbnail,
              'description': description,
+            'display_id': display_id,
          }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self._extract_from_player(url, video_id)
+
+
+class FranceCultureEmissionIE(FranceCultureIE):
+    _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/emission-(?P<id>[^?#]+)'
+    _TEST = {
+        'url': 'http://www.franceculture.fr/emission-les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
+        'info_dict': {
+            'title': 'Jean-Gabriel Périot, cinéaste',
+            'alt_title': 'Les Carnets de la création',
+            'id': '5093239',
+            'display_id': 'les-carnets-de-la-creation-jean-gabriel-periot-cineaste-2015-10-13',
+            'ext': 'mp3',
+            'timestamp': 1444762500,
+            'upload_date': '20151013',
+            'description': 'startswith:Aujourd\'hui dans "Les carnets de la création", le cinéaste',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        video_path = self._html_search_regex(
+            r'<a class="rf-player-open".*?href="([^"]+)"', webpage, 'video path', 'no_path_player')
+        if video_path == 'no_path_player':
+            raise ExtractorError('no player : no sound in this page.', expected=True)
+        new_id = self._search_regex('play=(?P<id>[0-9]+)', video_path, 'new_id', group='id')
+        video_url = compat_urlparse.urljoin(url, video_path)
+        return self._extract_from_player(video_url, new_id)
diff --git a/youtube_dl/extractor/franceinter.py b/youtube_dl/extractor/franceinter.py

index 6613ee17acee4a3fade5470d17196864f4ccae29..2369f868da4a39b1cf84c7cee6a5830859484082 100644 (file)
--- a/youtube_dl/extractor/franceinter.py
+++ b/youtube_dl/extractor/franceinter.py
@@ -1,18 +1,16 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import int_or_none
  
  
  class FranceInterIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
          'md5': '4764932e466e6f6c79c317d2e74f6884',
-        "info_dict": {
+        'info_dict': {
              'id': '793962',
              'ext': 'mp3',
              'title': 'L’Histoire dans les jeux vidéo',
@@ -23,8 +21,7 @@ class FranceInterIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
  
@@ -33,7 +30,7 @@ class FranceInterIE(InfoExtractor):
          video_url = 'http://www.franceinter.fr/' + path
  
          title = self._html_search_regex(
-            r'<span class="title">(.+?)</span>', webpage, 'title')
+            r'<span class="title-diffusion">(.+?)</span>', webpage, 'title')
          description = self._html_search_regex(
              r'<span class="description">(.*?)</span>',
              webpage, 'description', fatal=False)
diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py

index 75723c00dc9e96c018e3b6771e634ff93c293ba1..ad94e31f346cc97cd71ad1be9f6983a16b6df209 100644 (file)
--- a/youtube_dl/extractor/francetv.py
+++ b/youtube_dl/extractor/francetv.py
@@ -60,61 +60,91 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
                      video_id, 'Downloading f4m manifest token', fatal=False)
                  if f4m_url:
                      formats.extend(self._extract_f4m_formats(
-                        f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id))
+                        f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
+                        video_id, f4m_id=format_id, fatal=False))
              elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id))
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id=format_id, fatal=False))
              elif video_url.startswith('rtmp'):
                  formats.append({
                      'url': video_url,
                      'format_id': 'rtmp-%s' % format_id,
                      'ext': 'flv',
-                    'preference': 1,
                  })
              else:
-                formats.append({
-                    'url': video_url,
-                    'format_id': format_id,
-                    'preference': -1,
-                })
+                if self._is_valid_url(video_url, video_id, format_id):
+                    formats.append({
+                        'url': video_url,
+                        'format_id': format_id,
+                    })
          self._sort_formats(formats)
  
+        title = info['titre']
+        subtitle = info.get('sous_titre')
+        if subtitle:
+            title += ' - %s' % subtitle
+        title = title.strip()
+
+        subtitles = {}
+        subtitles_list = [{
+            'url': subformat['url'],
+            'ext': subformat.get('format'),
+        } for subformat in info.get('subtitles', []) if subformat.get('url')]
+        if subtitles_list:
+            subtitles['fr'] = subtitles_list
+
          return {
              'id': video_id,
-            'title': info['titre'],
+            'title': title,
              'description': clean_html(info['synopsis']),
              'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']),
              'duration': int_or_none(info.get('real_duration')) or parse_duration(info['duree']),
              'timestamp': int_or_none(info['diffusion']['timestamp']),
              'formats': formats,
+            'subtitles': subtitles,
          }
  
  
  class PluzzIE(FranceTVBaseInfoExtractor):
      IE_NAME = 'pluzz.francetv.fr'
-    _VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
+    _VALID_URL = r'https?://(?:m\.)?pluzz\.francetv\.fr/videos/(?P<id>.+?)\.html'
  
      # Can't use tests, videos expire in 7 days
  
      def _real_extract(self, url):
-        title = re.match(self._VALID_URL, url).group(1)
-        webpage = self._download_webpage(url, title)
-        video_id = self._search_regex(
-            r'data-diffusion="(\d+)"', webpage, 'ID')
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._html_search_meta(
+            'id_video', webpage, 'video id', default=None)
+        if not video_id:
+            video_id = self._search_regex(
+                r'data-diffusion=["\'](\d+)', webpage, 'video id')
+
          return self._extract_video(video_id, 'Pluzz')
  
  
  class FranceTvInfoIE(FranceTVBaseInfoExtractor):
      IE_NAME = 'francetvinfo.fr'
-    _VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
+    _VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
  
      _TESTS = [{
          'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
          'info_dict': {
              'id': '84981923',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Soir 3',
              'upload_date': '20130826',
              'timestamp': 1377548400,
+            'subtitles': {
+                'fr': 'mincount:2',
+            },
+        },
+        'params': {
+            # m3u8 downloads
+            'skip_download': True,
          },
      }, {
          'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
@@ -132,11 +162,32 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
          'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
          'md5': 'f485bda6e185e7d15dbc69b72bae993e',
          'info_dict': {
-            'id': '556e03339473995ee145930c',
+            'id': 'NI_173343',
              'ext': 'mp4',
              'title': 'Les entreprises familiales : le secret de la réussite',
              'thumbnail': 're:^https?://.*\.jpe?g$',
-        }
+            'timestamp': 1433273139,
+            'upload_date': '20150602',
+        },
+        'params': {
+            # m3u8 downloads
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://france3-regions.francetvinfo.fr/bretagne/cotes-d-armor/thalassa-echappee-breizh-ce-venredi-dans-les-cotes-d-armor-954961.html',
+        'md5': 'f485bda6e185e7d15dbc69b72bae993e',
+        'info_dict': {
+            'id': 'NI_657393',
+            'ext': 'mp4',
+            'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de l’Armor"',
+            'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
+            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'timestamp': 1458300695,
+            'upload_date': '20160318',
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
@@ -149,7 +200,9 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
              return self.url_result(dmcloud_url, 'DailymotionCloud')
  
          video_id, catalogue = self._search_regex(
-            r'id-video=([^@]+@[^"]+)', webpage, 'video id').split('@')
+            (r'id-video=([^@]+@[^"]+)',
+             r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
+            webpage, 'video id').split('@')
          return self._extract_video(video_id, catalogue)
  
  
@@ -214,15 +267,15 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
          },
          # france5
          {
-            'url': 'http://www.france5.fr/emissions/c-a-dire/videos/92837968',
-            'md5': '78f0f4064f9074438e660785bbf2c5d9',
+            'url': 'http://www.france5.fr/emissions/c-a-dire/videos/quels_sont_les_enjeux_de_cette_rentree_politique__31-08-2015_908948?onglet=tous&page=1',
+            'md5': 'f6c577df3806e26471b3d21631241fd0',
              'info_dict': {
-                'id': '108961659',
+                'id': '123327454',
                  'ext': 'flv',
-                'title': 'C à dire ?!',
-                'description': 'md5:1a4aeab476eb657bf57c4ff122129f81',
-                'upload_date': '20140915',
-                'timestamp': 1410795000,
+                'title': 'C à dire ?! - Quels sont les enjeux de cette rentrée politique ?',
+                'description': 'md5:4a0d5cb5dce89d353522a84462bae5a4',
+                'upload_date': '20150831',
+                'timestamp': 1441035120,
              },
          },
          # franceo
@@ -266,7 +319,7 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
          video_id, catalogue = self._html_search_regex(
-            r'href="http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
+            r'(?:href=|player\.setVideo\(\s*)"http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
              webpage, 'video ID').split('@')
          return self._extract_video(video_id, catalogue)
  
diff --git a/youtube_dl/extractor/freespeech.py b/youtube_dl/extractor/freespeech.py

index c210177f7297e174d38988a2e62f379a9a478305..1477708bbec14c38bf0db7801d09d68a22ff1546 100644 (file)
--- a/youtube_dl/extractor/freespeech.py
+++ b/youtube_dl/extractor/freespeech.py
@@ -14,7 +14,7 @@ class FreespeechIE(InfoExtractor):
          'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',
          'info_dict': {
              'id': 'poKsVCZ64uU',
-            'ext': 'mp4',
+            'ext': 'webm',
              'title': 'Obama, Romney Campaign in Colorado Ahead of Debate',
              'description': 'Obama, Romney Campaign in Colorado Ahead of Debate',
              'uploader': 'freespeechtv',
diff --git a/youtube_dl/extractor/freevideo.py b/youtube_dl/extractor/freevideo.py

index f755e3c4a7b127d0947abd5f6319b0b0d2256bb7..cd8423a6faff4431c310e1258396bbb58dc92a14 100644 (file)
--- a/youtube_dl/extractor/freevideo.py
+++ b/youtube_dl/extractor/freevideo.py
@@ -5,15 +5,15 @@ from ..utils import ExtractorError
  
  
  class FreeVideoIE(InfoExtractor):
-    _VALID_URL = r'^http://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
+    _VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
  
      _TEST = {
          'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html',
          'info_dict': {
              'id': 'vysukany-zadecek-22033',
              'ext': 'mp4',
-            "title": "vysukany-zadecek-22033",
-            "age_limit": 18,
+            'title': 'vysukany-zadecek-22033',
+            'age_limit': 18,
          },
          'skip': 'Blocked outside .cz',
      }
diff --git a/youtube_dl/extractor/funimation.py b/youtube_dl/extractor/funimation.py

new file mode 100644 (file)

index 0000000..1eb528f
--- /dev/null
+++ b/youtube_dl/extractor/funimation.py
@@ -0,0 +1,190 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    clean_html,
+    determine_ext,
+    int_or_none,
+    sanitized_Request,
+    ExtractorError,
+    urlencode_postdata
+)
+
+
+class FunimationIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?funimation\.com/shows/[^/]+/videos/(?:official|promotional)/(?P<id>[^/?#&]+)'
+
+    _NETRC_MACHINE = 'funimation'
+
+    _TESTS = [{
+        'url': 'http://www.funimation.com/shows/air/videos/official/breeze',
+        'info_dict': {
+            'id': '658',
+            'display_id': 'breeze',
+            'ext': 'mp4',
+            'title': 'Air - 1 - Breeze',
+            'description': 'md5:1769f43cd5fc130ace8fd87232207892',
+            'thumbnail': 're:https?://.*\.jpg',
+        },
+    }, {
+        'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
+        'info_dict': {
+            'id': '31128',
+            'display_id': 'role-play',
+            'ext': 'mp4',
+            'title': '.hack//SIGN - 1 - Role Play',
+            'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
+            'thumbnail': 're:https?://.*\.jpg',
+        },
+    }, {
+        'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview',
+        'info_dict': {
+            'id': '9635',
+            'display_id': 'broadcast-dub-preview',
+            'ext': 'mp4',
+            'title': 'Attack on Titan: Junior High - Broadcast Dub Preview',
+            'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
+            'thumbnail': 're:https?://.*\.(?:jpg|png)',
+        },
+    }]
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+        data = urlencode_postdata({
+            'email_field': username,
+            'password_field': password,
+        })
+        login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
+            'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
+            'Content-Type': 'application/x-www-form-urlencoded'
+        })
+        login_page = self._download_webpage(
+            login_request, None, 'Logging in as %s' % username)
+        if any(p in login_page for p in ('funimation.com/logout', '>Log Out<')):
+            return
+        error = self._html_search_regex(
+            r'(?s)<div[^>]+id=["\']errorMessages["\'][^>]*>(.+?)</div>',
+            login_page, 'error messages', default=None)
+        if error:
+            raise ExtractorError('Unable to login: %s' % error, expected=True)
+        raise ExtractorError('Unable to log in')
+
+    def _real_initialize(self):
+        self._login()
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        errors = []
+        formats = []
+
+        ERRORS_MAP = {
+            'ERROR_MATURE_CONTENT_LOGGED_IN': 'matureContentLoggedIn',
+            'ERROR_MATURE_CONTENT_LOGGED_OUT': 'matureContentLoggedOut',
+            'ERROR_SUBSCRIPTION_LOGGED_OUT': 'subscriptionLoggedOut',
+            'ERROR_VIDEO_EXPIRED': 'videoExpired',
+            'ERROR_TERRITORY_UNAVAILABLE': 'territoryUnavailable',
+            'SVODBASIC_SUBSCRIPTION_IN_PLAYER': 'basicSubscription',
+            'SVODNON_SUBSCRIPTION_IN_PLAYER': 'nonSubscription',
+            'ERROR_PLAYER_NOT_RESPONDING': 'playerNotResponding',
+            'ERROR_UNABLE_TO_CONNECT_TO_CDN': 'unableToConnectToCDN',
+            'ERROR_STREAM_NOT_FOUND': 'streamNotFound',
+        }
+
+        USER_AGENTS = (
+            # PC UA is served with m3u8 that provides some bonus lower quality formats
+            ('pc', 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'),
+            # Mobile UA allows to extract direct links and also does not fail when
+            # PC UA fails with hulu error (e.g.
+            # http://www.funimation.com/shows/hacksign/videos/official/role-play)
+            ('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
+        )
+
+        for kind, user_agent in USER_AGENTS:
+            request = sanitized_Request(url)
+            request.add_header('User-Agent', user_agent)
+            webpage = self._download_webpage(
+                request, display_id, 'Downloading %s webpage' % kind)
+
+            playlist = self._parse_json(
+                self._search_regex(
+                    r'var\s+playersData\s*=\s*(\[.+?\]);\n',
+                    webpage, 'players data'),
+                display_id)[0]['playlist']
+
+            items = next(item['items'] for item in playlist if item.get('items'))
+            item = next(item for item in items if item.get('itemAK') == display_id)
+
+            error_messages = {}
+            video_error_messages = self._search_regex(
+                r'var\s+videoErrorMessages\s*=\s*({.+?});\n',
+                webpage, 'error messages', default=None)
+            if video_error_messages:
+                error_messages_json = self._parse_json(video_error_messages, display_id, fatal=False)
+                if error_messages_json:
+                    for _, error in error_messages_json.items():
+                        type_ = error.get('type')
+                        description = error.get('description')
+                        content = error.get('content')
+                        if type_ == 'text' and description and content:
+                            error_message = ERRORS_MAP.get(description)
+                            if error_message:
+                                error_messages[error_message] = content
+
+            for video in item.get('videoSet', []):
+                auth_token = video.get('authToken')
+                if not auth_token:
+                    continue
+                funimation_id = video.get('FUNImationID') or video.get('videoId')
+                preference = 1 if video.get('languageMode') == 'dub' else 0
+                if not auth_token.startswith('?'):
+                    auth_token = '?%s' % auth_token
+                for quality, height in (('sd', 480), ('hd', 720), ('hd1080', 1080)):
+                    format_url = video.get('%sUrl' % quality)
+                    if not format_url:
+                        continue
+                    if not format_url.startswith(('http', '//')):
+                        errors.append(format_url)
+                        continue
+                    if determine_ext(format_url) == 'm3u8':
+                        formats.extend(self._extract_m3u8_formats(
+                            format_url + auth_token, display_id, 'mp4', entry_protocol='m3u8_native',
+                            preference=preference, m3u8_id='%s-hls' % funimation_id, fatal=False))
+                    else:
+                        tbr = int_or_none(self._search_regex(
+                            r'-(\d+)[Kk]', format_url, 'tbr', default=None))
+                        formats.append({
+                            'url': format_url + auth_token,
+                            'format_id': '%s-http-%dp' % (funimation_id, height),
+                            'height': height,
+                            'tbr': tbr,
+                            'preference': preference,
+                        })
+
+        if not formats and errors:
+            raise ExtractorError(
+                '%s returned error: %s'
+                % (self.IE_NAME, clean_html(error_messages.get(errors[0], errors[0]))),
+                expected=True)
+
+        self._sort_formats(formats)
+
+        title = item['title']
+        artist = item.get('artist')
+        if artist:
+            title = '%s - %s' % (artist, title)
+        description = self._og_search_description(webpage) or item.get('description')
+        thumbnail = self._og_search_thumbnail(webpage) or item.get('posterUrl')
+        video_id = item.get('itemId') or display_id
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

index dd87257c465983dcda30a6faf5dbd7bc0950560c..8c5ffc9e84cec305e9fc813a6366b360b7e36230 100644 (file)
--- a/youtube_dl/extractor/funnyordie.py
+++ b/youtube_dl/extractor/funnyordie.py
@@ -45,15 +45,22 @@ class FunnyOrDieIE(InfoExtractor):
  
          links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
  
-        bitrates = self._html_search_regex(r'<source src="[^"]+/v,((?:\d+,)+)\.mp4\.csmil', webpage, 'video bitrates')
-        bitrates = [int(b) for b in bitrates.rstrip(',').split(',')]
-        bitrates.sort()
+        m3u8_url = self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
+            webpage, 'm3u8 url', group='url')
  
          formats = []
+
+        formats.extend(self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)[,/]', m3u8_url)]
+        bitrates.sort()
+
          for bitrate in bitrates:
              for link in links:
                  formats.append({
-                    'url': '%s%d.%s' % (link[0], bitrate, link[1]),
+                    'url': self._proto_relative_url('%s%d.%s' % (link[0], bitrate, link[1])),
                      'format_id': '%s-%d' % (link[1], bitrate),
                      'vbr': bitrate,
                  })
diff --git a/youtube_dl/extractor/gameinformer.py b/youtube_dl/extractor/gameinformer.py

new file mode 100644 (file)

index 0000000..a66e309
--- /dev/null
+++ b/youtube_dl/extractor/gameinformer.py
@@ -0,0 +1,28 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class GameInformerIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
+    _TEST = {
+        'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
+        'md5': '292f26da1ab4beb4c9099f1304d2b071',
+        'info_dict': {
+            'id': '4515472681001',
+            'ext': 'mp4',
+            'title': 'Replay - Animal Crossing',
+            'description': 'md5:2e211891b215c85d061adc7a4dd2d930',
+            'timestamp': 1443457610,
+            'upload_date': '20150928',
+            'uploader_id': '694940074001',
+        },
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        brightcove_id = self._search_regex(r"getVideo\('[^']+video_id=(\d+)", webpage, 'brightcove id')
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/gamekings.py b/youtube_dl/extractor/gamekings.py

index 027f55eb2b7338dd2459c8128d587c0ac858434e..cbcddcb7cd116a05a8b294c06988586aec955c51 100644 (file)
--- a/youtube_dl/extractor/gamekings.py
+++ b/youtube_dl/extractor/gamekings.py
@@ -6,24 +6,29 @@ from ..utils import (
      xpath_text,
      xpath_with_ns,
  )
+from .youtube import YoutubeIE
  
  
  class GamekingsIE(InfoExtractor):
-    _VALID_URL = r'http://www\.gamekings\.tv/(?:videos|nieuws)/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://www\.gamekings\.nl/(?:videos|nieuws)/(?P<id>[^/]+)'
      _TESTS = [{
-        'url': 'http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
-        # MD5 is flaky, seems to change regularly
-        # 'md5': '2f32b1f7b80fdc5cb616efb4f387f8a3',
+        # YouTube embed video
+        'url': 'http://www.gamekings.nl/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
+        'md5': '5208d3a17adeaef829a7861887cb9029',
          'info_dict': {
-            'id': 'phoenix-wright-ace-attorney-dual-destinies-review',
+            'id': 'HkSQKetlGOU',
              'ext': 'mp4',
-            'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review',
-            'description': 'md5:36fd701e57e8c15ac8682a2374c99731',
+            'title': 'Phoenix Wright: Ace Attorney - Dual Destinies Review',
+            'description': 'md5:db88c0e7f47e9ea50df3271b9dc72e1d',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader_id': 'UCJugRGo4STYMeFr5RoOShtQ',
+            'uploader': 'Gamekings Vault',
+            'upload_date': '20151123',
          },
+        'add_ie': ['Youtube'],
      }, {
          # vimeo video
-        'url': 'http://www.gamekings.tv/videos/the-legend-of-zelda-majoras-mask/',
+        'url': 'http://www.gamekings.nl/videos/the-legend-of-zelda-majoras-mask/',
          'md5': '12bf04dfd238e70058046937657ea68d',
          'info_dict': {
              'id': 'the-legend-of-zelda-majoras-mask',
@@ -33,7 +38,7 @@ class GamekingsIE(InfoExtractor):
              'thumbnail': 're:^https?://.*\.jpg$',
          },
      }, {
-        'url': 'http://www.gamekings.tv/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
+        'url': 'http://www.gamekings.nl/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
          'only_matching': True,
      }]
  
@@ -43,7 +48,11 @@ class GamekingsIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          playlist_id = self._search_regex(
-            r'gogoVideo\(\s*\d+\s*,\s*"([^"]+)', webpage, 'playlist id')
+            r'gogoVideo\([^,]+,\s*"([^"]+)', webpage, 'playlist id')
+
+        # Check if a YouTube embed is used
+        if YoutubeIE.suitable(playlist_id):
+            return self.url_result(playlist_id, ie='Youtube')
  
          playlist = self._download_xml(
              'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
diff --git a/youtube_dl/extractor/gamespot.py b/youtube_dl/extractor/gamespot.py

index b3f1bafcc37ee98f1c5b89a644909f3ee0a32049..4ffdd75157486957810f718cb1019cdc5dd80f4f 100644 (file)
--- a/youtube_dl/extractor/gamespot.py
+++ b/youtube_dl/extractor/gamespot.py
@@ -14,7 +14,7 @@ from ..utils import (
  
  
  class GameSpotIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
+    _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
      _TESTS = [{
          'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
          'md5': 'b2a30deaa8654fcccd43713a6b6a4825',
diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py

index 590ccf5266d61e67772a1276a83bfdb6919abc63..69058a5835f2bac0d1e56ce0917909df0fb9a92b 100644 (file)
--- a/youtube_dl/extractor/gamestar.py
+++ b/youtube_dl/extractor/gamestar.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  
  class GameStarIE(InfoExtractor):
-    _VALID_URL = r'http://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
+    _VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
      _TEST = {
          'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
          'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
diff --git a/youtube_dl/extractor/gametrailers.py b/youtube_dl/extractor/gametrailers.py

index a6ab795aef1bab4a56b2655515983aed35886a77..1e7948ab816f5b08ee6dbeb39de1d5f50fbdf314 100644 (file)
--- a/youtube_dl/extractor/gametrailers.py
+++ b/youtube_dl/extractor/gametrailers.py
@@ -1,19 +1,62 @@
  from __future__ import unicode_literals
  
-from .mtv import MTVServicesInfoExtractor
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+    url_basename,
+)
  
  
-class GametrailersIE(MTVServicesInfoExtractor):
-    _VALID_URL = r'http://www\.gametrailers\.com/(?P<type>videos|reviews|full-episodes)/(?P<id>.*?)/(?P<title>.*)'
+class GametrailersIE(InfoExtractor):
+    _VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
+
      _TEST = {
-        'url': 'http://www.gametrailers.com/videos/zbvr8i/mirror-s-edge-2-e3-2013--debut-trailer',
-        'md5': '4c8e67681a0ea7ec241e8c09b3ea8cf7',
+        'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
+        'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
          'info_dict': {
-            'id': '70e9a5d7-cf25-4a10-9104-6f3e7342ae0d',
+            'id': '2983958',
              'ext': 'mp4',
-            'title': 'E3 2013: Debut Trailer',
-            'description': 'Faith is back!  Check out the World Premiere trailer for Mirror\'s Edge 2 straight from the EA Press Conference at E3 2013!',
+            'display_id': '116437-Just-Cause-3-Review',
+            'title': 'Just Cause 3 - Review',
+            'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
          },
      }
  
-    _FEED_URL = 'http://www.gametrailers.com/feeds/mrss'
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        title = self._html_search_regex(
+            r'<title>(.+?)\|', webpage, 'title').strip()
+        embed_url = self._proto_relative_url(
+            self._search_regex(
+                r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
+                'embed url'),
+            scheme='http:')
+        video_id = url_basename(embed_url)
+        embed_page = self._download_webpage(embed_url, video_id)
+        embed_vars_json = self._search_regex(
+            r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
+            'embed vars')
+        info = self._parse_json(embed_vars_json, video_id)
+
+        formats = []
+        for media in info['media']:
+            if media['mediaPurpose'] == 'play':
+                formats.append({
+                    'url': media['uri'],
+                    'height': media['height'],
+                    'width:': media['width'],
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': info.get('thumbUri'),
+            'description': self._og_search_description(webpage),
+            'duration': int_or_none(info.get('videoLengthInSeconds')),
+            'age_limit': parse_age_limit(info.get('audienceRating')),
+        }
diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py

index ea32b621c390c390e22ebf8a6010304466700a4a..18ef5c252a9adc0ac2a1e6ae6806d2ea9b5b2546 100644 (file)
--- a/youtube_dl/extractor/gazeta.py
+++ b/youtube_dl/extractor/gazeta.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  
  class GazetaIE(InfoExtractor):
-    _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
+    _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
      _TESTS = [{
          'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
          'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
              'description': 'md5:38617526050bd17b234728e7f9620a71',
              'thumbnail': 're:^https?://.*\.jpg',
          },
+        'skip': 'video not found',
      }, {
          'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
          'only_matching': True,
+    }, {
+        'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
+        'md5': '37f19f78355eb2f4256ee1688359f24c',
+        'info_dict': {
+            'id': '252048',
+            'ext': 'mp4',
+            'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
+        },
+        'add_ie': ['EaglePlatform'],
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/gdcvault.py b/youtube_dl/extractor/gdcvault.py

index 43f916412d9b97f3ca93cea830e5390bdcc70db0..3136427db39a2f1739fa0a791bd2cc85f1eedd02 100644 (file)
--- a/youtube_dl/extractor/gdcvault.py
+++ b/youtube_dl/extractor/gdcvault.py
@@ -3,11 +3,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..utils import (
+    HEADRequest,
+    sanitized_Request,
+    urlencode_postdata,
  )
-from ..utils import remove_end
  
  
  class GDCVaultIE(InfoExtractor):
@@ -50,53 +50,33 @@ class GDCVaultIE(InfoExtractor):
          {
              'url': 'http://gdcvault.com/play/1020791/',
              'only_matching': True,
-        }
+        },
+        {
+            # Hard-coded hostname
+            'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
+            'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
+            'info_dict': {
+                'id': '1023460',
+                'ext': 'mp4',
+                'display_id': 'Tenacious-Design-and-The-Interface',
+                'title': 'Tenacious Design and The Interface of \'Destiny\'',
+            },
+        },
+        {
+            # Multiple audios
+            'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
+            'info_dict': {
+                'id': '1014631',
+                'ext': 'flv',
+                'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
+            },
+            'params': {
+                'skip_download': True,  # Requires rtmpdump
+                'format': 'jp',  # The japanese audio
+            }
+        },
      ]
  
-    def _parse_mp4(self, xml_description):
-        video_formats = []
-        mp4_video = xml_description.find('./metadata/mp4video')
-        if mp4_video is None:
-            return None
-
-        mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
-        video_root = mobj.group('root')
-        formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
-        for format in formats:
-            mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
-            url = video_root + mobj.group('path')
-            vbr = format.find('bitrate').text
-            video_formats.append({
-                'url': url,
-                'vbr': int(vbr),
-            })
-        return video_formats
-
-    def _parse_flv(self, xml_description):
-        video_formats = []
-        akamai_url = xml_description.find('./metadata/akamaiHost').text
-        slide_video_path = xml_description.find('./metadata/slideVideo').text
-        video_formats.append({
-            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
-            'play_path': remove_end(slide_video_path, '.flv'),
-            'ext': 'flv',
-            'format_note': 'slide deck video',
-            'quality': -2,
-            'preference': -2,
-            'format_id': 'slides',
-        })
-        speaker_video_path = xml_description.find('./metadata/speakerVideo').text
-        video_formats.append({
-            'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
-            'play_path': remove_end(speaker_video_path, '.flv'),
-            'ext': 'flv',
-            'format_note': 'speaker video',
-            'quality': -1,
-            'preference': -1,
-            'format_id': 'speaker',
-        })
-        return video_formats
-
      def _login(self, webpage_url, display_id):
          (username, password) = self._get_login_info()
          if username is None or password is None:
@@ -112,7 +92,7 @@ class GDCVaultIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(login_url, compat_urllib_parse.urlencode(login_form))
+        request = sanitized_Request(login_url, urlencode_postdata(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self._download_webpage(request, display_id, 'Logging in')
          start_page = self._download_webpage(webpage_url, display_id, 'Getting authenticated video page')
@@ -133,22 +113,25 @@ class GDCVaultIE(InfoExtractor):
              r's1\.addVariable\("file",\s*encodeURIComponent\("(/[^"]+)"\)\);',
              start_page, 'url', default=None)
          if direct_url:
-            video_url = 'http://www.gdcvault.com/' + direct_url
              title = self._html_search_regex(
                  r'<td><strong>Session Name</strong></td>\s*<td>(.*?)</td>',
                  start_page, 'title')
+            video_url = 'http://www.gdcvault.com' + direct_url
+            # resolve the url so that we can detect the correct extension
+            head = self._request_webpage(HEADRequest(video_url), video_id)
+            video_url = head.geturl()
  
              return {
                  'id': video_id,
                  'display_id': display_id,
                  'url': video_url,
-                'ext': 'flv',
                  'title': title,
              }
  
+        PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/player.*?\.html.*?".*?</iframe>'
+
          xml_root = self._html_search_regex(
-            r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
-            start_page, 'xml root', default=None)
+            PLAYER_REGEX, start_page, 'xml root', default=None)
          if xml_root is None:
              # Probably need to authenticate
              login_res = self._login(webpage_url, display_id)
@@ -158,27 +141,21 @@ class GDCVaultIE(InfoExtractor):
                  start_page = login_res
                  # Grab the url from the authenticated page
                  xml_root = self._html_search_regex(
-                    r'<iframe src="(.*?)player.html.*?".*?</iframe>',
-                    start_page, 'xml root')
+                    PLAYER_REGEX, start_page, 'xml root')
  
          xml_name = self._html_search_regex(
              r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
              start_page, 'xml filename', default=None)
          if xml_name is None:
              # Fallback to the older format
-            xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
-
-        xml_decription_url = xml_root + 'xml/' + xml_name
-        xml_description = self._download_xml(xml_decription_url, display_id)
-
-        video_title = xml_description.find('./metadata/title').text
-        video_formats = self._parse_mp4(xml_description)
-        if video_formats is None:
-            video_formats = self._parse_flv(xml_description)
+            xml_name = self._html_search_regex(
+                r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
+                start_page, 'xml filename')
  
          return {
+            '_type': 'url_transparent',
              'id': video_id,
              'display_id': display_id,
-            'title': video_title,
-            'formats': video_formats,
+            'url': '%s/xml/%s' % (xml_root, xml_name),
+            'ie_key': 'DigitallySpeaking',
          }
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index 8cef61c3c9a235a6d3f3230b8517222334dc0fbc..c63bdbd08df14e3944e57ede59f22e1b8717d5f6 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -4,12 +4,13 @@ from __future__ import unicode_literals
  
  import os
  import re
+import sys
  
  from .common import InfoExtractor
  from .youtube import YoutubeIE
  from ..compat import (
+    compat_etree_fromstring,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
      compat_urlparse,
      compat_xml_parse_error,
  )
@@ -20,7 +21,7 @@ from ..utils import (
      HEADRequest,
      is_html,
      orderedSet,
-    parse_xml,
+    sanitized_Request,
      smuggle_url,
      unescapeHTML,
      unified_strdate,
@@ -29,7 +30,10 @@ from ..utils import (
      url_basename,
      xpath_text,
  )
-from .brightcove import BrightcoveIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
  from .nbc import NBCSportsVPlayerIE
  from .ooyala import OoyalaIE
  from .rutv import RUTVIE
@@ -40,14 +44,23 @@ from .myvi import MyviIE
  from .condenast import CondeNastIE
  from .udn import UDNEmbedIE
  from .senateisvp import SenateISVPIE
-from .bliptv import BlipTVIE
  from .svt import SVTIE
  from .pornhub import PornHubIE
  from .xhamster import XHamsterEmbedIE
+from .tnaflix import TNAFlixNetworkEmbedIE
  from .vimeo import VimeoIE
  from .dailymotion import DailymotionCloudIE
  from .onionstudios import OnionStudiosIE
  from .snagfilms import SnagFilmsEmbedIE
+from .screenwavemedia import ScreenwaveMediaIE
+from .mtv import MTVServicesEmbeddedIE
+from .pladform import PladformIE
+from .videomore import VideomoreIE
+from .googledrive import GoogleDriveIE
+from .jwplatform import JWPlatformIE
+from .digiteka import DigitekaIE
+from .instagram import InstagramIE
+from .liveleak import LiveLeakIE
  
  
  class GenericIE(InfoExtractor):
@@ -92,7 +105,8 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,  # infinite live stream
              },
              'expected_warnings': [
-                r'501.*Not Implemented'
+                r'501.*Not Implemented',
+                r'400.*Bad Request',
              ],
          },
          # Direct link with incorrect MIME type
@@ -130,6 +144,134 @@ class GenericIE(InfoExtractor):
                  'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624',
              }
          },
+        # SMIL from http://videolectures.net/promogram_igor_mekjavic_eng
+        {
+            'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml',
+            'info_dict': {
+                'id': 'smil',
+                'ext': 'mp4',
+                'title': 'Automatics, robotics and biocybernetics',
+                'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
+                'upload_date': '20130627',
+                'formats': 'mincount:16',
+                'subtitles': 'mincount:1',
+            },
+            'params': {
+                'force_generic_extractor': True,
+                'skip_download': True,
+            },
+        },
+        # SMIL from http://www1.wdr.de/mediathek/video/livestream/index.html
+        {
+            'url': 'http://metafilegenerator.de/WDR/WDR_FS/hds/hds.smil',
+            'info_dict': {
+                'id': 'hds',
+                'ext': 'flv',
+                'title': 'hds',
+                'formats': 'mincount:1',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        # SMIL from https://www.restudy.dk/video/play/id/1637
+        {
+            'url': 'https://www.restudy.dk/awsmedia/SmilDirectory/video_1637.xml',
+            'info_dict': {
+                'id': 'video_1637',
+                'ext': 'flv',
+                'title': 'video_1637',
+                'formats': 'mincount:3',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        # SMIL from http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm
+        {
+            'url': 'http://services.media.howstuffworks.com/videos/450221/smil-service.smil',
+            'info_dict': {
+                'id': 'smil-service',
+                'ext': 'flv',
+                'title': 'smil-service',
+                'formats': 'mincount:1',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        # SMIL from http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370
+        {
+            'url': 'http://api.new.livestream.com/accounts/1570303/events/1585861/videos/4719370.smil',
+            'info_dict': {
+                'id': '4719370',
+                'ext': 'mp4',
+                'title': '571de1fd-47bc-48db-abf9-238872a58d1f',
+                'formats': 'mincount:3',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        # XSPF playlist from http://www.telegraaf.nl/tv/nieuws/binnenland/24353229/__Tikibad_ontruimd_wegens_brand__.html
+        {
+            'url': 'http://www.telegraaf.nl/xml/playlist/2015/8/7/mZlp2ctYIUEB.xspf',
+            'info_dict': {
+                'id': 'mZlp2ctYIUEB',
+                'ext': 'mp4',
+                'title': 'Tikibad ontruimd wegens brand',
+                'description': 'md5:05ca046ff47b931f9b04855015e163a4',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'duration': 33,
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        # MPD from http://dash-mse-test.appspot.com/media.html
+        {
+            'url': 'http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd',
+            'md5': '4b57baab2e30d6eb3a6a09f0ba57ef53',
+            'info_dict': {
+                'id': 'car-20120827-manifest',
+                'ext': 'mp4',
+                'title': 'car-20120827-manifest',
+                'formats': 'mincount:9',
+                'upload_date': '20130904',
+            },
+            'params': {
+                'format': 'bestvideo',
+            },
+        },
+        # m3u8 served with Content-Type: audio/x-mpegURL; charset=utf-8
+        {
+            'url': 'http://once.unicornmedia.com/now/master/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/content.m3u8',
+            'info_dict': {
+                'id': 'content',
+                'ext': 'mp4',
+                'title': 'content',
+                'formats': 'mincount:8',
+            },
+            'params': {
+                # m3u8 downloads
+                'skip_download': True,
+            }
+        },
+        # m3u8 served with Content-Type: text/plain
+        {
+            'url': 'http://www.nacentapps.com/m3u8/index.m3u8',
+            'info_dict': {
+                'id': 'index',
+                'ext': 'mp4',
+                'title': 'index',
+                'upload_date': '20140720',
+                'formats': 'mincount:11',
+            },
+            'params': {
+                # m3u8 downloads
+                'skip_download': True,
+            }
+        },
          # google redirect
          {
              'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@@ -146,6 +288,22 @@ class GenericIE(InfoExtractor):
                  'skip_download': False,
              }
          },
+        {
+            # redirect in Refresh HTTP header
+            'url': 'https://www.facebook.com/l.php?u=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DpO8h3EaFRdo&h=TAQHsoToz&enc=AZN16h-b6o4Zq9pZkCCdOLNKMN96BbGMNtcFwHSaazus4JHT_MFYkAA-WARTX2kvsCIdlAIyHZjl6d33ILIJU7Jzwk_K3mcenAXoAzBNoZDI_Q7EXGDJnIhrGkLXo_LJ_pAa2Jzbx17UHMd3jAs--6j2zaeto5w9RTn8T_1kKg3fdC5WPX9Dbb18vzH7YFX0eSJmoa6SP114rvlkw6pkS1-T&s=1',
+            'info_dict': {
+                'id': 'pO8h3EaFRdo',
+                'ext': 'mp4',
+                'title': 'Tripeo Boiler Room x Dekmantel Festival DJ Set',
+                'description': 'md5:6294cc1af09c4049e0652b51a2df10d5',
+                'upload_date': '20150917',
+                'uploader_id': 'brtvofficial',
+                'uploader': 'Boiler Room',
+            },
+            'params': {
+                'skip_download': False,
+            },
+        },
          {
              'url': 'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
              'md5': '85b90ccc9d73b4acd9138d3af4c27f89',
@@ -172,7 +330,7 @@ class GenericIE(InfoExtractor):
          # it also tests brightcove videos that need to set the 'Referer' in the
          # http requests
          {
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
              'info_dict': {
                  'id': '2765128793001',
@@ -196,7 +354,7 @@ class GenericIE(InfoExtractor):
                  'uploader': 'thestar.com',
                  'description': 'Mississauga resident David Farmer is still out of power as a result of the ice storm a month ago. To keep the house warm, Farmer cuts wood from his property for a wood burning stove downstairs.',
              },
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
          },
          {
              'url': 'http://www.championat.com/video/football/v/87/87499.html',
@@ -211,7 +369,7 @@ class GenericIE(InfoExtractor):
          },
          {
              # https://github.com/rg3/youtube-dl/issues/3541
-            'add_ie': ['Brightcove'],
+            'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.kijk.nl/sbs6/leermijvrouwenkennen/videos/jqMiXKAYan2S/aflevering-1',
              'info_dict': {
                  'id': '3866516442001',
@@ -233,21 +391,23 @@ class GenericIE(InfoExtractor):
                  'id': 'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ',
                  'ext': 'mp4',
                  'title': '2cc213299525360.mov',  # that's what we get
+                'duration': 238.231,
              },
              'add_ie': ['Ooyala'],
          },
-        # multiple ooyala embeds on SBN network websites
          {
-            'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
+            # ooyala video embedded with http://player.ooyala.com/iframe.js
+            'url': 'http://www.macrumors.com/2015/07/24/steve-jobs-the-man-in-the-machine-first-trailer/',
              'info_dict': {
-                'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
-                'title': '25 lies you will tell yourself on National Signing Day - SBNation.com',
+                'id': 'p0MGJndjoG5SOKqO_hZJuZFPB-Tr5VgB',
+                'ext': 'mp4',
+                'title': '"Steve Jobs: Man in the Machine" trailer',
+                'description': 'The first trailer for the Alex Gibney documentary "Steve Jobs: Man in the Machine."',
+                'duration': 135.427,
              },
-            'playlist_mincount': 3,
              'params': {
                  'skip_download': True,
              },
-            'add_ie': ['Ooyala'],
          },
          # embed.ly video
          {
@@ -362,7 +522,7 @@ class GenericIE(InfoExtractor):
                  'description': 'md5:8145d19d320ff3e52f28401f4c4283b9',
              }
          },
-        # Embeded Ustream video
+        # Embedded Ustream video
          {
              'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
              'md5': '27b99cdb639c9b12a79bca876a073417',
@@ -437,7 +597,11 @@ class GenericIE(InfoExtractor):
                  'id': 'k2mm4bCdJ6CQ2i7c8o2',
                  'ext': 'mp4',
                  'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
+                'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
                  'uploader': 'Spi0n',
+                'uploader_id': 'xgditw',
+                'upload_date': '20140425',
+                'timestamp': 1398441542,
              },
              'add_ie': ['Dailymotion'],
          },
@@ -570,8 +734,11 @@ class GenericIE(InfoExtractor):
                  'id': 'uxjb0lwrcz',
                  'ext': 'mp4',
                  'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
+                'description': 'a Martin Fowler video from ThoughtWorks',
                  'duration': 1715.0,
                  'uploader': 'thoughtworks.wistia.com',
+                'upload_date': '20140603',
+                'timestamp': 1401832161,
              },
          },
          # Soundcloud embed
@@ -704,6 +871,19 @@ class GenericIE(InfoExtractor):
                  'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014',
              },
          },
+        # Kaltura embed protected with referrer
+        {
+            'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero',
+            'info_dict': {
+                'id': '1_g4fbemnq',
+                'ext': 'mp4',
+                'title': 'Violetta - Achter De Schermen - Ruggero',
+                'description': 'Achter de schermen met Ruggero',
+                'timestamp': 1435133761,
+                'upload_date': '20150624',
+                'uploader_id': 'echojecka',
+            },
+        },
          # Eagle.Platform embed (generic URL)
          {
              'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -809,6 +989,9 @@ class GenericIE(InfoExtractor):
                  'ext': 'flv',
                  'title': "PFT Live: New leader in the 'new-look' defense",
                  'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
+                'uploader': 'NBCU-SPORTS',
+                'upload_date': '20140107',
+                'timestamp': 1389118457,
              },
          },
          # UDN embed
@@ -828,8 +1011,9 @@ class GenericIE(InfoExtractor):
              'info_dict': {
                  'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
                  'ext': 'mp4',
-                'description': 'VIDEO: Index/Match versus VLOOKUP.',
+                'description': 'VIDEO: INDEX/MATCH versus VLOOKUP.',
                  'title': 'This is what separates the Excel masters from the wannabes',
+                'duration': 191.933,
              },
              'params': {
                  # m3u8 downloads
@@ -860,6 +1044,9 @@ class GenericIE(InfoExtractor):
                  'title': 'SN Presents: Russell Martin, World Citizen',
                  'description': 'To understand why he was the Toronto Blue Jays’ top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
                  'uploader': 'Rogers Sportsnet',
+                'uploader_id': '1704050871',
+                'upload_date': '20150525',
+                'timestamp': 1432570283,
              },
          },
          # Dailymotion Cloud video
@@ -905,7 +1092,85 @@ class GenericIE(InfoExtractor):
                  'description': 'New experience with Acrobat DC',
                  'duration': 248.667,
              },
-        }
+        },
+        # ScreenwaveMedia embed
+        {
+            'url': 'http://www.thecinemasnob.com/the-cinema-snob/a-nightmare-on-elm-street-2-freddys-revenge1',
+            'md5': '24ace5baba0d35d55c6810b51f34e9e0',
+            'info_dict': {
+                'id': 'cinemasnob-55d26273809dd',
+                'ext': 'mp4',
+                'title': 'cinemasnob',
+            },
+        },
+        # BrightcoveInPageEmbed embed
+        {
+            'url': 'http://www.geekandsundry.com/tabletop-bonus-wils-final-thoughts-on-dread/',
+            'info_dict': {
+                'id': '4238694884001',
+                'ext': 'flv',
+                'title': 'Tabletop: Dread, Last Thoughts',
+                'description': 'Tabletop: Dread, Last Thoughts',
+                'duration': 51690,
+            },
+        },
+        # JWPlayer with M3U8
+        {
+            'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video',
+            'info_dict': {
+                'id': 'playlist',
+                'ext': 'mp4',
+                'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ',
+                'uploader': 'ren.tv',
+            },
+            'params': {
+                # m3u8 downloads
+                'skip_download': True,
+            }
+        },
+        # Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
+        # This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
+        {
+            'url': 'https://dl.dropboxusercontent.com/u/29092637/interview.html',
+            'info_dict': {
+                'id': '4785848093001',
+                'ext': 'mp4',
+                'title': 'The Cardinal Pell Interview',
+                'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
+                'uploader': 'GlobeCast Australia - GlobeStream',
+                'uploader_id': '2733773828001',
+                'upload_date': '20160304',
+                'timestamp': 1457083087,
+            },
+            'params': {
+                # m3u8 downloads
+                'skip_download': True,
+            },
+        },
+        # Another form of arte.tv embed
+        {
+            'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
+            'md5': '850bfe45417ddf221288c88a0cffe2e2',
+            'info_dict': {
+                'id': '030273-562_PLUS7-F',
+                'ext': 'mp4',
+                'title': 'ARTE Reportage - Nulle part, en France',
+                'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
+                'upload_date': '20160409',
+            },
+        },
+        # LiveLeak embed
+        {
+            'url': 'http://www.wykop.pl/link/3088787/',
+            'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
+            'info_dict': {
+                'id': '874_1459135191',
+                'ext': 'mp4',
+                'title': 'Man shows poor quality of new apartment building',
+                'description': 'The wall is like a sand pile.',
+                'uploader': 'Lake8737',
+            }
+        },
      ]
  
      def report_following_redirect(self, new_url):
@@ -1048,28 +1313,36 @@ class GenericIE(InfoExtractor):
  
          full_response = None
          if head_response is False:
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.add_header('Accept-Encoding', '*')
              full_response = self._request_webpage(request, video_id)
              head_response = full_response
  
+        info_dict = {
+            'id': video_id,
+            'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
+            'upload_date': unified_strdate(head_response.headers.get('Last-Modified'))
+        }
+
          # Check for direct link to a video
-        content_type = head_response.headers.get('Content-Type', '')
-        m = re.match(r'^(?P<type>audio|video|application(?=/ogg$))/(?P<format_id>.+)$', content_type)
+        content_type = head_response.headers.get('Content-Type', '').lower()
+        m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>[^;\s]+)', content_type)
          if m:
-            upload_date = unified_strdate(
-                head_response.headers.get('Last-Modified'))
-            return {
-                'id': video_id,
-                'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
-                'direct': True,
-                'formats': [{
+            format_id = m.group('format_id')
+            if format_id.endswith('mpegurl'):
+                formats = self._extract_m3u8_formats(url, video_id, 'mp4')
+            elif format_id == 'f4m':
+                formats = self._extract_f4m_formats(url, video_id)
+            else:
+                formats = [{
                      'format_id': m.group('format_id'),
                      'url': url,
                      'vcodec': 'none' if m.group('type') == 'audio' else None
-                }],
-                'upload_date': upload_date,
-            }
+                }]
+                info_dict['direct'] = True
+            self._sort_formats(formats)
+            info_dict['formats'] = formats
+            return info_dict
  
          if not self._downloader.params.get('test', False) and not is_intentional:
              force = self._downloader.params.get('force_generic_extractor', False)
@@ -1077,7 +1350,7 @@ class GenericIE(InfoExtractor):
                  '%s on generic information extractor.' % ('Forcing' if force else 'Falling back'))
  
          if not full_response:
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              # Some webservers may serve compressed content of rather big size (e.g. gzipped flac)
              # making it impossible to download only chunk of the file (yet we need only 512kB to
              # test whether it's HTML or not). According to youtube-dl default Accept-Encoding
@@ -1089,32 +1362,50 @@ class GenericIE(InfoExtractor):
              request.add_header('Accept-Encoding', '*')
              full_response = self._request_webpage(request, video_id)
  
+        first_bytes = full_response.read(512)
+
+        # Is it an M3U playlist?
+        if first_bytes.startswith(b'#EXTM3U'):
+            info_dict['formats'] = self._extract_m3u8_formats(url, video_id, 'mp4')
+            self._sort_formats(info_dict['formats'])
+            return info_dict
+
          # Maybe it's a direct link to a video?
          # Be careful not to download the whole thing!
-        first_bytes = full_response.read(512)
          if not is_html(first_bytes):
              self._downloader.report_warning(
                  'URL could be a direct video link, returning it as such.')
-            upload_date = unified_strdate(
-                head_response.headers.get('Last-Modified'))
-            return {
-                'id': video_id,
-                'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
+            info_dict.update({
                  'direct': True,
                  'url': url,
-                'upload_date': upload_date,
-            }
+            })
+            return info_dict
  
          webpage = self._webpage_read_content(
              full_response, url, video_id, prefix=first_bytes)
  
          self.report_extraction(video_id)
  
-        # Is it an RSS feed?
+        # Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest?
          try:
-            doc = parse_xml(webpage)
+            doc = compat_etree_fromstring(webpage.encode('utf-8'))
              if doc.tag == 'rss':
                  return self._extract_rss(url, video_id, doc)
+            elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
+                smil = self._parse_smil(doc, url, video_id)
+                self._sort_formats(smil['formats'])
+                return smil
+            elif doc.tag == '{http://xspf.org/ns/0/}playlist':
+                return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
+            elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
+                info_dict['formats'] = self._parse_mpd_formats(
+                    doc, video_id, mpd_base_url=url.rpartition('/')[0])
+                self._sort_formats(info_dict['formats'])
+                return info_dict
+            elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
+                info_dict['formats'] = self._parse_f4m_formats(doc, url, video_id)
+                self._sort_formats(info_dict['formats'])
+                return info_dict
          except compat_xml_parse_error:
              pass
  
@@ -1160,14 +1451,14 @@ class GenericIE(InfoExtractor):
              return self.playlist_result(
                  urlrs, playlist_id=video_id, playlist_title=video_title)
  
-        # Look for BrightCove:
-        bc_urls = BrightcoveIE._extract_brightcove_urls(webpage)
+        # Look for Brightcove Legacy Studio embeds
+        bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
          if bc_urls:
              self.to_screen('Brightcove video detected.')
              entries = [{
                  '_type': 'url',
                  'url': smuggle_url(bc_url, {'Referer': url}),
-                'ie_key': 'Brightcove'
+                'ie_key': 'BrightcoveLegacy'
              } for bc_url in bc_urls]
  
              return {
@@ -1177,6 +1468,11 @@ class GenericIE(InfoExtractor):
                  'entries': entries,
              }
  
+        # Look for Brightcove New Studio embeds
+        bc_urls = BrightcoveNewIE._extract_urls(webpage)
+        if bc_urls:
+            return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
+
          # Look for embedded rtl.nl player
          matches = re.findall(
              r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1219,7 +1515,7 @@ class GenericIE(InfoExtractor):
  
          # Look for embedded Dailymotion player
          matches = re.findall(
-            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/embed/video/.+?)\1', webpage)
+            r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
          if matches:
              return _playlist_from_matches(
                  matches, lambda m: unescapeHTML(m[1]))
@@ -1260,11 +1556,6 @@ class GenericIE(InfoExtractor):
                  'id': match.group('id')
              }
  
-        # Look for embedded blip.tv player
-        bliptv_url = BlipTVIE._extract_url(webpage)
-        if bliptv_url:
-            return self.url_result(bliptv_url, 'BlipTV')
-
          # Look for SVT player
          svt_url = SVTIE._extract_url(webpage)
          if svt_url:
@@ -1320,12 +1611,12 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'))
  
          # Look for Ooyala videos
-        mobj = (re.search(r'player\.ooyala\.com/[^"?]+\?[^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
+        mobj = (re.search(r'player\.ooyala\.com/[^"?]+[?#][^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
                  re.search(r'OO\.Player\.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
                  re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
                  re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
          if mobj is not None:
-            return OoyalaIE._build_url_result(mobj.group('ec'))
+            return OoyalaIE._build_url_result(smuggle_url(mobj.group('ec'), {'domain': url}))
  
          # Look for multiple Ooyala embeds on SBN network websites
          mobj = re.search(r'SBN\.VideoLinkset\.entryGroup\((\[.*?\])', webpage)
@@ -1333,7 +1624,7 @@ class GenericIE(InfoExtractor):
              embeds = self._parse_json(mobj.group(1), video_id, fatal=False)
              if embeds:
                  return _playlist_from_matches(
-                    embeds, getter=lambda v: OoyalaIE._url_for_embed_code(v['provider_video_id']), ie='Ooyala')
+                    embeds, getter=lambda v: OoyalaIE._url_for_embed_code(smuggle_url(v['provider_video_id'], {'domain': url})), ie='Ooyala')
  
          # Look for Aparat videos
          mobj = re.search(r'<iframe .*?src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
@@ -1369,6 +1660,11 @@ class GenericIE(InfoExtractor):
          if mobj is not None:
              return self.url_result(mobj.group('url'), 'VK')
  
+        # Look for embedded Odnoklassniki player
+        mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
+        if mobj is not None:
+            return self.url_result(mobj.group('url'), 'Odnoklassniki')
+
          # Look for embedded ivi player
          mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
          if mobj is not None:
@@ -1424,6 +1720,11 @@ class GenericIE(InfoExtractor):
          if xhamster_urls:
              return _playlist_from_matches(xhamster_urls, ie='XHamsterEmbed')
  
+        # Look for embedded TNAFlixNetwork player
+        tnaflix_urls = TNAFlixNetworkEmbedIE._extract_urls(webpage)
+        if tnaflix_urls:
+            return _playlist_from_matches(tnaflix_urls, ie=TNAFlixNetworkEmbedIE.ie_key())
+
          # Look for embedded Tvigle player
          mobj = re.search(
              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -1444,7 +1745,7 @@ class GenericIE(InfoExtractor):
  
          # Look for embedded arte.tv player
          mobj = re.search(
-            r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
+            r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
              webpage)
          if mobj is not None:
              return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@@ -1466,7 +1767,7 @@ class GenericIE(InfoExtractor):
          if myvi_url:
              return self.url_result(myvi_url)
  
-        # Look for embeded soundcloud player
+        # Look for embedded soundcloud player
          mobj = re.search(
              r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
              webpage)
@@ -1483,12 +1784,9 @@ class GenericIE(InfoExtractor):
              return self.url_result(url, ie='Vulture')
  
          # Look for embedded mtvservices player
-        mobj = re.search(
-            r'<iframe src="(?P<url>https?://media\.mtvnservices\.com/embed/[^"]+)"',
-            webpage)
-        if mobj is not None:
-            url = unescapeHTML(mobj.group('url'))
-            return self.url_result(url, ie='MTVServicesEmbedded')
+        mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
+        if mtvservices_url:
+            return self.url_result(mtvservices_url, ie='MTVServicesEmbedded')
  
          # Look for embedded yahoo player
          mobj = re.search(
@@ -1527,7 +1825,7 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'MLB')
  
          mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>%s)\1' % CondeNastIE.EMBED_URL,
+            r'<(?:iframe|script)[^>]+?src=(["\'])(?P<url>%s)\1' % CondeNastIE.EMBED_URL,
              webpage)
          if mobj is not None:
              return self.url_result(self._proto_relative_url(mobj.group('url'), scheme='http:'), 'CondeNast')
@@ -1545,10 +1843,12 @@ class GenericIE(InfoExtractor):
              return self.url_result(mobj.group('url'), 'Zapiks')
  
          # Look for Kaltura embeds
-        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_id'\s*:\s*'(?P<id>[^']+)',", webpage) or
-                re.search(r'(?s)(["\'])(?:https?:)?//cdnapisec\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?\1.*?entry_id\s*:\s*(["\'])(?P<id>[^\2]+?)\2', webpage))
+        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or
+                re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
          if mobj is not None:
-            return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura')
+            return self.url_result(smuggle_url(
+                'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
+                {'source_url': url}), 'Kaltura')
  
          # Look for Eagle.Platform embeds
          mobj = re.search(
@@ -1563,10 +1863,14 @@ class GenericIE(InfoExtractor):
              return self.url_result('eagleplatform:%(host)s:%(id)s' % mobj.groupdict(), 'EaglePlatform')
  
          # Look for Pladform embeds
-        mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>https?://out\.pladform\.ru/player\?.+?)"', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'Pladform')
+        pladform_url = PladformIE._extract_url(webpage)
+        if pladform_url:
+            return self.url_result(pladform_url)
+
+        # Look for Videomore embeds
+        videomore_url = VideomoreIE._extract_url(webpage)
+        if videomore_url:
+            return self.url_result(videomore_url)
  
          # Look for Playwire embeds
          mobj = re.search(
@@ -1591,9 +1895,14 @@ class GenericIE(InfoExtractor):
          if nbc_sports_url:
              return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
  
+        # Look for Google Drive embeds
+        google_drive_url = GoogleDriveIE._extract_url(webpage)
+        if google_drive_url:
+            return self.url_result(google_drive_url, 'GoogleDrive')
+
          # Look for UDN embeds
          mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._VALID_URL, webpage)
+            r'<iframe[^>]+src="(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
          if mobj is not None:
              return self.url_result(
                  compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed')
@@ -1618,6 +1927,32 @@ class GenericIE(InfoExtractor):
          if snagfilms_url:
              return self.url_result(snagfilms_url)
  
+        # Look for JWPlatform embeds
+        jwplatform_url = JWPlatformIE._extract_url(webpage)
+        if jwplatform_url:
+            return self.url_result(jwplatform_url, 'JWPlatform')
+
+        # Look for ScreenwaveMedia embeds
+        mobj = re.search(ScreenwaveMediaIE.EMBED_PATTERN, webpage)
+        if mobj is not None:
+            return self.url_result(unescapeHTML(mobj.group('url')), 'ScreenwaveMedia')
+
+        # Look for Digiteka embeds
+        digiteka_url = DigitekaIE._extract_url(webpage)
+        if digiteka_url:
+            return self.url_result(self._proto_relative_url(digiteka_url), DigitekaIE.ie_key())
+
+        # Look for Limelight embeds
+        mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
+        if mobj:
+            lm = {
+                'Media': 'media',
+                'Channel': 'channel',
+                'ChannelList': 'channel_list',
+            }
+            return self.url_result('limelight:%s:%s' % (
+                lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2))
+
          # Look for AdobeTVVideo embeds
          mobj = re.search(
              r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
@@ -1627,6 +1962,25 @@ class GenericIE(InfoExtractor):
                  self._proto_relative_url(unescapeHTML(mobj.group(1))),
                  'AdobeTVVideo')
  
+        # Look for Vine embeds
+        mobj = re.search(
+            r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))',
+            webpage)
+        if mobj is not None:
+            return self.url_result(
+                self._proto_relative_url(unescapeHTML(mobj.group(1))), 'Vine')
+
+        # Look for Instagram embeds
+        instagram_embed_url = InstagramIE._extract_embed_url(webpage)
+        if instagram_embed_url is not None:
+            return self.url_result(
+                self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
+
+        # Look for LiveLeak embeds
+        liveleak_url = LiveLeakIE._extract_url(webpage)
+        if liveleak_url:
+            return self.url_result(liveleak_url, 'LiveLeak')
+
          def check_video(vurl):
              if YoutubeIE.suitable(vurl):
                  return True
@@ -1655,7 +2009,7 @@ class GenericIE(InfoExtractor):
          if not found:
              # Broaden the findall a little bit: JWPlayer JS loader
              found = filter_video(re.findall(
-                r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage))
+                r'[^A-Za-z0-9]?(?:file|video_url)["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage))
          if not found:
              # Flow player
              found = filter_video(re.findall(r'''(?xs)
@@ -1681,7 +2035,7 @@ class GenericIE(InfoExtractor):
                  found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
          if not found:
              # HTML5 video
-            found = re.findall(r'(?s)<video[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
+            found = re.findall(r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
          if not found:
              REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
              found = re.search(
@@ -1692,6 +2046,9 @@ class GenericIE(InfoExtractor):
                  # Look also in Refresh HTTP header
                  refresh_header = head_response.headers.get('Refresh')
                  if refresh_header:
+                    # In python 2 response HTTP headers are bytestrings
+                    if sys.version_info < (3, 0) and isinstance(refresh_header, str):
+                        refresh_header = refresh_header.decode('iso-8859-1')
                      found = re.search(REDIRECT_REGEX, refresh_header)
              if found:
                  new_url = compat_urlparse.urljoin(url, unescapeHTML(found.group(1)))
@@ -1705,6 +2062,8 @@ class GenericIE(InfoExtractor):
  
          entries = []
          for video_url in found:
+            video_url = unescapeHTML(video_url)
+            video_url = video_url.replace('\\/', '/')
              video_url = compat_urlparse.urljoin(url, video_url)
              video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
  
@@ -1716,22 +2075,31 @@ class GenericIE(InfoExtractor):
              # here's a fun little line of code for you:
              video_id = os.path.splitext(video_id)[0]
  
-            if determine_ext(video_url) == 'smil':
-                entries.append({
-                    'id': video_id,
-                    'formats': self._extract_smil_formats(video_url, video_id),
-                    'uploader': video_uploader,
-                    'title': video_title,
-                    'age_limit': age_limit,
-                })
+            entry_info_dict = {
+                'id': video_id,
+                'uploader': video_uploader,
+                'title': video_title,
+                'age_limit': age_limit,
+            }
+
+            ext = determine_ext(video_url)
+            if ext == 'smil':
+                entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id)
+            elif ext == 'xspf':
+                return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
+            elif ext == 'm3u8':
+                entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
+            elif ext == 'mpd':
+                entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
+            elif ext == 'f4m':
+                entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
              else:
-                entries.append({
-                    'id': video_id,
-                    'url': video_url,
-                    'uploader': video_uploader,
-                    'title': video_title,
-                    'age_limit': age_limit,
-                })
+                entry_info_dict['url'] = video_url
+
+            if entry_info_dict.get('formats'):
+                self._sort_formats(entry_info_dict['formats'])
+
+            entries.append(entry_info_dict)
  
          if len(entries) == 1:
              return entries[0]
diff --git a/youtube_dl/extractor/glide.py b/youtube_dl/extractor/glide.py

index 9561ed5fbaa25404654303956a676b000da2af67..62ff84835c87b28d18ace1afa5eee19f894d198d 100644 (file)
--- a/youtube_dl/extractor/glide.py
+++ b/youtube_dl/extractor/glide.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import unified_strdate
  
  
  class GlideIE(InfoExtractor):
@@ -15,26 +16,38 @@ class GlideIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Damon Timm\'s Glide message',
              'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
+            'uploader': 'Damon Timm',
+            'upload_date': '20140919',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+
          webpage = self._download_webpage(url, video_id)
+
          title = self._html_search_regex(
-            r'<title>(.*?)</title>', webpage, 'title')
-        video_url = self.http_scheme() + self._search_regex(
-            r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL')
-        thumbnail_url = self._search_regex(
-            r'<img id="video-thumbnail" src="(.*?)"',
-            webpage, 'thumbnail url', fatal=False)
-        thumbnail = (
-            thumbnail_url if thumbnail_url is None
-            else self.http_scheme() + thumbnail_url)
+            r'<title>(.+?)</title>', webpage, 'title')
+        video_url = self._proto_relative_url(self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'video URL', default=None,
+            group='url')) or self._og_search_video_url(webpage)
+        thumbnail = self._proto_relative_url(self._search_regex(
+            r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'thumbnail url', default=None,
+            group='url')) or self._og_search_thumbnail(webpage)
+        uploader = self._search_regex(
+            r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
+            webpage, 'uploader', fatal=False)
+        upload_date = unified_strdate(self._search_regex(
+            r'<div[^>]+class="info-date"[^>]*>([^<]+)',
+            webpage, 'upload date', fatal=False))
  
          return {
              'id': video_id,
              'title': title,
              'url': video_url,
              'thumbnail': thumbnail,
+            'uploader': uploader,
+            'upload_date': upload_date,
          }
diff --git a/youtube_dl/extractor/globo.py b/youtube_dl/extractor/globo.py

index 8a95793cae07734e67340bf49db088cdb043d1cb..3de8356f68ef67e0913fd958995ad1d3e48ac62f 100644 (file)
--- a/youtube_dl/extractor/globo.py
+++ b/youtube_dl/extractor/globo.py
@@ -13,79 +13,59 @@ from ..compat import (
  from ..utils import (
      ExtractorError,
      float_or_none,
+    int_or_none,
+    str_or_none,
  )
  
  
  class GloboIE(InfoExtractor):
-    _VALID_URL = 'https?://.+?\.globo\.com/(?P<id>.+)'
+    _VALID_URL = '(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
  
      _API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
      _SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
  
-    _VIDEOID_REGEXES = [
-        r'\bdata-video-id="(\d+)"',
-        r'\bdata-player-videosids="(\d+)"',
-        r'<div[^>]+\bid="(\d+)"',
-    ]
-
      _RESIGN_EXPIRATION = 86400
  
-    _TESTS = [
-        {
-            'url': 'http://globotv.globo.com/sportv/futebol-nacional/v/os-gols-de-atletico-mg-3-x-2-santos-pela-24a-rodada-do-brasileirao/3654973/',
-            'md5': '03ebf41cb7ade43581608b7d9b71fab0',
-            'info_dict': {
-                'id': '3654973',
-                'ext': 'mp4',
-                'title': 'Os gols de Atlético-MG 3 x 2 Santos pela 24ª rodada do Brasileirão',
-                'duration': 251.585,
-                'uploader': 'SporTV',
-                'uploader_id': 698,
-                'like_count': int,
-            }
+    _TESTS = [{
+        'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
+        'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
+        'info_dict': {
+            'id': '3607726',
+            'ext': 'mp4',
+            'title': 'Mercedes-Benz GLA passa por teste de colisão na Europa',
+            'duration': 103.204,
+            'uploader': 'Globo.com',
+            'uploader_id': '265',
          },
-        {
-            'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
-            'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
-            'info_dict': {
-                'id': '3607726',
-                'ext': 'mp4',
-                'title': 'Mercedes-Benz GLA passa por teste de colisão na Europa',
-                'duration': 103.204,
-                'uploader': 'Globo.com',
-                'uploader_id': 265,
-                'like_count': int,
-            }
+    }, {
+        'url': 'http://globoplay.globo.com/v/4581987/',
+        'md5': 'f36a1ecd6a50da1577eee6dd17f67eff',
+        'info_dict': {
+            'id': '4581987',
+            'ext': 'mp4',
+            'title': 'Acidentes de trânsito estão entre as maiores causas de queda de energia em SP',
+            'duration': 137.973,
+            'uploader': 'Rede Globo',
+            'uploader_id': '196',
          },
-        {
-            'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
-            'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
-            'info_dict': {
-                'id': '3652183',
-                'ext': 'mp4',
-                'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
-                'duration': 110.711,
-                'uploader': 'Rede Globo',
-                'uploader_id': 196,
-                'like_count': int,
-            }
-        },
-        {
-            'url': 'http://globotv.globo.com/canal-brasil/sangue-latino/t/todos-os-videos/v/ator-e-diretor-argentino-ricado-darin-fala-sobre-utopias-e-suas-perdas/3928201/',
-            'md5': 'c1defca721ce25b2354e927d3e4b3dec',
-            'info_dict': {
-                'id': '3928201',
-                'ext': 'mp4',
-                'title': 'Ator e diretor argentino, Ricado Darín fala sobre utopias e suas perdas',
-                'duration': 1472.906,
-                'uploader': 'Canal Brasil',
-                'uploader_id': 705,
-                'like_count': int,
-            }
-        },
-    ]
-
-    class MD5():
+    }, {
+        'url': 'http://canalbrasil.globo.com/programas/sangue-latino/videos/3928201.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://globosatplay.globo.com/globonews/v/4472924/',
+        'only_matching': True,
+    }, {
+        'url': 'http://globotv.globo.com/t/programa/v/clipe-sexo-e-as-negas-adeus/3836166/',
+        'only_matching': True,
+    }, {
+        'url': 'http://globotv.globo.com/canal-brasil/sangue-latino/t/todos-os-videos/v/ator-e-diretor-argentino-ricado-darin-fala-sobre-utopias-e-suas-perdas/3928201/',
+        'only_matching': True,
+    }, {
+        'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
+        'only_matching': True,
+    }]
+
+    class MD5(object):
          HEX_FORMAT_LOWERCASE = 0
          HEX_FORMAT_UPPERCASE = 1
          BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = ''
@@ -352,23 +332,15 @@ class GloboIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
-        video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
-
          video = self._download_json(
              self._API_URL_TEMPLATE % video_id, video_id)['videos'][0]
  
          title = video['title']
-        duration = float_or_none(video['duration'], 1000)
-        like_count = video['likes']
-        uploader = video['channel']
-        uploader_id = video['channel_id']
  
          formats = []
-
          for resource in video['resources']:
              resource_id = resource.get('_id')
-            if not resource_id:
+            if not resource_id or resource_id.endswith('manifest'):
                  continue
  
              security = self._download_json(
@@ -397,22 +369,68 @@ class GloboIE(InfoExtractor):
              resource_url = resource['url']
              signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
              if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(signed_url, resource_id, 'mp4'))
+                formats.extend(self._extract_m3u8_formats(
+                    signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
              else:
                  formats.append({
                      'url': signed_url,
-                    'format_id': resource_id,
-                    'height': resource.get('height'),
+                    'format_id': 'http-%s' % resource_id,
+                    'height': int_or_none(resource.get('height')),
                  })
  
          self._sort_formats(formats)
  
+        duration = float_or_none(video.get('duration'), 1000)
+        uploader = video.get('channel')
+        uploader_id = str_or_none(video.get('channel_id'))
+
          return {
              'id': video_id,
              'title': title,
              'duration': duration,
              'uploader': uploader,
              'uploader_id': uploader_id,
-            'like_count': like_count,
              'formats': formats
          }
+
+
+class GloboArticleIE(InfoExtractor):
+    _VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+
+    _VIDEOID_REGEXES = [
+        r'\bdata-video-id=["\'](\d{7,})',
+        r'\bdata-player-videosids=["\'](\d{7,})',
+        r'\bvideosIDs\s*:\s*["\'](\d{7,})',
+        r'\bdata-id=["\'](\d{7,})',
+        r'<div[^>]+\bid=["\'](\d{7,})',
+    ]
+
+    _TESTS = [{
+        'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
+        'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
+        'info_dict': {
+            'id': '3652183',
+            'ext': 'mp4',
+            'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
+            'duration': 110.711,
+            'uploader': 'Rede Globo',
+            'uploader_id': '196',
+        }
+    }, {
+        'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://gshow.globo.com/programas/tv-xuxa/O-Programa/noticia/2014/01/xuxa-e-junno-namoram-muuuito-em-luau-de-zeze-di-camargo-e-luciano.html',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if GloboIE.suitable(url) else super(GloboArticleIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
+        return self.url_result('globo:%s' % video_id, 'Globo')
diff --git a/youtube_dl/extractor/googledrive.py b/youtube_dl/extractor/googledrive.py

new file mode 100644 (file)

index 0000000..766fc26
--- /dev/null
+++ b/youtube_dl/extractor/googledrive.py
@@ -0,0 +1,92 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
+
+
+class GoogleDriveIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})'
+    _TESTS = [{
+        'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1',
+        'md5': '881f7700aec4f538571fa1e0eed4a7b6',
+        'info_dict': {
+            'id': '0ByeS4oOUV-49Zzh4R1J6R09zazQ',
+            'ext': 'mp4',
+            'title': 'Big Buck Bunny.mp4',
+            'duration': 46,
+        }
+    }, {
+        # video id is longer than 28 characters
+        'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit',
+        'only_matching': True,
+    }]
+    _FORMATS_EXT = {
+        '5': 'flv',
+        '6': 'flv',
+        '13': '3gp',
+        '17': '3gp',
+        '18': 'mp4',
+        '22': 'mp4',
+        '34': 'flv',
+        '35': 'flv',
+        '36': '3gp',
+        '37': 'mp4',
+        '38': 'mp4',
+        '43': 'webm',
+        '44': 'webm',
+        '45': 'webm',
+        '46': 'webm',
+        '59': 'mp4',
+    }
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src="https?://(?:video\.google\.com/get_player\?.*?docid=|(?:docs|drive)\.google\.com/file/d/)(?P<id>[a-zA-Z0-9_-]{28,})',
+            webpage)
+        if mobj:
+            return 'https://drive.google.com/file/d/%s' % mobj.group('id')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(
+            'http://docs.google.com/file/d/%s' % video_id, video_id, encoding='unicode_escape')
+
+        reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
+        if reason:
+            raise ExtractorError(reason)
+
+        title = self._search_regex(r'"title"\s*,\s*"([^"]+)', webpage, 'title')
+        duration = int_or_none(self._search_regex(
+            r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds', default=None))
+        fmt_stream_map = self._search_regex(
+            r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',')
+        fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
+
+        formats = []
+        for fmt, fmt_stream in zip(fmt_list, fmt_stream_map):
+            fmt_id, fmt_url = fmt_stream.split('|')
+            resolution = fmt.split('/')[1]
+            width, height = resolution.split('x')
+            formats.append({
+                'url': fmt_url,
+                'format_id': fmt_id,
+                'resolution': resolution,
+                'width': int_or_none(width),
+                'height': int_or_none(height),
+                'ext': self._FORMATS_EXT[fmt_id],
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
+            'duration': duration,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/googleplus.py b/youtube_dl/extractor/googleplus.py

index fcefe54cd1207f1a57000c04b7fb460590f2024e..731bacd673bd57fe82411268c5920a3e9c7447ac 100644 (file)
--- a/youtube_dl/extractor/googleplus.py
+++ b/youtube_dl/extractor/googleplus.py
@@ -61,7 +61,7 @@ class GooglePlusIE(InfoExtractor):
              'width': int(width),
              'height': int(height),
          } for width, height, video_url in re.findall(
-            r'\d+,(\d+),(\d+),"(https?://redirector\.googlevideo\.com.*?)"', webpage)]
+            r'\d+,(\d+),(\d+),"(https?://[^.]+\.googleusercontent.com.*?)"', webpage)]
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/gorillavid.py b/youtube_dl/extractor/gorillavid.py

deleted file mode 100644 (file)

index f006f0c..0000000
--- a/youtube_dl/extractor/gorillavid.py
+++ /dev/null
@@ -1,116 +0,0 @@
-# -*- coding: utf-8 -*-
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
-from ..utils import (
-    ExtractorError,
-    int_or_none,
-)
-
-
-class GorillaVidIE(InfoExtractor):
-    IE_DESC = 'GorillaVid.in, daclips.in, movpod.in, fastvideo.in and realvid.net'
-    _VALID_URL = r'''(?x)
-        https?://(?P<host>(?:www\.)?
-            (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net))/
-        (?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
-    '''
-
-    _FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<'
-
-    _TESTS = [{
-        'url': 'http://gorillavid.in/06y9juieqpmi',
-        'md5': '5ae4a3580620380619678ee4875893ba',
-        'info_dict': {
-            'id': '06y9juieqpmi',
-            'ext': 'flv',
-            'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
-        'only_matching': True,
-    }, {
-        'url': 'http://daclips.in/3rso4kdn6f9m',
-        'md5': '1ad8fd39bb976eeb66004d3a4895f106',
-        'info_dict': {
-            'id': '3rso4kdn6f9m',
-            'ext': 'mp4',
-            'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
-            'thumbnail': 're:http://.*\.jpg',
-        }
-    }, {
-        # video with countdown timeout
-        'url': 'http://fastvideo.in/1qmdn1lmsmbw',
-        'md5': '8b87ec3f6564a3108a0e8e66594842ba',
-        'info_dict': {
-            'id': '1qmdn1lmsmbw',
-            'ext': 'mp4',
-            'title': 'Man of Steel - Trailer',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://realvid.net/ctn2y6p2eviw',
-        'md5': 'b2166d2cf192efd6b6d764c18fd3710e',
-        'info_dict': {
-            'id': 'ctn2y6p2eviw',
-            'ext': 'flv',
-            'title': 'rdx 1955',
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }, {
-        'url': 'http://movpod.in/0wguyyxi1yca',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        webpage = self._download_webpage('http://%s/%s' % (mobj.group('host'), video_id), video_id)
-
-        if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-
-        fields = self._hidden_inputs(webpage)
-
-        if fields['op'] == 'download1':
-            countdown = int_or_none(self._search_regex(
-                r'<span id="countdown_str">(?:[Ww]ait)?\s*<span id="cxc">(\d+)</span>\s*(?:seconds?)?</span>',
-                webpage, 'countdown', default=None))
-            if countdown:
-                self._sleep(countdown, video_id)
-
-            post = compat_urllib_parse.urlencode(fields)
-
-            req = compat_urllib_request.Request(url, post)
-            req.add_header('Content-type', 'application/x-www-form-urlencoded')
-
-            webpage = self._download_webpage(req, video_id, 'Downloading video page')
-
-        title = self._search_regex(
-            [r'style="z-index: [0-9]+;">([^<]+)</span>', r'>Watch (.+) '],
-            webpage, 'title', default=None) or self._og_search_title(webpage)
-        video_url = self._search_regex(
-            r'file\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'file url')
-        thumbnail = self._search_regex(
-            r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', fatal=False)
-
-        formats = [{
-            'format_id': 'sd',
-            'url': video_url,
-            'quality': 1,
-        }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/goshgay.py b/youtube_dl/extractor/goshgay.py

index 1d9166455aae935f1eb51777d170e0f6259ffd4e..0c015141fa322465b1476e035f87da223555b211 100644 (file)
--- a/youtube_dl/extractor/goshgay.py
+++ b/youtube_dl/extractor/goshgay.py
@@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
      _VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
      _TEST = {
          'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
-        'md5': '027fcc54459dff0feb0bc06a7aeda680',
+        'md5': '4b6db9a0a333142eb9f15913142b0ed1',
          'info_dict': {
              'id': '299069',
              'ext': 'flv',
              'title': 'DIESEL SFW XXX Video',
              'thumbnail': 're:^http://.*\.jpg$',
-            'duration': 79,
+            'duration': 80,
              'age_limit': 18,
          }
      }
@@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
              'title': title,
              'thumbnail': thumbnail,
              'duration': duration,
-            'age_limit': self._family_friendly_search(webpage),
+            'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/gputechconf.py b/youtube_dl/extractor/gputechconf.py

new file mode 100644 (file)

index 0000000..73dc62c
--- /dev/null
+++ b/youtube_dl/extractor/gputechconf.py
@@ -0,0 +1,35 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class GPUTechConfIE(InfoExtractor):
+    _VALID_URL = r'https?://on-demand\.gputechconf\.com/gtc/2015/video/S(?P<id>\d+)\.html'
+    _TEST = {
+        'url': 'http://on-demand.gputechconf.com/gtc/2015/video/S5156.html',
+        'md5': 'a8862a00a0fd65b8b43acc5b8e33f798',
+        'info_dict': {
+            'id': '5156',
+            'ext': 'mp4',
+            'title': 'Coordinating More Than 3 Million CUDA Threads for Social Network Analysis',
+            'duration': 1219,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        root_path = self._search_regex(
+            r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
+            default='http://evt.dispeak.com/nvidia/events/gtc15/')
+        xml_file_id = self._search_regex(
+            r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': '%sxml/%s.xml' % (root_path, xml_file_id),
+            'ie_key': 'DigitallySpeaking',
+        }
diff --git a/youtube_dl/extractor/groupon.py b/youtube_dl/extractor/groupon.py

index 8b9e0e2f8ee6d8e9ce16e846a20a78a62ca97247..f6b69662baf547aa48a9bdf460671f072bd59884 100644 (file)
--- a/youtube_dl/extractor/groupon.py
+++ b/youtube_dl/extractor/groupon.py
@@ -16,12 +16,14 @@ class GrouponIE(InfoExtractor):
          'playlist': [{
              'info_dict': {
                  'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Bikram Yoga Huntington Beach | Orange County',
+                'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+                'duration': 44.961,
              },
          }],
          'params': {
-            'skip_download': 'HLS',
+            'skip_download': 'HDS',
          }
      }
  
@@ -30,7 +32,7 @@ class GrouponIE(InfoExtractor):
          webpage = self._download_webpage(url, playlist_id)
  
          payload = self._parse_json(self._search_regex(
-            r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
+            r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
          videos = payload['carousel'].get('dealVideos', [])
          entries = []
          for v in videos:
diff --git a/youtube_dl/extractor/hbo.py b/youtube_dl/extractor/hbo.py

new file mode 100644 (file)

index 0000000..dad0f39
--- /dev/null
+++ b/youtube_dl/extractor/hbo.py
@@ -0,0 +1,122 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    xpath_text,
+    xpath_element,
+    int_or_none,
+    parse_duration,
+)
+
+
+class HBOIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
+        'md5': '1c33253f0c7782142c993c0ba62a8753',
+        'info_dict': {
+            'id': '1437839',
+            'ext': 'mp4',
+            'title': 'Ep. 64 Clip: Encryption',
+        }
+    }
+    _FORMATS_INFO = {
+        '1920': {
+            'width': 1280,
+            'height': 720,
+        },
+        '640': {
+            'width': 768,
+            'height': 432,
+        },
+        'highwifi': {
+            'width': 640,
+            'height': 360,
+        },
+        'high3g': {
+            'width': 640,
+            'height': 360,
+        },
+        'medwifi': {
+            'width': 400,
+            'height': 224,
+        },
+        'med3g': {
+            'width': 400,
+            'height': 224,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_xml(
+            'http://render.lv3.hbo.com/data/content/global/videos/data/%s.xml' % video_id, video_id)
+        title = xpath_text(video_data, 'title', 'title', True)
+
+        formats = []
+        for source in xpath_element(video_data, 'videos', 'sources', True):
+            if source.tag == 'size':
+                path = xpath_text(source, './/path')
+                if not path:
+                    continue
+                width = source.attrib.get('width')
+                format_info = self._FORMATS_INFO.get(width, {})
+                height = format_info.get('height')
+                fmt = {
+                    'url': path,
+                    'format_id': 'http%s' % ('-%dp' % height if height else ''),
+                    'width': format_info.get('width'),
+                    'height': height,
+                }
+                rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', path)
+                if rtmp:
+                    fmt.update({
+                        'url': rtmp.group('url'),
+                        'play_path': rtmp.group('playpath'),
+                        'app': rtmp.group('app'),
+                        'ext': 'flv',
+                        'format_id': fmt['format_id'].replace('http', 'rtmp'),
+                    })
+                formats.append(fmt)
+            else:
+                video_url = source.text
+                if not video_url:
+                    continue
+                if source.tag == 'tarball':
+                    formats.extend(self._extract_m3u8_formats(
+                        video_url.replace('.tar', '/base_index_w8.m3u8'),
+                        video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+                else:
+                    format_info = self._FORMATS_INFO.get(source.tag, {})
+                    formats.append({
+                        'format_id': 'http-%s' % source.tag,
+                        'url': video_url,
+                        'width': format_info.get('width'),
+                        'height': format_info.get('height'),
+                    })
+        self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
+
+        thumbnails = []
+        card_sizes = xpath_element(video_data, 'titleCardSizes')
+        if card_sizes is not None:
+            for size in card_sizes:
+                path = xpath_text(size, 'path')
+                if not path:
+                    continue
+                width = int_or_none(size.get('width'))
+                thumbnails.append({
+                    'id': width,
+                    'url': path,
+                    'width': width,
+                })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'duration': parse_duration(xpath_element(video_data, 'duration/tv14')),
+            'formats': formats,
+            'thumbnails': thumbnails,
+        }
diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py

index a19b31ac0605392b044ca14fad4ffa4e4752b281..7d8698655666f8de4e8850ac2684a16dd28810af 100644 (file)
--- a/youtube_dl/extractor/hearthisat.py
+++ b/youtube_dl/extractor/hearthisat.py
@@ -4,12 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      HEADRequest,
+    sanitized_Request,
      str_to_int,
      urlencode_postdata,
      urlhandle_detect_ext,
@@ -47,7 +45,7 @@ class HearThisAtIE(InfoExtractor):
              r'intTrackId\s*=\s*(\d+)', webpage, 'track ID')
  
          payload = urlencode_postdata({'tracks[]': track_id})
-        req = compat_urllib_request.Request(self._PLAYLIST_URL, payload)
+        req = sanitized_Request(self._PLAYLIST_URL, payload)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          track = self._download_json(req, track_id, 'Downloading playlist')[0]
diff --git a/youtube_dl/extractor/hentaistigma.py b/youtube_dl/extractor/hentaistigma.py

index f5aa73d18b47ff225b7e7e332051b97583ca8237..86a93de4d62cd906d5efb03faa9e568792d82392 100644 (file)
--- a/youtube_dl/extractor/hentaistigma.py
+++ b/youtube_dl/extractor/hentaistigma.py
@@ -11,8 +11,8 @@ class HentaiStigmaIE(InfoExtractor):
          'info_dict': {
              'id': 'inyouchuu-etsu-bonus',
              'ext': 'mp4',
-            "title": "Inyouchuu Etsu Bonus",
-            "age_limit": 18,
+            'title': 'Inyouchuu Etsu Bonus',
+            'age_limit': 18,
          }
      }
  
diff --git a/youtube_dl/extractor/history.py b/youtube_dl/extractor/history.py

deleted file mode 100644 (file)

index f86164a..0000000
--- a/youtube_dl/extractor/history.py
+++ /dev/null
@@ -1,31 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import smuggle_url
-
-
-class HistoryIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?history\.com/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
-
-    _TESTS = [{
-        'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
-        'md5': '6fe632d033c92aa10b8d4a9be047a7c5',
-        'info_dict': {
-            'id': 'bLx5Dv5Aka1G',
-            'ext': 'mp4',
-            'title': "Bet You Didn't Know: Valentine's Day",
-            'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
-        },
-        'add_ie': ['ThePlatform'],
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = self._search_regex(
-            r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
-            webpage, 'video url')
-
-        return self.url_result(smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}))
diff --git a/youtube_dl/extractor/hitbox.py b/youtube_dl/extractor/hitbox.py

index 421f55bbeaed2c1249833e5136ff479557c1bccc..ff797438dec12303aab55af0e29aac8bd35229c5 100644 (file)
--- a/youtube_dl/extractor/hitbox.py
+++ b/youtube_dl/extractor/hitbox.py
@@ -159,6 +159,9 @@ class HitboxLiveIE(HitboxIE):
          cdns = player_config.get('cdns')
          servers = []
          for cdn in cdns:
+            # Subscribe URLs are not playable
+            if cdn.get('rtmpSubscribe') is True:
+                continue
              base_url = cdn.get('netConnectionUrl')
              host = re.search('.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
              if base_url not in servers:
diff --git a/youtube_dl/extractor/hostingbulk.py b/youtube_dl/extractor/hostingbulk.py

deleted file mode 100644 (file)

index a3154cf..0000000
--- a/youtube_dl/extractor/hostingbulk.py
+++ /dev/null
@@ -1,80 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
-from ..utils import (
-    ExtractorError,
-    int_or_none,
-    urlencode_postdata,
-)
-
-
-class HostingBulkIE(InfoExtractor):
-    _VALID_URL = r'''(?x)
-        https?://(?:www\.)?hostingbulk\.com/
-        (?:embed-)?(?P<id>[A-Za-z0-9]{12})(?:-\d+x\d+)?\.html'''
-    _FILE_DELETED_REGEX = r'<b>File Not Found</b>'
-    _TEST = {
-        'url': 'http://hostingbulk.com/n0ulw1hv20fm.html',
-        'md5': '6c8653c8ecf7ebfa83b76e24b7b2fe3f',
-        'info_dict': {
-            'id': 'n0ulw1hv20fm',
-            'ext': 'mp4',
-            'title': 'md5:5afeba33f48ec87219c269e054afd622',
-            'filesize': 6816081,
-            'thumbnail': 're:^http://.*\.jpg$',
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        url = 'http://hostingbulk.com/{0:}.html'.format(video_id)
-
-        # Custom request with cookie to set language to English, so our file
-        # deleted regex would work.
-        request = compat_urllib_request.Request(
-            url, headers={'Cookie': 'lang=english'})
-        webpage = self._download_webpage(request, video_id)
-
-        if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id,
-                                 expected=True)
-
-        title = self._html_search_regex(r'<h3>(.*?)</h3>', webpage, 'title')
-        filesize = int_or_none(
-            self._search_regex(
-                r'<small>\((\d+)\sbytes?\)</small>',
-                webpage,
-                'filesize',
-                fatal=False
-            )
-        )
-        thumbnail = self._search_regex(
-            r'<img src="([^"]+)".+?class="pic"',
-            webpage, 'thumbnail', fatal=False)
-
-        fields = self._hidden_inputs(webpage)
-
-        request = compat_urllib_request.Request(url, urlencode_postdata(fields))
-        request.add_header('Content-type', 'application/x-www-form-urlencoded')
-        response = self._request_webpage(request, video_id,
-                                         'Submiting download request')
-        video_url = response.geturl()
-
-        formats = [{
-            'format_id': 'sd',
-            'filesize': filesize,
-            'url': video_url,
-        }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/hotnewhiphop.py b/youtube_dl/extractor/hotnewhiphop.py

index 651784b73940032fd65c9675043f045b0bcf4ff2..9db5652096acc5ead0cb926791d731d0f6f35565 100644 (file)
--- a/youtube_dl/extractor/hotnewhiphop.py
+++ b/youtube_dl/extractor/hotnewhiphop.py
@@ -3,18 +3,16 @@ from __future__ import unicode_literals
  import base64
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
      HEADRequest,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class HotNewHipHopIE(InfoExtractor):
-    _VALID_URL = r'http://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
+    _VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
      _TEST = {
          'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
          'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',
@@ -37,11 +35,11 @@ class HotNewHipHopIE(InfoExtractor):
                  r'"contentUrl" content="(.*?)"', webpage, 'content URL')
              return self.url_result(video_url, ie='Youtube')
  
-        reqdata = compat_urllib_parse.urlencode([
+        reqdata = urlencode_postdata([
              ('mediaType', 's'),
              ('mediaId', video_id),
          ])
-        r = compat_urllib_request.Request(
+        r = sanitized_Request(
              'http://www.hotnewhiphop.com/ajax/media/getActions/', data=reqdata)
          r.add_header('Content-Type', 'application/x-www-form-urlencoded')
          mkd = self._download_json(
diff --git a/youtube_dl/extractor/hotstar.py b/youtube_dl/extractor/hotstar.py

new file mode 100644 (file)

index 0000000..f05d765
--- /dev/null
+++ b/youtube_dl/extractor/hotstar.py
@@ -0,0 +1,83 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    determine_ext,
+    int_or_none,
+)
+
+
+class HotStarIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
+    _TESTS = [{
+        'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
+        'info_dict': {
+            'id': '1000076273',
+            'ext': 'mp4',
+            'title': 'On Air With AIB - English',
+            'description': 'md5:c957d8868e9bc793ccb813691cc4c434',
+            'timestamp': 1447227000,
+            'upload_date': '20151111',
+            'duration': 381,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.hotstar.com/1000000515',
+        'only_matching': True,
+    }]
+
+    _GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s'
+    _GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s'
+
+    def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', fatal=True):
+        json_data = super(HotStarIE, self)._download_json(url_or_request, video_id, note, fatal=fatal)
+        if json_data['resultCode'] != 'OK':
+            if fatal:
+                raise ExtractorError(json_data['errorDescription'])
+            return None
+        return json_data['resultObj']
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            self._GET_CONTENT_TEMPLATE % video_id,
+            video_id)['contentInfo'][0]
+
+        formats = []
+        # PCTV for extracting f4m manifest
+        for f in ('TABLET',):
+            format_data = self._download_json(
+                self._GET_CDN_TEMPLATE % (f, video_id, 'VOD'),
+                video_id, 'Downloading %s JSON metadata' % f, fatal=False)
+            if format_data:
+                format_url = format_data['src']
+                ext = determine_ext(format_url)
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(format_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                elif ext == 'f4m':
+                    # produce broken files
+                    continue
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'width': int_or_none(format_data.get('width')),
+                        'height': int_or_none(format_data.get('height')),
+                    })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_data['episodeTitle'],
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'timestamp': int_or_none(video_data.get('broadcastDate')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/howcast.py b/youtube_dl/extractor/howcast.py

index 16677f179ecd77040e8fdc42bfb9e4095aa1774a..e8f51e545bfd2b89a251e1a4fbbeefe80aa371f9 100644 (file)
--- a/youtube_dl/extractor/howcast.py
+++ b/youtube_dl/extractor/howcast.py
@@ -16,6 +16,7 @@ class HowcastIE(InfoExtractor):
              'description': 'md5:dbe792e5f6f1489027027bf2eba188a3',
              'timestamp': 1276081287,
              'upload_date': '20100609',
+            'duration': 56.823,
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/howstuffworks.py b/youtube_dl/extractor/howstuffworks.py

index 663e6632a194d8ee271a0c031a921d7eed139005..65ba2a48b069bd67d2b3382f2d87bc1160145612 100644 (file)
--- a/youtube_dl/extractor/howstuffworks.py
+++ b/youtube_dl/extractor/howstuffworks.py
@@ -6,6 +6,7 @@ from ..utils import (
      int_or_none,
      js_to_json,
      unescapeHTML,
+    determine_ext,
  )
  
  
@@ -23,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 161,
              },
+            'skip': 'Video broken',
          },
          {
              'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
@@ -39,7 +41,7 @@ class HowStuffWorksIE(InfoExtractor):
              'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm',
              'info_dict': {
                  'id': '440011',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Sword Swallowing #1 by Dan Meyer',
                  'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
                  'display_id': 'sword-swallowing-1-by-dan-meyer',
@@ -63,13 +65,19 @@ class HowStuffWorksIE(InfoExtractor):
          video_id = clip_info['content_id']
          formats = []
          m3u8_url = clip_info.get('m3u8')
-        if m3u8_url:
-            formats += self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', format_id='hls', fatal=True))
+        flv_url = clip_info.get('flv_url')
+        if flv_url:
+            formats.append({
+                'url': flv_url,
+                'format_id': 'flv',
+            })
          for video in clip_info.get('mp4', []):
              formats.append({
                  'url': video['src'],
-                'format_id': video['bitrate'],
-                'vbr': int(video['bitrate'].rstrip('k')),
+                'format_id': 'mp4-%s' % video['bitrate'],
+                'vbr': int_or_none(video['bitrate'].rstrip('k')),
              })
  
          if not formats:
@@ -102,6 +110,6 @@ class HowStuffWorksIE(InfoExtractor):
              'title': unescapeHTML(clip_info['clip_title']),
              'description': unescapeHTML(clip_info.get('caption')),
              'thumbnail': clip_info.get('video_still_url'),
-            'duration': clip_info.get('duration'),
+            'duration': int_or_none(clip_info.get('duration')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/huffpost.py b/youtube_dl/extractor/huffpost.py

index a38eae421a9199b578b3a724d205b13e6367c67a..059073749e67605464b6159b9391f71eb5a6052d 100644 (file)
--- a/youtube_dl/extractor/huffpost.py
+++ b/youtube_dl/extractor/huffpost.py
@@ -4,6 +4,7 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      parse_duration,
      unified_strdate,
  )
@@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
              'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more.  ',
              'duration': 1549,
              'upload_date': '20140124',
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['HTTP Error 404: Not Found'],
      }
  
      def _real_extract(self, url):
@@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
          description = data.get('description')
  
          thumbnails = []
-        for url in data['images'].values():
+        for url in filter(None, data['images'].values()):
              m = re.match('.*-([0-9]+x[0-9]+)\.', url)
              if not m:
                  continue
@@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
                  'resolution': m.group(1),
              })
  
-        formats = [{
-            'format': key,
-            'format_id': key.replace('/', '.'),
-            'ext': 'mp4',
-            'url': url,
-            'vcodec': 'none' if key.startswith('audio/') else None,
-        } for key, url in data.get('sources', {}).get('live', {}).items()]
+        formats = []
+        sources = data.get('sources', {})
+        live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
+        for key, url in live_sources:
+            ext = determine_ext(url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
+            else:
+                formats.append({
+                    'format': key,
+                    'format_id': key.replace('/', '.'),
+                    'ext': 'mp4',
+                    'url': url,
+                    'vcodec': 'none' if key.startswith('audio/') else None,
+                })
  
          if not formats and data.get('fivemin_id'):
              return self.url_result('5min:%s' % data['fivemin_id'])
diff --git a/youtube_dl/extractor/hypem.py b/youtube_dl/extractor/hypem.py

index aa0724a02353840e5f5533a1eedbc7005aa63008..f7c9130540e51a75a83052704d61403b488b25f6 100644 (file)
--- a/youtube_dl/extractor/hypem.py
+++ b/youtube_dl/extractor/hypem.py
@@ -4,17 +4,15 @@ import json
  import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_urlencode
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
  class HypemIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?hypem\.com/track/(?P<id>[^/]+)/'
+    _VALID_URL = r'https?://(?:www\.)?hypem\.com/track/(?P<id>[^/]+)/'
      _TEST = {
          'url': 'http://hypem.com/track/1v6ga/BODYWORK+-+TAME',
          'md5': 'b9cc91b5af8995e9f0c1cee04c575828',
@@ -30,15 +28,12 @@ class HypemIE(InfoExtractor):
          track_id = self._match_id(url)
  
          data = {'ax': 1, 'ts': time.time()}
-        data_encoded = compat_urllib_parse.urlencode(data)
-        complete_url = url + "?" + data_encoded
-        request = compat_urllib_request.Request(complete_url)
+        request = sanitized_Request(url + '?' + compat_urllib_parse_urlencode(data))
          response, urlh = self._download_webpage_handle(
              request, track_id, 'Downloading webpage with the url')
-        cookie = urlh.headers.get('Set-Cookie', '')
  
          html_tracks = self._html_search_regex(
-            r'(?ms)<script type="application/json" id="displayList-data">\s*(.*?)\s*</script>',
+            r'(?ms)<script type="application/json" id="displayList-data">(.+?)</script>',
              response, 'tracks')
          try:
              track_list = json.loads(html_tracks)
@@ -48,15 +43,14 @@ class HypemIE(InfoExtractor):
  
          key = track['key']
          track_id = track['id']
-        artist = track['artist']
          title = track['song']
  
-        serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key)
-        request = compat_urllib_request.Request(
-            serve_url, '', {'Content-Type': 'application/json'})
-        request.add_header('cookie', cookie)
+        request = sanitized_Request(
+            'http://hypem.com/serve/source/%s/%s' % (track_id, key),
+            '', {'Content-Type': 'application/json'})
          song_data = self._download_json(request, track_id, 'Downloading metadata')
-        final_url = song_data["url"]
+        final_url = song_data['url']
+        artist = track.get('artist')
  
          return {
              'id': track_id,
diff --git a/youtube_dl/extractor/iconosquare.py b/youtube_dl/extractor/iconosquare.py

index 70e4c0d4173816e990749759cf2d36fe902904ee..a39f422e985bf4c97ac63b28418317d7288c85ca 100644 (file)
--- a/youtube_dl/extractor/iconosquare.py
+++ b/youtube_dl/extractor/iconosquare.py
@@ -1,7 +1,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..utils import (
+    int_or_none,
+    get_element_by_id,
+    remove_end,
+)
  
  
  class IconosquareIE(InfoExtractor):
@@ -12,7 +16,7 @@ class IconosquareIE(InfoExtractor):
          'info_dict': {
              'id': '522207370455279102_24101272',
              'ext': 'mp4',
-            'title': 'Instagram media by @aguynamedpatrick (Patrick Janelle)',
+            'title': 'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
              'description': 'md5:644406a9ec27457ed7aa7a9ebcd4ce3d',
              'timestamp': 1376471991,
              'upload_date': '20130814',
@@ -29,8 +33,7 @@ class IconosquareIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          media = self._parse_json(
-            self._search_regex(
-                r'window\.media\s*=\s*({.+?});\n', webpage, 'media'),
+            get_element_by_id('mediaJson', webpage),
              video_id)
  
          formats = [{
@@ -41,9 +44,7 @@ class IconosquareIE(InfoExtractor):
          } for format_id, f in media['videos'].items()]
          self._sort_formats(formats)
  
-        title = self._html_search_regex(
-            r'<title>(.+?)(?: *\(Videos?\))? \| (?:Iconosquare|Statigram)</title>',
-            webpage, 'title')
+        title = remove_end(self._og_search_title(webpage), ' - via Iconosquare')
  
          timestamp = int_or_none(media.get('created_time') or media.get('caption', {}).get('created_time'))
          description = media.get('caption', {}).get('text')
@@ -61,6 +62,14 @@ class IconosquareIE(InfoExtractor):
              'height': int_or_none(t.get('height'))
          } for thumbnail_id, t in media.get('images', {}).items()]
  
+        comments = [{
+            'id': comment.get('id'),
+            'text': comment['text'],
+            'timestamp': int_or_none(comment.get('created_time')),
+            'author': comment.get('from', {}).get('full_name'),
+            'author_id': comment.get('from', {}).get('username'),
+        } for comment in media.get('comments', {}).get('data', []) if 'text' in comment]
+
          return {
              'id': video_id,
              'title': title,
@@ -72,4 +81,5 @@ class IconosquareIE(InfoExtractor):
              'comment_count': comment_count,
              'like_count': like_count,
              'formats': formats,
+            'comments': comments,
          }
diff --git a/youtube_dl/extractor/ign.py b/youtube_dl/extractor/ign.py

index bf2d2041b91a261d81e891bffee42966e0e53146..c45c68c1d6ff523ddcbb5144e260bc931fb09873 100644 (file)
--- a/youtube_dl/extractor/ign.py
+++ b/youtube_dl/extractor/ign.py
@@ -3,6 +3,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+)
  
  
  class IGNIE(InfoExtractor):
@@ -11,25 +15,24 @@ class IGNIE(InfoExtractor):
      Some videos of it.ign.com are also supported
      """
  
-    _VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
+    _VALID_URL = r'https?://.+?\.ign\.com/(?:[^/]+/)?(?P<type>videos|show_videos|articles|feature|(?:[^/]+/\d+/video))(/.+)?/(?P<name_or_id>.+)'
      IE_NAME = 'ign.com'
  
-    _CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
-    _DESCRIPTION_RE = [
-        r'<span class="page-object-description">(.+?)</span>',
-        r'id="my_show_video">.*?<p>(.*?)</p>',
-        r'<meta name="description" content="(.*?)"',
-    ]
+    _API_URL_TEMPLATE = 'http://apis.ign.com/video/v3/videos/%s'
+    _EMBED_RE = r'<iframe[^>]+?["\']((?:https?:)?//.+?\.ign\.com.+?/embed.+?)["\']'
  
      _TESTS = [
          {
              'url': 'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
-            'md5': 'eac8bdc1890980122c3b66f14bdd02e9',
+            'md5': 'febda82c4bafecd2d44b6e1a18a595f8',
              'info_dict': {
                  'id': '8f862beef863986b2785559b9e1aa599',
                  'ext': 'mp4',
                  'title': 'The Last of Us Review',
                  'description': 'md5:c8946d4260a4d43a00d5ae8ed998870c',
+                'timestamp': 1370440800,
+                'upload_date': '20130605',
+                'uploader_id': 'cberidon@ign.com',
              }
          },
          {
@@ -44,6 +47,9 @@ class IGNIE(InfoExtractor):
                          'ext': 'mp4',
                          'title': 'GTA 5 Video Review',
                          'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
+                        'timestamp': 1379339880,
+                        'upload_date': '20130916',
+                        'uploader_id': 'danieljkrupa@gmail.com',
                      },
                  },
                  {
@@ -52,6 +58,9 @@ class IGNIE(InfoExtractor):
                          'ext': 'mp4',
                          'title': '26 Twisted Moments from GTA 5 in Slow Motion',
                          'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
+                        'timestamp': 1386878820,
+                        'upload_date': '20131212',
+                        'uploader_id': 'togilvie@ign.com',
                      },
                  },
              ],
@@ -66,12 +75,20 @@ class IGNIE(InfoExtractor):
                  'id': '078fdd005f6d3c02f63d795faa1b984f',
                  'ext': 'mp4',
                  'title': 'Rewind Theater - Wild Trailer Gamescom 2014',
-                'description': (
-                    'Giant skeletons, bloody hunts, and captivating'
-                    ' natural beauty take our breath away.'
-                ),
+                'description': 'Brian and Jared explore Michel Ancel\'s captivating new preview.',
+                'timestamp': 1408047180,
+                'upload_date': '20140814',
+                'uploader_id': 'jamesduggan1990@gmail.com',
              },
          },
+        {
+            'url': 'http://me.ign.com/en/videos/112203/video/how-hitman-aims-to-be-different-than-every-other-s',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
+            'only_matching': True,
+        },
      ]
  
      def _find_video_id(self, webpage):
@@ -82,7 +99,7 @@ class IGNIE(InfoExtractor):
              r'<object id="vid_(.+?)"',
              r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
          ]
-        return self._search_regex(res_id, webpage, 'video id')
+        return self._search_regex(res_id, webpage, 'video id', default=None)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -91,7 +108,7 @@ class IGNIE(InfoExtractor):
          webpage = self._download_webpage(url, name_or_id)
          if page_type != 'video':
              multiple_urls = re.findall(
-                '<param name="flashvars"[^>]*value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
+                r'<param name="flashvars"[^>]*value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
                  webpage)
              if multiple_urls:
                  entries = [self.url_result(u, ie='IGN') for u in multiple_urls]
@@ -102,22 +119,51 @@ class IGNIE(InfoExtractor):
                  }
  
          video_id = self._find_video_id(webpage)
-        result = self._get_video_info(video_id)
-        description = self._html_search_regex(self._DESCRIPTION_RE,
-                                              webpage, 'video description', flags=re.DOTALL)
-        result['description'] = description
-        return result
+        if not video_id:
+            return self.url_result(self._search_regex(
+                self._EMBED_RE, webpage, 'embed url'))
+        return self._get_video_info(video_id)
  
      def _get_video_info(self, video_id):
-        config_url = self._CONFIG_URL_TEMPLATE % video_id
-        config = self._download_json(config_url, video_id)
-        media = config['playlist']['media']
+        api_data = self._download_json(
+            self._API_URL_TEMPLATE % video_id, video_id)
+
+        formats = []
+        m3u8_url = api_data['refs'].get('m3uUrl')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False))
+        f4m_url = api_data['refs'].get('f4mUrl')
+        if f4m_url:
+            formats.extend(self._extract_f4m_formats(
+                f4m_url, video_id, f4m_id='hds', fatal=False))
+        for asset in api_data['assets']:
+            formats.append({
+                'url': asset['url'],
+                'tbr': asset.get('actual_bitrate_kbps'),
+                'fps': asset.get('frame_rate'),
+                'height': int_or_none(asset.get('height')),
+                'width': int_or_none(asset.get('width')),
+            })
+        self._sort_formats(formats)
+
+        thumbnails = [{
+            'url': thumbnail['url']
+        } for thumbnail in api_data.get('thumbnails', [])]
+
+        metadata = api_data['metadata']
  
          return {
-            'id': media['metadata']['videoId'],
-            'url': media['url'],
-            'title': media['metadata']['title'],
-            'thumbnail': media['poster'][0]['url'].replace('{size}', 'grande'),
+            'id': api_data.get('videoId') or video_id,
+            'title': metadata.get('longTitle') or metadata.get('name') or metadata.get['title'],
+            'description': metadata.get('description'),
+            'timestamp': parse_iso8601(metadata.get('publishDate')),
+            'duration': int_or_none(metadata.get('duration')),
+            'display_id': metadata.get('slug') or video_id,
+            'uploader_id': metadata.get('creator'),
+            'thumbnails': thumbnails,
+            'formats': formats,
          }
  
  
@@ -125,16 +171,17 @@ class OneUPIE(IGNIE):
      _VALID_URL = r'https?://gamevideos\.1up\.com/(?P<type>video)/id/(?P<name_or_id>.+)\.html'
      IE_NAME = '1up.com'
  
-    _DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
-
      _TESTS = [{
          'url': 'http://gamevideos.1up.com/video/id/34976.html',
-        'md5': '68a54ce4ebc772e4b71e3123d413163d',
+        'md5': 'c9cc69e07acb675c31a16719f909e347',
          'info_dict': {
              'id': '34976',
              'ext': 'mp4',
              'title': 'Sniper Elite V2 - Trailer',
-            'description': 'md5:5d289b722f5a6d940ca3136e9dae89cf',
+            'description': 'md5:bf0516c5ee32a3217aa703e9b1bc7826',
+            'timestamp': 1313099220,
+            'upload_date': '20110811',
+            'uploader_id': 'IGN',
          }
      }]
  
@@ -143,3 +190,36 @@ class OneUPIE(IGNIE):
          result = super(OneUPIE, self)._real_extract(url)
          result['id'] = mobj.group('name_or_id')
          return result
+
+
+class PCMagIE(IGNIE):
+    _VALID_URL = r'https?://(?:www\.)?pcmag\.com/(?P<type>videos|article2)(/.+)?/(?P<name_or_id>.+)'
+    IE_NAME = 'pcmag'
+
+    _EMBED_RE = r'iframe.setAttribute\("src",\s*__util.objToUrlString\("http://widgets\.ign\.com/video/embed/content.html?[^"]*url=([^"]+)["&]'
+
+    _TESTS = [{
+        'url': 'http://www.pcmag.com/videos/2015/01/06/010615-whats-new-now-is-gogo-snooping-on-your-data',
+        'md5': '212d6154fd0361a2781075f1febbe9ad',
+        'info_dict': {
+            'id': 'ee10d774b508c9b8ec07e763b9125b91',
+            'ext': 'mp4',
+            'title': '010615_What\'s New Now: Is GoGo Snooping on Your Data?',
+            'description': 'md5:a7071ae64d2f68cc821c729d4ded6bb3',
+            'timestamp': 1420571160,
+            'upload_date': '20150106',
+            'uploader_id': 'cozzipix@gmail.com',
+        }
+    }, {
+        'url': 'http://www.pcmag.com/article2/0,2817,2470156,00.asp',
+        'md5': '94130c1ca07ba0adb6088350681f16c1',
+        'info_dict': {
+            'id': '042e560ba94823d43afcb12ddf7142ca',
+            'ext': 'mp4',
+            'title': 'HTC\'s Weird New Re Camera - What\'s New Now',
+            'description': 'md5:53433c45df96d2ea5d0fda18be2ca908',
+            'timestamp': 1412953920,
+            'upload_date': '20141010',
+            'uploader_id': 'chris_snyder@pcmag.com',
+        }
+    }]
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index 4bb574cf37df2421721b088ded37a3fc66c8c2ea..8bed8ccd06e2eeb64eba69f3407c9271c0643731 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -4,15 +4,15 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
+from ..utils import (
+    qualities,
  )
  
  
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'http://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@@ -30,24 +30,33 @@ class ImdbIE(InfoExtractor):
          descr = self._html_search_regex(
              r'(?s)<span itemprop="description">(.*?)</span>',
              webpage, 'description', fatal=False)
-        available_formats = re.findall(
-            r'case \'(?P<f_id>.*?)\' :$\s+url = \'(?P<path>.*?)\'', webpage,
-            flags=re.MULTILINE)
+        player_url = 'http://www.imdb.com/video/imdb/vi%s/imdb/single' % video_id
+        player_page = self._download_webpage(
+            player_url, video_id, 'Downloading player page')
+        # the player page contains the info for the default format, we have to
+        # fetch other pages for the rest of the formats
+        extra_formats = re.findall(r'href="(?P<url>%s.*?)".*?>(?P<name>.*?)<' % re.escape(player_url), player_page)
+        format_pages = [
+            self._download_webpage(
+                f_url, video_id, 'Downloading info for %s format' % f_name)
+            for f_url, f_name in extra_formats]
+        format_pages.append(player_page)
+
+        quality = qualities(('SD', '480p', '720p', '1080p'))
          formats = []
-        for f_id, f_path in available_formats:
-            f_path = f_path.strip()
-            format_page = self._download_webpage(
-                compat_urlparse.urljoin(url, f_path),
-                'Downloading info for %s format' % f_id)
+        for format_page in format_pages:
              json_data = self._search_regex(
                  r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
                  format_page, 'json data', flags=re.DOTALL)
              info = json.loads(json_data)
              format_info = info['videoPlayerObject']['video']
+            f_id = format_info['ffname']
              formats.append({
                  'format_id': f_id,
                  'url': format_info['videoInfoList'][0]['videoUrl'],
+                'quality': quality(f_id),
              })
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
@@ -61,7 +70,7 @@ class ImdbIE(InfoExtractor):
  class ImdbListIE(InfoExtractor):
      IE_NAME = 'imdb:list'
      IE_DESC = 'Internet Movie Database lists'
-    _VALID_URL = r'http://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
+    _VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
      _TEST = {
          'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
          'info_dict': {
diff --git a/youtube_dl/extractor/imgur.py b/youtube_dl/extractor/imgur.py

index d692ea79ab493174038c9649445e6a592a86687c..85e9344aab18e22be204134ca846318367b54329 100644 (file)
--- a/youtube_dl/extractor/imgur.py
+++ b/youtube_dl/extractor/imgur.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  
  class ImgurIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?P<id>[a-zA-Z0-9]+)'
+    _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$'
  
      _TESTS = [{
          'url': 'https://i.imgur.com/A61SaA1.gifv',
@@ -21,7 +21,7 @@ class ImgurIE(InfoExtractor):
              'id': 'A61SaA1',
              'ext': 'mp4',
              'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
-            'description': 're:The origin of the Internet\'s most viral images$|The Internet\'s visual storytelling community\. Explore, share, and discuss the best visual stories the Internet has to offer\.$',
+            'description': 'Imgur: The most awesome images on the Internet.',
          },
      }, {
          'url': 'https://imgur.com/A61SaA1',
@@ -29,8 +29,20 @@ class ImgurIE(InfoExtractor):
              'id': 'A61SaA1',
              'ext': 'mp4',
              'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
-            'description': 're:The origin of the Internet\'s most viral images$|The Internet\'s visual storytelling community\. Explore, share, and discuss the best visual stories the Internet has to offer\.$',
+            'description': 'Imgur: The most awesome images on the Internet.',
          },
+    }, {
+        'url': 'https://imgur.com/gallery/YcAQlkx',
+        'info_dict': {
+            'id': 'YcAQlkx',
+            'ext': 'mp4',
+            'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
+            'description': 'Imgur: The most awesome images on the Internet.'
+
+        }
+    }, {
+        'url': 'http://imgur.com/topic/Funny/N8rOudd',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -97,3 +109,41 @@ class ImgurIE(InfoExtractor):
              'description': self._og_search_description(webpage),
              'title': self._og_search_title(webpage),
          }
+
+
+class ImgurAlbumIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:a|gallery|topic/[^/]+)/)?(?P<id>[a-zA-Z0-9]{5})(?:[/?#&]+)?$'
+
+    _TESTS = [{
+        'url': 'http://imgur.com/gallery/Q95ko',
+        'info_dict': {
+            'id': 'Q95ko',
+        },
+        'playlist_count': 25,
+    }, {
+        'url': 'http://imgur.com/a/j6Orj',
+        'only_matching': True,
+    }, {
+        'url': 'http://imgur.com/topic/Aww/ll5Vk',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        album_id = self._match_id(url)
+
+        album_images = self._download_json(
+            'http://imgur.com/gallery/%s/album_images/hit.json?all=true' % album_id,
+            album_id, fatal=False)
+
+        if album_images:
+            data = album_images.get('data')
+            if data and isinstance(data, dict):
+                images = data.get('images')
+                if images and isinstance(images, list):
+                    entries = [
+                        self.url_result('http://imgur.com/%s' % image['hash'])
+                        for image in images if image.get('hash')]
+                    return self.playlist_result(entries, album_id)
+
+        # Fallback to single video
+        return self.url_result('http://imgur.com/%s' % album_id, ImgurIE.ie_key())
diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py

new file mode 100644 (file)

index 0000000..9622f19
--- /dev/null
+++ b/youtube_dl/extractor/indavideo.py
@@ -0,0 +1,142 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+    parse_iso8601,
+)
+
+
+class IndavideoEmbedIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)'
+    _TESTS = [{
+        'url': 'http://indavideo.hu/player/video/1bdc3c6d80/',
+        'md5': 'f79b009c66194acacd40712a6778acfa',
+        'info_dict': {
+            'id': '1837039',
+            'ext': 'mp4',
+            'title': 'Cicatánc',
+            'description': '',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'cukiajanlo',
+            'uploader_id': '83729',
+            'timestamp': 1439193826,
+            'upload_date': '20150810',
+            'duration': 72,
+            'age_limit': 0,
+            'tags': ['tánc', 'cica', 'cuki', 'cukiajanlo', 'newsroom'],
+        },
+    }, {
+        'url': 'http://embed.indavideo.hu/player/video/1bdc3c6d80?autostart=1&hide=1',
+        'only_matching': True,
+    }, {
+        'url': 'http://assets.indavideo.hu/swf/player.swf?v=fe25e500&vID=1bdc3c6d80&autostart=1&hide=1&i=1',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
+            video_id)['data']
+
+        title = video['title']
+
+        video_urls = video.get('video_files', [])
+        video_file = video.get('video_file')
+        if video:
+            video_urls.append(video_file)
+        video_urls = list(set(video_urls))
+
+        video_prefix = video_urls[0].rsplit('/', 1)[0]
+
+        for flv_file in video.get('flv_files', []):
+            flv_url = '%s/%s' % (video_prefix, flv_file)
+            if flv_url not in video_urls:
+                video_urls.append(flv_url)
+
+        formats = [{
+            'url': video_url,
+            'height': self._search_regex(r'\.(\d{3,4})\.mp4$', video_url, 'height', default=None),
+        } for video_url in video_urls]
+        self._sort_formats(formats)
+
+        timestamp = video.get('date')
+        if timestamp:
+            # upload date is in CEST
+            timestamp = parse_iso8601(timestamp + ' +0200', ' ')
+
+        thumbnails = [{
+            'url': self._proto_relative_url(thumbnail)
+        } for thumbnail in video.get('thumbnails', [])]
+
+        tags = [tag['title'] for tag in video.get('tags') or []]
+
+        return {
+            'id': video.get('id') or video_id,
+            'title': title,
+            'description': video.get('description'),
+            'thumbnails': thumbnails,
+            'uploader': video.get('user_name'),
+            'uploader_id': video.get('user_id'),
+            'timestamp': timestamp,
+            'duration': int_or_none(video.get('length')),
+            'age_limit': parse_age_limit(video.get('age_limit')),
+            'tags': tags,
+            'formats': formats,
+        }
+
+
+class IndavideoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?indavideo\.hu/video/(?P<id>[^/#?]+)'
+    _TESTS = [{
+        'url': 'http://indavideo.hu/video/Vicces_cica_1',
+        'md5': '8c82244ba85d2a2310275b318eb51eac',
+        'info_dict': {
+            'id': '1335611',
+            'display_id': 'Vicces_cica_1',
+            'ext': 'mp4',
+            'title': 'Vicces cica',
+            'description': 'Játszik a tablettel. :D',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Jet_Pack',
+            'uploader_id': '491217',
+            'timestamp': 1390821212,
+            'upload_date': '20140127',
+            'duration': 7,
+            'age_limit': 0,
+            'tags': ['vicces', 'macska', 'cica', 'ügyes', 'nevetés', 'játszik', 'Cukiság', 'Jet_Pack'],
+        },
+    }, {
+        'url': 'http://index.indavideo.hu/video/2015_0728_beregszasz',
+        'only_matching': True,
+    }, {
+        'url': 'http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko',
+        'only_matching': True,
+    }, {
+        'url': 'http://erotika.indavideo.hu/video/Amator_tini_punci',
+        'only_matching': True,
+    }, {
+        'url': 'http://film.indavideo.hu/video/f_hrom_nagymamm_volt',
+        'only_matching': True,
+    }, {
+        'url': 'http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+        embed_url = self._search_regex(
+            r'<link[^>]+rel="video_src"[^>]+href="(.+?)"', webpage, 'embed url')
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'IndavideoEmbed',
+            'url': embed_url,
+            'display_id': display_id,
+        }
diff --git a/youtube_dl/extractor/infoq.py b/youtube_dl/extractor/infoq.py

index 71cfd12c56549d0be540c9daee6a2732959039de..cca0b8a9323c0d2412c65610a3acb3ef2943ba6f 100644 (file)
--- a/youtube_dl/extractor/infoq.py
+++ b/youtube_dl/extractor/infoq.py
@@ -1,22 +1,22 @@
+# coding: utf-8
+
  from __future__ import unicode_literals
  
  import base64
  
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urlparse,
-)
+from ..compat import compat_urllib_parse_unquote
+from ..utils import determine_ext
+from .bokecc import BokeCCBaseIE
  
  
-class InfoQIE(InfoExtractor):
+class InfoQIE(BokeCCBaseIE):
      _VALID_URL = r'https?://(?:www\.)?infoq\.com/(?:[^/]+/)+(?P<id>[^/]+)'
  
      _TESTS = [{
          'url': 'http://www.infoq.com/presentations/A-Few-of-My-Favorite-Python-Things',
          'md5': 'b5ca0e0a8c1fed93b0e65e48e462f9a2',
          'info_dict': {
-            'id': '12-jan-pythonthings',
+            'id': 'A-Few-of-My-Favorite-Python-Things',
              'ext': 'mp4',
              'description': 'Mike Pirnat presents some tips and tricks, standard libraries and third party packages that make programming in Python a richer experience.',
              'title': 'A Few of My Favorite [Python] Things',
@@ -24,40 +24,64 @@ class InfoQIE(InfoExtractor):
      }, {
          'url': 'http://www.infoq.com/fr/presentations/changez-avis-sur-javascript',
          'only_matching': True,
+    }, {
+        'url': 'http://www.infoq.com/cn/presentations/openstack-continued-delivery',
+        'md5': '4918d0cca1497f2244572caf626687ef',
+        'info_dict': {
+            'id': 'openstack-continued-delivery',
+            'title': 'OpenStack持续交付之路',
+            'ext': 'flv',
+            'description': 'md5:308d981fb28fa42f49f9568322c683ff',
+        },
      }]
  
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')
-        video_description = self._html_search_meta('description', webpage, 'description')
-
+    def _extract_rtmp_videos(self, webpage):
          # The server URL is hardcoded
          video_url = 'rtmpe://video.infoq.com/cfx/st/'
  
          # Extract video URL
          encoded_id = self._search_regex(
-            r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id')
+            r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None)
+
          real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
          playpath = 'mp4:' + real_id
  
-        video_filename = playpath.split('/')[-1]
-        video_id, extension = video_filename.split('.')
-
-        http_base = self._search_regex(
-            r'EXPRESSINSTALL_SWF\s*=\s*[^"]*"((?:https?:)?//[^/"]+/)', webpage,
-            'HTTP base URL')
-
-        formats = [{
+        return [{
              'format_id': 'rtmp',
              'url': video_url,
-            'ext': extension,
+            'ext': determine_ext(playpath),
              'play_path': playpath,
-        }, {
+        }]
+
+    def _extract_http_videos(self, webpage):
+        http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
+
+        policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
+        signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
+        key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
+
+        return [{
              'format_id': 'http',
-            'url': compat_urlparse.urljoin(url, http_base) + real_id,
+            'url': http_video_url,
+            'http_headers': {
+                'Cookie': 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
+                    policy, signature, key_pair_id),
+            },
          }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        video_title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')
+        video_description = self._html_search_meta('description', webpage, 'description')
+
+        if '/cn/' in url:
+            # for China videos, HTTP video URL exists but always fails with 403
+            formats = self._extract_bokecc_formats(webpage, video_id)
+        else:
+            formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage)
+
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 3d78f78c46d1ad004339bc33ebcb09d1286e5092..3cbe77ad80f2fc9a03c738745524d5dac98c9d37 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -4,14 +4,16 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    get_element_by_attribute,
      int_or_none,
      limit_length,
+    lowercase_escape,
  )
  
  
  class InstagramIE(InfoExtractor):
-    _VALID_URL = r'https://instagram\.com/p/(?P<id>[\da-zA-Z]+)'
-    _TEST = {
+    _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
+    _TESTS = [{
          'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
          'md5': '0d2da106a9d2631273e192b372806516',
          'info_dict': {
@@ -21,16 +23,56 @@ class InstagramIE(InfoExtractor):
              'title': 'Video by naomipq',
              'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
          }
-    }
+    }, {
+        # missing description
+        'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
+        'info_dict': {
+            'id': 'BA-pQFBG8HZ',
+            'ext': 'mp4',
+            'uploader_id': 'britneyspears',
+            'title': 'Video by britneyspears',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://instagram.com/p/-Cmh1cukG2/',
+        'only_matching': True,
+    }, {
+        'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_embed_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+        blockquote_el = get_element_by_attribute(
+            'class', 'instagram-media', webpage)
+        if blockquote_el is None:
+            return
+
+        mobj = re.search(
+            r'<a[^>]+href=([\'"])(?P<link>[^\'"]+)\1', blockquote_el)
+        if mobj:
+            return mobj.group('link')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        url = mobj.group('url')
  
          webpage = self._download_webpage(url, video_id)
          uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
                                           webpage, 'uploader id', fatal=False)
-        desc = self._search_regex(r'"caption":"(.*?)"', webpage, 'description',
-                                  fatal=False)
+        desc = self._search_regex(
+            r'"caption":"(.+?)"', webpage, 'description', default=None)
+        if desc is not None:
+            desc = lowercase_escape(desc)
  
          return {
              'id': video_id,
@@ -44,7 +86,7 @@ class InstagramIE(InfoExtractor):
  
  
  class InstagramUserIE(InfoExtractor):
-    _VALID_URL = r'https://instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])'
+    _VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])'
      IE_DESC = 'Instagram user profile'
      IE_NAME = 'instagram:user'
      _TEST = {
@@ -121,7 +163,7 @@ class InstagramUserIE(InfoExtractor):
  
              if not page['items']:
                  break
-            max_id = page['items'][-1]['id']
+            max_id = page['items'][-1]['id'].split('_')[0]
              media_url = (
                  'http://instagram.com/%s/media?max_id=%s' % (
                      uploader_id, max_id))
diff --git a/youtube_dl/extractor/internetvideoarchive.py b/youtube_dl/extractor/internetvideoarchive.py

index 483cc6f9e62da3bc272ba66efc540b95c17116e7..45add007fd99c8bd80f16c1becfb42cf403d45d5 100644 (file)
--- a/youtube_dl/extractor/internetvideoarchive.py
+++ b/youtube_dl/extractor/internetvideoarchive.py
@@ -1,93 +1,91 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..compat import (
+    compat_parse_qs,
      compat_urlparse,
-    compat_urllib_parse,
  )
  from ..utils import (
-    xpath_with_ns,
+    determine_ext,
+    int_or_none,
+    xpath_text,
  )
  
  
  class InternetVideoArchiveIE(InfoExtractor):
-    _VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
+    _VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
  
      _TEST = {
-        'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
+        'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
          'info_dict': {
-            'id': '452693',
+            'id': '194487',
              'ext': 'mp4',
-            'title': 'SKYFALL',
-            'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
-            'duration': 152,
+            'title': 'KICK-ASS 2',
+            'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
      }
  
      @staticmethod
-    def _build_url(query):
-        return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
+    def _build_json_url(query):
+        return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
  
      @staticmethod
-    def _clean_query(query):
-        NEEDED_ARGS = ['publishedid', 'customerid']
-        query_dic = compat_urlparse.parse_qs(query)
-        cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
-        # Other player ids return m3u8 urls
-        cleaned_dic['playerid'] = '247'
-        cleaned_dic['videokbrate'] = '100000'
-        return compat_urllib_parse.urlencode(cleaned_dic)
+    def _build_xml_url(query):
+        return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
  
      def _real_extract(self, url):
          query = compat_urlparse.urlparse(url).query
-        query_dic = compat_urlparse.parse_qs(query)
+        query_dic = compat_parse_qs(query)
          video_id = query_dic['publishedid'][0]
-        url = self._build_url(query)
  
-        flashconfiguration = self._download_xml(url, video_id,
-                                                'Downloading flash configuration')
-        file_url = flashconfiguration.find('file').text
-        file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
-        # Replace some of the parameters in the query to get the best quality
-        # and http links (no m3u8 manifests)
-        file_url = re.sub(r'(?<=\?)(.+)$',
-                          lambda m: self._clean_query(m.group()),
-                          file_url)
-        info = self._download_xml(file_url, video_id,
-                                  'Downloading video info')
-        item = info.find('channel/item')
+        if '/player/' in url:
+            configuration = self._download_json(url, video_id)
+
+            # There are multiple videos in the playlist whlie only the first one
+            # matches the video played in browsers
+            video_info = configuration['playlist'][0]
+
+            formats = []
+            for source in video_info['sources']:
+                file_url = source['file']
+                if determine_ext(file_url) == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        file_url, video_id, ext='mp4', m3u8_id='hls'))
+                else:
+                    a_format = {
+                        'url': file_url,
+                    }
+
+                    if source.get('label') and source['label'][-4:] == ' kbs':
+                        tbr = int_or_none(source['label'][:-4])
+                        a_format.update({
+                            'tbr': tbr,
+                            'format_id': 'http-%d' % tbr,
+                        })
+                        formats.append(a_format)
  
-        def _bp(p):
-            return xpath_with_ns(
-                p,
-                {
-                    'media': 'http://search.yahoo.com/mrss/',
-                    'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
-                }
-            )
-        formats = []
-        for content in item.findall(_bp('media:group/media:content')):
-            attr = content.attrib
-            f_url = attr['url']
-            width = int(attr['width'])
-            bitrate = int(attr['bitrate'])
-            format_id = '%d-%dk' % (width, bitrate)
-            formats.append({
-                'format_id': format_id,
-                'url': f_url,
-                'width': width,
-                'tbr': bitrate,
-            })
+            self._sort_formats(formats)
  
-        self._sort_formats(formats)
+            title = video_info['title']
+            description = video_info.get('description')
+            thumbnail = video_info.get('image')
+        else:
+            configuration = self._download_xml(url, video_id)
+            formats = [{
+                'url': xpath_text(configuration, './file', 'file URL', fatal=True),
+            }]
+            thumbnail = xpath_text(configuration, './image', 'thumbnail')
+            title = 'InternetVideoArchive video %s' % video_id
+            description = None
  
          return {
              'id': video_id,
-            'title': item.find('title').text,
+            'title': title,
              'formats': formats,
-            'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
-            'description': item.find('description').text,
-            'duration': int(attr['duration']),
+            'thumbnail': thumbnail,
+            'description': description,
          }
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index 821c8ec109236b787b9afa2985e450ff8a647595..788bbe0d5c44177b5a943da9f9c3c3adf46a77b1 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -1,113 +1,92 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
-from random import random
-from math import floor
+import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
-    ExtractorError,
-    remove_end,
+    determine_ext,
+    js_to_json,
+    sanitized_Request,
  )
  
  
  class IPrimaIE(InfoExtractor):
-    _VALID_URL = r'https?://play\.iprima\.cz/(?:[^/]+/)*(?P<id>[^?#]+)'
+    _VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
  
      _TESTS = [{
-        'url': 'http://play.iprima.cz/particka/particka-92',
-        'info_dict': {
-            'id': '39152',
-            'ext': 'flv',
-            'title': 'Partička (92)',
-            'description': 'md5:74e9617e51bca67c3ecfb2c6f9766f45',
-            'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg',
-        },
-        'params': {
-            'skip_download': True,  # requires rtmpdump
-        },
-    }, {
-        'url': 'http://play.iprima.cz/particka/tchibo-particka-jarni-moda',
+        'url': 'http://play.iprima.cz/gondici-s-r-o-33',
          'info_dict': {
-            'id': '9718337',
-            'ext': 'flv',
-            'title': 'Tchibo Partička - Jarní móda',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'id': 'p136534',
+            'ext': 'mp4',
+            'title': 'Gondíci s. r. o. (34)',
+            'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
          },
          'params': {
-            'skip_download': True,  # requires rtmpdump
+            'skip_download': True,  # m3u8 download
          },
      }, {
-        'url': 'http://play.iprima.cz/zpravy-ftv-prima-2752015',
+        'url': 'http://play.iprima.cz/particka/particka-92',
          'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
  
-        if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
-            raise ExtractorError(
-                '%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
-
-        player_url = (
-            'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
-            (floor(random() * 1073741824), floor(random() * 1073741824))
-        )
+        video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
  
-        req = compat_urllib_request.Request(player_url)
+        req = sanitized_Request(
+            'http://play.iprima.cz/prehravac/init?_infuse=1'
+            '&_ts=%s&productId=%s' % (round(time.time()), video_id))
          req.add_header('Referer', url)
-        playerpage = self._download_webpage(req, video_id)
-
-        base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1])
-
-        zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO')
-        if zoneGEO != '0':
-            base_url = base_url.replace('token', 'token_' + zoneGEO)
+        playerpage = self._download_webpage(req, video_id, note='Downloading player')
  
          formats = []
-        for format_id in ['lq', 'hq', 'hd']:
-            filename = self._html_search_regex(
-                r'"%s_id":(.+?),' % format_id, webpage, 'filename')
-
-            if filename == 'null':
-                continue
-
-            real_id = self._search_regex(
-                r'Prima-(?:[0-9]{10}|WEB)-([0-9]+)[-_]',
-                filename, 'real video id')
-
-            if format_id == 'lq':
-                quality = 0
-            elif format_id == 'hq':
-                quality = 1
-            elif format_id == 'hd':
-                quality = 2
-                filename = 'hq/' + filename
  
-            formats.append({
-                'format_id': format_id,
-                'url': base_url,
-                'quality': quality,
-                'play_path': 'mp4:' + filename.replace('"', '')[:-4],
-                'rtmp_live': True,
-                'ext': 'flv',
-            })
+        def extract_formats(format_url, format_key=None, lang=None):
+            ext = determine_ext(format_url)
+            new_formats = []
+            if format_key == 'hls' or ext == 'm3u8':
+                new_formats = self._extract_m3u8_formats(
+                    format_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False)
+            elif format_key == 'dash' or ext == 'mpd':
+                return
+                new_formats = self._extract_mpd_formats(
+                    format_url, video_id, mpd_id='dash', fatal=False)
+            if lang:
+                for f in new_formats:
+                    if not f.get('language'):
+                        f['language'] = lang
+            formats.extend(new_formats)
+
+        options = self._parse_json(
+            self._search_regex(
+                r'(?s)var\s+playerOptions\s*=\s*({.+?});',
+                playerpage, 'player options', default='{}'),
+            video_id, transform_source=js_to_json, fatal=False)
+        if options:
+            for key, tracks in options.get('tracks', {}).items():
+                if not isinstance(tracks, list):
+                    continue
+                for track in tracks:
+                    src = track.get('src')
+                    if src:
+                        extract_formats(src, key.lower(), track.get('lang'))
+
+        if not formats:
+            for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage):
+                extract_formats(src)
  
          self._sort_formats(formats)
  
          return {
-            'id': real_id,
-            'title': remove_end(self._og_search_title(webpage), ' | Prima PLAY'),
+            'id': video_id,
+            'title': self._og_search_title(webpage),
              'thumbnail': self._og_search_thumbnail(webpage),
              'formats': formats,
-            'description': self._search_regex(
-                r'<p[^>]+itemprop="description"[^>]*>([^<]+)',
-                webpage, 'description', default=None),
+            'description': self._og_search_description(webpage),
          }
diff --git a/youtube_dl/extractor/iqiyi.py b/youtube_dl/extractor/iqiyi.py

index afb7f4e6153ac84795503dabf049de5ed5ecf5bb..ffb8008ce29c81363c58e8b7af135b4d096835e8 100644 (file)
--- a/youtube_dl/extractor/iqiyi.py
+++ b/youtube_dl/extractor/iqiyi.py
@@ -2,21 +2,172 @@
  from __future__ import unicode_literals
  
  import hashlib
+import itertools
  import math
+import os
  import random
+import re
  import time
  import uuid
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse
-from ..utils import ExtractorError
+from ..compat import (
+    compat_parse_qs,
+    compat_str,
+    compat_urllib_parse_urlencode,
+    compat_urllib_parse_urlparse,
+)
+from ..utils import (
+    decode_packed_codes,
+    ExtractorError,
+    ohdave_rsa_encrypt,
+    remove_start,
+    sanitized_Request,
+    urlencode_postdata,
+    url_basename,
+)
+
+
+def md5_text(text):
+    return hashlib.md5(text.encode('utf-8')).hexdigest()
+
+
+class IqiyiSDK(object):
+    def __init__(self, target, ip, timestamp):
+        self.target = target
+        self.ip = ip
+        self.timestamp = timestamp
+
+    @staticmethod
+    def split_sum(data):
+        return compat_str(sum(map(lambda p: int(p, 16), list(data))))
+
+    @staticmethod
+    def digit_sum(num):
+        if isinstance(num, int):
+            num = compat_str(num)
+        return compat_str(sum(map(int, num)))
+
+    def even_odd(self):
+        even = self.digit_sum(compat_str(self.timestamp)[::2])
+        odd = self.digit_sum(compat_str(self.timestamp)[1::2])
+        return even, odd
+
+    def preprocess(self, chunksize):
+        self.target = md5_text(self.target)
+        chunks = []
+        for i in range(32 // chunksize):
+            chunks.append(self.target[chunksize * i:chunksize * (i + 1)])
+        if 32 % chunksize:
+            chunks.append(self.target[32 - 32 % chunksize:])
+        return chunks, list(map(int, self.ip.split('.')))
+
+    def mod(self, modulus):
+        chunks, ip = self.preprocess(32)
+        self.target = chunks[0] + ''.join(map(lambda p: compat_str(p % modulus), ip))
+
+    def split(self, chunksize):
+        modulus_map = {
+            4: 256,
+            5: 10,
+            8: 100,
+        }
+
+        chunks, ip = self.preprocess(chunksize)
+        ret = ''
+        for i in range(len(chunks)):
+            ip_part = compat_str(ip[i] % modulus_map[chunksize]) if i < 4 else ''
+            if chunksize == 8:
+                ret += ip_part + chunks[i]
+            else:
+                ret += chunks[i] + ip_part
+        self.target = ret
+
+    def handle_input16(self):
+        self.target = md5_text(self.target)
+        self.target = self.split_sum(self.target[:16]) + self.target + self.split_sum(self.target[16:])
+
+    def handle_input8(self):
+        self.target = md5_text(self.target)
+        ret = ''
+        for i in range(4):
+            part = self.target[8 * i:8 * (i + 1)]
+            ret += self.split_sum(part) + part
+        self.target = ret
+
+    def handleSum(self):
+        self.target = md5_text(self.target)
+        self.target = self.split_sum(self.target) + self.target
+
+    def date(self, scheme):
+        self.target = md5_text(self.target)
+        d = time.localtime(self.timestamp)
+        strings = {
+            'y': compat_str(d.tm_year),
+            'm': '%02d' % d.tm_mon,
+            'd': '%02d' % d.tm_mday,
+        }
+        self.target += ''.join(map(lambda c: strings[c], list(scheme)))
+
+    def split_time_even_odd(self):
+        even, odd = self.even_odd()
+        self.target = odd + md5_text(self.target) + even
+
+    def split_time_odd_even(self):
+        even, odd = self.even_odd()
+        self.target = even + md5_text(self.target) + odd
+
+    def split_ip_time_sum(self):
+        chunks, ip = self.preprocess(32)
+        self.target = compat_str(sum(ip)) + chunks[0] + self.digit_sum(self.timestamp)
+
+    def split_time_ip_sum(self):
+        chunks, ip = self.preprocess(32)
+        self.target = self.digit_sum(self.timestamp) + chunks[0] + compat_str(sum(ip))
+
+
+class IqiyiSDKInterpreter(object):
+    def __init__(self, sdk_code):
+        self.sdk_code = sdk_code
+
+    def run(self, target, ip, timestamp):
+        self.sdk_code = decode_packed_codes(self.sdk_code)
+
+        functions = re.findall(r'input=([a-zA-Z0-9]+)\(input', self.sdk_code)
+
+        sdk = IqiyiSDK(target, ip, timestamp)
+
+        other_functions = {
+            'handleSum': sdk.handleSum,
+            'handleInput8': sdk.handle_input8,
+            'handleInput16': sdk.handle_input16,
+            'splitTimeEvenOdd': sdk.split_time_even_odd,
+            'splitTimeOddEven': sdk.split_time_odd_even,
+            'splitIpTimeSum': sdk.split_ip_time_sum,
+            'splitTimeIpSum': sdk.split_time_ip_sum,
+        }
+        for function in functions:
+            if re.match(r'mod\d+', function):
+                sdk.mod(int(function[3:]))
+            elif re.match(r'date[ymd]{3}', function):
+                sdk.date(function[4:])
+            elif re.match(r'split\d+', function):
+                sdk.split(int(function[5:]))
+            elif function in other_functions:
+                other_functions[function]()
+            else:
+                raise ExtractorError('Unknown funcion %s' % function)
+
+        return sdk.target
  
  
  class IqiyiIE(InfoExtractor):
      IE_NAME = 'iqiyi'
      IE_DESC = '爱奇艺'
  
-    _VALID_URL = r'http://(?:www\.)iqiyi.com/v_.+?\.html'
+    _VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
+
+    _NETRC_MACHINE = 'iqiyi'
  
      _TESTS = [{
          'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
@@ -84,6 +235,47 @@ class IqiyiIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.iqiyi.com/w_19rt6o8t9p.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.iqiyi.com/a_19rrhbc6kt.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://yule.iqiyi.com/pcb.html',
+        'only_matching': True,
+    }, {
+        # VIP-only video. The first 2 parts (6 minutes) are available without login
+        # MD5 sums omitted as values are different on Travis CI and my machine
+        'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
+        'info_dict': {
+            'id': 'f3cf468b39dddb30d676f89a91200dc1',
+            'title': '泰坦尼克号',
+        },
+        'playlist': [{
+            'info_dict': {
+                'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
+                'ext': 'f4v',
+                'title': '泰坦尼克号',
+            },
+        }, {
+            'info_dict': {
+                'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
+                'ext': 'f4v',
+                'title': '泰坦尼克号',
+            },
+        }],
+        'expected_warnings': ['Needs a VIP account for full video'],
+    }, {
+        'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
+        'info_dict': {
+            'id': '202918101',
+            'title': '灌篮高手 国语版',
+        },
+        'playlist_count': 101,
+    }, {
+        'url': 'http://www.pps.tv/w_19rrbav0ph.html',
+        'only_matching': True,
      }]
  
      _FORMATS_MAP = [
@@ -95,7 +287,112 @@ class IqiyiIE(InfoExtractor):
          ('10', 'h1'),
      ]
  
-    def construct_video_urls(self, data, video_id, _uuid):
+    AUTH_API_ERRORS = {
+        # No preview available (不允许试看鉴权失败)
+        'Q00505': 'This video requires a VIP account',
+        # End of preview time (试看结束鉴权失败)
+        'Q00506': 'Needs a VIP account for full video',
+    }
+
+    def _real_initialize(self):
+        self._login()
+
+    @staticmethod
+    def _rsa_fun(data):
+        # public key extracted from http://static.iqiyi.com/js/qiyiV2/20160129180840/jobs/i18n/i18nIndex.js
+        N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
+        e = 65537
+
+        return ohdave_rsa_encrypt(data, e, N)
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+
+        # No authentication to be performed
+        if not username:
+            return True
+
+        data = self._download_json(
+            'http://kylin.iqiyi.com/get_token', None,
+            note='Get token for logging', errnote='Unable to get token for logging')
+        sdk = data['sdk']
+        timestamp = int(time.time())
+        target = '/apis/reglogin/login.action?lang=zh_TW&area_code=null&email=%s&passwd=%s&agenttype=1&from=undefined&keeplogin=0&piccode=&fromurl=&_pos=1' % (
+            username, self._rsa_fun(password.encode('utf-8')))
+
+        interp = IqiyiSDKInterpreter(sdk)
+        sign = interp.run(target, data['ip'], timestamp)
+
+        validation_params = {
+            'target': target,
+            'server': 'BEA3AA1908656AABCCFF76582C4C6660',
+            'token': data['token'],
+            'bird_src': 'f8d91d57af224da7893dd397d52d811a',
+            'sign': sign,
+            'bird_t': timestamp,
+        }
+        validation_result = self._download_json(
+            'http://kylin.iqiyi.com/validate?' + compat_urllib_parse_urlencode(validation_params), None,
+            note='Validate credentials', errnote='Unable to validate credentials')
+
+        MSG_MAP = {
+            'P00107': 'please login via the web interface and enter the CAPTCHA code',
+            'P00117': 'bad username or password',
+        }
+
+        code = validation_result['code']
+        if code != 'A00000':
+            msg = MSG_MAP.get(code)
+            if not msg:
+                msg = 'error %s' % code
+                if validation_result.get('msg'):
+                    msg += ': ' + validation_result['msg']
+            self._downloader.report_warning('unable to log in: ' + msg)
+            return False
+
+        return True
+
+    def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning):
+        auth_params = {
+            # version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as
+            'version': '2.0',
+            'platform': 'b6c13e26323c537d',
+            'aid': tvid,
+            'tvid': tvid,
+            'uid': '',
+            'deviceId': _uuid,
+            'playType': 'main',  # XXX: always main?
+            'filename': os.path.splitext(url_basename(api_video_url))[0],
+        }
+
+        qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
+        for key, val in qd_items.items():
+            auth_params[key] = val[0]
+
+        auth_req = sanitized_Request(
+            'http://api.vip.iqiyi.com/services/ckn.action',
+            urlencode_postdata(auth_params))
+        # iQiyi server throws HTTP 405 error without the following header
+        auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        auth_result = self._download_json(
+            auth_req, video_id,
+            note='Downloading video authentication JSON',
+            errnote='Unable to download video authentication JSON')
+
+        code = auth_result.get('code')
+        msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
+        if code == 'Q00506':
+            if do_report_warning:
+                self.report_warning(msg)
+            return False
+        if 'data' not in auth_result:
+            if msg is not None:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
+            raise ExtractorError('Unexpected error from Iqiyi auth API')
+
+        return auth_result['data']
+
+    def construct_video_urls(self, data, video_id, _uuid, tvid):
          def do_xor(x, y):
              a = y % 3
              if a == 1:
@@ -121,9 +418,10 @@ class IqiyiIE(InfoExtractor):
                  note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
              )['t']
              t = str(int(math.floor(int(tm) / (600.0))))
-            return hashlib.md5((t + mg + x).encode('utf8')).hexdigest()
+            return md5_text(t + mg + x)
  
          video_urls_dict = {}
+        need_vip_warning_report = True
          for format_item in data['vp']['tkl'][0]['vs']:
              if 0 < int(format_item['bid']) <= 10:
                  format_id = self.get_format(format_item['bid'])
@@ -142,11 +440,13 @@ class IqiyiIE(InfoExtractor):
                  vl = segment['l']
                  if not vl.startswith('/'):
                      vl = get_encode_code(vl)
-                key = get_path_key(
-                    vl.split('/')[-1].split('.')[0], format_id, segment_index)
+                is_vip_video = '/vip/' in vl
                  filesize = segment['b']
                  base_url = data['vp']['du'].split('/')
-                base_url.insert(-1, key)
+                if not is_vip_video:
+                    key = get_path_key(
+                        vl.split('/')[-1].split('.')[0], format_id, segment_index)
+                    base_url.insert(-1, key)
                  base_url = '/'.join(base_url)
                  param = {
                      'su': _uuid,
@@ -157,8 +457,23 @@ class IqiyiIE(InfoExtractor):
                      'ct': '',
                      'tn': str(int(time.time()))
                  }
-                api_video_url = base_url + vl + '?' + \
-                    compat_urllib_parse.urlencode(param)
+                api_video_url = base_url + vl
+                if is_vip_video:
+                    api_video_url = api_video_url.replace('.f4v', '.hml')
+                    auth_result = self._authenticate_vip_video(
+                        api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
+                    if auth_result is False:
+                        need_vip_warning_report = False
+                        break
+                    param.update({
+                        't': auth_result['t'],
+                        # cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
+                        'cid': 'afbe8fd3d73448c9',
+                        'vid': video_id,
+                        'QY00001': auth_result['u'],
+                    })
+                api_video_url += '?' if '?' not in api_video_url else '&'
+                api_video_url += compat_urllib_parse_urlencode(param)
                  js = self._download_json(
                      api_video_url, video_id,
                      note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
@@ -179,59 +494,96 @@ class IqiyiIE(InfoExtractor):
  
      def get_raw_data(self, tvid, video_id, enc_key, _uuid):
          tm = str(int(time.time()))
+        tail = tm + tvid
          param = {
              'key': 'fvip',
-            'src': hashlib.md5(b'youtube-dl').hexdigest(),
+            'src': md5_text('youtube-dl'),
              'tvId': tvid,
              'vid': video_id,
              'vinfo': 1,
              'tm': tm,
-            'enc': hashlib.md5(
-                (enc_key + tm + tvid).encode('utf8')).hexdigest(),
+            'enc': md5_text(enc_key + tail),
              'qyid': _uuid,
              'tn': random.random(),
              'um': 0,
-            'authkey': hashlib.md5(
-                (tm + tvid).encode('utf8')).hexdigest()
+            'authkey': md5_text(md5_text('') + tail),
+            'k_tag': 1,
          }
  
          api_url = 'http://cache.video.qiyi.com/vms' + '?' + \
-            compat_urllib_parse.urlencode(param)
+            compat_urllib_parse_urlencode(param)
          raw_data = self._download_json(api_url, video_id)
          return raw_data
  
-    def get_enc_key(self, swf_url, video_id):
-        enc_key = '8e29ab5666d041c3a1ea76e06dabdffb'
+    def get_enc_key(self, video_id):
+        # TODO: automatic key extraction
+        # last update at 2016-01-22 for Zombie::bite
+        enc_key = '4a1caba4b4465345366f28da7c117d20'
          return enc_key
  
+    def _extract_playlist(self, webpage):
+        PAGE_SIZE = 50
+
+        links = re.findall(
+            r'<a[^>]+class="site-piclist_pic_link"[^>]+href="(http://www\.iqiyi\.com/.+\.html)"',
+            webpage)
+        if not links:
+            return
+
+        album_id = self._search_regex(
+            r'albumId\s*:\s*(\d+),', webpage, 'album ID')
+        album_title = self._search_regex(
+            r'data-share-title="([^"]+)"', webpage, 'album title', fatal=False)
+
+        entries = list(map(self.url_result, links))
+
+        # Start from 2 because links in the first page are already on webpage
+        for page_num in itertools.count(2):
+            pagelist_page = self._download_webpage(
+                'http://cache.video.qiyi.com/jp/avlist/%s/%d/%d/' % (album_id, page_num, PAGE_SIZE),
+                album_id,
+                note='Download playlist page %d' % page_num,
+                errnote='Failed to download playlist page %d' % page_num)
+            pagelist = self._parse_json(
+                remove_start(pagelist_page, 'var tvInfoJs='), album_id)
+            vlist = pagelist['data']['vlist']
+            for item in vlist:
+                entries.append(self.url_result(item['vurl']))
+            if len(vlist) < PAGE_SIZE:
+                break
+
+        return self.playlist_result(entries, album_id, album_title)
+
      def _real_extract(self, url):
          webpage = self._download_webpage(
              url, 'temp_id', note='download video page')
+
+        # There's no simple way to determine whether an URL is a playlist or not
+        # So detect it
+        playlist_result = self._extract_playlist(webpage)
+        if playlist_result:
+            return playlist_result
+
          tvid = self._search_regex(
              r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
          video_id = self._search_regex(
              r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
-        swf_url = self._search_regex(
-            r'(http://[^\'"]+MainPlayer[^.]+\.swf)', webpage, 'swf player URL')
          _uuid = uuid.uuid4().hex
  
-        enc_key = self.get_enc_key(swf_url, video_id)
+        enc_key = self.get_enc_key(video_id)
  
          raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid)
  
          if raw_data['code'] != 'A000000':
              raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
  
-        if not raw_data['data']['vp']['tkl']:
-            raise ExtractorError('No support iQiqy VIP video')
-
          data = raw_data['data']
  
          title = data['vi']['vn']
  
          # generate video_urls_dict
          video_urls_dict = self.construct_video_urls(
-            data, video_id, _uuid)
+            data, video_id, _uuid, tvid)
  
          # construct info
          entries = []
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index e825944443392153d8a0d456f5ad5dc80b2c9f77..472d72b4c34fa3305b6b2808be1e45c6da25a60e 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -5,11 +5,10 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    int_or_none,
+    sanitized_Request,
  )
  
  
@@ -29,44 +28,36 @@ class IviIE(InfoExtractor):
                  'title': 'Иван Васильевич меняет профессию',
                  'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
                  'duration': 5498,
-                'thumbnail': 'http://thumbs.ivi.ru/f20.vcp.digitalaccess.ru/contents/d/1/c3c885163a082c29bceeb7b5a267a6.jpg',
+                'thumbnail': 're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          },
-        # Serial's serie
+        # Serial's series
          {
              'url': 'http://www.ivi.ru/watch/dvoe_iz_lartsa/9549',
              'md5': '221f56b35e3ed815fde2df71032f4b3e',
              'info_dict': {
                  'id': '9549',
                  'ext': 'mp4',
-                'title': 'Двое из ларца - Серия 1',
+                'title': 'Двое из ларца - Дело Гольдберга (1 часть)',
+                'series': 'Двое из ларца',
+                'season': 'Сезон 1',
+                'season_number': 1,
+                'episode': 'Дело Гольдберга (1 часть)',
+                'episode_number': 1,
                  'duration': 2655,
-                'thumbnail': 'http://thumbs.ivi.ru/f15.vcp.digitalaccess.ru/contents/8/4/0068dc0677041f3336b7c2baad8fc0.jpg',
+                'thumbnail': 're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          }
      ]
  
      # Sorted by quality
-    _known_formats = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
-
-    # Sorted by size
-    _known_thumbnails = ['Thumb-120x90', 'Thumb-160', 'Thumb-640x480']
-
-    def _extract_description(self, html):
-        m = re.search(r'<meta name="description" content="(?P<description>[^"]+)"/>', html)
-        return m.group('description') if m is not None else None
-
-    def _extract_comment_count(self, html):
-        m = re.search('(?s)<a href="#" id="view-comments" class="action-button dim gradient">\s*Комментарии:\s*(?P<commentcount>\d+)\s*</a>', html)
-        return int(m.group('commentcount')) if m is not None else 0
+    _KNOWN_FORMATS = ['MP4-low-mobile', 'MP4-mobile', 'FLV-lo', 'MP4-lo', 'FLV-hi', 'MP4-hi', 'MP4-SHQ']
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        api_url = 'http://api.digitalaccess.ru/api/json/'
-
          data = {
              'method': 'da.content.get',
              'params': [
@@ -78,11 +69,10 @@ class IviIE(InfoExtractor):
              ]
          }
  
-        request = compat_urllib_request.Request(api_url, json.dumps(data))
-
-        video_json_page = self._download_webpage(
+        request = sanitized_Request(
+            'http://api.digitalaccess.ru/api/json/', json.dumps(data))
+        video_json = self._download_json(
              request, video_id, 'Downloading video JSON')
-        video_json = json.loads(video_json_page)
  
          if 'error' in video_json:
              error = video_json['error']
@@ -97,35 +87,51 @@ class IviIE(InfoExtractor):
          formats = [{
              'url': x['url'],
              'format_id': x['content_format'],
-            'preference': self._known_formats.index(x['content_format']),
-        } for x in result['files'] if x['content_format'] in self._known_formats]
+            'preference': self._KNOWN_FORMATS.index(x['content_format']),
+        } for x in result['files'] if x['content_format'] in self._KNOWN_FORMATS]
  
          self._sort_formats(formats)
  
-        if not formats:
-            raise ExtractorError('No media links available for %s' % video_id)
-
-        duration = result['duration']
-        compilation = result['compilation']
          title = result['title']
  
+        duration = int_or_none(result.get('duration'))
+        compilation = result.get('compilation')
+        episode = title if compilation else None
+
          title = '%s - %s' % (compilation, title) if compilation is not None else title
  
-        previews = result['preview']
-        previews.sort(key=lambda fmt: self._known_thumbnails.index(fmt['content_format']))
-        thumbnail = previews[-1]['url'] if len(previews) > 0 else None
+        thumbnails = [{
+            'url': preview['url'],
+            'id': preview.get('content_format'),
+        } for preview in result.get('preview', []) if preview.get('url')]
+
+        webpage = self._download_webpage(url, video_id)
+
+        season = self._search_regex(
+            r'<li[^>]+class="season active"[^>]*><a[^>]+>([^<]+)',
+            webpage, 'season', default=None)
+        season_number = int_or_none(self._search_regex(
+            r'<li[^>]+class="season active"[^>]*><a[^>]+data-season(?:-index)?="(\d+)"',
+            webpage, 'season number', default=None))
+
+        episode_number = int_or_none(self._search_regex(
+            r'<meta[^>]+itemprop="episode"[^>]*>\s*<meta[^>]+itemprop="episodeNumber"[^>]+content="(\d+)',
+            webpage, 'episode number', default=None))
  
-        video_page = self._download_webpage(url, video_id, 'Downloading video page')
-        description = self._extract_description(video_page)
-        comment_count = self._extract_comment_count(video_page)
+        description = self._og_search_description(webpage, default=None) or self._html_search_meta(
+            'description', webpage, 'description', default=None)
  
          return {
              'id': video_id,
              'title': title,
-            'thumbnail': thumbnail,
+            'series': compilation,
+            'season': season,
+            'season_number': season_number,
+            'episode': episode,
+            'episode_number': episode_number,
+            'thumbnails': thumbnails,
              'description': description,
              'duration': duration,
-            'comment_count': comment_count,
              'formats': formats,
          }
  
@@ -151,8 +157,11 @@ class IviCompilationIE(InfoExtractor):
      }]
  
      def _extract_entries(self, html, compilation_id):
-        return [self.url_result('http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), 'Ivi')
-                for serie in re.findall(r'<strong><a href="/watch/%s/(\d+)">(?:[^<]+)</a></strong>' % compilation_id, html)]
+        return [
+            self.url_result(
+                'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
+            for serie in re.findall(
+                r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -160,7 +169,8 @@ class IviCompilationIE(InfoExtractor):
          season_id = mobj.group('seasonid')
  
          if season_id is not None:  # Season link
-            season_page = self._download_webpage(url, compilation_id, 'Downloading season %s web page' % season_id)
+            season_page = self._download_webpage(
+                url, compilation_id, 'Downloading season %s web page' % season_id)
              playlist_id = '%s/season%s' % (compilation_id, season_id)
              playlist_title = self._html_search_meta('title', season_page, 'title')
              entries = self._extract_entries(season_page, compilation_id)
@@ -168,8 +178,9 @@ class IviCompilationIE(InfoExtractor):
              compilation_page = self._download_webpage(url, compilation_id, 'Downloading compilation web page')
              playlist_id = compilation_id
              playlist_title = self._html_search_meta('title', compilation_page, 'title')
-            seasons = re.findall(r'<a href="/watch/%s/season(\d+)">[^<]+</a>' % compilation_id, compilation_page)
-            if len(seasons) == 0:  # No seasons in this compilation
+            seasons = re.findall(
+                r'<a href="/watch/%s/season(\d+)' % compilation_id, compilation_page)
+            if not seasons:  # No seasons in this compilation
                  entries = self._extract_entries(compilation_page, compilation_id)
              else:
                  entries = []
diff --git a/youtube_dl/extractor/ivideon.py b/youtube_dl/extractor/ivideon.py

new file mode 100644 (file)

index 0000000..3ca824f
--- /dev/null
+++ b/youtube_dl/extractor/ivideon.py
@@ -0,0 +1,83 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
+from ..utils import qualities
+
+
+class IvideonIE(InfoExtractor):
+    IE_NAME = 'ivideon'
+    IE_DESC = 'Ivideon TV'
+    _VALID_URL = r'https?://(?:www\.)?ivideon\.com/tv/(?:[^/]+/)*camera/(?P<id>\d+-[\da-f]+)/(?P<camera_id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.ivideon.com/tv/camera/100-916ca13b5c4ad9f564266424a026386d/0/',
+        'info_dict': {
+            'id': '100-916ca13b5c4ad9f564266424a026386d',
+            'ext': 'flv',
+            'title': 're:^Касса [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'description': 'Основное предназначение - запись действий кассиров. Плюс общий вид.',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://www.ivideon.com/tv/camera/100-c4ee4cb9ede885cf62dfbe93d7b53783/589824/?lang=ru',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.ivideon.com/tv/map/22.917923/-31.816406/16/camera/100-e7bc16c7d4b5bbd633fd5350b66dfa9a/0',
+        'only_matching': True,
+    }]
+
+    _QUALITIES = ('low', 'mid', 'hi')
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        server_id, camera_id = mobj.group('id'), mobj.group('camera_id')
+        camera_name, description = None, None
+        camera_url = compat_urlparse.urljoin(
+            url, '/tv/camera/%s/%s/' % (server_id, camera_id))
+
+        webpage = self._download_webpage(camera_url, server_id, fatal=False)
+        if webpage:
+            config_string = self._search_regex(
+                r'var\s+config\s*=\s*({.+?});', webpage, 'config', default=None)
+            if config_string:
+                config = self._parse_json(config_string, server_id, fatal=False)
+                camera_info = config.get('ivTvAppOptions', {}).get('currentCameraInfo')
+                if camera_info:
+                    camera_name = camera_info.get('camera_name')
+                    description = camera_info.get('misc', {}).get('description')
+            if not camera_name:
+                camera_name = self._html_search_meta(
+                    'name', webpage, 'camera name', default=None) or self._search_regex(
+                    r'<h1[^>]+class="b-video-title"[^>]*>([^<]+)', webpage, 'camera name', default=None)
+
+        quality = qualities(self._QUALITIES)
+
+        formats = [{
+            'url': 'https://streaming.ivideon.com/flv/live?%s' % compat_urllib_parse_urlencode({
+                'server': server_id,
+                'camera': camera_id,
+                'sessionId': 'demo',
+                'q': quality(format_id),
+            }),
+            'format_id': format_id,
+            'ext': 'flv',
+            'quality': quality(format_id),
+        } for format_id in self._QUALITIES]
+        self._sort_formats(formats)
+
+        return {
+            'id': server_id,
+            'title': self._live_title(camera_name or server_id),
+            'description': description,
+            'is_live': True,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/izlesene.py b/youtube_dl/extractor/izlesene.py

index bc226fa67c064b991674a510b1eba54d40dc67e0..aa0728abc0155fa6abbe8e2a88de18dd89d85138 100644 (file)
--- a/youtube_dl/extractor/izlesene.py
+++ b/youtube_dl/extractor/izlesene.py
@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
                  'description': 'md5:253753e2655dde93f59f74b572454f6d',
-                'thumbnail': 're:^http://.*\.jpg',
+                'thumbnail': 're:^https?://.*\.jpg',
                  'uploader_id': 'pelikzzle',
                  'timestamp': int,
                  'upload_date': '20140702',
@@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
                  'id': '17997',
                  'ext': 'mp4',
                  'title': 'Tarkan Dortmund 2006 Konseri',
-                'description': 'Tarkan Dortmund 2006 Konseri',
-                'thumbnail': 're:^http://.*\.jpg',
+                'thumbnail': 're:^https://.*\.jpg',
                  'uploader_id': 'parlayankiz',
                  'timestamp': int,
                  'upload_date': '20061112',
@@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
+        description = self._og_search_description(webpage, default=None)
          thumbnail = self._proto_relative_url(
              self._og_search_thumbnail(webpage), scheme='http:')
  
diff --git a/youtube_dl/extractor/jadorecettepub.py b/youtube_dl/extractor/jadorecettepub.py

deleted file mode 100644 (file)

index 063e86d..0000000
--- a/youtube_dl/extractor/jadorecettepub.py
+++ /dev/null
@@ -1,47 +0,0 @@
-# coding: utf-8
-
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from .youtube import YoutubeIE
-
-
-class JadoreCettePubIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
-
-    _TEST = {
-        'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
-        'md5': '401286a06067c70b44076044b66515de',
-        'info_dict': {
-            'id': 'jLMja3tr7a4',
-            'ext': 'mp4',
-            'title': 'La pire utilisation de Star Wars',
-            'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon.  Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, display_id)
-
-        title = self._html_search_regex(
-            r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
-            webpage, 'title')
-        description = self._html_search_regex(
-            r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
-            fatal=False)
-        real_url = self._search_regex(
-            r'\[/postlink\](.*)endofvid', webpage, 'video URL')
-        video_id = YoutubeIE.extract_id(real_url)
-
-        return {
-            '_type': 'url_transparent',
-            'url': real_url,
-            'id': video_id,
-            'title': title,
-            'description': description,
-        }
diff --git a/youtube_dl/extractor/jeuxvideo.py b/youtube_dl/extractor/jeuxvideo.py

index 1df084d87ae4c712d9bcfa1aac6d6367641287b5..1a4227f6b4b0ef7370b0f09613ef9d4b8916b435 100644 (file)
--- a/youtube_dl/extractor/jeuxvideo.py
+++ b/youtube_dl/extractor/jeuxvideo.py
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  
  
  class JeuxVideoIE(InfoExtractor):
-    _VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
+    _VALID_URL = r'https?://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
  
      _TESTS = [{
          'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
@@ -28,9 +28,9 @@ class JeuxVideoIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          title = mobj.group(1)
          webpage = self._download_webpage(url, title)
-        title = self._html_search_meta('name', webpage)
+        title = self._html_search_meta('name', webpage) or self._og_search_title(webpage)
          config_url = self._html_search_regex(
-            r'data-src="(/contenu/medias/video.php.*?)"',
+            r'data-src(?:set-video)?="(/contenu/medias/video.php.*?)"',
              webpage, 'config URL')
          config_url = 'http://www.jeuxvideo.com' + config_url
  
diff --git a/youtube_dl/extractor/jukebox.py b/youtube_dl/extractor/jukebox.py

deleted file mode 100644 (file)

index da8068e..0000000
--- a/youtube_dl/extractor/jukebox.py
+++ /dev/null
@@ -1,59 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    RegexNotFoundError,
-    unescapeHTML,
-)
-
-
-class JukeboxIE(InfoExtractor):
-    _VALID_URL = r'^http://www\.jukebox?\..+?\/.+[,](?P<id>[a-z0-9\-]+)\.html'
-    _TEST = {
-        'url': 'http://www.jukebox.es/kosheen/videoclip,pride,r303r.html',
-        'info_dict': {
-            'id': 'r303r',
-            'ext': 'flv',
-            'title': 'Kosheen-En Vivo Pride',
-            'uploader': 'Kosheen',
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        html = self._download_webpage(url, video_id)
-        iframe_url = unescapeHTML(self._search_regex(r'<iframe .*src="([^"]*)"', html, 'iframe url'))
-
-        iframe_html = self._download_webpage(iframe_url, video_id, 'Downloading iframe')
-        if re.search(r'class="jkb_waiting"', iframe_html) is not None:
-            raise ExtractorError('Video is not available(in your country?)!')
-
-        self.report_extraction(video_id)
-
-        try:
-            video_url = self._search_regex(r'"config":{"file":"(?P<video_url>http:[^"]+\?mdtk=[0-9]+)"',
-                                           iframe_html, 'video url')
-            video_url = unescapeHTML(video_url).replace('\/', '/')
-        except RegexNotFoundError:
-            youtube_url = self._search_regex(
-                r'config":{"file":"(http:\\/\\/www\.youtube\.com\\/watch\?v=[^"]+)"',
-                iframe_html, 'youtube url')
-            youtube_url = unescapeHTML(youtube_url).replace('\/', '/')
-            self.to_screen('Youtube video detected')
-            return self.url_result(youtube_url, ie='Youtube')
-
-        title = self._html_search_regex(r'<h1 class="inline">([^<]+)</h1>',
-                                        html, 'title')
-        artist = self._html_search_regex(r'<span id="infos_article_artist">([^<]+)</span>',
-                                         html, 'artist')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': artist + '-' + title,
-            'uploader': artist,
-        }
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

new file mode 100644 (file)

index 0000000..8a5e562
--- /dev/null
+++ b/youtube_dl/extractor/jwplatform.py
@@ -0,0 +1,84 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    int_or_none,
+)
+
+
+class JWPlatformBaseIE(InfoExtractor):
+    def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
+        video_data = jwplayer_data['playlist'][0]
+
+        formats = []
+        for source in video_data['sources']:
+            source_url = self._proto_relative_url(source['file'])
+            source_type = source.get('type') or ''
+            if source_type in ('application/vnd.apple.mpegurl', 'hls'):
+                formats.extend(self._extract_m3u8_formats(
+                    source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
+            elif source_type.startswith('audio'):
+                formats.append({
+                    'url': source_url,
+                    'vcodec': 'none',
+                })
+            else:
+                formats.append({
+                    'url': source_url,
+                    'width': int_or_none(source.get('width')),
+                    'height': int_or_none(source.get('height')),
+                })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        tracks = video_data.get('tracks')
+        if tracks and isinstance(tracks, list):
+            for track in tracks:
+                if track.get('file') and track.get('kind') == 'captions':
+                    subtitles.setdefault(track.get('label') or 'en', []).append({
+                        'url': self._proto_relative_url(track['file'])
+                    })
+
+        return {
+            'id': video_id,
+            'title': video_data['title'] if require_title else video_data.get('title'),
+            'description': video_data.get('description'),
+            'thumbnail': self._proto_relative_url(video_data.get('image')),
+            'timestamp': int_or_none(video_data.get('pubdate')),
+            'duration': float_or_none(jwplayer_data.get('duration')),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
+
+
+class JWPlatformIE(JWPlatformBaseIE):
+    _VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
+    _TEST = {
+        'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
+        'md5': 'fa8899fa601eb7c83a64e9d568bdf325',
+        'info_dict': {
+            'id': 'nPripu9l',
+            'ext': 'mov',
+            'title': 'Big Buck Bunny Trailer',
+            'description': 'Big Buck Bunny is a short animated film by the Blender Institute. It is made using free and open source software.',
+            'upload_date': '20081127',
+            'timestamp': 1227796140,
+        }
+    }
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<script[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id)
+        return self._parse_jwplayer_data(json_data, video_id)
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index d2873049202813be7587067b629503dc2da0f877..a65697ff558864f36cc5e8b8f82f959b19ea16fc 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -2,23 +2,38 @@
  from __future__ import unicode_literals
  
  import re
+import base64
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+    compat_parse_qs,
+)
  from ..utils import (
+    clean_html,
      ExtractorError,
      int_or_none,
+    unsmuggle_url,
  )
  
  
  class KalturaIE(InfoExtractor):
      _VALID_URL = r'''(?x)
-    (?:kaltura:|
-       https?://(:?(?:www|cdnapisec)\.)?kaltura\.com/index\.php/kwidget/(?:[^/]+/)*?wid/_
-    )(?P<partner_id>\d+)
-    (?::|
-       /(?:[^/]+/)*?entry_id/
-    )(?P<id>[0-9a-z_]+)'''
+                (?:
+                    kaltura:(?P<partner_id>\d+):(?P<id>[0-9a-z_]+)|
+                    https?://
+                        (:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/
+                        (?:
+                            (?:
+                                # flash player
+                                index\.php/kwidget|
+                                # html5 player
+                                html5/html5lib/[^/]+/mwEmbedFrame\.php
+                            )
+                        )(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
+                )
+                '''
      _API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?'
      _TESTS = [
          {
@@ -27,7 +42,7 @@ class KalturaIE(InfoExtractor):
              'info_dict': {
                  'id': '1_1jc2y3e4',
                  'ext': 'mp4',
-                'title': 'Track 4',
+                'title': 'Straight from the Heart',
                  'upload_date': '20131219',
                  'uploader_id': 'mlundberg@wolfgangsvault.com',
                  'description': 'The Allman Brothers Band, 12/16/1981',
@@ -43,6 +58,10 @@ class KalturaIE(InfoExtractor):
              'url': 'https://cdnapisec.kaltura.com/index.php/kwidget/wid/_557781/uiconf_id/22845202/entry_id/1_plr1syf3',
              'only_matching': True,
          },
+        {
+            'url': 'https://cdnapisec.kaltura.com/html5/html5lib/v2.30.2/mwEmbedFrame.php/p/1337/uiconf_id/20540612/entry_id/1_sf5ovm7u?wid=_243342',
+            'only_matching': True,
+        }
      ]
  
      def _kaltura_api_call(self, video_id, actions, *args, **kwargs):
@@ -52,7 +71,7 @@ class KalturaIE(InfoExtractor):
                  for k, v in a.items():
                      params['%d:%s' % (i, k)] = v
  
-        query = compat_urllib_parse.urlencode(params)
+        query = compat_urllib_parse_urlencode(params)
          url = self._API_BASE + query
          data = self._download_json(url, video_id, *args, **kwargs)
  
@@ -93,43 +112,99 @@ class KalturaIE(InfoExtractor):
                  'version': '-1',
              },
              {
-                'action': 'getContextData',
-                'contextDataParams:objectType': 'KalturaEntryContextDataParams',
-                'contextDataParams:referrer': 'http://www.kaltura.com/',
-                'contextDataParams:streamerType': 'http',
+                'action': 'getbyentryid',
                  'entryId': video_id,
-                'service': 'baseentry',
+                'service': 'flavorAsset',
              },
          ]
          return self._kaltura_api_call(
              video_id, actions, note='Downloading video info JSON')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        url, smuggled_data = unsmuggle_url(url, {})
+
          mobj = re.match(self._VALID_URL, url)
-        partner_id, entry_id = mobj.group('partner_id'), mobj.group('id')
-
-        info, source_data = self._get_video_info(entry_id, partner_id)
-
-        formats = [{
-            'format_id': '%(fileExt)s-%(bitrate)s' % f,
-            'ext': f['fileExt'],
-            'tbr': f['bitrate'],
-            'fps': f.get('frameRate'),
-            'filesize_approx': int_or_none(f.get('size'), invscale=1024),
-            'container': f.get('containerFormat'),
-            'vcodec': f.get('videoCodecId'),
-            'height': f.get('height'),
-            'width': f.get('width'),
-            'url': '%s/flavorId/%s' % (info['dataUrl'], f['id']),
-        } for f in source_data['flavorAssets']]
+        partner_id, entry_id = mobj.group('partner_id', 'id')
+        ks = None
+        if partner_id and entry_id:
+            info, flavor_assets = self._get_video_info(entry_id, partner_id)
+        else:
+            path, query = mobj.group('path', 'query')
+            if not path and not query:
+                raise ExtractorError('Invalid URL', expected=True)
+            params = {}
+            if query:
+                params = compat_parse_qs(query)
+            if path:
+                splitted_path = path.split('/')
+                params.update(dict((zip(splitted_path[::2], [[v] for v in splitted_path[1::2]]))))
+            if 'wid' in params:
+                partner_id = params['wid'][0][1:]
+            elif 'p' in params:
+                partner_id = params['p'][0]
+            else:
+                raise ExtractorError('Invalid URL', expected=True)
+            if 'entry_id' in params:
+                entry_id = params['entry_id'][0]
+                info, flavor_assets = self._get_video_info(entry_id, partner_id)
+            elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
+                reference_id = params['flashvars[referenceId]'][0]
+                webpage = self._download_webpage(url, reference_id)
+                entry_data = self._parse_json(self._search_regex(
+                    r'window\.kalturaIframePackageData\s*=\s*({.*});',
+                    webpage, 'kalturaIframePackageData'),
+                    reference_id)['entryResult']
+                info, flavor_assets = entry_data['meta'], entry_data['contextData']['flavorAssets']
+                entry_id = info['id']
+            else:
+                raise ExtractorError('Invalid URL', expected=True)
+            ks = params.get('flashvars[ks]', [None])[0]
+
+        source_url = smuggled_data.get('source_url')
+        if source_url:
+            referrer = base64.b64encode(
+                '://'.join(compat_urlparse.urlparse(source_url)[:2])
+                .encode('utf-8')).decode('utf-8')
+        else:
+            referrer = None
+
+        def sign_url(unsigned_url):
+            if ks:
+                unsigned_url += '/ks/%s' % ks
+            if referrer:
+                unsigned_url += '?referrer=%s' % referrer
+            return unsigned_url
+
+        formats = []
+        for f in flavor_assets:
+            # Continue if asset is not ready
+            if f['status'] != 2:
+                continue
+            video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id']))
+            formats.append({
+                'format_id': '%(fileExt)s-%(bitrate)s' % f,
+                'ext': f.get('fileExt'),
+                'tbr': int_or_none(f['bitrate']),
+                'fps': int_or_none(f.get('frameRate')),
+                'filesize_approx': int_or_none(f.get('size'), invscale=1024),
+                'container': f.get('containerFormat'),
+                'vcodec': f.get('videoCodecId'),
+                'height': int_or_none(f.get('height')),
+                'width': int_or_none(f.get('width')),
+                'url': video_url,
+            })
+        m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp'))
+        formats.extend(self._extract_m3u8_formats(
+            m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        self._check_formats(formats, entry_id)
          self._sort_formats(formats)
  
          return {
-            'id': video_id,
+            'id': entry_id,
              'title': info['name'],
              'formats': formats,
-            'description': info.get('description'),
+            'description': clean_html(info.get('description')),
              'thumbnail': info.get('thumbnailUrl'),
              'duration': info.get('duration'),
              'timestamp': info.get('createdAt'),
diff --git a/youtube_dl/extractor/kanalplay.py b/youtube_dl/extractor/kanalplay.py

index 4597d1b961a0fcae8137d3ec919fc2ce6ac31777..6c3498c6722e2312631cb6a327be4a90ba6d102f 100644 (file)
--- a/youtube_dl/extractor/kanalplay.py
+++ b/youtube_dl/extractor/kanalplay.py
@@ -49,7 +49,7 @@ class KanalPlayIE(InfoExtractor):
          subs = self._download_json(
              'http://www.kanal%splay.se/api/subtitles/%s' % (channel_id, video_id),
              video_id, 'Downloading subtitles JSON', fatal=False)
-        return {'se': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]} if subs else {}
+        return {'sv': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]} if subs else {}
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/kankan.py b/youtube_dl/extractor/kankan.py

index 364dc878ee23b98413a7f2c6735124d50d4f487b..a677ff44712794ef54f53a1afe9c55fbacad91e2 100644 (file)
--- a/youtube_dl/extractor/kankan.py
+++ b/youtube_dl/extractor/kankan.py
@@ -28,7 +28,7 @@ class KankanIE(InfoExtractor):
  
          title = self._search_regex(r'(?:G_TITLE=|G_MOVIE_TITLE = )[\'"](.+?)[\'"]', webpage, 'video title')
          surls = re.search(r'surls:\[\'.+?\'\]|lurl:\'.+?\.flv\'', webpage).group(0)
-        gcids = re.findall(r"http://.+?/.+?/(.+?)/", surls)
+        gcids = re.findall(r'http://.+?/.+?/(.+?)/', surls)
          gcid = gcids[-1]
  
          info_url = 'http://p2s.cl.kankan.com/getCdnresource_flv?gcid=%s' % gcid
diff --git a/youtube_dl/extractor/karaoketv.py b/youtube_dl/extractor/karaoketv.py

index 06daf5a89ce3ffde4d71d7dc8ceee9441840b72b..a6050c4de3e1695ac26bd1a21bab981a52755c21 100644 (file)
--- a/youtube_dl/extractor/karaoketv.py
+++ b/youtube_dl/extractor/karaoketv.py
@@ -2,39 +2,63 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote_plus
-from ..utils import (
-    js_to_json,
-)
  
  
  class KaraoketvIE(InfoExtractor):
-    _VALID_URL = r'http://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
+    _VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
      _TEST = {
-        'url': 'http://karaoketv.co.il/?container=songs&id=171568',
+        'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
          'info_dict': {
-            'id': '171568',
-            'ext': 'mp4',
-            'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
+            'id': '58356',
+            'ext': 'flv',
+            'title': 'קריוקי של איזון',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+
          webpage = self._download_webpage(url, video_id)
+        api_page_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
+            webpage, 'API play URL', group='url')
+
+        api_page = self._download_webpage(api_page_url, video_id)
+        video_cdn_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
+            api_page, 'video cdn URL', group='url')
+
+        video_cdn = self._download_webpage(video_cdn_url, video_id)
+        play_path = self._parse_json(
+            self._search_regex(
+                r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
+            video_id)['clip']['url']
  
-        page_video_url = self._og_search_video_url(webpage, video_id)
-        config_json = compat_urllib_parse_unquote_plus(self._search_regex(
-            r'config=(.*)', page_video_url, 'configuration'))
+        settings = self._parse_json(
+            self._search_regex(
+                r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
+            video_id, fatal=False) or {}
  
-        urls_info_json = self._download_json(
-            config_json, video_id, 'Downloading configuration',
-            transform_source=js_to_json)
+        servers = settings.get('servers')
+        if not servers or not isinstance(servers, list):
+            servers = ('wowzail.video-cdn.com:80/vodcdn', )
  
-        url = urls_info_json['playlist'][0]['url']
+        formats = [{
+            'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
+            'play_path': play_path,
+            'app': 'vodcdn',
+            'page_url': video_cdn_url,
+            'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
+            'rtmp_real_time': True,
+            'ext': 'flv',
+        } for server in servers]
  
          return {
              'id': video_id,
              'title': self._og_search_title(webpage),
-            'url': url,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/karrierevideos.py b/youtube_dl/extractor/karrierevideos.py

index bed94bc9338d158c77087d4e74ef341aa236f94f..c05263e6165159320376939c252af7dea7aeadb2 100644 (file)
--- a/youtube_dl/extractor/karrierevideos.py
+++ b/youtube_dl/extractor/karrierevideos.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class KarriereVideosIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?karrierevideos\.at(?:/[^/]+)+/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?karrierevideos\.at(?:/[^/]+)+/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://www.karrierevideos.at/berufsvideos/mittlere-hoehere-schulen/altenpflegerin',
          'info_dict': {
@@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
  
          video_id = self._search_regex(
              r'/config/video/(.+?)\.xml', webpage, 'video id')
+        # Server returns malformed headers
+        # Force Accept-Encoding: * to prevent gzipped results
          playlist = self._download_xml(
              'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
-            video_id, transform_source=fix_xml_ampersands)
+            video_id, transform_source=fix_xml_ampersands,
+            headers={'Accept-Encoding': '*'})
  
          NS_MAP = {
              'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'
diff --git a/youtube_dl/extractor/keek.py b/youtube_dl/extractor/keek.py

index c0956ba0902be3b8fd9a9188872eb90ab9acdefa..94a03d277a227733480b8a73f5535f9f3410be15 100644 (file)
--- a/youtube_dl/extractor/keek.py
+++ b/youtube_dl/extractor/keek.py
@@ -1,46 +1,39 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  
  
  class KeekIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<id>\w+)'
+    _VALID_URL = r'https?://(?:www\.)?keek\.com/keek/(?P<id>\w+)'
      IE_NAME = 'keek'
      _TEST = {
-        'url': 'https://www.keek.com/ytdl/keeks/NODfbab',
-        'md5': '09c5c109067536c1cec8bac8c21fea05',
+        'url': 'https://www.keek.com/keek/NODfbab',
+        'md5': '9b0636f8c0f7614afa4ea5e4c6e57e83',
          'info_dict': {
              'id': 'NODfbab',
              'ext': 'mp4',
-            'uploader': 'youtube-dl project',
-            'uploader_id': 'ytdl',
-            'title': 'test chars: "\'/\\\u00e4<>This is a test video for youtube-dl.For more information, contact phihag@phihag.de .',
+            'title': 'md5:35d42050a3ece241d5ddd7fdcc6fd896',
+            'uploader': 'ytdl',
+            'uploader_id': 'eGT5bab',
          },
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        video_url = 'http://cdn.keek.com/keek/video/%s' % video_id
-        thumbnail = 'http://cdn.keek.com/keek/thumbnail/%s/w100/h75' % video_id
          webpage = self._download_webpage(url, video_id)
  
-        raw_desc = self._html_search_meta('description', webpage)
-        if raw_desc:
-            uploader = self._html_search_regex(
-                r'Watch (.*?)\s+\(', raw_desc, 'uploader', fatal=False)
-            uploader_id = self._html_search_regex(
-                r'Watch .*?\(@(.+?)\)', raw_desc, 'uploader_id', fatal=False)
-        else:
-            uploader = None
-            uploader_id = None
-
          return {
              'id': video_id,
-            'url': video_url,
+            'url': self._og_search_video_url(webpage),
              'ext': 'mp4',
-            'title': self._og_search_title(webpage),
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
+            'title': self._og_search_description(webpage).strip(),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'uploader': self._search_regex(
+                r'data-username=(["\'])(?P<uploader>.+?)\1', webpage,
+                'uploader', fatal=False, group='uploader'),
+            'uploader_id': self._search_regex(
+                r'data-user-id=(["\'])(?P<uploader_id>.+?)\1', webpage,
+                'uploader id', fatal=False, group='uploader_id'),
          }
diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py

index 82eddec511850ade9b4786636027597baf75dd29..126ca13df1b8c30e9d94b204beb54eab03644fea 100644 (file)
--- a/youtube_dl/extractor/keezmovies.py
+++ b/youtube_dl/extractor/keezmovies.py
@@ -1,12 +1,11 @@
  from __future__ import unicode_literals
  
-import os
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-    compat_urllib_request,
+from ..utils import (
+    sanitized_Request,
+    url_basename,
  )
  
  
@@ -14,19 +13,20 @@ class KeezMoviesIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/.+?(?P<id>[0-9]+)(?:[/?&]|$)'
      _TEST = {
          'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
-        'md5': '6e297b7e789329923fcf83abb67c9289',
+        'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0',
          'info_dict': {
              'id': '1214711',
              'ext': 'mp4',
              'title': 'Petite Asian Lady Mai Playing In Bathtub',
              'age_limit': 18,
+            'thumbnail': 're:^https?://.*\.jpg$',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
@@ -38,21 +38,29 @@ class KeezMoviesIE(InfoExtractor):
  
          video_title = self._html_search_regex(
              r'<h1 [^>]*>([^<]+)', webpage, 'title')
-        video_url = self._html_search_regex(
-            r'(?s)html5VideoPlayer = .*?src="([^"]+)"', webpage, 'video URL')
-        path = compat_urllib_parse_urlparse(video_url).path
-        extension = os.path.splitext(path)[1][1:]
-        format = path.split('/')[4].split('_')[:2]
-        format = "-".join(format)
+        flashvars = self._parse_json(self._search_regex(
+            r'var\s+flashvars\s*=\s*([^;]+);', webpage, 'flashvars'), video_id)
+
+        formats = []
+        for height in (180, 240, 480):
+            if flashvars.get('quality_%dp' % height):
+                video_url = flashvars['quality_%dp' % height]
+                a_format = {
+                    'url': video_url,
+                    'height': height,
+                    'format_id': '%dp' % height,
+                }
+                filename_parts = url_basename(video_url).split('_')
+                if len(filename_parts) >= 2 and re.match(r'\d+[Kk]', filename_parts[1]):
+                    a_format['tbr'] = int(filename_parts[1][:-1])
+                formats.append(a_format)
  
          age_limit = self._rta_search(webpage)
  
          return {
              'id': video_id,
              'title': video_title,
-            'url': video_url,
-            'ext': extension,
-            'format': format,
-            'format_id': format,
+            'formats': formats,
              'age_limit': age_limit,
+            'thumbnail': flashvars.get('image_url')
          }
diff --git a/youtube_dl/extractor/khanacademy.py b/youtube_dl/extractor/khanacademy.py

index 08a671fa86a007d3327ef03c257f1b943bd425db..61739efa7a4c3b84892083eab10237c23eb69e3d 100644 (file)
--- a/youtube_dl/extractor/khanacademy.py
+++ b/youtube_dl/extractor/khanacademy.py
@@ -14,10 +14,10 @@ class KhanAcademyIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.khanacademy.org/video/one-time-pad',
-        'md5': '7021db7f2d47d4fff89b13177cb1e8f4',
+        'md5': '7b391cce85e758fb94f763ddc1bbb979',
          'info_dict': {
              'id': 'one-time-pad',
-            'ext': 'mp4',
+            'ext': 'webm',
              'title': 'The one-time pad',
              'description': 'The perfect cipher',
              'duration': 176,
diff --git a/youtube_dl/extractor/kickstarter.py b/youtube_dl/extractor/kickstarter.py

index 1d391e69ff7e0aba1b78ae5e32792b2dca839943..9f1ade2e46e8e2905adaa65eeaf2de22bfed8d2c 100644 (file)
--- a/youtube_dl/extractor/kickstarter.py
+++ b/youtube_dl/extractor/kickstarter.py
@@ -2,12 +2,13 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import smuggle_url
  
  
  class KickStarterIE(InfoExtractor):
      _VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
      _TESTS = [{
-        'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location',
+        'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
          'md5': 'c81addca81327ffa66c642b5d8b08cab',
          'info_dict': {
              'id': '1404461844',
@@ -27,7 +28,8 @@ class KickStarterIE(InfoExtractor):
              'uploader_id': 'pebble',
              'uploader': 'Pebble Technology',
              'title': 'Pebble iOS Notifications',
-        }
+        },
+        'add_ie': ['Vimeo'],
      }, {
          'url': 'https://www.kickstarter.com/projects/1420158244/power-drive-2000/widget/video.html',
          'info_dict': {
@@ -43,7 +45,7 @@ class KickStarterIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          title = self._html_search_regex(
-            r'<title>\s*(.*?)(?:\s*&mdash; Kickstarter)?\s*</title>',
+            r'<title>\s*(.*?)(?:\s*&mdash;\s*Kickstarter)?\s*</title>',
              webpage, 'title')
          video_url = self._search_regex(
              r'data-video-url="(.*?)"',
@@ -52,7 +54,7 @@ class KickStarterIE(InfoExtractor):
              return {
                  '_type': 'url_transparent',
                  'ie_key': 'Generic',
-                'url': url,
+                'url': smuggle_url(url, {'to_generic': True}),
                  'title': title,
              }
  
diff --git a/youtube_dl/extractor/konserthusetplay.py b/youtube_dl/extractor/konserthusetplay.py

new file mode 100644 (file)

index 0000000..55291c6
--- /dev/null
+++ b/youtube_dl/extractor/konserthusetplay.py
@@ -0,0 +1,107 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    int_or_none,
+)
+
+
+class KonserthusetPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)'
+    _TEST = {
+        'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
+        'info_dict': {
+            'id': 'CKDDnlCY-dhWAAqiMERd-A',
+            'ext': 'flv',
+            'title': 'Orkesterns instrument: Valthornen',
+            'description': 'md5:f10e1f0030202020396a4d712d2fa827',
+            'thumbnail': 're:^https?://.*$',
+            'duration': 398.8,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        e = self._search_regex(
+            r'https?://csp\.picsearch\.com/rest\?.*\be=(.+?)[&"\']', webpage, 'e')
+
+        rest = self._download_json(
+            'http://csp.picsearch.com/rest?e=%s&containerId=mediaplayer&i=object' % e,
+            video_id, transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1])
+
+        media = rest['media']
+        player_config = media['playerconfig']
+        playlist = player_config['playlist']
+
+        source = next(f for f in playlist if f.get('bitrates'))
+
+        FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
+
+        formats = []
+
+        fallback_url = source.get('fallbackUrl')
+        fallback_format_id = None
+        if fallback_url:
+            fallback_format_id = self._search_regex(
+                FORMAT_ID_REGEX, fallback_url, 'format id', default=None)
+
+        connection_url = (player_config.get('rtmp', {}).get(
+            'netConnectionUrl') or player_config.get(
+            'plugins', {}).get('bwcheck', {}).get('netConnectionUrl'))
+        if connection_url:
+            for f in source['bitrates']:
+                video_url = f.get('url')
+                if not video_url:
+                    continue
+                format_id = self._search_regex(
+                    FORMAT_ID_REGEX, video_url, 'format id', default=None)
+                f_common = {
+                    'vbr': int_or_none(f.get('bitrate')),
+                    'width': int_or_none(f.get('width')),
+                    'height': int_or_none(f.get('height')),
+                }
+                f = f_common.copy()
+                f.update({
+                    'url': connection_url,
+                    'play_path': video_url,
+                    'format_id': 'rtmp-%s' % format_id if format_id else 'rtmp',
+                    'ext': 'flv',
+                })
+                formats.append(f)
+                if format_id and format_id == fallback_format_id:
+                    f = f_common.copy()
+                    f.update({
+                        'url': fallback_url,
+                        'format_id': 'http-%s' % format_id if format_id else 'http',
+                    })
+                    formats.append(f)
+
+        if not formats and fallback_url:
+            formats.append({
+                'url': fallback_url,
+            })
+
+        self._sort_formats(formats)
+
+        title = player_config.get('title') or media['title']
+        description = player_config.get('mediaInfo', {}).get('description')
+        thumbnail = media.get('image')
+        duration = float_or_none(media.get('duration'), 1000)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/kontrtube.py b/youtube_dl/extractor/kontrtube.py

index 720bc939bfd4c3a30c9a3709968c6008e6472067..704bd7b34554af60dfec9b811251f5270cbd1f55 100644 (file)
--- a/youtube_dl/extractor/kontrtube.py
+++ b/youtube_dl/extractor/kontrtube.py
@@ -4,13 +4,16 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..utils import (
+    int_or_none,
+    parse_duration,
+)
  
  
  class KontrTubeIE(InfoExtractor):
      IE_NAME = 'kontrtube'
      IE_DESC = 'KontrTube.ru - Труба зовёт'
-    _VALID_URL = r'http://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
+    _VALID_URL = r'https?://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
  
      _TEST = {
          'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',
@@ -34,33 +37,28 @@ class KontrTubeIE(InfoExtractor):
          webpage = self._download_webpage(
              url, display_id, 'Downloading page')
  
-        video_url = self._html_search_regex(
+        video_url = self._search_regex(
              r"video_url\s*:\s*'(.+?)/?',", webpage, 'video URL')
-        thumbnail = self._html_search_regex(
-            r"preview_url\s*:\s*'(.+?)/?',", webpage, 'video thumbnail', fatal=False)
+        thumbnail = self._search_regex(
+            r"preview_url\s*:\s*'(.+?)/?',", webpage, 'thumbnail', fatal=False)
          title = self._html_search_regex(
-            r'<title>(.+?)</title>', webpage, 'video title')
+            r'(?s)<h2>(.+?)</h2>', webpage, 'title')
          description = self._html_search_meta(
-            'description', webpage, 'video description')
+            'description', webpage, 'description')
  
-        mobj = re.search(
-            r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>',
-            webpage)
-        duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
+        duration = self._search_regex(
+            r'Длительность: <em>([^<]+)</em>', webpage, 'duration', fatal=False)
+        if duration:
+            duration = parse_duration(duration.replace('мин', 'min').replace('сек', 'sec'))
  
-        view_count = self._html_search_regex(
-            r'<div class="col_2">Просмотров: <span>(\d+)</span></div>',
+        view_count = self._search_regex(
+            r'Просмотров: <em>([^<]+)</em>',
              webpage, 'view count', fatal=False)
+        if view_count:
+            view_count = int_or_none(view_count.replace(' ', ''))
  
-        comment_count = None
-        comment_str = self._html_search_regex(
-            r'Комментарии: <span>([^<]+)</span>', webpage, 'comment count', fatal=False)
-        if comment_str.startswith('комментариев нет'):
-            comment_count = 0
-        else:
-            mobj = re.search(r'\d+ из (?P<total>\d+) комментариев', comment_str)
-            if mobj:
-                comment_count = mobj.group('total')
+        comment_count = int_or_none(self._search_regex(
+            r'Комментарии \((\d+)\)<', webpage, ' comment count', fatal=False))
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/krasview.py b/youtube_dl/extractor/krasview.py

index 96f95979a22429d2a19af3575ad1ca25c463b13e..0ae8ebd687034343c364dbc968d90d84f5bc37df 100644 (file)
--- a/youtube_dl/extractor/krasview.py
+++ b/youtube_dl/extractor/krasview.py
@@ -25,6 +25,9 @@ class KrasViewIE(InfoExtractor):
              'duration': 27,
              'thumbnail': 're:^https?://.*\.jpg',
          },
+        'params': {
+            'skip_download': 'Not accessible from Travis CI server',
+        },
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/ku6.py b/youtube_dl/extractor/ku6.py

index a602980a141f3f8ccce026eaddc8b383e7894352..a574408e55b6a5ee251d94ca1d0346b9e34ac0b8 100644 (file)
--- a/youtube_dl/extractor/ku6.py
+++ b/youtube_dl/extractor/ku6.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class Ku6IE(InfoExtractor):
-    _VALID_URL = r'http://v\.ku6\.com/show/(?P<id>[a-zA-Z0-9\-\_]+)(?:\.)*html'
+    _VALID_URL = r'https?://v\.ku6\.com/show/(?P<id>[a-zA-Z0-9\-\_]+)(?:\.)*html'
      _TEST = {
          'url': 'http://v.ku6.com/show/JG-8yS14xzBr4bCn1pu0xw...html',
          'md5': '01203549b9efbb45f4b87d55bdea1ed1',
diff --git a/youtube_dl/extractor/kusi.py b/youtube_dl/extractor/kusi.py

new file mode 100644 (file)

index 0000000..12cc56e
--- /dev/null
+++ b/youtube_dl/extractor/kusi.py
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import random
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote_plus
+from ..utils import (
+    int_or_none,
+    float_or_none,
+    timeconvert,
+    update_url_query,
+    xpath_text,
+)
+
+
+class KUSIIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?kusi\.com/(?P<path>story/.+|video\?clipId=(?P<clipId>\d+))'
+    _TESTS = [{
+        'url': 'http://www.kusi.com/story/31183873/turko-files-case-closed-put-on-hold',
+        'md5': 'f926e7684294cf8cb7bdf8858e1b3988',
+        'info_dict': {
+            'id': '12203019',
+            'ext': 'mp4',
+            'title': 'Turko Files: Case Closed! & Put On Hold!',
+            'duration': 231.0,
+            'upload_date': '20160210',
+            'timestamp': 1455087571,
+            'thumbnail': 're:^https?://.*\.jpg$'
+        },
+    }, {
+        'url': 'http://kusi.com/video?clipId=12203019',
+        'info_dict': {
+            'id': '12203019',
+            'ext': 'mp4',
+            'title': 'Turko Files: Case Closed! & Put On Hold!',
+            'duration': 231.0,
+            'upload_date': '20160210',
+            'timestamp': 1455087571,
+            'thumbnail': 're:^https?://.*\.jpg$'
+        },
+        'params': {
+            'skip_download': True,  # Same as previous one
+        },
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        clip_id = mobj.group('clipId')
+        video_id = clip_id or mobj.group('path')
+
+        webpage = self._download_webpage(url, video_id)
+
+        if clip_id is None:
+            video_id = clip_id = self._html_search_regex(
+                r'"clipId"\s*,\s*"(\d+)"', webpage, 'clip id')
+
+        affiliate_id = self._search_regex(
+            r'affiliateId\s*:\s*\'([^\']+)\'', webpage, 'affiliate id')
+
+        # See __Packages/worldnow/model/GalleryModel.as of WNGallery.swf
+        xml_url = update_url_query('http://www.kusi.com/build.asp', {
+            'buildtype': 'buildfeaturexmlrequest',
+            'featureType': 'Clip',
+            'featureid': clip_id,
+            'affiliateno': affiliate_id,
+            'clientgroupid': '1',
+            'rnd': int(round(random.random() * 1000000)),
+        })
+
+        doc = self._download_xml(xml_url, video_id)
+
+        video_title = xpath_text(doc, 'HEADLINE', fatal=True)
+        duration = float_or_none(xpath_text(doc, 'DURATION'), scale=1000)
+        description = xpath_text(doc, 'ABSTRACT')
+        thumbnail = xpath_text(doc, './THUMBNAILIMAGE/FILENAME')
+        createtion_time = timeconvert(xpath_text(doc, 'rfc822creationdate'))
+
+        quality_options = doc.find('{http://search.yahoo.com/mrss/}group').findall('{http://search.yahoo.com/mrss/}content')
+        formats = []
+        for quality in quality_options:
+            formats.append({
+                'url': compat_urllib_parse_unquote_plus(quality.attrib['url']),
+                'height': int_or_none(quality.attrib.get('height')),
+                'width': int_or_none(quality.attrib.get('width')),
+                'vbr': float_or_none(quality.attrib.get('bitratebits'), scale=1000),
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_title,
+            'description': description,
+            'duration': duration,
+            'formats': formats,
+            'thumbnail': thumbnail,
+            'timestamp': createtion_time,
+        }
diff --git a/youtube_dl/extractor/kuwo.py b/youtube_dl/extractor/kuwo.py

index 1077846f296150c8503dbe091a0a096df4620719..3740869c74cdc9ae878f5012c398a9bde7e7356b 100644 (file)
--- a/youtube_dl/extractor/kuwo.py
+++ b/youtube_dl/extractor/kuwo.py
@@ -2,13 +2,13 @@
  from __future__ import unicode_literals
  
  import re
-import itertools
  
  from .common import InfoExtractor
  from ..utils import (
      get_element_by_id,
      clean_html,
      ExtractorError,
+    InAdvancePagedList,
      remove_start,
  )
  
@@ -23,14 +23,31 @@ class KuwoBaseIE(InfoExtractor):
          {'format': 'aac', 'ext': 'aac', 'abr': 48, 'preference': 10}
      ]
  
-    def _get_formats(self, song_id):
+    def _get_formats(self, song_id, tolerate_ip_deny=False):
          formats = []
          for file_format in self._FORMATS:
+            headers = {}
+            cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
+            if cn_verification_proxy:
+                headers['Ytdl-request-proxy'] = cn_verification_proxy
+
+            query = {
+                'format': file_format['ext'],
+                'br': file_format.get('br', ''),
+                'rid': 'MUSIC_%s' % song_id,
+                'type': 'convert_url',
+                'response': 'url'
+            }
+
              song_url = self._download_webpage(
-                'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
-                (file_format['ext'], file_format.get('br', ''), song_id),
+                'http://antiserver.kuwo.cn/anti.s',
                  song_id, note='Download %s url info' % file_format['format'],
+                query=query, headers=headers,
              )
+
+            if song_url == 'IPDeny' and not tolerate_ip_deny:
+                raise ExtractorError('This song is blocked in this region', expected=True)
+
              if song_url.startswith('http://') or song_url.startswith('https://'):
                  formats.append({
                      'url': song_url,
@@ -39,14 +56,14 @@ class KuwoBaseIE(InfoExtractor):
                      'preference': file_format['preference'],
                      'abr': file_format.get('abr'),
                  })
-        self._sort_formats(formats)
+
          return formats
  
  
  class KuwoIE(KuwoBaseIE):
      IE_NAME = 'kuwo:song'
      IE_DESC = '酷我音乐'
-    _VALID_URL = r'http://www\.kuwo\.cn/yinyue/(?P<id>\d+?)/'
+    _VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://www.kuwo.cn/yinyue/635632/',
          'info_dict': {
@@ -57,18 +74,23 @@ class KuwoIE(KuwoBaseIE):
              'upload_date': '20080122',
              'description': 'md5:ed13f58e3c3bf3f7fd9fbc4e5a7aa75c'
          },
+        'skip': 'this song has been offline because of copyright issues',
      }, {
          'url': 'http://www.kuwo.cn/yinyue/6446136/',
          'info_dict': {
              'id': '6446136',
              'ext': 'mp3',
              'title': '心',
+            'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
              'creator': 'IU',
              'upload_date': '20150518',
          },
          'params': {
              'format': 'mp3-320'
          },
+    }, {
+        'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -76,20 +98,23 @@ class KuwoIE(KuwoBaseIE):
          webpage = self._download_webpage(
              url, song_id, note='Download song detail info',
              errnote='Unable to get song detail info')
+        if '对不起，该歌曲由于版权问题已被下线，将返回网站首页' in webpage:
+            raise ExtractorError('this song has been offline because of copyright issues', expected=True)
  
          song_name = self._html_search_regex(
-            r'<h1[^>]+title="([^"]+)">', webpage, 'song name')
-        singer_name = self._html_search_regex(
-            r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
-            webpage, 'singer name', fatal=False)
+            r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
+        singer_name = remove_start(self._html_search_regex(
+            r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
+            webpage, 'singer name', fatal=False), '歌手')
          lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
          if lrc_content == '暂无':     # indicates no lyrics
              lrc_content = None
  
          formats = self._get_formats(song_id)
+        self._sort_formats(formats)
  
          album_id = self._html_search_regex(
-            r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
+            r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
              webpage, 'album id', fatal=False)
  
          publish_time = None
@@ -118,7 +143,7 @@ class KuwoIE(KuwoBaseIE):
  class KuwoAlbumIE(InfoExtractor):
      IE_NAME = 'kuwo:album'
      IE_DESC = '酷我音乐 - 专辑'
-    _VALID_URL = r'http://www\.kuwo\.cn/album/(?P<id>\d+?)/'
+    _VALID_URL = r'https?://www\.kuwo\.cn/album/(?P<id>\d+?)/'
      _TEST = {
          'url': 'http://www.kuwo.cn/album/502294/',
          'info_dict': {
@@ -154,13 +179,11 @@ class KuwoAlbumIE(InfoExtractor):
  class KuwoChartIE(InfoExtractor):
      IE_NAME = 'kuwo:chart'
      IE_DESC = '酷我音乐 - 排行榜'
-    _VALID_URL = r'http://yinyue\.kuwo\.cn/billboard_(?P<id>[^.]+).htm'
+    _VALID_URL = r'https?://yinyue\.kuwo\.cn/billboard_(?P<id>[^.]+).htm'
      _TEST = {
          'url': 'http://yinyue.kuwo.cn/billboard_香港中文龙虎榜.htm',
          'info_dict': {
              'id': '香港中文龙虎榜',
-            'title': '香港中文龙虎榜',
-            'description': 're:\d{4}第\d{2}期',
          },
          'playlist_mincount': 10,
      }
@@ -171,30 +194,24 @@ class KuwoChartIE(InfoExtractor):
              url, chart_id, note='Download chart info',
              errnote='Unable to get chart info')
  
-        chart_name = self._html_search_regex(
-            r'<h1[^>]+class="unDis">([^<]+)</h1>', webpage, 'chart name')
-
-        chart_desc = self._html_search_regex(
-            r'<p[^>]+class="tabDef">(\d{4}第\d{2}期)</p>', webpage, 'chart desc')
-
          entries = [
              self.url_result(song_url, 'Kuwo') for song_url in re.findall(
-                r'<a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/"', webpage)
+                r'<a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)', webpage)
          ]
-        return self.playlist_result(entries, chart_id, chart_name, chart_desc)
+        return self.playlist_result(entries, chart_id)
  
  
  class KuwoSingerIE(InfoExtractor):
      IE_NAME = 'kuwo:singer'
      IE_DESC = '酷我音乐 - 歌手'
-    _VALID_URL = r'http://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
          'info_dict': {
              'id': 'bruno+mars',
              'title': 'Bruno Mars',
          },
-        'playlist_count': 10,
+        'playlist_mincount': 329,
      }, {
          'url': 'http://www.kuwo.cn/mingxing/Ali/music.htm',
          'info_dict': {
@@ -202,8 +219,11 @@ class KuwoSingerIE(InfoExtractor):
              'title': 'Ali',
          },
          'playlist_mincount': 95,
+        'skip': 'Regularly stalls travis build',  # See https://travis-ci.org/rg3/youtube-dl/jobs/78878540
      }]
  
+    PAGE_SIZE = 15
+
      def _real_extract(self, url):
          singer_id = self._match_id(url)
          webpage = self._download_webpage(
@@ -211,25 +231,28 @@ class KuwoSingerIE(InfoExtractor):
              errnote='Unable to get singer info')
  
          singer_name = self._html_search_regex(
-            r'<div class="title clearfix">\s*<h1>([^<]+)<span', webpage, 'singer name'
-        )
+            r'<h1>([^<]+)</h1>', webpage, 'singer name')
+
+        artist_id = self._html_search_regex(
+            r'data-artistid="(\d+)"', webpage, 'artist id')
+
+        page_count = int(self._html_search_regex(
+            r'data-page="(\d+)"', webpage, 'page count'))
  
-        entries = []
-        first_page_only = False if re.search(r'/music(?:_\d+)?\.htm', url) else True
-        for page_num in itertools.count(1):
+        def page_func(page_num):
              webpage = self._download_webpage(
-                'http://www.kuwo.cn/mingxing/%s/music_%d.htm' % (singer_id, page_num),
-                singer_id, note='Download song list page #%d' % page_num,
-                errnote='Unable to get song list page #%d' % page_num)
+                'http://www.kuwo.cn/artist/contentMusicsAjax',
+                singer_id, note='Download song list page #%d' % (page_num + 1),
+                errnote='Unable to get song list page #%d' % (page_num + 1),
+                query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
  
-            entries.extend([
+            return [
                  self.url_result(song_url, 'Kuwo') for song_url in re.findall(
-                    r'<p[^>]+class="m_name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/',
+                    r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
                      webpage)
-            ][:10 if first_page_only else None])
+            ]
  
-            if first_page_only or not re.search(r'<a[^>]+href="[^"]+">下一页</a>', webpage):
-                break
+        entries = InAdvancePagedList(page_func, page_count, self.PAGE_SIZE)
  
          return self.playlist_result(entries, singer_id, singer_name)
  
@@ -237,7 +260,7 @@ class KuwoSingerIE(InfoExtractor):
  class KuwoCategoryIE(InfoExtractor):
      IE_NAME = 'kuwo:category'
      IE_DESC = '酷我音乐 - 分类'
-    _VALID_URL = r'http://yinyue\.kuwo\.cn/yy/cinfo_(?P<id>\d+?).htm'
+    _VALID_URL = r'https?://yinyue\.kuwo\.cn/yy/cinfo_(?P<id>\d+?).htm'
      _TEST = {
          'url': 'http://yinyue.kuwo.cn/yy/cinfo_86375.htm',
          'info_dict': {
@@ -245,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
              'title': '八十年代精选',
              'description': '这些都是属于八十年代的回忆！',
          },
-        'playlist_count': 30,
+        'playlist_mincount': 24,
      }
  
      def _real_extract(self, url):
@@ -274,15 +297,21 @@ class KuwoCategoryIE(InfoExtractor):
  class KuwoMvIE(KuwoBaseIE):
      IE_NAME = 'kuwo:mv'
      IE_DESC = '酷我音乐 - MV'
-    _VALID_URL = r'http://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
+    _VALID_URL = r'https?://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
      _TEST = {
          'url': 'http://www.kuwo.cn/mv/6480076/',
          'info_dict': {
              'id': '6480076',
-            'ext': 'mkv',
-            'title': '我们家MV',
+            'ext': 'mp4',
+            'title': 'My HouseMV',
              'creator': '2PM',
          },
+        # In this video, music URLs (anti.s) are blocked outside China and
+        # USA, while the MV URL (mvurl) is available globally, so force the MV
+        # URL for consistent results in different countries
+        'params': {
+            'format': 'mv',
+        },
      }
      _FORMATS = KuwoBaseIE._FORMATS + [
          {'format': 'mkv', 'ext': 'mkv', 'preference': 250},
@@ -304,7 +333,17 @@ class KuwoMvIE(KuwoBaseIE):
          else:
              raise ExtractorError('Unable to find song or singer names')
  
-        formats = self._get_formats(song_id)
+        formats = self._get_formats(song_id, tolerate_ip_deny=True)
+
+        mv_url = self._download_webpage(
+            'http://www.kuwo.cn/yy/st/mvurl?rid=MUSIC_%s' % song_id,
+            song_id, note='Download %s MV URL' % song_id)
+        formats.append({
+            'url': mv_url,
+            'format_id': 'mv',
+        })
+
+        self._sort_formats(formats)
  
          return {
              'id': song_id,
diff --git a/youtube_dl/extractor/laola1tv.py b/youtube_dl/extractor/laola1tv.py

index b459559b0349bcae6c8d658c2fb120c2cb81d37e..2fab38079aac0c5f20a1772d52fa52642cb520bf 100644 (file)
--- a/youtube_dl/extractor/laola1tv.py
+++ b/youtube_dl/extractor/laola1tv.py
@@ -1,86 +1,151 @@
-# -*- coding: utf-8 -*-
+# coding: utf-8
  from __future__ import unicode_literals
  
-import random
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
+    unified_strdate,
+    urlencode_postdata,
+    xpath_element,
      xpath_text,
  )
  
  
  class Laola1TvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/.*?/(?P<id>[0-9]+)\.html'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/(?P<kind>[^/]+)/(?P<slug>[^/?#&]+)'
+    _TESTS = [{
          'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
          'info_dict': {
              'id': '227883',
-            'ext': 'mp4',
+            'display_id': 'straubing-tigers-koelner-haie',
+            'ext': 'flv',
              'title': 'Straubing Tigers - Kölner Haie',
+            'upload_date': '20140912',
+            'is_live': False,
              'categories': ['Eishockey'],
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie',
+        'info_dict': {
+            'id': '464602',
+            'display_id': 'straubing-tigers-koelner-haie',
+            'ext': 'flv',
+            'title': 'Straubing Tigers - Kölner Haie',
+            'upload_date': '20160129',
              'is_live': False,
+            'categories': ['Eishockey'],
          },
          'params': {
              'skip_download': True,
-        }
-    }
+        },
+    }, {
+        'url': 'http://www.laola1.tv/de-de/livestream/2016-03-22-belogorie-belgorod-trentino-diatec-lde',
+        'info_dict': {
+            'id': '487850',
+            'display_id': '2016-03-22-belogorie-belgorod-trentino-diatec-lde',
+            'ext': 'flv',
+            'title': 'Belogorie BELGOROD - TRENTINO Diatec',
+            'upload_date': '20160322',
+            'uploader': 'CEV - Europäischer Volleyball Verband',
+            'is_live': True,
+            'categories': ['Volleyball'],
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'skip': 'This live stream has already finished.',
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        display_id = mobj.group('slug')
+        kind = mobj.group('kind')
          lang = mobj.group('lang')
          portal = mobj.group('portal')
  
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, display_id)
+
+        if 'Dieser Livestream ist bereits beendet.' in webpage:
+            raise ExtractorError('This live stream has already finished.', expected=True)
+
          iframe_url = self._search_regex(
-            r'<iframe[^>]*?class="main_tv_player"[^>]*?src="([^"]+)"',
-            webpage, 'iframe URL')
+            r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
+            webpage, 'iframe url')
  
-        iframe = self._download_webpage(
-            iframe_url, video_id, note='Downloading iframe')
-        flashvars_m = re.findall(
-            r'flashvars\.([_a-zA-Z0-9]+)\s*=\s*"([^"]*)";', iframe)
-        flashvars = dict((m[0], m[1]) for m in flashvars_m)
+        video_id = self._search_regex(
+            r'videoid=(\d+)', iframe_url, 'video id')
+
+        iframe = self._download_webpage(compat_urlparse.urljoin(
+            url, iframe_url), display_id, 'Downloading iframe')
  
          partner_id = self._search_regex(
-            r'partnerid\s*:\s*"([^"]+)"', iframe, 'partner id')
-
-        xml_url = ('http://www.laola1.tv/server/hd_video.php?' +
-                   'play=%s&partner=%s&portal=%s&v5ident=&lang=%s' % (
-                       video_id, partner_id, portal, lang))
-        hd_doc = self._download_xml(xml_url, video_id)
-
-        title = xpath_text(hd_doc, './/video/title', fatal=True)
-        flash_url = xpath_text(hd_doc, './/video/url', fatal=True)
-        uploader = xpath_text(hd_doc, './/video/meta_organistation')
-        is_live = xpath_text(hd_doc, './/video/islive') == 'true'
-
-        categories = xpath_text(hd_doc, './/video/meta_sports')
-        if categories:
-            categories = categories.split(',')
-
-        ident = random.randint(10000000, 99999999)
-        token_url = '%s&ident=%s&klub=0&unikey=0&timestamp=%s&auth=%s' % (
-            flash_url, ident, flashvars['timestamp'], flashvars['auth'])
-
-        token_doc = self._download_xml(
-            token_url, video_id, note='Downloading token')
-        token_attrib = token_doc.find('.//token').attrib
-        if token_attrib.get('auth') in ('blocked', 'restricted'):
+            r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
+            iframe, 'partner id', group='partner_id')
+
+        hd_doc = self._download_xml(
+            'http://www.laola1.tv/server/hd_video.php?%s'
+            % compat_urllib_parse_urlencode({
+                'play': video_id,
+                'partner': partner_id,
+                'portal': portal,
+                'lang': lang,
+                'v5ident': '',
+            }), display_id)
+
+        _v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
+        title = _v('title', fatal=True)
+
+        VS_TARGETS = {
+            'video': '2',
+            'livestream': '17',
+        }
+
+        req = sanitized_Request(
+            'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
+            compat_urllib_parse_urlencode({
+                'videoId': video_id,
+                'target': VS_TARGETS.get(kind, '2'),
+                'label': _v('label'),
+                'area': _v('area'),
+            }),
+            urlencode_postdata(
+                dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
+
+        token_url = self._download_json(req, display_id)['data']['stream-access'][0]
+        token_doc = self._download_xml(token_url, display_id, 'Downloading token')
+
+        token_attrib = xpath_element(token_doc, './/token').attrib
+        token_auth = token_attrib['auth']
+
+        if token_auth in ('blocked', 'restricted', 'error'):
              raise ExtractorError(
-                'Token error: %s' % token_attrib.get('comment'), expected=True)
+                'Token error: %s' % token_attrib['comment'], expected=True)
+
+        formats = self._extract_f4m_formats(
+            '%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
+            video_id, f4m_id='hds')
+        self._sort_formats(formats)
  
-        video_url = '%s?hdnea=%s&hdcore=3.2.0' % (
-            token_attrib['url'], token_attrib['auth'])
+        categories_str = _v('meta_sports')
+        categories = categories_str.split(',') if categories_str else []
  
          return {
              'id': video_id,
-            'is_live': is_live,
+            'display_id': display_id,
              'title': title,
-            'url': video_url,
-            'uploader': uploader,
+            'upload_date': unified_strdate(_v('time_date')),
+            'uploader': _v('meta_organisation'),
              'categories': categories,
-            'ext': 'mp4',
+            'is_live': _v('islive') == 'true',
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/lecture2go.py b/youtube_dl/extractor/lecture2go.py

index 40a3d23468636877cc485ac9e064ee3527a3dcb2..81b5d41be4a676e55c795fe233591913e7a691c8 100644 (file)
--- a/youtube_dl/extractor/lecture2go.py
+++ b/youtube_dl/extractor/lecture2go.py
@@ -6,6 +6,7 @@ import re
  from .common import InfoExtractor
  from ..utils import (
      determine_ext,
+    determine_protocol,
      parse_duration,
      int_or_none,
  )
@@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
          'md5': 'ac02b570883020d208d405d5a3fd2f7f',
          'info_dict': {
              'id': '17473',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': '2 - Endliche Automaten und reguläre Sprachen',
              'creator': 'Frank Heitmann',
              'duration': 5220,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          }
      }
  
@@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
          title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
  
          formats = []
-        for url in set(re.findall(r'"src","([^"]+)"', webpage)):
+        for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
              ext = determine_ext(url)
+            protocol = determine_protocol({'url': url})
              if ext == 'f4m':
-                formats.extend(self._extract_f4m_formats(url, video_id))
+                formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
              elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(url, video_id))
+                formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
              else:
+                if protocol == 'rtmp':
+                    continue  # XXX: currently broken
                  formats.append({
+                    'format_id': protocol,
                      'url': url,
                  })
  
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

new file mode 100644 (file)

index 0000000..375fdae
--- /dev/null
+++ b/youtube_dl/extractor/leeco.py
@@ -0,0 +1,361 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import base64
+import datetime
+import hashlib
+import re
+import time
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_ord,
+    compat_str,
+    compat_urllib_parse_urlencode,
+)
+from ..utils import (
+    determine_ext,
+    encode_data_uri,
+    ExtractorError,
+    int_or_none,
+    orderedSet,
+    parse_iso8601,
+    sanitized_Request,
+    str_or_none,
+    url_basename,
+)
+
+
+class LeIE(InfoExtractor):
+    IE_DESC = '乐视网'
+    _VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html'
+
+    _URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
+
+    _TESTS = [{
+        'url': 'http://www.le.com/ptv/vplay/22005890.html',
+        'md5': 'edadcfe5406976f42f9f266057ee5e40',
+        'info_dict': {
+            'id': '22005890',
+            'ext': 'mp4',
+            'title': '第87届奥斯卡颁奖礼完美落幕 《鸟人》成最大赢家',
+            'description': 'md5:a9cb175fd753e2962176b7beca21a47c',
+        },
+        'params': {
+            'hls_prefer_native': True,
+        },
+    }, {
+        'url': 'http://www.le.com/ptv/vplay/1415246.html',
+        'info_dict': {
+            'id': '1415246',
+            'ext': 'mp4',
+            'title': '美人天下01',
+            'description': 'md5:f88573d9d7225ada1359eaf0dbf8bcda',
+        },
+        'params': {
+            'hls_prefer_native': True,
+        },
+    }, {
+        'note': 'This video is available only in Mainland China, thus a proxy is needed',
+        'url': 'http://www.le.com/ptv/vplay/1118082.html',
+        'md5': '2424c74948a62e5f31988438979c5ad1',
+        'info_dict': {
+            'id': '1118082',
+            'ext': 'mp4',
+            'title': '与龙共舞 完整版',
+            'description': 'md5:7506a5eeb1722bb9d4068f85024e3986',
+        },
+        'params': {
+            'hls_prefer_native': True,
+        },
+        'skip': 'Only available in China',
+    }]
+
+    @staticmethod
+    def urshift(val, n):
+        return val >> n if val >= 0 else (val + 0x100000000) >> n
+
+    # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
+    def ror(self, param1, param2):
+        _loc3_ = 0
+        while _loc3_ < param2:
+            param1 = self.urshift(param1, 1) + ((param1 & 1) << 31)
+            _loc3_ += 1
+        return param1
+
+    def calc_time_key(self, param1):
+        _loc2_ = 773625421
+        _loc3_ = self.ror(param1, _loc2_ % 13)
+        _loc3_ = _loc3_ ^ _loc2_
+        _loc3_ = self.ror(_loc3_, _loc2_ % 17)
+        return _loc3_
+
+    # see M3U8Encryption class in KLetvPlayer.swf
+    @staticmethod
+    def decrypt_m3u8(encrypted_data):
+        if encrypted_data[:5].decode('utf-8').lower() != 'vc_01':
+            return encrypted_data
+        encrypted_data = encrypted_data[5:]
+
+        _loc4_ = bytearray(2 * len(encrypted_data))
+        for idx, val in enumerate(encrypted_data):
+            b = compat_ord(val)
+            _loc4_[2 * idx] = b // 16
+            _loc4_[2 * idx + 1] = b % 16
+        idx = len(_loc4_) - 11
+        _loc4_ = _loc4_[idx:] + _loc4_[:idx]
+        _loc7_ = bytearray(len(encrypted_data))
+        for i in range(len(encrypted_data)):
+            _loc7_[i] = _loc4_[2 * i] * 16 + _loc4_[2 * i + 1]
+
+        return bytes(_loc7_)
+
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        page = self._download_webpage(url, media_id)
+        params = {
+            'id': media_id,
+            'platid': 1,
+            'splatid': 101,
+            'format': 1,
+            'tkey': self.calc_time_key(int(time.time())),
+            'domain': 'www.le.com'
+        }
+        play_json_req = sanitized_Request(
+            'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
+        )
+        cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
+        if cn_verification_proxy:
+            play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
+
+        play_json = self._download_json(
+            play_json_req,
+            media_id, 'Downloading playJson data')
+
+        # Check for errors
+        playstatus = play_json['playstatus']
+        if playstatus['status'] == 0:
+            flag = playstatus['flag']
+            if flag == 1:
+                msg = 'Country %s auth error' % playstatus['country']
+            else:
+                msg = 'Generic error. flag = %d' % flag
+            raise ExtractorError(msg, expected=True)
+
+        playurl = play_json['playurl']
+
+        formats = ['350', '1000', '1300', '720p', '1080p']
+        dispatch = playurl['dispatch']
+
+        urls = []
+        for format_id in formats:
+            if format_id in dispatch:
+                media_url = playurl['domain'][0] + dispatch[format_id][0]
+                media_url += '&' + compat_urllib_parse_urlencode({
+                    'm3v': 1,
+                    'format': 1,
+                    'expect': 3,
+                    'rateid': format_id,
+                })
+
+                nodes_data = self._download_json(
+                    media_url, media_id,
+                    'Download JSON metadata for format %s' % format_id)
+
+                req = self._request_webpage(
+                    nodes_data['nodelist'][0]['location'], media_id,
+                    note='Downloading m3u8 information for format %s' % format_id)
+
+                m3u8_data = self.decrypt_m3u8(req.read())
+
+                url_info_dict = {
+                    'url': encode_data_uri(m3u8_data, 'application/vnd.apple.mpegurl'),
+                    'ext': determine_ext(dispatch[format_id][1]),
+                    'format_id': format_id,
+                    'protocol': 'm3u8',
+                }
+
+                if format_id[-1:] == 'p':
+                    url_info_dict['height'] = int_or_none(format_id[:-1])
+
+                urls.append(url_info_dict)
+
+        publish_time = parse_iso8601(self._html_search_regex(
+            r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
+            delimiter=' ', timezone=datetime.timedelta(hours=8))
+        description = self._html_search_meta('description', page, fatal=False)
+
+        return {
+            'id': media_id,
+            'formats': urls,
+            'title': playurl['title'],
+            'thumbnail': playurl['pic'],
+            'description': description,
+            'timestamp': publish_time,
+        }
+
+
+class LePlaylistIE(InfoExtractor):
+    _VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)'
+
+    _TESTS = [{
+        'url': 'http://www.le.com/tv/46177.html',
+        'info_dict': {
+            'id': '46177',
+            'title': '美人天下',
+            'description': 'md5:395666ff41b44080396e59570dbac01c'
+        },
+        'playlist_count': 35
+    }, {
+        'url': 'http://tv.le.com/izt/wuzetian/index.html',
+        'info_dict': {
+            'id': 'wuzetian',
+            'title': '武媚娘传奇',
+            'description': 'md5:e12499475ab3d50219e5bba00b3cb248'
+        },
+        # This playlist contains some extra videos other than the drama itself
+        'playlist_mincount': 96
+    }, {
+        'url': 'http://tv.le.com/pzt/lswjzzjc/index.shtml',
+        # This series is moved to http://www.le.com/tv/10005297.html
+        'only_matching': True,
+    }, {
+        'url': 'http://www.le.com/comic/92063.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://list.le.com/listn/c1009_sc532002_d2_p1_o1.html',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if LeIE.suitable(url) else super(LePlaylistIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        page = self._download_webpage(url, playlist_id)
+
+        # Currently old domain names are still used in playlists
+        media_ids = orderedSet(re.findall(
+            r'<a[^>]+href="http://www\.letv\.com/ptv/vplay/(\d+)\.html', page))
+        entries = [self.url_result(LeIE._URL_TEMPLATE % media_id, ie='Le')
+                   for media_id in media_ids]
+
+        title = self._html_search_meta('keywords', page,
+                                       fatal=False).split('，')[0]
+        description = self._html_search_meta('description', page, fatal=False)
+
+        return self.playlist_result(entries, playlist_id, playlist_title=title,
+                                    playlist_description=description)
+
+
+class LetvCloudIE(InfoExtractor):
+    # Most of *.letv.com is changed to *.le.com on 2016/01/02
+    # but yuntv.letv.com is kept, so also keep the extractor name
+    IE_DESC = '乐视云'
+    _VALID_URL = r'https?://yuntv\.letv\.com/bcloud.html\?.+'
+
+    _TESTS = [{
+        'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=467623dedf',
+        'md5': '26450599afd64c513bc77030ad15db44',
+        'info_dict': {
+            'id': 'p7jnfw5hw9_467623dedf',
+            'ext': 'mp4',
+            'title': 'Video p7jnfw5hw9_467623dedf',
+        },
+    }, {
+        'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=ec93197892&pu=2c7cd40209&auto_play=1&gpcflag=1&width=640&height=360',
+        'md5': 'e03d9cc8d9c13191e1caf277e42dbd31',
+        'info_dict': {
+            'id': 'p7jnfw5hw9_ec93197892',
+            'ext': 'mp4',
+            'title': 'Video p7jnfw5hw9_ec93197892',
+        },
+    }, {
+        'url': 'http://yuntv.letv.com/bcloud.html?uu=p7jnfw5hw9&vu=187060b6fd',
+        'md5': 'cb988699a776b22d4a41b9d43acfb3ac',
+        'info_dict': {
+            'id': 'p7jnfw5hw9_187060b6fd',
+            'ext': 'mp4',
+            'title': 'Video p7jnfw5hw9_187060b6fd',
+        },
+    }]
+
+    @staticmethod
+    def sign_data(obj):
+        if obj['cf'] == 'flash':
+            salt = '2f9d6924b33a165a6d8b5d3d42f4f987'
+            items = ['cf', 'format', 'ran', 'uu', 'ver', 'vu']
+        elif obj['cf'] == 'html5':
+            salt = 'fbeh5player12c43eccf2bec3300344'
+            items = ['cf', 'ran', 'uu', 'bver', 'vu']
+        input_data = ''.join([item + obj[item] for item in items]) + salt
+        obj['sign'] = hashlib.md5(input_data.encode('utf-8')).hexdigest()
+
+    def _get_formats(self, cf, uu, vu, media_id):
+        def get_play_json(cf, timestamp):
+            data = {
+                'cf': cf,
+                'ver': '2.2',
+                'bver': 'firefox44.0',
+                'format': 'json',
+                'uu': uu,
+                'vu': vu,
+                'ran': compat_str(timestamp),
+            }
+            self.sign_data(data)
+            return self._download_json(
+                'http://api.letvcloud.com/gpc.php?' + compat_urllib_parse_urlencode(data),
+                media_id, 'Downloading playJson data for type %s' % cf)
+
+        play_json = get_play_json(cf, time.time())
+        # The server time may be different from local time
+        if play_json.get('code') == 10071:
+            play_json = get_play_json(cf, play_json['timestamp'])
+
+        if not play_json.get('data'):
+            if play_json.get('message'):
+                raise ExtractorError('Letv cloud said: %s' % play_json['message'], expected=True)
+            elif play_json.get('code'):
+                raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
+            else:
+                raise ExtractorError('Letv cloud returned an unknwon error')
+
+        def b64decode(s):
+            return base64.b64decode(s.encode('utf-8')).decode('utf-8')
+
+        formats = []
+        for media in play_json['data']['video_info']['media'].values():
+            play_url = media['play_url']
+            url = b64decode(play_url['main_url'])
+            decoded_url = b64decode(url_basename(url))
+            formats.append({
+                'url': url,
+                'ext': determine_ext(decoded_url),
+                'format_id': str_or_none(play_url.get('vtype')),
+                'format_note': str_or_none(play_url.get('definition')),
+                'width': int_or_none(play_url.get('vwidth')),
+                'height': int_or_none(play_url.get('vheight')),
+            })
+
+        return formats
+
+    def _real_extract(self, url):
+        uu_mobj = re.search('uu=([\w]+)', url)
+        vu_mobj = re.search('vu=([\w]+)', url)
+
+        if not uu_mobj or not vu_mobj:
+            raise ExtractorError('Invalid URL: %s' % url, expected=True)
+
+        uu = uu_mobj.group(1)
+        vu = vu_mobj.group(1)
+        media_id = uu + '_' + vu
+
+        formats = self._get_formats('flash', uu, vu, media_id) + self._get_formats('html5', uu, vu, media_id)
+        self._sort_formats(formats)
+
+        return {
+            'id': media_id,
+            'title': 'Video %s' % media_id,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/lemonde.py b/youtube_dl/extractor/lemonde.py

new file mode 100644 (file)

index 0000000..be66fff
--- /dev/null
+++ b/youtube_dl/extractor/lemonde.py
@@ -0,0 +1,34 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class LemondeIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?lemonde\.fr/(?:[^/]+/)*(?P<id>[^/]+)\.html'
+    _TESTS = [{
+        'url': 'http://www.lemonde.fr/police-justice/video/2016/01/19/comprendre-l-affaire-bygmalion-en-cinq-minutes_4849702_1653578.html',
+        'md5': '01fb3c92de4c12c573343d63e163d302',
+        'info_dict': {
+            'id': 'lqm3kl',
+            'ext': 'mp4',
+            'title': "Comprendre l'affaire Bygmalion en 5 minutes",
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 320,
+            'upload_date': '20160119',
+            'timestamp': 1453194778,
+            'uploader_id': '3pmkp',
+        },
+    }, {
+        'url': 'http://redaction.actu.lemonde.fr/societe/video/2016/01/18/calais-debut-des-travaux-de-defrichement-dans-la-jungle_4849233_3224.html',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        digiteka_url = self._proto_relative_url(self._search_regex(
+            r'url\s*:\s*(["\'])(?P<url>(?:https?://)?//(?:www\.)?(?:digiteka\.net|ultimedia\.com)/deliver/.+?)\1',
+            webpage, 'digiteka url', group='url'))
+        return self.url_result(digiteka_url, 'Digiteka')
diff --git a/youtube_dl/extractor/letv.py b/youtube_dl/extractor/letv.py

deleted file mode 100644 (file)

index a28abb0..0000000
--- a/youtube_dl/extractor/letv.py
+++ /dev/null
@@ -1,207 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import datetime
-import re
-import time
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-    compat_urlparse,
-)
-from ..utils import (
-    determine_ext,
-    ExtractorError,
-    parse_iso8601,
-    int_or_none,
-)
-
-
-class LetvIE(InfoExtractor):
-    IE_DESC = '乐视网'
-    _VALID_URL = r'http://www\.letv\.com/ptv/vplay/(?P<id>\d+).html'
-
-    _TESTS = [{
-        'url': 'http://www.letv.com/ptv/vplay/22005890.html',
-        'md5': 'cab23bd68d5a8db9be31c9a222c1e8df',
-        'info_dict': {
-            'id': '22005890',
-            'ext': 'mp4',
-            'title': '第87届奥斯卡颁奖礼完美落幕 《鸟人》成最大赢家',
-            'timestamp': 1424747397,
-            'upload_date': '20150224',
-            'description': 'md5:a9cb175fd753e2962176b7beca21a47c',
-        }
-    }, {
-        'url': 'http://www.letv.com/ptv/vplay/1415246.html',
-        'info_dict': {
-            'id': '1415246',
-            'ext': 'mp4',
-            'title': '美人天下01',
-            'description': 'md5:f88573d9d7225ada1359eaf0dbf8bcda',
-        },
-    }, {
-        'note': 'This video is available only in Mainland China, thus a proxy is needed',
-        'url': 'http://www.letv.com/ptv/vplay/1118082.html',
-        'md5': 'f80936fbe20fb2f58648e81386ff7927',
-        'info_dict': {
-            'id': '1118082',
-            'ext': 'mp4',
-            'title': '与龙共舞 完整版',
-            'description': 'md5:7506a5eeb1722bb9d4068f85024e3986',
-        },
-        'skip': 'Only available in China',
-    }]
-
-    @staticmethod
-    def urshift(val, n):
-        return val >> n if val >= 0 else (val + 0x100000000) >> n
-
-    # ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf
-    def ror(self, param1, param2):
-        _loc3_ = 0
-        while _loc3_ < param2:
-            param1 = self.urshift(param1, 1) + ((param1 & 1) << 31)
-            _loc3_ += 1
-        return param1
-
-    def calc_time_key(self, param1):
-        _loc2_ = 773625421
-        _loc3_ = self.ror(param1, _loc2_ % 13)
-        _loc3_ = _loc3_ ^ _loc2_
-        _loc3_ = self.ror(_loc3_, _loc2_ % 17)
-        return _loc3_
-
-    def _real_extract(self, url):
-        media_id = self._match_id(url)
-        page = self._download_webpage(url, media_id)
-        params = {
-            'id': media_id,
-            'platid': 1,
-            'splatid': 101,
-            'format': 1,
-            'tkey': self.calc_time_key(int(time.time())),
-            'domain': 'www.letv.com'
-        }
-        play_json_req = compat_urllib_request.Request(
-            'http://api.letv.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params)
-        )
-        cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
-        if cn_verification_proxy:
-            play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
-
-        play_json = self._download_json(
-            play_json_req,
-            media_id, 'Downloading playJson data')
-
-        # Check for errors
-        playstatus = play_json['playstatus']
-        if playstatus['status'] == 0:
-            flag = playstatus['flag']
-            if flag == 1:
-                msg = 'Country %s auth error' % playstatus['country']
-            else:
-                msg = 'Generic error. flag = %d' % flag
-            raise ExtractorError(msg, expected=True)
-
-        playurl = play_json['playurl']
-
-        formats = ['350', '1000', '1300', '720p', '1080p']
-        dispatch = playurl['dispatch']
-
-        urls = []
-        for format_id in formats:
-            if format_id in dispatch:
-                media_url = playurl['domain'][0] + dispatch[format_id][0]
-
-                # Mimic what flvxz.com do
-                url_parts = list(compat_urlparse.urlparse(media_url))
-                qs = dict(compat_urlparse.parse_qs(url_parts[4]))
-                qs.update({
-                    'platid': '14',
-                    'splatid': '1401',
-                    'tss': 'no',
-                    'retry': 1
-                })
-                url_parts[4] = compat_urllib_parse.urlencode(qs)
-                media_url = compat_urlparse.urlunparse(url_parts)
-
-                url_info_dict = {
-                    'url': media_url,
-                    'ext': determine_ext(dispatch[format_id][1]),
-                    'format_id': format_id,
-                }
-
-                if format_id[-1:] == 'p':
-                    url_info_dict['height'] = int_or_none(format_id[:-1])
-
-                urls.append(url_info_dict)
-
-        publish_time = parse_iso8601(self._html_search_regex(
-            r'发布时间&nbsp;([^<>]+) ', page, 'publish time', default=None),
-            delimiter=' ', timezone=datetime.timedelta(hours=8))
-        description = self._html_search_meta('description', page, fatal=False)
-
-        return {
-            'id': media_id,
-            'formats': urls,
-            'title': playurl['title'],
-            'thumbnail': playurl['pic'],
-            'description': description,
-            'timestamp': publish_time,
-        }
-
-
-class LetvTvIE(InfoExtractor):
-    _VALID_URL = r'http://www.letv.com/tv/(?P<id>\d+).html'
-    _TESTS = [{
-        'url': 'http://www.letv.com/tv/46177.html',
-        'info_dict': {
-            'id': '46177',
-            'title': '美人天下',
-            'description': 'md5:395666ff41b44080396e59570dbac01c'
-        },
-        'playlist_count': 35
-    }]
-
-    def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-        page = self._download_webpage(url, playlist_id)
-
-        media_urls = list(set(re.findall(
-            r'http://www.letv.com/ptv/vplay/\d+.html', page)))
-        entries = [self.url_result(media_url, ie='Letv')
-                   for media_url in media_urls]
-
-        title = self._html_search_meta('keywords', page,
-                                       fatal=False).split('，')[0]
-        description = self._html_search_meta('description', page, fatal=False)
-
-        return self.playlist_result(entries, playlist_id, playlist_title=title,
-                                    playlist_description=description)
-
-
-class LetvPlaylistIE(LetvTvIE):
-    _VALID_URL = r'http://tv.letv.com/[a-z]+/(?P<id>[a-z]+)/index.s?html'
-    _TESTS = [{
-        'url': 'http://tv.letv.com/izt/wuzetian/index.html',
-        'info_dict': {
-            'id': 'wuzetian',
-            'title': '武媚娘传奇',
-            'description': 'md5:e12499475ab3d50219e5bba00b3cb248'
-        },
-        # This playlist contains some extra videos other than the drama itself
-        'playlist_mincount': 96
-    }, {
-        'url': 'http://tv.letv.com/pzt/lswjzzjc/index.shtml',
-        'info_dict': {
-            'id': 'lswjzzjc',
-            # The title should be "劲舞青春", but I can't find a simple way to
-            # determine the playlist title
-            'title': '乐视午间自制剧场',
-            'description': 'md5:b1eef244f45589a7b5b1af9ff25a4489'
-        },
-        'playlist_mincount': 7
-    }]
diff --git a/youtube_dl/extractor/libsyn.py b/youtube_dl/extractor/libsyn.py

index 9ab1416f55e29d69681d0ccf3678957482a3e80c..d375695f5a26dbc072455777487ed239820c1ec6 100644 (file)
--- a/youtube_dl/extractor/libsyn.py
+++ b/youtube_dl/extractor/libsyn.py
@@ -8,9 +8,9 @@ from ..utils import unified_strdate
  
  
  class LibsynIE(InfoExtractor):
-    _VALID_URL = r'https?://html5-player\.libsyn\.com/embed/episode/id/(?P<id>[0-9]+)'
+    _VALID_URL = r'(?P<mainurl>https?://html5-player\.libsyn\.com/embed/episode/id/(?P<id>[0-9]+))'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://html5-player.libsyn.com/embed/episode/id/3377616/',
          'md5': '443360ee1b58007bc3dcf09b41d093bb',
          'info_dict': {
@@ -19,12 +19,24 @@ class LibsynIE(InfoExtractor):
              'title': "The Daily Show Podcast without Jon Stewart - Episode 12: Bassem Youssef: Egypt's Jon Stewart",
              'description': 'md5:601cb790edd05908957dae8aaa866465',
              'upload_date': '20150220',
+            'thumbnail': 're:^https?://.*',
          },
-    }
+    }, {
+        'url': 'https://html5-player.libsyn.com/embed/episode/id/3727166/height/75/width/200/theme/standard/direction/no/autoplay/no/autonext/no/thumbnail/no/preload/no/no_addthis/no/',
+        'md5': '6c5cb21acd622d754d3b1a92b582ce42',
+        'info_dict': {
+            'id': '3727166',
+            'ext': 'mp3',
+            'title': 'Clients From Hell Podcast - How a Sex Toy Company Kickstarted my Freelance Career',
+            'upload_date': '20150818',
+            'thumbnail': 're:^https?://.*',
+        }
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
+        m = re.match(self._VALID_URL, url)
+        video_id = m.group('id')
+        url = m.group('mainurl')
          webpage = self._download_webpage(url, video_id)
  
          formats = [{
@@ -32,20 +44,18 @@ class LibsynIE(InfoExtractor):
          } for media_url in set(re.findall('var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
  
          podcast_title = self._search_regex(
-            r'<h2>([^<]+)</h2>', webpage, 'title')
+            r'<h2>([^<]+)</h2>', webpage, 'podcast title', default=None)
          episode_title = self._search_regex(
-            r'<h3>([^<]+)</h3>', webpage, 'title', default=None)
+            r'(?:<div class="episode-title">|<h3>)([^<]+)</', webpage, 'episode title')
  
          title = '%s - %s' % (podcast_title, episode_title) if podcast_title else episode_title
  
          description = self._html_search_regex(
              r'<div id="info_text_body">(.+?)</div>', webpage,
-            'description', fatal=False)
-
+            'description', default=None)
          thumbnail = self._search_regex(
              r'<img[^>]+class="info-show-icon"[^>]+src="([^"]+)"',
              webpage, 'thumbnail', fatal=False)
-
          release_date = unified_strdate(self._search_regex(
              r'<div class="release_date">Released: ([^<]+)<', webpage, 'release date', fatal=False))
  
diff --git a/youtube_dl/extractor/lifenews.py b/youtube_dl/extractor/lifenews.py

index f8cbca7b36afab1890b71806d6761bbe67d7d924..ba2f80a757d071042b8d574721bde37a1b7006ba 100644 (file)
--- a/youtube_dl/extractor/lifenews.py
+++ b/youtube_dl/extractor/lifenews.py
@@ -17,21 +17,21 @@ from ..utils import (
  class LifeNewsIE(InfoExtractor):
      IE_NAME = 'lifenews'
      IE_DESC = 'LIFE | NEWS'
-    _VALID_URL = r'http://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
+    _VALID_URL = r'https?://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://lifenews.ru/news/126342',
-        'md5': 'e1b50a5c5fb98a6a544250f2e0db570a',
+        # single video embedded via video/source
+        'url': 'http://lifenews.ru/news/98736',
+        'md5': '77c95eaefaca216e32a76a343ad89d23',
          'info_dict': {
-            'id': '126342',
+            'id': '98736',
              'ext': 'mp4',
-            'title': 'МВД разыскивает мужчин, оставивших в IKEA сумку с автоматом',
-            'description': 'Камеры наблюдения гипермаркета зафиксировали троих мужчин, спрятавших оружейный арсенал в камере хранения.',
-            'thumbnail': 're:http://.*\.jpg',
-            'upload_date': '20140130',
+            'title': 'Мужчина нашел дома архив оборонного завода',
+            'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26',
+            'upload_date': '20120805',
          }
      }, {
-        # video in <iframe>
+        # single video embedded via iframe
          'url': 'http://lifenews.ru/news/152125',
          'md5': '77d19a6f0886cd76bdbf44b4d971a273',
          'info_dict': {
@@ -42,15 +42,33 @@ class LifeNewsIE(InfoExtractor):
              'upload_date': '20150402',
          }
      }, {
+        # two videos embedded via iframe
          'url': 'http://lifenews.ru/news/153461',
-        'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
          'info_dict': {
              'id': '153461',
-            'ext': 'mp4',
              'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
              'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
              'upload_date': '20150505',
-        }
+        },
+        'playlist': [{
+            'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
+            'info_dict': {
+                'id': '153461-video1',
+                'ext': 'mp4',
+                'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)',
+                'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
+                'upload_date': '20150505',
+            },
+        }, {
+            'md5': 'ebb3bf3b1ce40e878d0d628e93eb0322',
+            'info_dict': {
+                'id': '153461-video2',
+                'ext': 'mp4',
+                'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)',
+                'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
+                'upload_date': '20150505',
+            },
+        }],
      }, {
          'url': 'http://lifenews.ru/video/13035',
          'only_matching': True,
@@ -65,10 +83,14 @@ class LifeNewsIE(InfoExtractor):
              'http://lifenews.ru/%s/%s' % (section, video_id),
              video_id, 'Downloading page')
  
-        videos = re.findall(r'<video.*?poster="(?P<poster>[^"]+)".*?src="(?P<video>[^"]+)".*?></video>', webpage)
-        iframe_link = self._html_search_regex(
-            '<iframe[^>]+src=["\']([^"\']+)["\']', webpage, 'iframe link', default=None)
-        if not videos and not iframe_link:
+        video_urls = re.findall(
+            r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
+
+        iframe_links = re.findall(
+            r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/embed/.+?)["\']',
+            webpage)
+
+        if not video_urls and not iframe_links:
              raise ExtractorError('No media links available for %s' % video_id)
  
          title = remove_end(
@@ -95,36 +117,49 @@ class LifeNewsIE(InfoExtractor):
              'upload_date': upload_date,
          }
  
-        def make_entry(video_id, media, video_number=None):
+        def make_entry(video_id, video_url, index=None):
              cur_info = dict(common_info)
              cur_info.update({
-                'id': video_id,
-                'url': media[1],
-                'thumbnail': media[0],
-                'title': title if video_number is None else '%s-video%s' % (title, video_number),
+                'id': video_id if not index else '%s-video%s' % (video_id, index),
+                'url': video_url,
+                'title': title if not index else '%s (Видео %s)' % (title, index),
              })
              return cur_info
  
-        if iframe_link:
-            iframe_link = self._proto_relative_url(iframe_link, 'http:')
-            cur_info = dict(common_info)
-            cur_info.update({
-                '_type': 'url_transparent',
-                'id': video_id,
-                'title': title,
-                'url': iframe_link,
-            })
+        def make_video_entry(video_id, video_url, index=None):
+            video_url = compat_urlparse.urljoin(url, video_url)
+            return make_entry(video_id, video_url, index)
+
+        def make_iframe_entry(video_id, video_url, index=None):
+            video_url = self._proto_relative_url(video_url, 'http:')
+            cur_info = make_entry(video_id, video_url, index)
+            cur_info['_type'] = 'url_transparent'
              return cur_info
  
-        if len(videos) == 1:
-            return make_entry(video_id, videos[0])
-        else:
-            return [make_entry(video_id, media, video_number + 1) for video_number, media in enumerate(videos)]
+        if len(video_urls) == 1 and not iframe_links:
+            return make_video_entry(video_id, video_urls[0])
+
+        if len(iframe_links) == 1 and not video_urls:
+            return make_iframe_entry(video_id, iframe_links[0])
+
+        entries = []
+
+        if video_urls:
+            for num, video_url in enumerate(video_urls, 1):
+                entries.append(make_video_entry(video_id, video_url, num))
+
+        if iframe_links:
+            for num, iframe_link in enumerate(iframe_links, len(video_urls) + 1):
+                entries.append(make_iframe_entry(video_id, iframe_link, num))
+
+        playlist = common_info.copy()
+        playlist.update(self.playlist_result(entries, video_id, title, description))
+        return playlist
  
  
  class LifeEmbedIE(InfoExtractor):
      IE_NAME = 'life:embed'
-    _VALID_URL = r'http://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
+    _VALID_URL = r'https?://embed\.life\.ru/embed/(?P<id>[\da-f]{32})'
  
      _TEST = {
          'url': 'http://embed.life.ru/embed/e50c2dec2867350528e2574c899b8291',
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

new file mode 100644 (file)

index 0000000..2599d45
--- /dev/null
+++ b/youtube_dl/extractor/limelight.py
@@ -0,0 +1,230 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    float_or_none,
+    int_or_none,
+)
+
+
+class LimelightBaseIE(InfoExtractor):
+    _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
+    _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
+
+    def _call_playlist_service(self, item_id, method, fatal=True):
+        return self._download_json(
+            self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
+            item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal)
+
+    def _call_api(self, organization_id, item_id, method):
+        return self._download_json(
+            self._API_URL % (organization_id, self._API_PATH, item_id, method),
+            item_id, 'Downloading API %s JSON' % method)
+
+    def _extract(self, item_id, pc_method, mobile_method, meta_method):
+        pc = self._call_playlist_service(item_id, pc_method)
+        metadata = self._call_api(pc['orgId'], item_id, meta_method)
+        mobile = self._call_playlist_service(item_id, mobile_method, fatal=False)
+        return pc, mobile, metadata
+
+    def _extract_info(self, streams, mobile_urls, properties):
+        video_id = properties['media_id']
+        formats = []
+
+        for stream in streams:
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            if '.f4m' in stream_url:
+                formats.extend(self._extract_f4m_formats(
+                    stream_url, video_id, fatal=False))
+            else:
+                fmt = {
+                    'url': stream_url,
+                    'abr': float_or_none(stream.get('audioBitRate')),
+                    'vbr': float_or_none(stream.get('videoBitRate')),
+                    'fps': float_or_none(stream.get('videoFrameRate')),
+                    'width': int_or_none(stream.get('videoWidthInPixels')),
+                    'height': int_or_none(stream.get('videoHeightInPixels')),
+                    'ext': determine_ext(stream_url)
+                }
+                rtmp = re.search(r'^(?P<url>rtmpe?://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', stream_url)
+                if rtmp:
+                    format_id = 'rtmp'
+                    if stream.get('videoBitRate'):
+                        format_id += '-%d' % int_or_none(stream['videoBitRate'])
+                    fmt.update({
+                        'url': rtmp.group('url'),
+                        'play_path': rtmp.group('playpath'),
+                        'app': rtmp.group('app'),
+                        'ext': 'flv',
+                        'format_id': format_id,
+                    })
+                formats.append(fmt)
+
+        for mobile_url in mobile_urls:
+            media_url = mobile_url.get('mobileUrl')
+            if not media_url:
+                continue
+            format_id = mobile_url.get('targetMediaPlatform')
+            if determine_ext(media_url) == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    media_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=format_id, fatal=False))
+            else:
+                formats.append({
+                    'url': media_url,
+                    'format_id': format_id,
+                    'preference': -1,
+                })
+
+        self._sort_formats(formats)
+
+        title = properties['title']
+        description = properties.get('description')
+        timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
+        duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
+        filesize = int_or_none(properties.get('total_storage_in_bytes'))
+        categories = [properties.get('category')]
+        tags = properties.get('tags', [])
+        thumbnails = [{
+            'url': thumbnail['url'],
+            'width': int_or_none(thumbnail.get('width')),
+            'height': int_or_none(thumbnail.get('height')),
+        } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
+
+        subtitles = {}
+        for caption in properties.get('captions', {}):
+            lang = caption.get('language_code')
+            subtitles_url = caption.get('url')
+            if lang and subtitles_url:
+                subtitles[lang] = [{
+                    'url': subtitles_url,
+                }]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'formats': formats,
+            'timestamp': timestamp,
+            'duration': duration,
+            'filesize': filesize,
+            'categories': categories,
+            'tags': tags,
+            'thumbnails': thumbnails,
+            'subtitles': subtitles,
+        }
+
+
+class LimelightMediaIE(LimelightBaseIE):
+    IE_NAME = 'limelight'
+    _VALID_URL = r'(?:limelight:media:|https?://link\.videoplatform\.limelight\.com/media/\??\bmediaId=)(?P<id>[a-z0-9]{32})'
+    _TESTS = [{
+        'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
+        'info_dict': {
+            'id': '3ffd040b522b4485b6d84effc750cd86',
+            'ext': 'flv',
+            'title': 'HaP and the HB Prince Trailer',
+            'description': 'md5:8005b944181778e313d95c1237ddb640',
+            'thumbnail': 're:^https?://.*\.jpeg$',
+            'duration': 144.23,
+            'timestamp': 1244136834,
+            'upload_date': '20090604',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # video with subtitles
+        'url': 'limelight:media:a3e00274d4564ec4a9b29b9466432335',
+        'info_dict': {
+            'id': 'a3e00274d4564ec4a9b29b9466432335',
+            'ext': 'flv',
+            'title': '3Play Media Overview Video',
+            'description': '',
+            'thumbnail': 're:^https?://.*\.jpeg$',
+            'duration': 78.101,
+            'timestamp': 1338929955,
+            'upload_date': '20120605',
+            'subtitles': 'mincount:9',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }]
+    _PLAYLIST_SERVICE_PATH = 'media'
+    _API_PATH = 'media'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        pc, mobile, metadata = self._extract(
+            video_id, 'getPlaylistByMediaId', 'getMobilePlaylistByMediaId', 'properties')
+
+        return self._extract_info(
+            pc['playlistItems'][0].get('streams', []),
+            mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
+            metadata)
+
+
+class LimelightChannelIE(LimelightBaseIE):
+    IE_NAME = 'limelight:channel'
+    _VALID_URL = r'(?:limelight:channel:|https?://link\.videoplatform\.limelight\.com/media/\??\bchannelId=)(?P<id>[a-z0-9]{32})'
+    _TEST = {
+        'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
+        'info_dict': {
+            'id': 'ab6a524c379342f9b23642917020c082',
+            'title': 'Javascript Sample Code',
+        },
+        'playlist_mincount': 3,
+    }
+    _PLAYLIST_SERVICE_PATH = 'channel'
+    _API_PATH = 'channels'
+
+    def _real_extract(self, url):
+        channel_id = self._match_id(url)
+
+        pc, mobile, medias = self._extract(
+            channel_id, 'getPlaylistByChannelId',
+            'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1', 'media')
+
+        entries = [
+            self._extract_info(
+                pc['playlistItems'][i].get('streams', []),
+                mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
+                medias['media_list'][i])
+            for i in range(len(medias['media_list']))]
+
+        return self.playlist_result(entries, channel_id, pc['title'])
+
+
+class LimelightChannelListIE(LimelightBaseIE):
+    IE_NAME = 'limelight:channel_list'
+    _VALID_URL = r'(?:limelight:channel_list:|https?://link\.videoplatform\.limelight\.com/media/\?.*?\bchannelListId=)(?P<id>[a-z0-9]{32})'
+    _TEST = {
+        'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
+        'info_dict': {
+            'id': '301b117890c4465c8179ede21fd92e2b',
+            'title': 'Website - Hero Player',
+        },
+        'playlist_mincount': 2,
+    }
+    _PLAYLIST_SERVICE_PATH = 'channel_list'
+
+    def _real_extract(self, url):
+        channel_list_id = self._match_id(url)
+
+        channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
+
+        entries = [
+            self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
+            for channel in channel_list['channelList']]
+
+        return self.playlist_result(entries, channel_list_id, channel_list['title'])
diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py

index 857edfde263196d9bf2811568cc9f9de90eed92b..29fba5f30b0cc4633dbc978e886c62eab0d4ac81 100644 (file)
--- a/youtube_dl/extractor/liveleak.py
+++ b/youtube_dl/extractor/liveleak.py
@@ -47,12 +47,20 @@ class LiveLeakIE(InfoExtractor):
          'info_dict': {
              'id': '801_1409392012',
              'ext': 'mp4',
-            'description': "Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.",
+            'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
              'uploader': 'bony333',
              'title': 'Crazy Hungarian tourist films close call waterspout in Croatia'
          }
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
+            webpage)
+        if mobj:
+            return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py

index 6d7733e4111355a5011765336333f229596b8356..eada7c299238953baa9fd3d8219b2754aa7f9356 100644 (file)
--- a/youtube_dl/extractor/livestream.py
+++ b/youtube_dl/extractor/livestream.py
@@ -1,27 +1,30 @@
  from __future__ import unicode_literals
  
  import re
-import json
  import itertools
  
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
-    compat_urllib_parse_urlparse,
      compat_urlparse,
  )
  from ..utils import (
-    ExtractorError,
      find_xpath_attr,
-    int_or_none,
-    orderedSet,
+    xpath_attr,
      xpath_with_ns,
+    xpath_text,
+    orderedSet,
+    update_url_query,
+    int_or_none,
+    float_or_none,
+    parse_iso8601,
+    determine_ext,
  )
  
  
  class LivestreamIE(InfoExtractor):
      IE_NAME = 'livestream'
-    _VALID_URL = r'https?://(?:new\.)?livestream\.com/.*?/(?P<event_name>.*?)(/videos/(?P<id>[0-9]+)(?:/player)?)?/?(?:$|[?#])'
+    _VALID_URL = r'https?://(?:new\.)?livestream\.com/(?:accounts/(?P<account_id>\d+)|(?P<account_name>[^/]+))/(?:events/(?P<event_id>\d+)|(?P<event_name>[^/]+))(?:/videos/(?P<id>\d+))?'
      _TESTS = [{
          'url': 'http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370',
          'md5': '53274c76ba7754fb0e8d072716f2292b',
@@ -29,7 +32,9 @@ class LivestreamIE(InfoExtractor):
              'id': '4719370',
              'ext': 'mp4',
              'title': 'Live from Webster Hall NYC',
+            'timestamp': 1350008072,
              'upload_date': '20121012',
+            'duration': 5968.0,
              'like_count': int,
              'view_count': int,
              'thumbnail': 're:^http://.*\.jpg$'
@@ -55,39 +60,23 @@ class LivestreamIE(InfoExtractor):
          'url': 'http://livestream.com/bsww/concacafbeachsoccercampeonato2015',
          'only_matching': True,
      }]
+    _API_URL_TEMPLATE = 'http://livestream.com/api/accounts/%s/events/%s'
+
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        base_ele = find_xpath_attr(
+            smil, self._xpath_ns('.//meta', namespace), 'name', 'httpBase')
+        base = base_ele.get('content') if base_ele is not None else 'http://livestreamvod-f.akamaihd.net/'
  
-    def _parse_smil(self, video_id, smil_url):
          formats = []
-        _SWITCH_XPATH = (
-            './/{http://www.w3.org/2001/SMIL20/Language}body/'
-            '{http://www.w3.org/2001/SMIL20/Language}switch')
-        smil_doc = self._download_xml(
-            smil_url, video_id,
-            note='Downloading SMIL information',
-            errnote='Unable to download SMIL information',
-            fatal=False)
-        if smil_doc is False:  # Download failed
-            return formats
-        title_node = find_xpath_attr(
-            smil_doc, './/{http://www.w3.org/2001/SMIL20/Language}meta',
-            'name', 'title')
-        if title_node is None:
-            self.report_warning('Cannot find SMIL id')
-            switch_node = smil_doc.find(_SWITCH_XPATH)
-        else:
-            title_id = title_node.attrib['content']
-            switch_node = find_xpath_attr(
-                smil_doc, _SWITCH_XPATH, 'id', title_id)
-        if switch_node is None:
-            raise ExtractorError('Cannot find switch node')
-        video_nodes = switch_node.findall(
-            '{http://www.w3.org/2001/SMIL20/Language}video')
+        video_nodes = smil.findall(self._xpath_ns('.//video', namespace))
  
          for vn in video_nodes:
-            tbr = int_or_none(vn.attrib.get('system-bitrate'))
+            tbr = int_or_none(vn.attrib.get('system-bitrate'), 1000)
              furl = (
-                'http://livestream-f.akamaihd.net/%s?v=3.0.3&fp=WIN%%2014,0,0,145' %
-                (vn.attrib['src']))
+                update_url_query(compat_urlparse.urljoin(base, vn.attrib['src']), {
+                    'v': '3.0.3',
+                    'fp': 'WIN% 14,0,0,145',
+                }))
              if 'clipBegin' in vn.attrib:
                  furl += '&ssek=' + vn.attrib['clipBegin']
              formats.append({
@@ -106,97 +95,141 @@ class LivestreamIE(InfoExtractor):
              ('sd', 'progressive_url'),
              ('hd', 'progressive_url_hd'),
          )
-        formats = [{
-            'format_id': format_id,
-            'url': video_data[key],
-            'quality': i + 1,
-        } for i, (format_id, key) in enumerate(FORMAT_KEYS)
-            if video_data.get(key)]
+
+        formats = []
+        for format_id, key in FORMAT_KEYS:
+            video_url = video_data.get(key)
+            if video_url:
+                ext = determine_ext(video_url)
+                if ext == 'm3u8':
+                    continue
+                bitrate = int_or_none(self._search_regex(
+                    r'(\d+)\.%s' % ext, video_url, 'bitrate', default=None))
+                formats.append({
+                    'url': video_url,
+                    'format_id': format_id,
+                    'tbr': bitrate,
+                    'ext': ext,
+                })
  
          smil_url = video_data.get('smil_url')
          if smil_url:
-            formats.extend(self._parse_smil(video_id, smil_url))
+            formats.extend(self._extract_smil_formats(smil_url, video_id))
+
+        m3u8_url = video_data.get('m3u8_url')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        f4m_url = video_data.get('f4m_url')
+        if f4m_url:
+            formats.extend(self._extract_f4m_formats(
+                f4m_url, video_id, f4m_id='hds', fatal=False))
          self._sort_formats(formats)
  
+        comments = [{
+            'author_id': comment.get('author_id'),
+            'author': comment.get('author', {}).get('full_name'),
+            'id': comment.get('id'),
+            'text': comment['text'],
+            'timestamp': parse_iso8601(comment.get('created_at')),
+        } for comment in video_data.get('comments', {}).get('data', [])]
+
          return {
              'id': video_id,
              'formats': formats,
              'title': video_data['caption'],
+            'description': video_data.get('description'),
              'thumbnail': video_data.get('thumbnail_url'),
-            'upload_date': video_data['updated_at'].replace('-', '')[:8],
+            'duration': float_or_none(video_data.get('duration'), 1000),
+            'timestamp': parse_iso8601(video_data.get('publish_at')),
              'like_count': video_data.get('likes', {}).get('total'),
+            'comment_count': video_data.get('comments', {}).get('total'),
              'view_count': video_data.get('views'),
+            'comments': comments,
          }
  
-    def _extract_event(self, info):
-        event_id = compat_str(info['id'])
-        account = compat_str(info['owner_account_id'])
-        root_url = (
-            'https://new.livestream.com/api/accounts/{account}/events/{event}/'
-            'feed.json'.format(account=account, event=event_id))
-
-        def _extract_videos():
-            last_video = None
-            for i in itertools.count(1):
-                if last_video is None:
-                    info_url = root_url
-                else:
-                    info_url = '{root}?&id={id}&newer=-1&type=video'.format(
-                        root=root_url, id=last_video)
-                videos_info = self._download_json(info_url, event_id, 'Downloading page {0}'.format(i))['data']
-                videos_info = [v['data'] for v in videos_info if v['type'] == 'video']
-                if not videos_info:
-                    break
-                for v in videos_info:
-                    yield self._extract_video_info(v)
-                last_video = videos_info[-1]['id']
-        return self.playlist_result(_extract_videos(), event_id, info['full_name'])
+    def _extract_stream_info(self, stream_info):
+        broadcast_id = stream_info['broadcast_id']
+        is_live = stream_info.get('is_live')
+
+        formats = []
+        smil_url = stream_info.get('play_url')
+        if smil_url:
+            formats.extend(self._extract_smil_formats(smil_url, broadcast_id))
+
+        entry_protocol = 'm3u8' if is_live else 'm3u8_native'
+        m3u8_url = stream_info.get('m3u8_url')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, broadcast_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=False))
+
+        rtsp_url = stream_info.get('rtsp_url')
+        if rtsp_url:
+            formats.append({
+                'url': rtsp_url,
+                'format_id': 'rtsp',
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': broadcast_id,
+            'formats': formats,
+            'title': self._live_title(stream_info['stream_title']) if is_live else stream_info['stream_title'],
+            'thumbnail': stream_info.get('thumbnail_url'),
+            'is_live': is_live,
+        }
+
+    def _extract_event(self, event_data):
+        event_id = compat_str(event_data['id'])
+        account_id = compat_str(event_data['owner_account_id'])
+        feed_root_url = self._API_URL_TEMPLATE % (account_id, event_id) + '/feed.json'
+
+        stream_info = event_data.get('stream_info')
+        if stream_info:
+            return self._extract_stream_info(stream_info)
+
+        last_video = None
+        entries = []
+        for i in itertools.count(1):
+            if last_video is None:
+                info_url = feed_root_url
+            else:
+                info_url = '{root}?&id={id}&newer=-1&type=video'.format(
+                    root=feed_root_url, id=last_video)
+            videos_info = self._download_json(
+                info_url, event_id, 'Downloading page {0}'.format(i))['data']
+            videos_info = [v['data'] for v in videos_info if v['type'] == 'video']
+            if not videos_info:
+                break
+            for v in videos_info:
+                entries.append(self.url_result(
+                    'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v['id']),
+                    'Livestream', v['id'], v['caption']))
+            last_video = videos_info[-1]['id']
+        return self.playlist_result(entries, event_id, event_data['full_name'])
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        event_name = mobj.group('event_name')
-        webpage = self._download_webpage(url, video_id or event_name)
-
-        og_video = self._og_search_video_url(
-            webpage, 'player url', fatal=False, default=None)
-        if og_video is not None:
-            query_str = compat_urllib_parse_urlparse(og_video).query
-            query = compat_urlparse.parse_qs(query_str)
-            if 'play_url' in query:
-                api_url = query['play_url'][0].replace('.smil', '')
-                info = json.loads(self._download_webpage(
-                    api_url, video_id, 'Downloading video info'))
-                return self._extract_video_info(info)
-
-        config_json = self._search_regex(
-            r'window.config = ({.*?});', webpage, 'window config')
-        info = json.loads(config_json)['event']
-
-        def is_relevant(vdata, vid):
-            result = vdata['type'] == 'video'
-            if video_id is not None:
-                result = result and compat_str(vdata['data']['id']) == vid
-            return result
-
-        if video_id is None:
-            # This is an event page:
-            return self._extract_event(info)
+        event = mobj.group('event_id') or mobj.group('event_name')
+        account = mobj.group('account_id') or mobj.group('account_name')
+        api_url = self._API_URL_TEMPLATE % (account, event)
+        if video_id:
+            video_data = self._download_json(
+                api_url + '/videos/%s' % video_id, video_id)
+            return self._extract_video_info(video_data)
          else:
-            videos = [self._extract_video_info(video_data['data'])
-                      for video_data in info['feed']['data']
-                      if is_relevant(video_data, video_id)]
-            if not videos:
-                raise ExtractorError('Cannot find video %s' % video_id)
-            return videos[0]
+            event_data = self._download_json(api_url, video_id)
+            return self._extract_event(event_data)
  
  
  # The original version of Livestream uses a different system
  class LivestreamOriginalIE(InfoExtractor):
      IE_NAME = 'livestream:original'
      _VALID_URL = r'''(?x)https?://original\.livestream\.com/
-        (?P<user>[^/]+)/(?P<type>video|folder)
-        (?:\?.*?Id=|/)(?P<id>.*?)(&|$)
+        (?P<user>[^/\?#]+)(?:/(?P<type>video|folder)
+        (?:(?:\?.*?Id=|/)(?P<id>.*?)(&|$))?)?
          '''
      _TESTS = [{
          'url': 'http://original.livestream.com/dealbook/video?clipId=pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
@@ -204,6 +237,8 @@ class LivestreamOriginalIE(InfoExtractor):
              'id': 'pla_8aa4a3f1-ba15-46a4-893b-902210e138fb',
              'ext': 'mp4',
              'title': 'Spark 1 (BitCoin) with Cameron Winklevoss & Tyler Winklevoss of Winklevoss Capital',
+            'duration': 771.301,
+            'view_count': int,
          },
      }, {
          'url': 'https://original.livestream.com/newplay/folder?dirId=a07bf706-d0e4-4e75-a747-b021d84f2fd3',
@@ -211,26 +246,60 @@ class LivestreamOriginalIE(InfoExtractor):
              'id': 'a07bf706-d0e4-4e75-a747-b021d84f2fd3',
          },
          'playlist_mincount': 4,
+    }, {
+        # live stream
+        'url': 'http://original.livestream.com/znsbahamas',
+        'only_matching': True,
      }]
  
-    def _extract_video(self, user, video_id):
-        api_url = 'http://x{0}x.api.channel.livestream.com/2.0/clipdetails?extendedInfo=true&id={1}'.format(user, video_id)
-
+    def _extract_video_info(self, user, video_id):
+        api_url = 'http://x%sx.api.channel.livestream.com/2.0/clipdetails?extendedInfo=true&id=%s' % (user, video_id)
          info = self._download_xml(api_url, video_id)
-        # this url is used on mobile devices
-        stream_url = 'http://x{0}x.api.channel.livestream.com/3.0/getstream.json?id={1}'.format(user, video_id)
-        stream_info = self._download_json(stream_url, video_id)
+
          item = info.find('channel').find('item')
-        ns = {'media': 'http://search.yahoo.com/mrss'}
-        thumbnail_url = item.find(xpath_with_ns('media:thumbnail', ns)).attrib['url']
+        title = xpath_text(item, 'title')
+        media_ns = {'media': 'http://search.yahoo.com/mrss'}
+        thumbnail_url = xpath_attr(
+            item, xpath_with_ns('media:thumbnail', media_ns), 'url')
+        duration = float_or_none(xpath_attr(
+            item, xpath_with_ns('media:content', media_ns), 'duration'))
+        ls_ns = {'ls': 'http://api.channel.livestream.com/2.0'}
+        view_count = int_or_none(xpath_text(
+            item, xpath_with_ns('ls:viewsCount', ls_ns)))
  
          return {
              'id': video_id,
-            'title': item.find('title').text,
-            'url': stream_info['progressiveUrl'],
+            'title': title,
              'thumbnail': thumbnail_url,
+            'duration': duration,
+            'view_count': view_count,
          }
  
+    def _extract_video_formats(self, video_data, video_id, entry_protocol):
+        formats = []
+
+        progressive_url = video_data.get('progressiveUrl')
+        if progressive_url:
+            formats.append({
+                'url': progressive_url,
+                'format_id': 'http',
+            })
+
+        m3u8_url = video_data.get('httpUrl')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol, m3u8_id='hls', fatal=False))
+
+        rtsp_url = video_data.get('rtspUrl')
+        if rtsp_url:
+            formats.append({
+                'url': rtsp_url,
+                'format_id': 'rtsp',
+            })
+
+        self._sort_formats(formats)
+        return formats
+
      def _extract_folder(self, url, folder_id):
          webpage = self._download_webpage(url, folder_id)
          paths = orderedSet(re.findall(
@@ -239,24 +308,45 @@ class LivestreamOriginalIE(InfoExtractor):
                  <a\s+href="(?=https?://livestre\.am/)
              )([^"]+)"''', webpage))
  
-        return {
-            '_type': 'playlist',
-            'id': folder_id,
-            'entries': [{
-                '_type': 'url',
-                'url': compat_urlparse.urljoin(url, p),
-            } for p in paths],
-        }
+        entries = [{
+            '_type': 'url',
+            'url': compat_urlparse.urljoin(url, p),
+        } for p in paths]
+
+        return self.playlist_result(entries, folder_id)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        id = mobj.group('id')
          user = mobj.group('user')
          url_type = mobj.group('type')
+        content_id = mobj.group('id')
          if url_type == 'folder':
-            return self._extract_folder(url, id)
+            return self._extract_folder(url, content_id)
          else:
-            return self._extract_video(user, id)
+            # this url is used on mobile devices
+            stream_url = 'http://x%sx.api.channel.livestream.com/3.0/getstream.json' % user
+            info = {}
+            if content_id:
+                stream_url += '?id=%s' % content_id
+                info = self._extract_video_info(user, content_id)
+            else:
+                content_id = user
+                webpage = self._download_webpage(url, content_id)
+                info = {
+                    'title': self._og_search_title(webpage),
+                    'description': self._og_search_description(webpage),
+                    'thumbnail': self._search_regex(r'channelLogo.src\s*=\s*"([^"]+)"', webpage, 'thumbnail', None),
+                }
+            video_data = self._download_json(stream_url, content_id)
+            is_live = video_data.get('isLive')
+            entry_protocol = 'm3u8' if is_live else 'm3u8_native'
+            info.update({
+                'id': content_id,
+                'title': self._live_title(info['title']) if is_live else info['title'],
+                'formats': self._extract_video_formats(video_data, content_id, entry_protocol),
+                'is_live': is_live,
+            })
+            return info
  
  
  # The server doesn't support HEAD request, the generic extractor can't detect
diff --git a/youtube_dl/extractor/lovehomeporn.py b/youtube_dl/extractor/lovehomeporn.py

new file mode 100644 (file)

index 0000000..8f65a3c
--- /dev/null
+++ b/youtube_dl/extractor/lovehomeporn.py
@@ -0,0 +1,37 @@
+from __future__ import unicode_literals
+
+import re
+
+from .nuevo import NuevoBaseIE
+
+
+class LoveHomePornIE(NuevoBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?lovehomeporn\.com/video/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
+    _TEST = {
+        'url': 'http://lovehomeporn.com/video/48483/stunning-busty-brunette-girlfriend-sucking-and-riding-a-big-dick#menu',
+        'info_dict': {
+            'id': '48483',
+            'display_id': 'stunning-busty-brunette-girlfriend-sucking-and-riding-a-big-dick',
+            'ext': 'mp4',
+            'title': 'Stunning busty brunette girlfriend sucking and riding a big dick',
+            'age_limit': 18,
+            'duration': 238.47,
+        },
+        'params': {
+            'skip_download': True,
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        info = self._extract_nuevo(
+            'http://lovehomeporn.com/media/nuevo/config.php?key=%s' % video_id,
+            video_id)
+        info.update({
+            'display_id': display_id,
+            'age_limit': 18
+        })
+        return info
diff --git a/youtube_dl/extractor/lrt.py b/youtube_dl/extractor/lrt.py

index e3236f7b5797ab80431ed11b7027249354015a33..1072405b30c7663d19ddc4df86f858d94952fda5 100644 (file)
--- a/youtube_dl/extractor/lrt.py
+++ b/youtube_dl/extractor/lrt.py
@@ -1,12 +1,9 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
-    determine_ext,
-    js_to_json,
+    int_or_none,
      parse_duration,
      remove_end,
  )
@@ -23,9 +20,11 @@ class LRTIE(InfoExtractor):
              'title': 'Septynios Kauno dienos',
              'description': 'md5:24d84534c7dc76581e59f5689462411a',
              'duration': 1783,
+            'view_count': int,
+            'like_count': int,
          },
          'params': {
-            'skip_download': True,  # HLS download
+            'skip_download': True,  # m3u8 download
          },
      }
  
@@ -34,29 +33,24 @@ class LRTIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          title = remove_end(self._og_search_title(webpage), ' - LRT')
+        m3u8_url = self._search_regex(
+            r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
+            webpage, 'm3u8 url', group='url')
+        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        self._sort_formats(formats)
+
          thumbnail = self._og_search_thumbnail(webpage)
          description = self._og_search_description(webpage)
          duration = parse_duration(self._search_regex(
-            r"'duration':\s*'([^']+)',", webpage,
-            'duration', fatal=False, default=None))
+            r'var\s+record_len\s*=\s*(["\'])(?P<duration>[0-9]+:[0-9]+:[0-9]+)\1',
+            webpage, 'duration', default=None, group='duration'))
  
-        formats = []
-        for js in re.findall(r'(?s)config:\s*(\{.*?\})', webpage):
-            data = self._parse_json(js, video_id, transform_source=js_to_json)
-            if 'provider' not in data:
-                continue
-            if data['provider'] == 'rtmp':
-                formats.append({
-                    'format_id': 'rtmp',
-                    'ext': determine_ext(data['file']),
-                    'url': data['streamer'],
-                    'play_path': 'mp4:%s' % data['file'],
-                    'preference': -1,
-                    'rtmp_real_time': True,
-                })
-            else:
-                formats.extend(
-                    self._extract_m3u8_formats(data['file'], video_id, 'mp4'))
+        view_count = int_or_none(self._html_search_regex(
+            r'<div[^>]+class=(["\']).*?record-desc-seen.*?\1[^>]*>(?P<count>.+?)</div>',
+            webpage, 'view count', fatal=False, group='count'))
+        like_count = int_or_none(self._search_regex(
+            r'<span[^>]+id=(["\'])flikesCount.*?\1>(?P<count>\d+)<',
+            webpage, 'like count', fatal=False, group='count'))
  
          return {
              'id': video_id,
@@ -65,4 +59,6 @@ class LRTIE(InfoExtractor):
              'thumbnail': thumbnail,
              'description': description,
              'duration': duration,
+            'view_count': view_count,
+            'like_count': like_count,
          }
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index a00f6e5e5eb1d398ef0776d8b20dcb1dd51ec082..86d47266f80affd7edaec53dba66ee40a3dd90b9 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -4,20 +4,18 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
+    clean_html,
      int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class LyndaBaseIE(InfoExtractor):
      _LOGIN_URL = 'https://www.lynda.com/login/login.aspx'
-    _SUCCESSFUL_LOGIN_REGEX = r'isLoggedIn: true'
      _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'
      _NETRC_MACHINE = 'lynda'
  
@@ -25,23 +23,23 @@ class LyndaBaseIE(InfoExtractor):
          self._login()
  
      def _login(self):
-        (username, password) = self._get_login_info()
+        username, password = self._get_login_info()
          if username is None:
              return
  
          login_form = {
-            'username': username.encode('utf-8'),
-            'password': password.encode('utf-8'),
+            'username': username,
+            'password': password,
              'remember': 'false',
              'stayPut': 'false'
          }
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form))
          login_page = self._download_webpage(
              request, None, 'Logging in as %s' % username)
  
          # Not (yet) logged in
-        m = re.search(r'loginResultJson = \'(?P<json>[^\']+)\';', login_page)
+        m = re.search(r'loginResultJson\s*=\s*\'(?P<json>[^\']+)\';', login_page)
          if m is not None:
              response = m.group('json')
              response_json = json.loads(response)
@@ -64,15 +62,33 @@ class LyndaBaseIE(InfoExtractor):
                      'remember': 'false',
                      'stayPut': 'false',
                  }
-                request = compat_urllib_request.Request(
-                    self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8'))
+                request = sanitized_Request(
+                    self._LOGIN_URL, urlencode_postdata(confirm_form))
                  login_page = self._download_webpage(
                      request, None,
                      'Confirming log in and log out from another device')
  
-        if re.search(self._SUCCESSFUL_LOGIN_REGEX, login_page) is None:
+        if all(not re.search(p, login_page) for p in ('isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
+            if 'login error' in login_page:
+                mobj = re.search(
+                    r'(?s)<h1[^>]+class="topmost">(?P<title>[^<]+)</h1>\s*<div>(?P<description>.+?)</div>',
+                    login_page)
+                if mobj:
+                    raise ExtractorError(
+                        'lynda returned error: %s - %s'
+                        % (mobj.group('title'), clean_html(mobj.group('description'))),
+                        expected=True)
              raise ExtractorError('Unable to log in')
  
+    def _logout(self):
+        username, _ = self._get_login_info()
+        if username is None:
+            return
+
+        self._download_webpage(
+            'http://www.lynda.com/ajax/logout.aspx', None,
+            'Logging out', 'Unable to log out', fatal=False)
+
  
  class LyndaIE(LyndaBaseIE):
      IE_NAME = 'lynda'
@@ -99,52 +115,47 @@ class LyndaIE(LyndaBaseIE):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        page = self._download_webpage(
+        video = self._download_json(
              'http://www.lynda.com/ajax/player?videoId=%s&type=video' % video_id,
              video_id, 'Downloading video JSON')
-        video_json = json.loads(page)
  
-        if 'Status' in video_json:
+        if 'Status' in video:
              raise ExtractorError(
-                'lynda returned error: %s' % video_json['Message'], expected=True)
+                'lynda returned error: %s' % video['Message'], expected=True)
  
-        if video_json['HasAccess'] is False:
-            raise ExtractorError(
-                'Video %s is only available for members. '
-                % video_id + self._ACCOUNT_CREDENTIALS_HINT, expected=True)
+        if video.get('HasAccess') is False:
+            self.raise_login_required('Video %s is only available for members' % video_id)
  
-        video_id = compat_str(video_json['ID'])
-        duration = video_json['DurationInSeconds']
-        title = video_json['Title']
+        video_id = compat_str(video.get('ID') or video_id)
+        duration = int_or_none(video.get('DurationInSeconds'))
+        title = video['Title']
  
          formats = []
  
-        fmts = video_json.get('Formats')
+        fmts = video.get('Formats')
          if fmts:
-            formats.extend([
-                {
-                    'url': fmt['Url'],
-                    'ext': fmt['Extension'],
-                    'width': fmt['Width'],
-                    'height': fmt['Height'],
-                    'filesize': fmt['FileSize'],
-                    'format_id': str(fmt['Resolution'])
-                } for fmt in fmts])
-
-        prioritized_streams = video_json.get('PrioritizedStreams')
+            formats.extend([{
+                'url': f['Url'],
+                'ext': f.get('Extension'),
+                'width': int_or_none(f.get('Width')),
+                'height': int_or_none(f.get('Height')),
+                'filesize': int_or_none(f.get('FileSize')),
+                'format_id': compat_str(f.get('Resolution')) if f.get('Resolution') else None,
+            } for f in fmts if f.get('Url')])
+
+        prioritized_streams = video.get('PrioritizedStreams')
          if prioritized_streams:
-            formats.extend([
-                {
+            for prioritized_stream_id, prioritized_stream in prioritized_streams.items():
+                formats.extend([{
                      'url': video_url,
                      'width': int_or_none(format_id),
-                    'format_id': format_id,
-                } for format_id, video_url in prioritized_streams['0'].items()
-            ])
+                    'format_id': '%s-%s' % (prioritized_stream_id, format_id),
+                } for format_id, video_url in prioritized_stream.items()])
  
          self._check_formats(formats, video_id)
          self._sort_formats(formats)
  
-        subtitles = self.extract_subtitles(video_id, page)
+        subtitles = self.extract_subtitles(video_id)
  
          return {
              'id': video_id,
@@ -175,7 +186,7 @@ class LyndaIE(LyndaBaseIE):
          if srt:
              return srt
  
-    def _get_subtitles(self, video_id, webpage):
+    def _get_subtitles(self, video_id):
          url = 'http://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
          subs = self._download_json(url, None, False)
          if subs:
@@ -197,39 +208,43 @@ class LyndaCourseIE(LyndaBaseIE):
          course_path = mobj.group('coursepath')
          course_id = mobj.group('courseid')
  
-        page = self._download_webpage(
+        course = self._download_json(
              'http://www.lynda.com/ajax/player?courseId=%s&type=course' % course_id,
              course_id, 'Downloading course JSON')
-        course_json = json.loads(page)
  
-        if 'Status' in course_json and course_json['Status'] == 'NotFound':
+        self._logout()
+
+        if course.get('Status') == 'NotFound':
              raise ExtractorError(
                  'Course %s does not exist' % course_id, expected=True)
  
          unaccessible_videos = 0
-        videos = []
+        entries = []
  
          # Might want to extract videos right here from video['Formats'] as it seems 'Formats' is not provided
          # by single video API anymore
  
-        for chapter in course_json['Chapters']:
-            for video in chapter['Videos']:
-                if video['HasAccess'] is False:
+        for chapter in course['Chapters']:
+            for video in chapter.get('Videos', []):
+                if video.get('HasAccess') is False:
                      unaccessible_videos += 1
                      continue
-                videos.append(video['ID'])
+                video_id = video.get('ID')
+                if video_id:
+                    entries.append({
+                        '_type': 'url_transparent',
+                        'url': 'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
+                        'ie_key': LyndaIE.ie_key(),
+                        'chapter': chapter.get('Title'),
+                        'chapter_number': int_or_none(chapter.get('ChapterIndex')),
+                        'chapter_id': compat_str(chapter.get('ID')),
+                    })
  
          if unaccessible_videos > 0:
              self._downloader.report_warning(
                  '%s videos are only available for members (or paid members) and will not be downloaded. '
                  % unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT)
  
-        entries = [
-            self.url_result(
-                'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
-                'Lynda')
-            for video_id in videos]
-
-        course_title = course_json['Title']
+        course_title = course.get('Title')
  
          return self.playlist_result(entries, course_id, course_title)
diff --git a/youtube_dl/extractor/m6.py b/youtube_dl/extractor/m6.py

index 7e025831b51d611f00e248bda637b4ae8f35efb6..d5945ad66b3a784263fb1c5106534081b1f04913 100644 (file)
--- a/youtube_dl/extractor/m6.py
+++ b/youtube_dl/extractor/m6.py
@@ -8,7 +8,7 @@ from .common import InfoExtractor
  
  class M6IE(InfoExtractor):
      IE_NAME = 'm6'
-    _VALID_URL = r'http://(?:www\.)?m6\.fr/[^/]+/videos/(?P<id>\d+)-[^\.]+\.html'
+    _VALID_URL = r'https?://(?:www\.)?m6\.fr/[^/]+/videos/(?P<id>\d+)-[^\.]+\.html'
  
      _TEST = {
          'url': 'http://www.m6.fr/emission-les_reines_du_shopping/videos/11323908-emeline_est_la_reine_du_shopping_sur_le_theme_ma_fete_d_8217_anniversaire.html',
diff --git a/youtube_dl/extractor/mailru.py b/youtube_dl/extractor/mailru.py

index 54a14cb94c93dad587a83c58d58ec3d262f0eed8..9a7098c43c600a3cc3ed697252bc784d9a9cf5b7 100644 (file)
--- a/youtube_dl/extractor/mailru.py
+++ b/youtube_dl/extractor/mailru.py
@@ -4,12 +4,16 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    remove_end,
+)
  
  
  class MailRuIE(InfoExtractor):
      IE_NAME = 'mailru'
      IE_DESC = 'Видео@Mail.Ru'
-    _VALID_URL = r'http://(?:www\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
+    _VALID_URL = r'https?://(?:(?:www|m)\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
  
      _TESTS = [
          {
@@ -25,6 +29,7 @@ class MailRuIE(InfoExtractor):
                  'uploader_id': 'sonypicturesrus@mail.ru',
                  'duration': 184,
              },
+            'skip': 'Not accessible from Travis CI server',
          },
          {
              'url': 'http://my.mail.ru/corp/hitech/video/news_hi-tech_mail_ru/1263.html',
@@ -33,13 +38,34 @@ class MailRuIE(InfoExtractor):
                  'id': '46843144_1263',
                  'ext': 'mp4',
                  'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
-                'timestamp': 1397217632,
-                'upload_date': '20140411',
-                'uploader': 'hitech',
+                'timestamp': 1397039888,
+                'upload_date': '20140409',
+                'uploader': 'hitech@corp.mail.ru',
                  'uploader_id': 'hitech@corp.mail.ru',
                  'duration': 245,
              },
+            'skip': 'Not accessible from Travis CI server',
          },
+        {
+            # only available via metaUrl API
+            'url': 'http://my.mail.ru/mail/720pizle/video/_myvideo/502.html',
+            'md5': '3b26d2491c6949d031a32b96bd97c096',
+            'info_dict': {
+                'id': '56664382_502',
+                'ext': 'mp4',
+                'title': ':8336',
+                'timestamp': 1449094163,
+                'upload_date': '20151202',
+                'uploader': '720pizle@mail.ru',
+                'uploader_id': '720pizle@mail.ru',
+                'duration': 6001,
+            },
+            'skip': 'Not accessible from Travis CI server',
+        },
+        {
+            'url': 'http://m.my.mail.ru/mail/3sktvtr/video/_myvideo/138.html',
+            'only_matching': True,
+        }
      ]
  
      def _real_extract(self, url):
@@ -49,33 +75,56 @@ class MailRuIE(InfoExtractor):
          if not video_id:
              video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
  
-        video_data = self._download_json(
-            'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
+        webpage = self._download_webpage(url, video_id)
  
-        author = video_data['author']
-        uploader = author['name']
-        uploader_id = author.get('id') or author.get('email')
-        view_count = video_data.get('views_count')
+        video_data = None
  
-        meta_data = video_data['meta']
-        content_id = '%s_%s' % (
-            meta_data.get('accId', ''), meta_data['itemId'])
-        title = meta_data['title']
-        if title.endswith('.mp4'):
-            title = title[:-4]
-        thumbnail = meta_data['poster']
-        duration = meta_data['duration']
-        timestamp = meta_data['timestamp']
-
-        formats = [
-            {
-                'url': video['url'],
-                'format_id': video['key'],
-                'height': int(video['key'].rstrip('p'))
-            } for video in video_data['videos']
-        ]
+        page_config = self._parse_json(self._search_regex(
+            r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
+            webpage, 'page config', default='{}'), video_id, fatal=False)
+        if page_config:
+            meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
+            if meta_url:
+                video_data = self._download_json(
+                    meta_url, video_id, 'Downloading video meta JSON', fatal=False)
+
+        # Fallback old approach
+        if not video_data:
+            video_data = self._download_json(
+                'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
+                video_id, 'Downloading video JSON')
+
+        formats = []
+        for f in video_data['videos']:
+            video_url = f.get('url')
+            if not video_url:
+                continue
+            format_id = f.get('key')
+            height = int_or_none(self._search_regex(
+                r'^(\d+)[pP]$', format_id, 'height', default=None)) if format_id else None
+            formats.append({
+                'url': video_url,
+                'format_id': format_id,
+                'height': height,
+            })
          self._sort_formats(formats)
  
+        meta_data = video_data['meta']
+        title = remove_end(meta_data['title'], '.mp4')
+
+        author = video_data.get('author')
+        uploader = author.get('name')
+        uploader_id = author.get('id') or author.get('email')
+        view_count = int_or_none(video_data.get('viewsCount') or video_data.get('views_count'))
+
+        acc_id = meta_data.get('accId')
+        item_id = meta_data.get('itemId')
+        content_id = '%s_%s' % (acc_id, item_id) if acc_id and item_id else video_id
+
+        thumbnail = meta_data.get('poster')
+        duration = int_or_none(meta_data.get('duration'))
+        timestamp = int_or_none(meta_data.get('timestamp'))
+
          return {
              'id': content_id,
              'title': title,
diff --git a/youtube_dl/extractor/makerschannel.py b/youtube_dl/extractor/makerschannel.py

new file mode 100644 (file)

index 0000000..f5d00e6
--- /dev/null
+++ b/youtube_dl/extractor/makerschannel.py
@@ -0,0 +1,40 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class MakersChannelIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?makerschannel\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://makerschannel.com/en/zoomin/community-highlights?video_id=849',
+        'md5': '624a512c6969236b5967bf9286345ad1',
+        'info_dict': {
+            'id': '849',
+            'ext': 'mp4',
+            'title': 'Landing a bus on a plane is an epic win',
+            'uploader': 'ZoomIn',
+            'description': 'md5:cd9cca2ea7b69b78be81d07020c97139',
+        }
+    }
+
+    def _real_extract(self, url):
+        id_type, url_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, url_id)
+        video_data = self._html_search_regex(r'<div([^>]+data-%s-id="%s"[^>]+)>' % (id_type, url_id), webpage, 'video data')
+
+        def extract_data_val(attr, fatal=False):
+            return self._html_search_regex(r'data-%s\s*=\s*"([^"]+)"' % attr, video_data, attr, fatal=fatal)
+        minoto_id = self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id')
+
+        return {
+            '_type': 'url_transparent',
+            'url': 'minoto:%s' % minoto_id,
+            'id': extract_data_val('video-id', True),
+            'title': extract_data_val('title', True),
+            'description': extract_data_val('description'),
+            'thumbnail': extract_data_val('image'),
+            'uploader': extract_data_val('channel'),
+        }
diff --git a/youtube_dl/extractor/makertv.py b/youtube_dl/extractor/makertv.py

new file mode 100644 (file)

index 0000000..3c34d46
--- /dev/null
+++ b/youtube_dl/extractor/makertv.py
@@ -0,0 +1,32 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class MakerTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:www\.)?maker\.tv/(?:[^/]+/)*video|makerplayer.com/embed/maker)/(?P<id>[a-zA-Z0-9]{12})'
+    _TEST = {
+        'url': 'http://www.maker.tv/video/Fh3QgymL9gsc',
+        'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
+        'info_dict': {
+            'id': 'Fh3QgymL9gsc',
+            'ext': 'mp4',
+            'title': 'Maze Runner: The Scorch Trials Official Movie Review',
+            'description': 'md5:11ff3362d7ef1d679fdb649f6413975a',
+            'upload_date': '20150918',
+            'timestamp': 1442549540,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        jwplatform_id = self._search_regex(r'jw_?id="([^"]+)"', webpage, 'jwplatform id')
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'jwplatform:%s' % jwplatform_id,
+            'ie_key': 'JWPlatform',
+        }
diff --git a/youtube_dl/extractor/matchtv.py b/youtube_dl/extractor/matchtv.py

new file mode 100644 (file)

index 0000000..80a0d70
--- /dev/null
+++ b/youtube_dl/extractor/matchtv.py
@@ -0,0 +1,56 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import random
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlencode
+from ..utils import (
+    sanitized_Request,
+    xpath_text,
+)
+
+
+class MatchTVIE(InfoExtractor):
+    _VALID_URL = r'https?://matchtv\.ru/?#live-player'
+    _TEST = {
+        'url': 'http://matchtv.ru/#live-player',
+        'info_dict': {
+            'id': 'matchtv-live',
+            'ext': 'flv',
+            'title': 're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = 'matchtv-live'
+        request = sanitized_Request(
+            'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse_urlencode({
+                'ts': '',
+                'quality': 'SD',
+                'contentId': '561d2c0df7159b37178b4567',
+                'sign': '',
+                'includeHighlights': '0',
+                'userId': '',
+                'sessionId': random.randint(1, 1000000000),
+                'contentType': 'channel',
+                'timeShift': '0',
+                'platform': 'portal',
+            }),
+            headers={
+                'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
+            })
+        video_url = self._download_json(request, video_id)['data']['videoUrl']
+        f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
+        formats = self._extract_f4m_formats(f4m_url, video_id)
+        self._sort_formats(formats)
+        return {
+            'id': video_id,
+            'title': self._live_title('Матч ТВ - Прямой эфир'),
+            'is_live': True,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

index 5fdd19027db3ccad0265601b8d88452a0eaac525..2100583df46ab7955846f8e3b08467d13ed3440e 100644 (file)
--- a/youtube_dl/extractor/mdr.py
+++ b/youtube_dl/extractor/mdr.py
@@ -1,64 +1,172 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    determine_ext,
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+    xpath_text,
+)
  
  
  class MDRIE(InfoExtractor):
-    _VALID_URL = r'^(?P<domain>https?://(?:www\.)?mdr\.de)/(?:.*)/(?P<type>video|audio)(?P<video_id>[^/_]+)(?:_|\.html)'
+    IE_DESC = 'MDR.DE and KiKA'
+    _VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+-?(?P<id>\d+)(?:_.+?)?\.html'
  
-    # No tests, MDR regularily deletes its videos
-    _TEST = {
+    _TESTS = [{
+        # MDR regularly deletes its videos
          'url': 'http://www.mdr.de/fakt/video189002.html',
          'only_matching': True,
-    }
+    }, {
+        # audio
+        'url': 'http://www.mdr.de/kultur/audio1312272_zc-15948bad_zs-86171fdd.html',
+        'md5': '64c4ee50f0a791deb9479cd7bbe9d2fa',
+        'info_dict': {
+            'id': '1312272',
+            'ext': 'mp3',
+            'title': 'Feuilleton vom 30. Oktober 2015',
+            'duration': 250,
+            'uploader': 'MITTELDEUTSCHER RUNDFUNK',
+        },
+    }, {
+        'url': 'http://www.kika.de/baumhaus/videos/video19636.html',
+        'md5': '4930515e36b06c111213e80d1e4aad0e',
+        'info_dict': {
+            'id': '19636',
+            'ext': 'mp4',
+            'title': 'Baumhaus vom 30. Oktober 2015',
+            'duration': 134,
+            'uploader': 'KIKA',
+        },
+    }, {
+        'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/videos/video8182.html',
+        'md5': '5fe9c4dd7d71e3b238f04b8fdd588357',
+        'info_dict': {
+            'id': '8182',
+            'ext': 'mp4',
+            'title': 'Beutolomäus und der geheime Weihnachtswunsch',
+            'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
+            'timestamp': 1450950000,
+            'upload_date': '20151224',
+            'duration': 4628,
+            'uploader': 'KIKA',
+        },
+    }, {
+        'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/einzelsendung2534.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.mdr.de/mediathek/mdr-videos/a/video-1334.html',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        m = re.match(self._VALID_URL, url)
-        video_id = m.group('video_id')
-        domain = m.group('domain')
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
  
-        # determine title and media streams from webpage
-        html = self._download_webpage(url, video_id)
+        data_url = self._search_regex(
+            r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
+            webpage, 'data url', group='url').replace('\/', '/')
  
-        title = self._html_search_regex(r'<h[12]>(.*?)</h[12]>', html, 'title')
-        xmlurl = self._search_regex(
-            r'dataURL:\'(/(?:.+)/(?:video|audio)[0-9]+-avCustom.xml)', html, 'XML URL')
+        doc = self._download_xml(
+            compat_urlparse.urljoin(url, data_url), video_id)
+
+        title = xpath_text(doc, ['./title', './broadcast/broadcastName'], 'title', fatal=True)
  
-        doc = self._download_xml(domain + xmlurl, video_id)
          formats = []
-        for a in doc.findall('./assets/asset'):
-            url_el = a.find('.//progressiveDownloadUrl')
-            if url_el is None:
-                continue
-            abr = int(a.find('bitrateAudio').text) // 1000
-            media_type = a.find('mediaType').text
-            format = {
-                'abr': abr,
-                'filesize': int(a.find('fileSize').text),
-                'url': url_el.text,
-            }
-
-            vbr_el = a.find('bitrateVideo')
-            if vbr_el is None:
-                format.update({
-                    'vcodec': 'none',
-                    'format_id': '%s-%d' % (media_type, abr),
-                })
-            else:
-                vbr = int(vbr_el.text) // 1000
-                format.update({
-                    'vbr': vbr,
-                    'width': int(a.find('frameWidth').text),
-                    'height': int(a.find('frameHeight').text),
-                    'format_id': '%s-%d' % (media_type, vbr),
-                })
-            formats.append(format)
+        processed_urls = []
+        for asset in doc.findall('./assets/asset'):
+            for source in (
+                    'progressiveDownload',
+                    'dynamicHttpStreamingRedirector',
+                    'adaptiveHttpStreamingRedirector'):
+                url_el = asset.find('./%sUrl' % source)
+                if url_el is None:
+                    continue
+
+                video_url = url_el.text
+                if video_url in processed_urls:
+                    continue
+
+                processed_urls.append(video_url)
+
+                vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000)
+                abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
+
+                ext = determine_ext(url_el.text)
+                if ext == 'm3u8':
+                    url_formats = self._extract_m3u8_formats(
+                        video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                        preference=0, m3u8_id='HLS', fatal=False)
+                elif ext == 'f4m':
+                    url_formats = self._extract_f4m_formats(
+                        video_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
+                        preference=0, f4m_id='HDS', fatal=False)
+                else:
+                    media_type = xpath_text(asset, './mediaType', 'media type', default='MP4')
+                    vbr = int_or_none(xpath_text(asset, './bitrateVideo', 'vbr'), 1000)
+                    abr = int_or_none(xpath_text(asset, './bitrateAudio', 'abr'), 1000)
+                    filesize = int_or_none(xpath_text(asset, './fileSize', 'file size'))
+
+                    f = {
+                        'url': video_url,
+                        'format_id': '%s-%d' % (media_type, vbr or abr),
+                        'filesize': filesize,
+                        'abr': abr,
+                        'preference': 1,
+                    }
+
+                    if vbr:
+                        width = int_or_none(xpath_text(asset, './frameWidth', 'width'))
+                        height = int_or_none(xpath_text(asset, './frameHeight', 'height'))
+                        f.update({
+                            'vbr': vbr,
+                            'width': width,
+                            'height': height,
+                        })
+
+                    url_formats = [f]
+
+                if not url_formats:
+                    continue
+
+                if not vbr:
+                    for f in url_formats:
+                        abr = f.get('tbr') or abr
+                        if 'tbr' in f:
+                            del f['tbr']
+                        f.update({
+                            'abr': abr,
+                            'vcodec': 'none',
+                        })
+
+                formats.extend(url_formats)
+
          self._sort_formats(formats)
  
+        description = xpath_text(doc, './broadcast/broadcastDescription', 'description')
+        timestamp = parse_iso8601(
+            xpath_text(
+                doc, [
+                    './broadcast/broadcastDate',
+                    './broadcast/broadcastStartDate',
+                    './broadcast/broadcastEndDate'],
+                'timestamp', default=None))
+        duration = parse_duration(xpath_text(doc, './duration', 'duration'))
+        uploader = xpath_text(doc, './rights', 'uploader')
+
          return {
              'id': video_id,
              'title': title,
+            'description': description,
+            'timestamp': timestamp,
+            'duration': duration,
+            'uploader': uploader,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/megavideoz.py b/youtube_dl/extractor/megavideoz.py

deleted file mode 100644 (file)

index af7ff07..0000000
--- a/youtube_dl/extractor/megavideoz.py
+++ /dev/null
@@ -1,56 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    float_or_none,
-    xpath_text,
-)
-
-
-class MegaVideozIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?megavideoz\.eu/video/(?P<id>[^/]+)(?:/(?P<display_id>[^/]+))?'
-    _TEST = {
-        'url': 'http://megavideoz.eu/video/WM6UB919XMXH/SMPTE-Universal-Film-Leader',
-        'info_dict': {
-            'id': '48723',
-            'display_id': 'SMPTE-Universal-Film-Leader',
-            'ext': 'mp4',
-            'title': 'SMPTE Universal Film Leader',
-            'thumbnail': 're:https?://.*?\.jpg',
-            'duration': 10.93,
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
-
-        webpage = self._download_webpage(url, display_id)
-
-        if any(p in webpage for p in ('>Video Not Found<', '>404 Error<')):
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-
-        config = self._download_xml(
-            self._search_regex(
-                r"var\s+cnf\s*=\s*'([^']+)'", webpage, 'cnf url'),
-            display_id)
-
-        video_url = xpath_text(config, './file', 'video url', fatal=True)
-        title = xpath_text(config, './title', 'title', fatal=True)
-        thumbnail = xpath_text(config, './image', 'thumbnail')
-        duration = float_or_none(xpath_text(config, './duration', 'duration'))
-        video_id = xpath_text(config, './mediaid', 'video id') or video_id
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'url': video_url,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration
-        }
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index 6e2e73a5162f10ea5818b636da579c932b4f2e7d..b6f00cc25ff9c2769176e551b33bd5901c121f64 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -5,19 +5,19 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_parse_qs,
-    compat_urllib_parse,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
  )
  from ..utils import (
      determine_ext,
      ExtractorError,
      int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class MetacafeIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
+    _VALID_URL = r'https?://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
      _DISCLAIMER = 'http://www.metacafe.com/family_filter/'
      _FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
      IE_NAME = 'metacafe'
@@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
                  'title': 'Open: This is Face the Nation, February 9',
                  'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
                  'duration': 96,
+                'uploader': 'CBSI-NEW',
+                'upload_date': '20140209',
+                'timestamp': 1391959800,
              },
              'params': {
                  # rtmp download
@@ -117,7 +120,7 @@ class MetacafeIE(InfoExtractor):
              'filters': '0',
              'submit': "Continue - I'm over 18",
          }
-        request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
+        request = sanitized_Request(self._FILTER_POST, urlencode_postdata(disclaimer_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          self.report_age_confirmation()
          self._download_webpage(request, None, False, 'Unable to confirm age')
@@ -142,7 +145,7 @@ class MetacafeIE(InfoExtractor):
                  return self.url_result('theplatform:%s' % ext_id, 'ThePlatform')
  
          # Retrieve video webpage to extract further information
-        req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id)
+        req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id)
  
          # AnyClip videos require the flashversion cookie so that we get the link
          # to the mp4 file
@@ -154,10 +157,10 @@ class MetacafeIE(InfoExtractor):
          # Extract URL, uploader and title from webpage
          self.report_extraction(video_id)
          video_url = None
-        mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage)
+        mobj = re.search(r'(?m)&(?:media|video)URL=([^&]+)', webpage)
          if mobj is not None:
              mediaURL = compat_urllib_parse_unquote(mobj.group(1))
-            video_ext = mediaURL[-3:]
+            video_ext = determine_ext(mediaURL)
  
              # Extract gdaKey if available
              mobj = re.search(r'(?m)&gdaKey=(.*?)&', webpage)
@@ -229,7 +232,7 @@ class MetacafeIE(InfoExtractor):
  
          age_limit = (
              18
-            if re.search(r'"contentRating":"restricted"', webpage)
+            if re.search(r'(?:"contentRating":|"rating",)"restricted"', webpage)
              else 0)
  
          if isinstance(video_url, list):
diff --git a/youtube_dl/extractor/metacritic.py b/youtube_dl/extractor/metacritic.py

index e30320569805aedaa6694ae54f9086909593f7a4..444ec0310877e8377f78e88b07fd110ca9e6aa0d 100644 (file)
--- a/youtube_dl/extractor/metacritic.py
+++ b/youtube_dl/extractor/metacritic.py
@@ -11,7 +11,7 @@ from ..utils import (
  class MetacriticIE(InfoExtractor):
      _VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
          'info_dict': {
              'id': '3698222',
@@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
              'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
              'duration': 221,
          },
-    }
+        'skip': 'Not providing trailers anymore',
+    }, {
+        'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
+        'info_dict': {
+            'id': '5740315',
+            'ext': 'mp4',
+            'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
+            'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
+            'duration': 114,
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py

new file mode 100644 (file)

index 0000000..a14d176
--- /dev/null
+++ b/youtube_dl/extractor/mgtv.py
@@ -0,0 +1,63 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class MGTVIE(InfoExtractor):
+    _VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
+    IE_DESC = '芒果TV'
+
+    _TEST = {
+        'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
+        'md5': '',
+        'info_dict': {
+            'id': '3116640',
+            'ext': 'mp4',
+            'title': '我是歌手第四季双年巅峰会：韩红李玟“双王”领军对抗',
+            'description': '我是歌手第四季双年巅峰会',
+            'duration': 7461,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': True,  # m3u8 download
+        },
+    }
+
+    _FORMAT_MAP = {
+        '标清': ('Standard', 0),
+        '高清': ('High', 1),
+        '超清': ('SuperHigh', 2),
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        api_data = self._download_json(
+            'http://v.api.mgtv.com/player/video', video_id,
+            query={'video_id': video_id})['data']
+        info = api_data['info']
+
+        formats = []
+        for idx, stream in enumerate(api_data['stream']):
+            format_name = stream.get('name')
+            format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
+            format_info = self._download_json(
+                stream['url'], video_id,
+                note='Download video info for format %s' % format_id or '#%d' % idx)
+            formats.append({
+                'format_id': format_id,
+                'url': format_info['info'],
+                'ext': 'mp4',  # These are m3u8 playlists
+                'preference': preference,
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': info['title'].strip(),
+            'formats': formats,
+            'description': info.get('desc'),
+            'duration': int_or_none(info.get('duration')),
+            'thumbnail': info.get('thumb'),
+        }
diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py

index 14934b7ec5579d3b7cfb4b16e5308e81301ace63..e6730b75a68d27c16e694fedaac088d27a0ab1ec 100644 (file)
--- a/youtube_dl/extractor/minhateca.py
+++ b/youtube_dl/extractor/minhateca.py
@@ -2,14 +2,12 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
      parse_duration,
      parse_filesize,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -39,9 +37,9 @@ class MinhatecaIE(InfoExtractor):
              ('fileId', video_id),
              ('__RequestVerificationToken', token),
          ]
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://minhateca.com.br/action/License/Download',
-            data=compat_urllib_parse.urlencode(token_data))
+            data=urlencode_postdata(token_data))
          req.add_header('Content-Type', 'application/x-www-form-urlencoded')
          data = self._download_json(
              req, video_id, note='Downloading metadata')
diff --git a/youtube_dl/extractor/ministrygrid.py b/youtube_dl/extractor/ministrygrid.py

index 949ad11db2ecd0c53e5cb4c361bc43aa779cb1e6..e48eba3fa7343bbdf964be583a680affa5ad29fa 100644 (file)
--- a/youtube_dl/extractor/ministrygrid.py
+++ b/youtube_dl/extractor/ministrygrid.py
@@ -1,8 +1,5 @@
  from __future__ import unicode_literals
  
-import json
-import re
-
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
@@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
-            'uploader': 'LifeWay Christian Resources (MG)',
+            'uploader_id': '2034960640001',
+            'timestamp': 1397145591,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
+        'add_ie': ['TDSLifeway'],
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        portlets_json = self._search_regex(
-            r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list')
-        portlets = json.loads(portlets_json)
+        portlets = self._parse_json(self._search_regex(
+            r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
+            video_id)
          pl_id = self._search_regex(
-            r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id')
+            r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
  
          for i, portlet in enumerate(portlets):
              portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
                  r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
                  default=None)
              if video_iframe_url:
-                surl = smuggle_url(
-                    video_iframe_url, {'force_videoid': video_id})
-                return {
-                    '_type': 'url',
-                    'id': video_id,
-                    'url': surl,
-                }
+                return self.url_result(
+                    smuggle_url(video_iframe_url, {'force_videoid': video_id}),
+                    video_id=video_id)
  
          raise ExtractorError('Could not find video iframe in any portlets')
diff --git a/youtube_dl/extractor/minoto.py b/youtube_dl/extractor/minoto.py

new file mode 100644 (file)

index 0000000..959a105
--- /dev/null
+++ b/youtube_dl/extractor/minoto.py
@@ -0,0 +1,56 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class MinotoIE(InfoExtractor):
+    _VALID_URL = r'(?:minoto:|https?://(?:play|iframe|embed)\.minoto-video\.com/(?P<player_id>[0-9]+)/)(?P<id>[a-zA-Z0-9]+)'
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        player_id = mobj.group('player_id') or '1'
+        video_id = mobj.group('id')
+        video_data = self._download_json('http://play.minoto-video.com/%s/%s.js' % (player_id, video_id), video_id)
+        video_metadata = video_data['video-metadata']
+        formats = []
+        for fmt in video_data['video-files']:
+            fmt_url = fmt.get('url')
+            if not fmt_url:
+                continue
+            container = fmt.get('container')
+            if container == 'hls':
+                formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
+            else:
+                fmt_profile = fmt.get('profile') or {}
+                f = {
+                    'format_id': fmt_profile.get('name-short'),
+                    'format_note': fmt_profile.get('name'),
+                    'url': fmt_url,
+                    'container': container,
+                    'tbr': int_or_none(fmt.get('bitrate')),
+                    'filesize': int_or_none(fmt.get('filesize')),
+                    'width': int_or_none(fmt.get('width')),
+                    'height': int_or_none(fmt.get('height')),
+                }
+                codecs = fmt.get('codecs')
+                if codecs:
+                    codecs = codecs.split(',')
+                    if len(codecs) == 2:
+                        f.update({
+                            'vcodec': codecs[0],
+                            'acodec': codecs[1],
+                        })
+                formats.append(f)
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_metadata['title'],
+            'description': video_metadata.get('description'),
+            'thumbnail': video_metadata.get('video-poster', {}).get('url'),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/miomio.py b/youtube_dl/extractor/miomio.py

index a784fc5fba41c5931f6b1f040042e1900a6ff791..170ebd9eb9e285f91e4b8bd85c05b13f745039a8 100644 (file)
--- a/youtube_dl/extractor/miomio.py
+++ b/youtube_dl/extractor/miomio.py
@@ -8,6 +8,7 @@ from ..utils import (
      xpath_text,
      int_or_none,
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -51,6 +52,8 @@ class MioMioIE(InfoExtractor):
          mioplayer_path = self._search_regex(
              r'src="(/mioplayer/[^"]+)"', webpage, 'ref_path')
  
+        http_headers = {'Referer': 'http://www.miomio.tv%s' % mioplayer_path}
+
          xml_config = self._search_regex(
              r'flashvars="type=(?:sina|video)&amp;(.+?)&amp;',
              webpage, 'xml config')
@@ -60,14 +63,12 @@ class MioMioIE(InfoExtractor):
              'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/xml.php?id=%s&r=%s' % (id, random.randint(100, 999)),
              video_id)
  
-        # the following xml contains the actual configuration information on the video file(s)
-        vid_config = self._download_xml(
+        vid_config_request = sanitized_Request(
              'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/sina.php?{0}'.format(xml_config),
-            video_id)
+            headers=http_headers)
  
-        http_headers = {
-            'Referer': 'http://www.miomio.tv%s' % mioplayer_path,
-        }
+        # the following xml contains the actual configuration information on the video file(s)
+        vid_config = self._download_xml(vid_config_request, video_id)
  
          if not int_or_none(xpath_text(vid_config, 'timelength')):
              raise ExtractorError('Unable to load videos!', expected=True)
diff --git a/youtube_dl/extractor/mit.py b/youtube_dl/extractor/mit.py

index d7ab6a9aef23235d099175c7aff76ddd0ac0f84d..1aea78d118a84a135494214da54c3c2c21465bc9 100644 (file)
--- a/youtube_dl/extractor/mit.py
+++ b/youtube_dl/extractor/mit.py
@@ -18,12 +18,12 @@ class TechTVMITIE(InfoExtractor):
  
      _TEST = {
          'url': 'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set',
-        'md5': '1f8cb3e170d41fd74add04d3c9330e5f',
+        'md5': '00a3a27ee20d44bcaa0933ccec4a2cf7',
          'info_dict': {
              'id': '25418',
              'ext': 'mp4',
-            'title': 'MIT DNA Learning Center Set',
-            'description': 'md5:82313335e8a8a3f243351ba55bc1b474',
+            'title': 'MIT DNA and Protein Sets',
+            'description': 'md5:46f5c69ce434f0a97e7c628cc142802d',
          },
      }
  
@@ -33,8 +33,8 @@ class TechTVMITIE(InfoExtractor):
              'http://techtv.mit.edu/videos/%s' % video_id, video_id)
          clean_page = re.compile(r'<!--.*?-->', re.S).sub('', raw_page)
  
-        base_url = self._search_regex(
-            r'ipadUrl: \'(.+?cloudfront.net/)', raw_page, 'base url')
+        base_url = self._proto_relative_url(self._search_regex(
+            r'ipadUrl: \'(.+?cloudfront.net/)', raw_page, 'base url'), 'http:')
          formats_json = self._search_regex(
              r'bitrates: (\[.+?\])', raw_page, 'video formats')
          formats_mit = json.loads(formats_json)
@@ -86,12 +86,12 @@ class MITIE(TechTVMITIE):
          webpage = self._download_webpage(url, page_title)
          embed_url = self._search_regex(
              r'<iframe .*?src="(.+?)"', webpage, 'embed url')
-        return self.url_result(embed_url, ie='TechTVMIT')
+        return self.url_result(embed_url)
  
  
  class OCWMITIE(InfoExtractor):
      IE_NAME = 'ocw.mit.edu'
-    _VALID_URL = r'^http://ocw\.mit\.edu/courses/(?P<topic>[a-z0-9\-]+)'
+    _VALID_URL = r'^https?://ocw\.mit\.edu/courses/(?P<topic>[a-z0-9\-]+)'
      _BASE_URL = 'http://ocw.mit.edu/'
  
      _TESTS = [
@@ -99,7 +99,7 @@ class OCWMITIE(InfoExtractor):
              'url': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/',
              'info_dict': {
                  'id': 'EObHWIEKGjA',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence',
                  'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.',
                  'upload_date': '20121109',
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 852d722664a3d63aafed0f8246949335b4150c09..7b4581dc58415f508ca0d34d61a5cd96b0b08e31 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -1,74 +1,89 @@
  from __future__ import unicode_literals
  
-import json
-
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
      get_element_by_attribute,
-    parse_duration,
-    strip_jsonp,
+    int_or_none,
  )
  
  
  class MiTeleIE(InfoExtractor):
-    IE_NAME = 'mitele.es'
-    _VALID_URL = r'http://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
+    IE_DESC = 'mitele.es'
+    _VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
  
      _TESTS = [{
          'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
+        'md5': '0ff1a13aebb35d9bc14081ff633dd324',
          'info_dict': {
-            'id': '0fce117d',
-            'ext': 'mp4',
-            'title': 'Programa 144 - Tor, la web invisible',
-            'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'id': '0NF1jJnxS1Wu3pHrmvFyw2',
              'display_id': 'programa-144',
+            'ext': 'flv',
+            'title': 'Tor, la web invisible',
+            'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'thumbnail': 're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
      }]
  
      def _real_extract(self, url):
-        episode = self._match_id(url)
-        webpage = self._download_webpage(url, episode)
-        embed_data_json = self._search_regex(
-            r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
-        ).replace('\'', '"')
-        embed_data = json.loads(embed_data_json)
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        config_url = self._search_regex(
+            r'data-config\s*=\s*"([^"]+)"', webpage, 'data config url')
+        config_url = compat_urlparse.urljoin(url, config_url)
  
-        domain = embed_data['mediaUrl']
-        if not domain.startswith('http'):
-            # only happens in telecinco.es videos
-            domain = 'http://' + domain
-        info_url = compat_urlparse.urljoin(
-            domain,
-            compat_urllib_parse_unquote(embed_data['flashvars']['host'])
-        )
-        info_el = self._download_xml(info_url, episode).find('./video/info')
+        config = self._download_json(
+            config_url, display_id, 'Downloading config JSON')
  
-        video_link = info_el.find('videoUrl/link').text
-        token_query = compat_urllib_parse.urlencode({'id': video_link})
-        token_info = self._download_json(
-            embed_data['flashvars']['ov_tk'] + '?' + token_query,
-            episode,
-            transform_source=strip_jsonp
-        )
-        formats = self._extract_m3u8_formats(
-            token_info['tokenizedUrl'], episode, ext='mp4')
+        mmc = self._download_json(
+            config['services']['mmc'], display_id, 'Downloading mmc JSON')
+
+        formats = []
+        for location in mmc['locations']:
+            gat = self._proto_relative_url(location.get('gat'), 'http:')
+            bas = location.get('bas')
+            loc = location.get('loc')
+            ogn = location.get('ogn')
+            if None in (gat, bas, loc, ogn):
+                continue
+            token_data = {
+                'bas': bas,
+                'icd': loc,
+                'ogn': ogn,
+                'sta': '0',
+            }
+            media = self._download_json(
+                '%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)),
+                display_id, 'Downloading %s JSON' % location['loc'])
+            file_ = media.get('file')
+            if not file_:
+                continue
+            formats.extend(self._extract_f4m_formats(
+                file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
+                display_id, f4m_id=loc))
+        self._sort_formats(formats)
+
+        title = self._search_regex(
+            r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title')
+
+        video_id = self._search_regex(
+            r'data-media-id\s*=\s*"([^"]+)"', webpage,
+            'data media id', default=None) or display_id
+        thumbnail = config.get('poster', {}).get('imageUrl')
+        duration = int_or_none(mmc.get('duration'))
  
          return {
-            'id': embed_data['videoId'],
-            'display_id': episode,
-            'title': info_el.find('title').text,
-            'formats': formats,
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
              'description': get_element_by_attribute('class', 'text', webpage),
-            'thumbnail': info_el.find('thumb').text,
-            'duration': parse_duration(info_el.find('duration').text),
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index d47aecedae388829babaed8642611c5a6b7d29fe..483f6925fda989fc5111694c8c82f1807a1f3d97 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -1,25 +1,35 @@
  from __future__ import unicode_literals
  
+import base64
+import functools
+import itertools
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_chr,
+    compat_ord,
+    compat_urllib_parse_unquote,
+    compat_urlparse,
+)
  from ..utils import (
+    clean_html,
      ExtractorError,
-    HEADRequest,
+    OnDemandPagedList,
+    parse_count,
      str_to_int,
  )
  
  
  class MixcloudIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
      IE_NAME = 'mixcloud'
  
      _TESTS = [{
          'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
          'info_dict': {
              'id': 'dholbach-cryptkeeper',
-            'ext': 'mp3',
+            'ext': 'm4a',
              'title': 'Cryptkeeper',
              'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
              'uploader': 'Daniel Holbach',
@@ -37,22 +47,22 @@ class MixcloudIE(InfoExtractor):
              'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
              'uploader': 'Gilles Peterson Worldwide',
              'uploader_id': 'gillespeterson',
-            'thumbnail': 're:https?://.*/images/',
+            'thumbnail': 're:https?://.*',
              'view_count': int,
              'like_count': int,
          },
      }]
  
-    def _check_url(self, url, track_id, ext):
-        try:
-            # We only want to know if the request succeed
-            # don't download the whole file
-            self._request_webpage(
-                HEADRequest(url), track_id,
-                'Trying %s URL' % ext)
-            return True
-        except ExtractorError:
-            return False
+    # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
+    @staticmethod
+    def _decrypt_play_info(play_info):
+        KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
+
+        play_info = base64.b64decode(play_info.encode('ascii'))
+
+        return ''.join([
+            compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
+            for idx, ch in enumerate(play_info)])
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -62,13 +72,19 @@ class MixcloudIE(InfoExtractor):
  
          webpage = self._download_webpage(url, track_id)
  
-        preview_url = self._search_regex(
-            r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
-        song_url = preview_url.replace('/previews/', '/c/originals/')
-        if not self._check_url(song_url, track_id, 'mp3'):
-            song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
-            if not self._check_url(song_url, track_id, 'm4a'):
-                raise ExtractorError('Unable to extract track url')
+        message = self._html_search_regex(
+            r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
+            webpage, 'error message', default=None)
+
+        encrypted_play_info = self._search_regex(
+            r'm-play-info="([^"]+)"', webpage, 'play info')
+        play_info = self._parse_json(
+            self._decrypt_play_info(encrypted_play_info), track_id)
+
+        if message and 'stream_url' not in play_info:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
+
+        song_url = play_info['stream_url']
  
          PREFIX = (
              r'm-play-on-spacebar[^>]+'
@@ -84,8 +100,8 @@ class MixcloudIE(InfoExtractor):
          uploader_id = self._search_regex(
              r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
          description = self._og_search_description(webpage)
-        like_count = str_to_int(self._search_regex(
-            r'\bbutton-favorite\b[^>]+m-ajax-toggle-count="([^"]+)"',
+        like_count = parse_count(self._search_regex(
+            r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
              webpage, 'like count', fatal=False))
          view_count = str_to_int(self._search_regex(
              [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
@@ -103,3 +119,201 @@ class MixcloudIE(InfoExtractor):
              'view_count': view_count,
              'like_count': like_count,
          }
+
+
+class MixcloudPlaylistBaseIE(InfoExtractor):
+    _PAGE_SIZE = 24
+
+    def _find_urls_in_page(self, page):
+        for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
+            yield self.url_result(
+                compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
+                MixcloudIE.ie_key())
+
+    def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
+        real_page_number = real_page_number or current_page + 1
+        return self._download_webpage(
+            'https://www.mixcloud.com/%s/' % path, video_id,
+            note='Download %s (page %d)' % (page_name, current_page + 1),
+            errnote='Unable to download %s' % page_name,
+            query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
+            headers={'X-Requested-With': 'XMLHttpRequest'})
+
+    def _tracks_page_func(self, page, video_id, page_name, current_page):
+        resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
+
+        for item in self._find_urls_in_page(resp):
+            yield item
+
+    def _get_user_description(self, page_content):
+        return self._html_search_regex(
+            r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
+            page_content, 'user description', fatal=False)
+
+
+class MixcloudUserIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
+    IE_NAME = 'mixcloud:user'
+
+    _TESTS = [{
+        'url': 'http://www.mixcloud.com/dholbach/',
+        'info_dict': {
+            'id': 'dholbach_uploads',
+            'title': 'Daniel Holbach (uploads)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'playlist_mincount': 11,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/uploads/',
+        'info_dict': {
+            'id': 'dholbach_uploads',
+            'title': 'Daniel Holbach (uploads)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'playlist_mincount': 11,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/favorites/',
+        'info_dict': {
+            'id': 'dholbach_favorites',
+            'title': 'Daniel Holbach (favorites)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'params': {
+            'playlist_items': '1-100',
+        },
+        'playlist_mincount': 100,
+    }, {
+        'url': 'http://www.mixcloud.com/dholbach/listens/',
+        'info_dict': {
+            'id': 'dholbach_listens',
+            'title': 'Daniel Holbach (listens)',
+            'description': 'md5:327af72d1efeb404a8216c27240d1370',
+        },
+        'params': {
+            'playlist_items': '1-100',
+        },
+        'playlist_mincount': 100,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user')
+        list_type = mobj.group('type')
+
+        # if only a profile URL was supplied, default to download all uploads
+        if list_type is None:
+            list_type = 'uploads'
+
+        video_id = '%s_%s' % (user_id, list_type)
+
+        profile = self._download_webpage(
+            'https://www.mixcloud.com/%s/' % user_id, video_id,
+            note='Downloading user profile',
+            errnote='Unable to download user profile')
+
+        username = self._og_search_title(profile)
+        description = self._get_user_description(profile)
+
+        entries = OnDemandPagedList(
+            functools.partial(
+                self._tracks_page_func,
+                '%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
+            self._PAGE_SIZE, use_cache=True)
+
+        return self.playlist_result(
+            entries, video_id, '%s (%s)' % (username, list_type), description)
+
+
+class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
+    IE_NAME = 'mixcloud:playlist'
+
+    _TESTS = [{
+        'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
+        'info_dict': {
+            'id': 'RedBullThre3style_tokyo-finalists-2015',
+            'title': 'National Champions 2015',
+            'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
+        },
+        'playlist_mincount': 16,
+    }, {
+        'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
+        'info_dict': {
+            'id': 'maxvibes_jazzcat-on-ness-radio',
+            'title': 'Jazzcat on Ness Radio',
+            'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
+        },
+        'playlist_mincount': 23
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user')
+        playlist_id = mobj.group('playlist')
+        video_id = '%s_%s' % (user_id, playlist_id)
+
+        profile = self._download_webpage(
+            url, user_id,
+            note='Downloading playlist page',
+            errnote='Unable to download playlist page')
+
+        description = self._get_user_description(profile)
+        playlist_title = self._html_search_regex(
+            r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
+            profile, 'playlist title')
+
+        entries = OnDemandPagedList(
+            functools.partial(
+                self._tracks_page_func,
+                '%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
+            self._PAGE_SIZE)
+
+        return self.playlist_result(entries, video_id, playlist_title, description)
+
+
+class MixcloudStreamIE(MixcloudPlaylistBaseIE):
+    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
+    IE_NAME = 'mixcloud:stream'
+
+    _TEST = {
+        'url': 'https://www.mixcloud.com/FirstEar/stream/',
+        'info_dict': {
+            'id': 'FirstEar',
+            'title': 'First Ear',
+            'description': 'Curators of good music\nfirstearmusic.com',
+        },
+        'playlist_mincount': 192,
+    }
+
+    def _real_extract(self, url):
+        user_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, user_id)
+
+        entries = []
+        prev_page_url = None
+
+        def _handle_page(page):
+            entries.extend(self._find_urls_in_page(page))
+            return self._search_regex(
+                r'm-next-page-url="([^"]+)"', page,
+                'next page URL', default=None)
+
+        next_page_url = _handle_page(webpage)
+
+        for idx in itertools.count(0):
+            if not next_page_url or prev_page_url == next_page_url:
+                break
+
+            prev_page_url = next_page_url
+            current_page = int(self._search_regex(
+                r'\?page=(\d+)', next_page_url, 'next page number'))
+
+            next_page_url = _handle_page(self._fetch_tracks_page(
+                '%s/stream' % user_id, user_id, 'stream', idx,
+                real_page_number=current_page))
+
+        username = self._og_search_title(webpage)
+        description = self._get_user_description(webpage)
+
+        return self.playlist_result(entries, user_id, username, description)
diff --git a/youtube_dl/extractor/mnet.py b/youtube_dl/extractor/mnet.py

new file mode 100644 (file)

index 0000000..e3f42e7
--- /dev/null
+++ b/youtube_dl/extractor/mnet.py
@@ -0,0 +1,81 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+)
+
+
+class MnetIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?mnet\.(?:com|interest\.me)/tv/vod/(?:.*?\bclip_id=)?(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.mnet.com/tv/vod/171008',
+        'info_dict': {
+            'id': '171008',
+            'title': 'SS_이해인@히든박스',
+            'description': 'md5:b9efa592c3918b615ba69fe9f8a05c55',
+            'duration': 88,
+            'upload_date': '20151231',
+            'timestamp': 1451564040,
+            'age_limit': 0,
+            'thumbnails': 'mincount:5',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'ext': 'flv',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://mnet.interest.me/tv/vod/172790',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.mnet.com/tv/vod/vod_view.asp?clip_id=172790&tabMenu=',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        info = self._download_json(
+            'http://content.api.mnet.com/player/vodConfig?id=%s&ctype=CLIP' % video_id,
+            video_id, 'Downloading vod config JSON')['data']['info']
+
+        title = info['title']
+
+        rtmp_info = self._download_json(
+            info['cdn'], video_id, 'Downloading vod cdn JSON')
+
+        formats = [{
+            'url': rtmp_info['serverurl'] + rtmp_info['fileurl'],
+            'ext': 'flv',
+            'page_url': url,
+            'player_url': 'http://flvfile.mnet.com/service/player/201602/cjem_player_tv.swf?v=201602191318',
+        }]
+
+        description = info.get('ment')
+        duration = parse_duration(info.get('time'))
+        timestamp = parse_iso8601(info.get('date'), delimiter=' ')
+        age_limit = info.get('adult')
+        if age_limit is not None:
+            age_limit = 0 if age_limit == 'N' else 18
+        thumbnails = [{
+            'id': thumb_format,
+            'url': thumb['url'],
+            'width': int_or_none(thumb.get('width')),
+            'height': int_or_none(thumb.get('height')),
+        } for thumb_format, thumb in info.get('cover', {}).items() if thumb.get('url')]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': timestamp,
+            'age_limit': age_limit,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py

index 5a66302f6ec317f89c4153248565159ebd075010..978d5d5bfeaf5ff64b7279343876a3177c43339a 100644 (file)
--- a/youtube_dl/extractor/moevideo.py
+++ b/youtube_dl/extractor/moevideo.py
@@ -5,13 +5,11 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -79,8 +77,8 @@ class MoeVideoIE(InfoExtractor):
              ],
          ]
          r_json = json.dumps(r)
-        post = compat_urllib_parse.urlencode({'r': r_json})
-        req = compat_urllib_request.Request(self._API_URL, post)
+        post = urlencode_postdata({'r': r_json})
+        req = sanitized_Request(self._API_URL, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          response = self._download_json(req, video_id)
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index 9bf99a54a98c4838c2b878db3ec165c867602110..e47c8011924cb0f5ecddefd33b35debd0324d5a9 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -7,8 +7,8 @@ from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
+from ..utils import sanitized_Request
  
  
  class MofosexIE(InfoExtractor):
@@ -29,7 +29,7 @@ class MofosexIE(InfoExtractor):
          video_id = mobj.group('id')
          url = 'http://www.' + mobj.group('url')
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
@@ -38,7 +38,7 @@ class MofosexIE(InfoExtractor):
          path = compat_urllib_parse_urlparse(video_url).path
          extension = os.path.splitext(path)[1][1:]
          format = path.split('/')[5].split('_')[:2]
-        format = "-".join(format)
+        format = '-'.join(format)
  
          age_limit = self._rta_search(webpage)
  
diff --git a/youtube_dl/extractor/moniker.py b/youtube_dl/extractor/moniker.py

index 88dcd4f737544356091220d53078bc1c2e222d76..b208820fe64970b3f7b362dba13c534a6955686d 100644 (file)
--- a/youtube_dl/extractor/moniker.py
+++ b/youtube_dl/extractor/moniker.py
@@ -5,16 +5,17 @@ import os.path
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..utils import (
+    ExtractorError,
+    remove_start,
+    sanitized_Request,
+    urlencode_postdata,
  )
-from ..utils import ExtractorError
  
  
  class MonikerIE(InfoExtractor):
      IE_DESC = 'allmyvideos.net and vidspot.net'
-    _VALID_URL = r'https?://(?:www\.)?(?:allmyvideos|vidspot)\.net/(?P<id>[a-zA-Z0-9_-]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:allmyvideos|vidspot)\.net/(?:(?:2|v)/v-)?(?P<id>[a-zA-Z0-9_-]+)'
  
      _TESTS = [{
          'url': 'http://allmyvideos.net/jih3nce3x6wn',
@@ -24,6 +25,14 @@ class MonikerIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'youtube-dl test video',
          },
+    }, {
+        'url': 'http://allmyvideos.net/embed-jih3nce3x6wn',
+        'md5': '710883dee1bfc370ecf9fa6a89307c88',
+        'info_dict': {
+            'id': 'jih3nce3x6wn',
+            'ext': 'mp4',
+            'title': 'youtube-dl test video',
+        },
      }, {
          'url': 'http://vidspot.net/l2ngsmhs8ci5',
          'md5': '710883dee1bfc370ecf9fa6a89307c88',
@@ -35,10 +44,25 @@ class MonikerIE(InfoExtractor):
      }, {
          'url': 'https://www.vidspot.net/l2ngsmhs8ci5',
          'only_matching': True,
+    }, {
+        'url': 'http://vidspot.net/2/v-ywDf99',
+        'md5': '5f8254ce12df30479428b0152fb8e7ba',
+        'info_dict': {
+            'id': 'ywDf99',
+            'ext': 'mp4',
+            'title': 'IL FAIT LE MALIN EN PORSHE CAYENNE ( mais pas pour longtemps)',
+            'description': 'IL FAIT LE MALIN EN PORSHE CAYENNE.',
+        },
+    }, {
+        'url': 'http://allmyvideos.net/v/v-HXZm5t',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        orig_video_id = self._match_id(url)
+        video_id = remove_start(orig_video_id, 'embed-')
+        url = url.replace(orig_video_id, video_id)
+        assert re.match(self._VALID_URL, url) is not None
          orig_webpage = self._download_webpage(url, video_id)
  
          if '>File Not Found<' in orig_webpage:
@@ -50,18 +74,30 @@ class MonikerIE(InfoExtractor):
              raise ExtractorError(
                  '%s returned error: %s' % (self.IE_NAME, error), expected=True)
  
-        fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
-        data = dict(fields)
+        builtin_url = self._search_regex(
+            r'<iframe[^>]+src=(["\'])(?P<url>.+?/builtin-.+?)\1',
+            orig_webpage, 'builtin URL', default=None, group='url')
  
-        post = compat_urllib_parse.urlencode(data)
-        headers = {
-            b'Content-Type': b'application/x-www-form-urlencoded',
-        }
-        req = compat_urllib_request.Request(url, post, headers)
-        webpage = self._download_webpage(
-            req, video_id, note='Downloading video page ...')
+        if builtin_url:
+            req = sanitized_Request(builtin_url)
+            req.add_header('Referer', url)
+            webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
+            title = self._og_search_title(orig_webpage).strip()
+            description = self._og_search_description(orig_webpage).strip()
+        else:
+            fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
+            data = dict(fields)
+
+            post = urlencode_postdata(data)
+            headers = {
+                b'Content-Type': b'application/x-www-form-urlencoded',
+            }
+            req = sanitized_Request(url, post, headers)
+            webpage = self._download_webpage(
+                req, video_id, note='Downloading video page ...')
  
-        title = os.path.splitext(data['fname'])[0]
+            title = os.path.splitext(data['fname'])[0]
+            description = None
  
          # Could be several links with different quality
          links = re.findall(r'"file" : "?(.+?)",', webpage)
@@ -75,5 +111,6 @@ class MonikerIE(InfoExtractor):
          return {
              'id': video_id,
              'title': title,
+            'description': description,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/mooshare.py b/youtube_dl/extractor/mooshare.py

deleted file mode 100644 (file)

index 7603af5..0000000
--- a/youtube_dl/extractor/mooshare.py
+++ /dev/null
@@ -1,112 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
-)
-from ..utils import (
-    ExtractorError,
-)
-
-
-class MooshareIE(InfoExtractor):
-    IE_NAME = 'mooshare'
-    IE_DESC = 'Mooshare.biz'
-    _VALID_URL = r'http://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
-
-    _TESTS = [
-        {
-            'url': 'http://mooshare.biz/8dqtk4bjbp8g',
-            'md5': '4e14f9562928aecd2e42c6f341c8feba',
-            'info_dict': {
-                'id': '8dqtk4bjbp8g',
-                'ext': 'mp4',
-                'title': 'Comedy Football 2011 - (part 1-2)',
-                'duration': 893,
-            },
-        },
-        {
-            'url': 'http://mooshare.biz/aipjtoc4g95j',
-            'info_dict': {
-                'id': 'aipjtoc4g95j',
-                'ext': 'mp4',
-                'title': 'Orange Caramel  Dashing Through the Snow',
-                'duration': 212,
-            },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            }
-        }
-    ]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        page = self._download_webpage(url, video_id, 'Downloading page')
-
-        if re.search(r'>Video Not Found or Deleted<', page) is not None:
-            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
-
-        hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
-        title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
-
-        download_form = {
-            'op': 'download1',
-            'id': video_id,
-            'hash': hash_key,
-        }
-
-        request = compat_urllib_request.Request(
-            'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form))
-        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-
-        self._sleep(5, video_id)
-
-        video_page = self._download_webpage(request, video_id, 'Downloading video page')
-
-        thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
-        duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
-        duration = int(duration_str) if duration_str is not None else None
-
-        formats = []
-
-        # SD video
-        mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('url'),
-                'format_id': 'sd',
-                'format': 'SD',
-            })
-
-        # HD video
-        mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('url'),
-                'format_id': 'hd',
-                'format': 'HD',
-            })
-
-        # rtmp video
-        mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
-        if mobj is not None:
-            formats.append({
-                'url': mobj.group('rtmpurl'),
-                'play_path': mobj.group('playpath'),
-                'rtmp_live': False,
-                'ext': 'mp4',
-                'format_id': 'rtmp',
-                'format': 'HD',
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py

index 97d5da626a7a5d2555ac3107eb89d1a4fd11b510..5e1a8a71a93aa28962d7f260af966d10cf8e9f7a 100644 (file)
--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -5,62 +5,73 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    ExtractorError,
      str_to_int,
      unified_strdate,
  )
  
  
  class MotherlessIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?motherless\.com/(?:g/[a-z0-9_]+/)?(?P<id>[A-Z0-9]+)'
-    _TESTS = [
-        {
-            'url': 'http://motherless.com/AC3FFE1',
-            'md5': '310f62e325a9fafe64f68c0bccb6e75f',
-            'info_dict': {
-                'id': 'AC3FFE1',
-                'ext': 'mp4',
-                'title': 'Fucked in the ass while playing PS3',
-                'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
-                'upload_date': '20100913',
-                'uploader_id': 'famouslyfuckedup',
-                'thumbnail': 're:http://.*\.jpg',
-                'age_limit': 18,
-            }
-        },
-        {
-            'url': 'http://motherless.com/532291B',
-            'md5': 'bc59a6b47d1f958e61fbd38a4d31b131',
-            'info_dict': {
-                'id': '532291B',
-                'ext': 'mp4',
-                'title': 'Amazing girl playing the omegle game, PERFECT!',
-                'categories': ['Amateur', 'webcam', 'omegle', 'pink', 'young', 'masturbate', 'teen', 'game', 'hairy'],
-                'upload_date': '20140622',
-                'uploader_id': 'Sulivana7x',
-                'thumbnail': 're:http://.*\.jpg',
-                'age_limit': 18,
-            }
+    _VALID_URL = r'https?://(?:www\.)?motherless\.com/(?:g/[a-z0-9_]+/)?(?P<id>[A-Z0-9]+)'
+    _TESTS = [{
+        'url': 'http://motherless.com/AC3FFE1',
+        'md5': '310f62e325a9fafe64f68c0bccb6e75f',
+        'info_dict': {
+            'id': 'AC3FFE1',
+            'ext': 'mp4',
+            'title': 'Fucked in the ass while playing PS3',
+            'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
+            'upload_date': '20100913',
+            'uploader_id': 'famouslyfuckedup',
+            'thumbnail': 're:http://.*\.jpg',
+            'age_limit': 18,
+        }
+    }, {
+        'url': 'http://motherless.com/532291B',
+        'md5': 'bc59a6b47d1f958e61fbd38a4d31b131',
+        'info_dict': {
+            'id': '532291B',
+            'ext': 'mp4',
+            'title': 'Amazing girl playing the omegle game, PERFECT!',
+            'categories': ['Amateur', 'webcam', 'omegle', 'pink', 'young', 'masturbate', 'teen',
+                           'game', 'hairy'],
+            'upload_date': '20140622',
+            'uploader_id': 'Sulivana7x',
+            'thumbnail': 're:http://.*\.jpg',
+            'age_limit': 18,
          },
-        {
-            'url': 'http://motherless.com/g/cosplay/633979F',
-            'md5': '0b2a43f447a49c3e649c93ad1fafa4a0',
-            'info_dict': {
-                'id': '633979F',
-                'ext': 'mp4',
-                'title': 'Turtlette',
-                'categories': ['superheroine heroine  superher'],
-                'upload_date': '20140827',
-                'uploader_id': 'shade0230',
-                'thumbnail': 're:http://.*\.jpg',
-                'age_limit': 18,
-            }
+        'skip': '404',
+    }, {
+        'url': 'http://motherless.com/g/cosplay/633979F',
+        'md5': '0b2a43f447a49c3e649c93ad1fafa4a0',
+        'info_dict': {
+            'id': '633979F',
+            'ext': 'mp4',
+            'title': 'Turtlette',
+            'categories': ['superheroine heroine  superher'],
+            'upload_date': '20140827',
+            'uploader_id': 'shade0230',
+            'thumbnail': 're:http://.*\.jpg',
+            'age_limit': 18,
          }
-    ]
+    }, {
+        # no keywords
+        'url': 'http://motherless.com/8B4BBC1',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
+        if any(p in webpage for p in (
+                '<title>404 - MOTHERLESS.COM<',
+                ">The page you're looking for cannot be found.<")):
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
+        if '>The content you are trying to view is for friends only.' in webpage:
+            raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
+
          title = self._html_search_regex(
              r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
          video_url = self._html_search_regex(
@@ -86,7 +97,7 @@ class MotherlessIE(InfoExtractor):
              r'"thumb-member-username">\s+<a href="/m/([^"]+)"',
              webpage, 'uploader_id')
  
-        categories = self._html_search_meta('keywords', webpage)
+        categories = self._html_search_meta('keywords', webpage, default=None)
          if categories:
              categories = [cat.strip() for cat in categories.split(',')]
  
diff --git a/youtube_dl/extractor/motorsport.py b/youtube_dl/extractor/motorsport.py

index c1a482dba39fb98efdb28e85b681565eb58e3f9e..370328b362c2a0661925d054be121a7216dc94c7 100644 (file)
--- a/youtube_dl/extractor/motorsport.py
+++ b/youtube_dl/extractor/motorsport.py
@@ -9,7 +9,7 @@ from ..compat import (
  
  class MotorsportIE(InfoExtractor):
      IE_DESC = 'motorsport.com'
-    _VALID_URL = r'http://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
+    _VALID_URL = r'https?://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
      _TEST = {
          'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
          'info_dict': {
diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py

index 04e17d0551c7a46feff1822c4dc4be38d00cc520..d0cb8278e9860591a3bcb6e711998ae204c4962d 100644 (file)
--- a/youtube_dl/extractor/movieclips.py
+++ b/youtube_dl/extractor/movieclips.py
@@ -1,80 +1,49 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-)
  from ..utils import (
-    ExtractorError,
-    clean_html,
+    smuggle_url,
+    float_or_none,
+    parse_iso8601,
+    update_url_query,
  )
  
  
  class MovieClipsIE(InfoExtractor):
-    _VALID_URL = r'https?://movieclips\.com/(?P<id>[\da-zA-Z]+)(?:-(?P<display_id>[\da-z-]+))?'
+    _VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
      _TEST = {
-        'url': 'http://movieclips.com/Wy7ZU-my-week-with-marilyn-movie-do-you-love-me/',
+        'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597',
+        'md5': '42b5a0352d4933a7bd54f2104f481244',
          'info_dict': {
-            'id': 'Wy7ZU',
-            'display_id': 'my-week-with-marilyn-movie-do-you-love-me',
+            'id': 'pKIGmG83AqD9',
              'ext': 'mp4',
-            'title': 'My Week with Marilyn - Do You Love Me?',
-            'description': 'md5:e86795bd332fe3cff461e7c8dc542acb',
+            'title': 'Warcraft Trailer 1',
+            'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1446843055,
+            'upload_date': '20151106',
+            'uploader': 'Movieclips',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        }
+        'add_ie': ['ThePlatform'],
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-        show_id = display_id or video_id
-
-        config = self._download_xml(
-            'http://config.movieclips.com/player/config/%s' % video_id,
-            show_id, 'Downloading player config')
-
-        if config.find('./country-region').text == 'false':
-            raise ExtractorError(
-                '%s said: %s' % (self.IE_NAME, config.find('./region_alert').text), expected=True)
-
-        properties = config.find('./video/properties')
-        smil_file = properties.attrib['smil_file']
-
-        smil = self._download_xml(smil_file, show_id, 'Downloading SMIL')
-        base_url = smil.find('./head/meta').attrib['base']
-
-        formats = []
-        for video in smil.findall('./body/switch/video'):
-            vbr = int(video.attrib['system-bitrate']) / 1000
-            src = video.attrib['src']
-            formats.append({
-                'url': base_url,
-                'play_path': src,
-                'ext': src.split(':')[0],
-                'vbr': vbr,
-                'format_id': '%dk' % vbr,
-            })
-
-        self._sort_formats(formats)
-
-        title = '%s - %s' % (properties.attrib['clip_movie_title'], properties.attrib['clip_title'])
-        description = clean_html(compat_str(properties.attrib['clip_description']))
-        thumbnail = properties.attrib['image']
-        categories = properties.attrib['clip_categories'].split(',')
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        video = next(v for v in self._parse_json(self._search_regex(
+            r'var\s+__REACT_ENGINE__\s*=\s*({.+});',
+            webpage, 'react engine'), video_id)['playlist']['videos'] if v['id'] == video_id)
  
          return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'categories': categories,
-            'formats': formats,
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(update_url_query(
+                video['contentUrl'], {'mbr': 'true'}), {'force_smil_url': True}),
+            'title': self._og_search_title(webpage),
+            'description': self._html_search_meta('description', webpage),
+            'duration': float_or_none(video.get('duration')),
+            'timestamp': parse_iso8601(video.get('dateCreated')),
+            'thumbnail': video.get('defaultImage'),
+            'uploader': video.get('provider'),
          }
diff --git a/youtube_dl/extractor/movshare.py b/youtube_dl/extractor/movshare.py

deleted file mode 100644 (file)

index 6101063..0000000
--- a/youtube_dl/extractor/movshare.py
+++ /dev/null
@@ -1,27 +0,0 @@
-from __future__ import unicode_literals
-
-from .novamov import NovaMovIE
-
-
-class MovShareIE(NovaMovIE):
-    IE_NAME = 'movshare'
-    IE_DESC = 'MovShare'
-
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'movshare\.(?:net|sx|ag)'}
-
-    _HOST = 'www.movshare.net'
-
-    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
-    _TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
-    _DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
-
-    _TEST = {
-        'url': 'http://www.movshare.net/video/559e28be54d96',
-        'md5': 'abd31a2132947262c50429e1d16c1bfd',
-        'info_dict': {
-            'id': '559e28be54d96',
-            'ext': 'flv',
-            'title': 'dissapeared image',
-            'description': 'optical illusion  dissapeared image  magic illusion',
-        }
-    }
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index b48fac5e3e434569642284d0b6388cab34696b01..640ee3d9339c48e2b3fef0ade15ee8ebcae8b292 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -4,18 +4,20 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+    compat_urllib_parse_urlencode,
      compat_str,
  )
  from ..utils import (
      ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
+    float_or_none,
      HEADRequest,
+    sanitized_Request,
      unescapeHTML,
      url_basename,
      RegexNotFoundError,
+    xpath_text,
  )
  
  
@@ -53,7 +55,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
  
      def _extract_mobile_video_formats(self, mtvn_id):
          webpage_url = self._MOBILE_TEMPLATE % mtvn_id
-        req = compat_urllib_request.Request(webpage_url)
+        req = sanitized_Request(webpage_url)
          # Otherwise we get a webpage that would execute some javascript
          req.add_header('User-Agent', 'curl/7')
          webpage = self._download_webpage(req, mtvn_id,
@@ -67,7 +69,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
          return [{'url': url, 'ext': 'mp4'}]
  
      def _extract_video_formats(self, mdoc, mtvn_id):
-        if re.match(r'.*/(error_country_block\.swf|geoblock\.mp4)$', mdoc.find('.//src').text) is not None:
+        if re.match(r'.*/(error_country_block\.swf|geoblock\.mp4|copyright_error\.flv(?:\?geo\b.+?)?)$', mdoc.find('.//src').text) is not None:
              if mtvn_id is not None and self._MOBILE_TEMPLATE is not None:
                  self.to_screen('The normal version is not available from your '
                                 'country, trying with the mobile version')
@@ -110,11 +112,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
          uri = itemdoc.find('guid').text
          video_id = self._id_from_uri(uri)
          self.report_extraction(video_id)
-        mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url']
+        content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
+        mediagen_url = content_el.attrib['url']
          # Remove the templates, like &device={device}
          mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', mediagen_url)
          if 'acceptMethods' not in mediagen_url:
-            mediagen_url += '&acceptMethods=fms'
+            mediagen_url += '&' if '?' in mediagen_url else '?'
+            mediagen_url += 'acceptMethods=fms'
  
          mediagen_doc = self._download_xml(mediagen_url, video_id,
                                            'Downloading video urls')
@@ -127,11 +131,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
              message += item.text
              raise ExtractorError(message, expected=True)
  
-        description_node = itemdoc.find('description')
-        if description_node is not None:
-            description = description_node.text.strip()
-        else:
-            description = None
+        description = xpath_text(itemdoc, 'description')
  
          title_el = None
          if title_el is None:
@@ -141,7 +141,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
          if title_el is None:
              title_el = itemdoc.find('.//{http://search.yahoo.com/mrss/}title')
          if title_el is None:
-            title_el = itemdoc.find('.//title')
+            title_el = itemdoc.find('.//title') or itemdoc.find('./title')
              if title_el.text is None:
                  title_el = None
  
@@ -164,25 +164,29 @@ class MTVServicesInfoExtractor(InfoExtractor):
              'id': video_id,
              'thumbnail': self._get_thumbnail_url(uri, itemdoc),
              'description': description,
+            'duration': float_or_none(content_el.attrib.get('duration')),
          }
  
+    def _get_feed_query(self, uri):
+        data = {'uri': uri}
+        if self._LANG:
+            data['lang'] = self._LANG
+        return compat_urllib_parse_urlencode(data)
+
      def _get_videos_info(self, uri):
          video_id = self._id_from_uri(uri)
          feed_url = self._get_feed_url(uri)
-        data = compat_urllib_parse.urlencode({'uri': uri})
-        info_url = feed_url + '?'
-        if self._LANG:
-            info_url += 'lang=%s&' % self._LANG
-        info_url += data
+        info_url = feed_url + '?' + self._get_feed_query(uri)
+        return self._get_videos_info_from_url(info_url, video_id)
+
+    def _get_videos_info_from_url(self, url, video_id):
          idoc = self._download_xml(
-            info_url, video_id,
+            url, video_id,
              'Downloading info', transform_source=fix_xml_ampersands)
          return self.playlist_result(
              [self._get_video_info(item) for item in idoc.findall('.//item')])
  
-    def _real_extract(self, url):
-        title = url_basename(url)
-        webpage = self._download_webpage(url, title)
+    def _extract_mgid(self, webpage):
          try:
              # the url can be http://media.mtvnservices.com/fb/{mgid}.swf
              # or http://media.mtvnservices.com/{mgid}
@@ -196,8 +200,19 @@ class MTVServicesInfoExtractor(InfoExtractor):
          if mgid is None or ':' not in mgid:
              mgid = self._search_regex(
                  [r'data-mgid="(.*?)"', r'swfobject.embedSWF\(".*?(mgid:.*?)"'],
-                webpage, 'mgid')
+                webpage, 'mgid', default=None)
+
+        if not mgid:
+            sm4_embed = self._html_search_meta(
+                'sm4:video:embed', webpage, 'sm4 embed', default='')
+            mgid = self._search_regex(
+                r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid')
+        return mgid
  
+    def _real_extract(self, url):
+        title = url_basename(url)
+        webpage = self._download_webpage(url, title)
+        mgid = self._extract_mgid(webpage)
          videos_info = self._get_videos_info(mgid)
          return videos_info
  
@@ -218,6 +233,13 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
          },
      }
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
+        if mobj:
+            return mobj.group('url')
+
      def _get_feed_url(self, uri):
          video_id = self._id_from_uri(uri)
          site_id = uri.replace(video_id, '')
@@ -288,3 +310,65 @@ class MTVIggyIE(MTVServicesInfoExtractor):
          }
      }
      _FEED_URL = 'http://all.mtvworldverticals.com/feed-xml/'
+
+
+class MTVDEIE(MTVServicesInfoExtractor):
+    IE_NAME = 'mtv.de'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.de/(?:artists|shows|news)/(?:[^/]+/)*(?P<id>\d+)-[^/#?]+/*(?:[#?].*)?$'
+    _TESTS = [{
+        'url': 'http://www.mtv.de/artists/10571-cro/videos/61131-traum',
+        'info_dict': {
+            'id': 'music_video-a50bc5f0b3aa4b3190aa',
+            'ext': 'mp4',
+            'title': 'MusicVideo_cro-traum',
+            'description': 'Cro - Traum',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # mediagen URL without query (e.g. http://videos.mtvnn.com/mediagen/e865da714c166d18d6f80893195fcb97)
+        'url': 'http://www.mtv.de/shows/933-teen-mom-2/staffeln/5353/folgen/63565-enthullungen',
+        'info_dict': {
+            'id': 'local_playlist-f5ae778b9832cc837189',
+            'ext': 'mp4',
+            'title': 'Episode_teen-mom-2_shows_season-5_episode-1_full-episode_part1',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # single video in pagePlaylist with different id
+        'url': 'http://www.mtv.de/news/77491-mtv-movies-spotlight-pixels-teil-3',
+        'info_dict': {
+            'id': 'local_playlist-4e760566473c4c8c5344',
+            'ext': 'mp4',
+            'title': 'Article_mtv-movies-spotlight-pixels-teil-3_short-clips_part1',
+            'description': 'MTV Movies Supercut',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        playlist = self._parse_json(
+            self._search_regex(
+                r'window\.pagePlaylist\s*=\s*(\[.+?\]);\n', webpage, 'page playlist'),
+            video_id)
+
+        # news pages contain single video in playlist with different id
+        if len(playlist) == 1:
+            return self._get_videos_info_from_url(playlist[0]['mrss'], video_id)
+
+        for item in playlist:
+            item_id = item.get('id')
+            if item_id and compat_str(item_id) == video_id:
+                return self._get_videos_info_from_url(item['mrss'], video_id)
diff --git a/youtube_dl/extractor/musicplayon.py b/youtube_dl/extractor/musicplayon.py

index 50d92b50ae5ec2fa49e45cc64aea7f08cc21ccea..2174e5665778b590055c06255a91c030cb579d29 100644 (file)
--- a/youtube_dl/extractor/musicplayon.py
+++ b/youtube_dl/extractor/musicplayon.py
@@ -1,17 +1,21 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import int_or_none
+from ..compat import compat_urlparse
+from ..utils import (
+    int_or_none,
+    js_to_json,
+    mimetype2ext,
+)
  
  
  class MusicPlayOnIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://en.musicplayon.com/play?v=433377',
+        'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
          'info_dict': {
              'id': '433377',
              'ext': 'mp4',
@@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
              'duration': 342,
              'uploader': 'ultrafish',
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }
+    }, {
+        'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
+        'only_matching': True,
+    }]
+
+    _URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
+        url = self._URL_TEMPLATE % video_id
  
          page = self._download_webpage(url, video_id)
  
@@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
          uploader = self._html_search_regex(
              r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
  
-        formats = [
-            {
-                'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id,
-                'ext': 'mp4',
-            }
-        ]
-
-        manifest = self._download_webpage(
-            'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
-
-        for entry in manifest.split('#')[1:]:
-            if entry.startswith('EXT-X-STREAM-INF:'):
-                meta, url, _ = entry.split('\n')
-                params = dict(param.split('=') for param in meta.split(',')[1:])
-                formats.append({
-                    'url': url,
-                    'ext': 'mp4',
-                    'tbr': int(params['BANDWIDTH']),
-                    'width': int(params['RESOLUTION'].split('x')[1]),
-                    'height': int(params['RESOLUTION'].split('x')[-1]),
-                    'format_note': params['NAME'].replace('"', '').strip(),
-                })
+        sources = self._parse_json(
+            self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
+            video_id, transform_source=js_to_json)
+        formats = [{
+            'url': compat_urlparse.urljoin(url, source['src']),
+            'ext': mimetype2ext(source.get('type')),
+            'format_note': source.get('data-res'),
+        } for source in sources]
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/musicvault.py b/youtube_dl/extractor/musicvault.py

deleted file mode 100644 (file)

index 0e46ac7..0000000
--- a/youtube_dl/extractor/musicvault.py
+++ /dev/null
@@ -1,63 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class MusicVaultIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.musicvault\.com/(?P<uploader_id>[^/?#]*)/video/(?P<display_id>[^/?#]*)_(?P<id>[0-9]+)\.html'
-    _TEST = {
-        'url': 'http://www.musicvault.com/the-allman-brothers-band/video/straight-from-the-heart_1010863.html',
-        'md5': '3adcbdb3dcc02d647539e53f284ba171',
-        'info_dict': {
-            'id': '1010863',
-            'ext': 'mp4',
-            'uploader_id': 'the-allman-brothers-band',
-            'title': 'Straight from the Heart',
-            'duration': 244,
-            'uploader': 'The Allman Brothers Band',
-            'thumbnail': 're:^https?://.*/thumbnail/.*',
-            'upload_date': '20131219',
-            'location': 'Capitol Theatre (Passaic, NJ)',
-            'description': 'Listen to The Allman Brothers Band perform Straight from the Heart at Capitol Theatre (Passaic, NJ) on Dec 16, 1981',
-            'timestamp': int,
-        }
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('display_id')
-        webpage = self._download_webpage(url, display_id)
-
-        thumbnail = self._search_regex(
-            r'<meta itemprop="thumbnail" content="([^"]+)"',
-            webpage, 'thumbnail', fatal=False)
-
-        data_div = self._search_regex(
-            r'(?s)<div class="data">(.*?)</div>', webpage, 'data fields')
-        uploader = self._html_search_regex(
-            r'<h1.*?>(.*?)</h1>', data_div, 'uploader', fatal=False)
-        title = self._html_search_regex(
-            r'<h2.*?>(.*?)</h2>', data_div, 'title')
-        location = self._html_search_regex(
-            r'<h4.*?>(.*?)</h4>', data_div, 'location', fatal=False)
-
-        kaltura_id = self._search_regex(
-            r'<div id="video-detail-player" data-kaltura-id="([^"]+)"',
-            webpage, 'kaltura ID')
-        wid = self._search_regex(r'/wid/_([0-9]+)/', webpage, 'wid')
-
-        return {
-            'id': mobj.group('id'),
-            '_type': 'url_transparent',
-            'url': 'kaltura:%s:%s' % (wid, kaltura_id),
-            'ie_key': 'Kaltura',
-            'display_id': display_id,
-            'uploader_id': mobj.group('uploader_id'),
-            'thumbnail': thumbnail,
-            'description': self._html_search_meta('description', webpage),
-            'location': location,
-            'title': title,
-            'uploader': uploader,
-        }
diff --git a/youtube_dl/extractor/muzu.py b/youtube_dl/extractor/muzu.py

index 1e9cf8de9174e086dd7c19525a7dc94025075683..cbc800481bc16528883a6be58357a38dcbd2c195 100644 (file)
--- a/youtube_dl/extractor/muzu.py
+++ b/youtube_dl/extractor/muzu.py
@@ -1,9 +1,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse_urlencode
  
  
  class MuzuTVIE(InfoExtractor):
@@ -25,7 +23,7 @@ class MuzuTVIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        info_data = compat_urllib_parse.urlencode({
+        info_data = compat_urllib_parse_urlencode({
              'format': 'json',
              'url': url,
          })
@@ -41,7 +39,7 @@ class MuzuTVIE(InfoExtractor):
              if video_info.get('v%s' % quality):
                  break
  
-        data = compat_urllib_parse.urlencode({
+        data = compat_urllib_parse_urlencode({
              'ai': video_id,
              # Even if each time you watch a video the hash changes,
              # it seems to work for different videos, and it will work
diff --git a/youtube_dl/extractor/mwave.py b/youtube_dl/extractor/mwave.py

new file mode 100644 (file)

index 0000000..5c3c8d4
--- /dev/null
+++ b/youtube_dl/extractor/mwave.py
@@ -0,0 +1,58 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    parse_duration,
+)
+
+
+class MwaveIE(InfoExtractor):
+    _VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
+        # md5 is unstable
+        'info_dict': {
+            'id': '168859',
+            'ext': 'flv',
+            'title': '[M COUNTDOWN] SISTAR - SHAKE IT',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'M COUNTDOWN',
+            'duration': 206,
+            'view_count': int,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        vod_info = self._download_json(
+            'http://mwave.interest.me/onair/vod_info.m?vodtype=CL&sectorid=&endinfo=Y&id=%s' % video_id,
+            video_id, 'Download vod JSON')
+
+        formats = []
+        for num, cdn_info in enumerate(vod_info['cdn']):
+            stream_url = cdn_info.get('url')
+            if not stream_url:
+                continue
+            stream_name = cdn_info.get('name') or compat_str(num)
+            f4m_stream = self._download_json(
+                stream_url, video_id,
+                'Download %s stream JSON' % stream_name)
+            f4m_url = f4m_stream.get('fileurl')
+            if not f4m_url:
+                continue
+            formats.extend(
+                self._extract_f4m_formats(f4m_url + '&hdcore=3.0.3', video_id, f4m_id=stream_name))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': vod_info['title'],
+            'thumbnail': vod_info.get('cover'),
+            'uploader': vod_info.get('program_title'),
+            'duration': parse_duration(vod_info.get('time')),
+            'view_count': int_or_none(vod_info.get('hit')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/myspace.py b/youtube_dl/extractor/myspace.py

index 83414a2325586d7319c06247fa037c42bb2b199a..0d5238d777ad00ab13e84a69474d42b360cdecc1 100644 (file)
--- a/youtube_dl/extractor/myspace.py
+++ b/youtube_dl/extractor/myspace.py
@@ -2,13 +2,13 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
  )
-from ..utils import ExtractorError
  
  
  class MySpaceIE(InfoExtractor):
@@ -24,6 +24,8 @@ class MySpaceIE(InfoExtractor):
                  'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.',
                  'uploader': 'Five Minutes to the Stage',
                  'uploader_id': 'fiveminutestothestage',
+                'timestamp': 1414108751,
+                'upload_date': '20141023',
              },
              'params': {
                  # rtmp download
@@ -64,7 +66,7 @@ class MySpaceIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Starset - First Light',
                  'description': 'md5:2d5db6c9d11d527683bcda818d332414',
-                'uploader': 'Jacob Soren',
+                'uploader': 'Yumi K',
                  'uploader_id': 'SorenPromotions',
                  'upload_date': '20140725',
              }
@@ -78,6 +80,19 @@ class MySpaceIE(InfoExtractor):
          player_url = self._search_regex(
              r'playerSwf":"([^"?]*)', webpage, 'player URL')
  
+        def rtmp_format_from_stream_url(stream_url, width=None, height=None):
+            rtmp_url, play_path = stream_url.split(';', 1)
+            return {
+                'format_id': 'rtmp',
+                'url': rtmp_url,
+                'play_path': play_path,
+                'player_url': player_url,
+                'protocol': 'rtmp',
+                'ext': 'flv',
+                'width': width,
+                'height': height,
+            }
+
          if mobj.group('mediatype').startswith('music/song'):
              # songs don't store any useful info in the 'context' variable
              song_data = self._search_regex(
@@ -93,8 +108,8 @@ class MySpaceIE(InfoExtractor):
                  return self._search_regex(
                      r'''data-%s=([\'"])(?P<data>.*?)\1''' % name,
                      song_data, name, default='', group='data')
-            streamUrl = search_data('stream-url')
-            if not streamUrl:
+            stream_url = search_data('stream-url')
+            if not stream_url:
                  vevo_id = search_data('vevo-id')
                  youtube_id = search_data('youtube-id')
                  if vevo_id:
@@ -106,36 +121,47 @@ class MySpaceIE(InfoExtractor):
                  else:
                      raise ExtractorError(
                          'Found song but don\'t know how to download it')
-            info = {
+            return {
                  'id': video_id,
                  'title': self._og_search_title(webpage),
                  'uploader': search_data('artist-name'),
                  'uploader_id': search_data('artist-username'),
                  'thumbnail': self._og_search_thumbnail(webpage),
+                'duration': int_or_none(search_data('duration')),
+                'formats': [rtmp_format_from_stream_url(stream_url)]
              }
          else:
-            context = json.loads(self._search_regex(
-                r'context = ({.*?});', webpage, 'context'))
-            video = context['video']
-            streamUrl = video['streamUrl']
-            info = {
-                'id': compat_str(video['mediaId']),
+            video = self._parse_json(self._search_regex(
+                r'context = ({.*?});', webpage, 'context'),
+                video_id)['video']
+            formats = []
+            hls_stream_url = video.get('hlsStreamUrl')
+            if hls_stream_url:
+                formats.append({
+                    'format_id': 'hls',
+                    'url': hls_stream_url,
+                    'protocol': 'm3u8_native',
+                    'ext': 'mp4',
+                })
+            stream_url = video.get('streamUrl')
+            if stream_url:
+                formats.append(rtmp_format_from_stream_url(
+                    stream_url,
+                    int_or_none(video.get('width')),
+                    int_or_none(video.get('height'))))
+            self._sort_formats(formats)
+            return {
+                'id': video_id,
                  'title': video['title'],
-                'description': video['description'],
-                'thumbnail': video['imageUrl'],
-                'uploader': video['artistName'],
-                'uploader_id': video['artistUsername'],
+                'description': video.get('description'),
+                'thumbnail': video.get('imageUrl'),
+                'uploader': video.get('artistName'),
+                'uploader_id': video.get('artistUsername'),
+                'duration': int_or_none(video.get('duration')),
+                'timestamp': parse_iso8601(video.get('dateAdded')),
+                'formats': formats,
              }
  
-        rtmp_url, play_path = streamUrl.split(';', 1)
-        info.update({
-            'url': rtmp_url,
-            'play_path': play_path,
-            'player_url': player_url,
-            'ext': 'flv',
-        })
-        return info
-
  
  class MySpaceAlbumIE(InfoExtractor):
      IE_NAME = 'MySpace:album'
diff --git a/youtube_dl/extractor/myspass.py b/youtube_dl/extractor/myspass.py

index 4557a2b13b3e47a75242ebd4a5c095bf17cbaacf..1ca7b1a9e958c221f44c48bced04c314c0957f8c 100644 (file)
--- a/youtube_dl/extractor/myspass.py
+++ b/youtube_dl/extractor/myspass.py
@@ -11,15 +11,15 @@ from ..utils import (
  
  
  class MySpassIE(InfoExtractor):
-    _VALID_URL = r'http://www\.myspass\.de/.*'
+    _VALID_URL = r'https?://www\.myspass\.de/.*'
      _TEST = {
          'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
          'md5': '0b49f4844a068f8b33f4b7c88405862b',
          'info_dict': {
              'id': '11741',
              'ext': 'mp4',
-            "description": "Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
-            "title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2",
+            'description': 'Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
+            'title': 'Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2',
          },
      }
  
diff --git a/youtube_dl/extractor/myvideo.py b/youtube_dl/extractor/myvideo.py

index c96f472a39e569c7dfb88682d36fad9ed6ce2c10..6d447a4935e49cd3c4f7525fff6ffe5e9883656e 100644 (file)
--- a/youtube_dl/extractor/myvideo.py
+++ b/youtube_dl/extractor/myvideo.py
@@ -9,17 +9,18 @@ import json
  from .common import InfoExtractor
  from ..compat import (
      compat_ord,
-    compat_urllib_parse,
      compat_urllib_parse_unquote,
-    compat_urllib_request,
+    compat_urllib_parse_urlencode,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
  class MyVideoIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?myvideo\.de/(?:[^/]+/)?watch/(?P<id>[0-9]+)/[^?/]+.*'
+    _WORKING = False
+    _VALID_URL = r'https?://(?:www\.)?myvideo\.de/(?:[^/]+/)?watch/(?P<id>[0-9]+)/[^?/]+.*'
      IE_NAME = 'myvideo'
      _TEST = {
          'url': 'http://www.myvideo.de/watch/8229274/bowling_fail_or_win',
@@ -83,7 +84,7 @@ class MyVideoIE(InfoExtractor):
  
          mobj = re.search(r'data-video-service="/service/data/video/%s/config' % video_id, webpage)
          if mobj is not None:
-            request = compat_urllib_request.Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
+            request = sanitized_Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
              response = self._download_webpage(request, video_id,
                                                'Downloading video info')
              info = json.loads(base64.b64decode(response).decode('utf-8'))
@@ -111,7 +112,7 @@ class MyVideoIE(InfoExtractor):
                  encxml = compat_urllib_parse_unquote(b)
          if not params.get('domain'):
              params['domain'] = 'www.myvideo.de'
-        xmldata_url = '%s?%s' % (encxml, compat_urllib_parse.urlencode(params))
+        xmldata_url = '%s?%s' % (encxml, compat_urllib_parse_urlencode(params))
          if 'flash_playertype=MTV' in xmldata_url:
              self._downloader.report_warning('avoiding MTV player')
              xmldata_url = (
diff --git a/youtube_dl/extractor/myvidster.py b/youtube_dl/extractor/myvidster.py

index a94ab8358cacc51094ab791ace648ec062eb5f94..731c245428103b3ea96f5c396b063afadac82702 100644 (file)
--- a/youtube_dl/extractor/myvidster.py
+++ b/youtube_dl/extractor/myvidster.py
@@ -4,7 +4,7 @@ from .common import InfoExtractor
  
  
  class MyVidsterIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?myvidster\.com/video/(?P<id>\d+)/'
+    _VALID_URL = r'https?://(?:www\.)?myvidster\.com/video/(?P<id>\d+)/'
  
      _TEST = {
          'url': 'http://www.myvidster.com/video/32059805/Hot_chemistry_with_raw_love_making',
diff --git a/youtube_dl/extractor/nationalgeographic.py b/youtube_dl/extractor/nationalgeographic.py

index 6fc9e7b050c75ced3262ebad9ba92d74ed6b5d7d..72251866303885f6cd9b040ec9ad3f042d8add6a 100644 (file)
--- a/youtube_dl/extractor/nationalgeographic.py
+++ b/youtube_dl/extractor/nationalgeographic.py
@@ -4,30 +4,40 @@ from .common import InfoExtractor
  from ..utils import (
      smuggle_url,
      url_basename,
+    update_url_query,
  )
  
  
  class NationalGeographicIE(InfoExtractor):
-    _VALID_URL = r'http://video\.nationalgeographic\.com/.*?'
+    IE_NAME = 'natgeo'
+    _VALID_URL = r'https?://video\.nationalgeographic\.com/.*?'
  
      _TESTS = [
          {
              'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
+            'md5': '730855d559abbad6b42c2be1fa584917',
              'info_dict': {
-                'id': '4DmDACA6Qtk_',
-                'ext': 'flv',
+                'id': '0000014b-70a1-dd8c-af7f-f7b559330001',
+                'ext': 'mp4',
                  'title': 'Mating Crabs Busted by Sharks',
                  'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
+                'timestamp': 1423523799,
+                'upload_date': '20150209',
+                'uploader': 'NAGS',
              },
              'add_ie': ['ThePlatform'],
          },
          {
              'url': 'http://video.nationalgeographic.com/wild/when-sharks-attack/the-real-jaws',
+            'md5': '6a3105eb448c070503b3105fb9b320b5',
              'info_dict': {
-                'id': '_JeBD_D7PlS5',
-                'ext': 'flv',
+                'id': 'ngc-I0IauNSWznb_UV008GxSbwY35BZvgi2e',
+                'ext': 'mp4',
                  'title': 'The Real Jaws',
                  'description': 'md5:8d3e09d9d53a85cd397b4b21b2c77be6',
+                'timestamp': 1433772632,
+                'upload_date': '20150608',
+                'uploader': 'NAGS',
              },
              'add_ie': ['ThePlatform'],
          },
@@ -37,18 +47,67 @@ class NationalGeographicIE(InfoExtractor):
          name = url_basename(url)
  
          webpage = self._download_webpage(url, name)
-        feed_url = self._search_regex(
-            r'data-feed-url="([^"]+)"', webpage, 'feed url')
          guid = self._search_regex(
              r'id="(?:videoPlayer|player-container)"[^>]+data-guid="([^"]+)"',
              webpage, 'guid')
  
-        feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
-        content = feed.find('.//{http://search.yahoo.com/mrss/}content')
-        theplatform_id = url_basename(content.attrib.get('url'))
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                'http://link.theplatform.com/s/ngs/media/guid/2423130747/%s?mbr=true' % guid,
+                {'force_smil_url': True}),
+            'id': guid,
+        }
  
-        return self.url_result(smuggle_url(
-            'http://link.theplatform.com/s/ngs/%s?format=SMIL&formats=MPEG4&manifest=f4m' % theplatform_id,
-            # For some reason, the normal links don't work and we must force
-            # the use of f4m
-            {'force_smil_url': True}))
+
+class NationalGeographicChannelIE(InfoExtractor):
+    IE_NAME = 'natgeo:channel'
+    _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)'
+
+    _TESTS = [
+        {
+            'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
+            'md5': '518c9aa655686cf81493af5cc21e2a04',
+            'info_dict': {
+                'id': 'nB5vIAfmyllm',
+                'ext': 'mp4',
+                'title': 'Uncovering a Universal Knowledge',
+                'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
+                'timestamp': 1458680907,
+                'upload_date': '20160322',
+                'uploader': 'NEWA-FNG-NGTV',
+            },
+            'add_ie': ['ThePlatform'],
+        },
+        {
+            'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
+            'md5': 'c4912f656b4cbe58f3e000c489360989',
+            'info_dict': {
+                'id': '3TmMv9OvGwIR',
+                'ext': 'mp4',
+                'title': 'The Stunning Red Bird of Paradise',
+                'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
+                'timestamp': 1459362152,
+                'upload_date': '20160330',
+                'uploader': 'NEWA-FNG-NGTV',
+            },
+            'add_ie': ['ThePlatform'],
+        },
+    ]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        release_url = self._search_regex(
+            r'video_auth_playlist_url\s*=\s*"([^"]+)"',
+            webpage, 'release url')
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(
+                update_url_query(release_url, {'mbr': 'true', 'switch': 'http'}),
+                {'force_smil_url': True}),
+            'display_id': display_id,
+        }
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index 925967753bd12816005b5ed8929f0438e9ec0214..6d6f69b440a4b91d95c42210b2e597aca99144f6 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -5,12 +5,11 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
-    clean_html,
  )
  
  
@@ -46,16 +45,16 @@ class NaverIE(InfoExtractor):
          m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
                           webpage)
          if m_id is None:
-            m_error = re.search(
-                r'(?s)<div class="(?:nation_error|nation_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
-                webpage)
-            if m_error:
-                raise ExtractorError(clean_html(m_error.group('msg')), expected=True)
+            error = self._html_search_regex(
+                r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
+                webpage, 'error', default=None)
+            if error:
+                raise ExtractorError(error, expected=True)
              raise ExtractorError('couldn\'t extract vid and key')
          vid = m_id.group(1)
          key = m_id.group(2)
-        query = compat_urllib_parse.urlencode({'vid': vid, 'inKey': key, })
-        query_urls = compat_urllib_parse.urlencode({
+        query = compat_urllib_parse_urlencode({'vid': vid, 'inKey': key, })
+        query_urls = compat_urllib_parse_urlencode({
              'masterVid': vid,
              'protocol': 'p2p',
              'inKey': key,
diff --git a/youtube_dl/extractor/nba.py b/youtube_dl/extractor/nba.py

index 944096e1ca15de964fcdf896adf988c9aa2264bd..d896b0d04810655c1d7c993819b88e7b32029832 100644 (file)
--- a/youtube_dl/extractor/nba.py
+++ b/youtube_dl/extractor/nba.py
@@ -1,63 +1,197 @@
  from __future__ import unicode_literals
  
+import functools
+import os.path
+import re
+
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
  from ..utils import (
-    remove_end,
+    int_or_none,
+    OnDemandPagedList,
      parse_duration,
+    remove_start,
+    xpath_text,
+    xpath_attr,
  )
  
  
  class NBAIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?:nba/)?video(?P<id>/[^?]*?)/?(?:/index\.html)?(?:\?.*)?$'
+    _VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?P<path>(?:[^/]+/)+(?P<id>[^?]*?))/?(?:/index\.html)?(?:\?.*)?$'
      _TESTS = [{
          'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
-        'md5': 'c0edcfc37607344e2ff8f13c378c88a4',
+        'md5': '9e7729d3010a9c71506fd1248f74e4f4',
          'info_dict': {
-            'id': '0021200253-okc-bkn-recap.nba',
+            'id': '0021200253-okc-bkn-recap',
              'ext': 'mp4',
              'title': 'Thunder vs. Nets',
              'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
              'duration': 181,
+            'timestamp': 1354638466,
+            'upload_date': '20121204',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
      }, {
          'url': 'http://www.nba.com/video/games/hornets/2014/12/05/0021400276-nyk-cha-play5.nba/',
          'only_matching': True,
      }, {
-        'url': 'http://watch.nba.com/nba/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
+        'url': 'http://watch.nba.com/video/channels/playoffs/2015/05/20/0041400301-cle-atl-recap.nba',
+        'md5': 'b2b39b81cf28615ae0c3360a3f9668c4',
          'info_dict': {
-            'id': '0041400301-cle-atl-recap.nba',
+            'id': '0041400301-cle-atl-recap',
              'ext': 'mp4',
-            'title': 'NBA GAME TIME | Video: Hawks vs. Cavaliers Game 1',
+            'title': 'Hawks vs. Cavaliers Game 1',
              'description': 'md5:8094c3498d35a9bd6b1a8c396a071b4d',
              'duration': 228,
+            'timestamp': 1432134543,
+            'upload_date': '20150520',
+        }
+    }, {
+        'url': 'http://www.nba.com/clippers/news/doc-rivers-were-not-trading-blake',
+        'info_dict': {
+            'id': '1455672027478-Doc_Feb16_720',
+            'ext': 'mp4',
+            'title': 'Practice: Doc Rivers - 2/16/16',
+            'description': 'Head Coach Doc Rivers addresses the media following practice.',
+            'upload_date': '20160217',
+            'timestamp': 1455672000,
          },
          'params': {
+            # m3u8 download
              'skip_download': True,
-        }
+        },
+    }, {
+        'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#',
+        'info_dict': {
+            'id': 'timberwolves',
+            'title': 'Shootaround Access - Dec. 12 | Andrew Wiggins',
+        },
+        'playlist_count': 30,
+        'params': {
+            # Download the whole playlist takes too long time
+            'playlist_items': '1-30',
+        },
+    }, {
+        'url': 'http://www.nba.com/timberwolves/wiggins-shootaround#',
+        'info_dict': {
+            'id': 'Wigginsmp4',
+            'ext': 'mp4',
+            'title': 'Shootaround Access - Dec. 12 | Andrew Wiggins',
+            'description': 'Wolves rookie Andrew Wiggins addresses the media after Friday\'s shootaround.',
+            'upload_date': '20141212',
+            'timestamp': 1418418600,
+        },
+        'params': {
+            'noplaylist': True,
+            # m3u8 download
+            'skip_download': True,
+        },
      }]
  
+    _PAGE_SIZE = 30
+
+    def _fetch_page(self, team, video_id, page):
+        search_url = 'http://searchapp2.nba.com/nba-search/query.jsp?' + compat_urllib_parse_urlencode({
+            'type': 'teamvideo',
+            'start': page * self._PAGE_SIZE + 1,
+            'npp': (page + 1) * self._PAGE_SIZE + 1,
+            'sort': 'recent',
+            'output': 'json',
+            'site': team,
+        })
+        results = self._download_json(
+            search_url, video_id, note='Download page %d of playlist data' % page)['results'][0]
+        for item in results:
+            yield self.url_result(compat_urlparse.urljoin('http://www.nba.com/', item['url']))
+
+    def _extract_playlist(self, orig_path, video_id, webpage):
+        team = orig_path.split('/')[0]
+
+        if self._downloader.params.get('noplaylist'):
+            self.to_screen('Downloading just video because of --no-playlist')
+            video_path = self._search_regex(
+                r'nbaVideoCore\.firstVideo\s*=\s*\'([^\']+)\';', webpage, 'video path')
+            video_url = 'http://www.nba.com/%s/video/%s' % (team, video_path)
+            return self.url_result(video_url)
+
+        self.to_screen('Downloading playlist - add --no-playlist to just download video')
+        playlist_title = self._og_search_title(webpage, fatal=False)
+        entries = OnDemandPagedList(
+            functools.partial(self._fetch_page, team, video_id),
+            self._PAGE_SIZE, use_cache=True)
+
+        return self.playlist_result(entries, team, playlist_title)
+
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
+        path, video_id = re.match(self._VALID_URL, url).groups()
+        orig_path = path
+        if path.startswith('nba/'):
+            path = path[3:]
+
+        if 'video/' not in path:
+            webpage = self._download_webpage(url, video_id)
+            path = remove_start(self._search_regex(r'data-videoid="([^"]+)"', webpage, 'video id'), '/')
+
+            if path == '{{id}}':
+                return self._extract_playlist(orig_path, video_id, webpage)
+
+            # See prepareContentId() of pkgCvp.js
+            if path.startswith('video/teams'):
+                path = 'video/channels/proxy/' + path[6:]
  
-        video_url = 'http://ht-mobile.cdn.turner.com/nba/big' + video_id + '_nba_1280x720.mp4'
+        video_info = self._download_xml('http://www.nba.com/%s.xml' % path, video_id)
+        video_id = os.path.splitext(xpath_text(video_info, 'slug'))[0]
+        title = xpath_text(video_info, 'headline')
+        description = xpath_text(video_info, 'description')
+        duration = parse_duration(xpath_text(video_info, 'length'))
+        timestamp = int_or_none(xpath_attr(video_info, 'dateCreated', 'uts'))
  
-        shortened_video_id = video_id.rpartition('/')[2]
-        title = remove_end(
-            self._og_search_title(webpage, default=shortened_video_id), ' : NBA.com')
+        thumbnails = []
+        for image in video_info.find('images'):
+            thumbnails.append({
+                'id': image.attrib.get('cut'),
+                'url': image.text,
+                'width': int_or_none(image.attrib.get('width')),
+                'height': int_or_none(image.attrib.get('height')),
+            })
  
-        description = self._og_search_description(webpage)
-        duration_str = self._html_search_meta(
-            'duration', webpage, 'duration', default=None)
-        if not duration_str:
-            duration_str = self._html_search_regex(
-                r'Duration:</b>\s*(\d+:\d+)', webpage, 'duration', fatal=False)
-        duration = parse_duration(duration_str)
+        formats = []
+        for video_file in video_info.findall('.//file'):
+            video_url = video_file.text
+            if video_url.startswith('/'):
+                continue
+            if video_url.endswith('.m3u8'):
+                formats.extend(self._extract_m3u8_formats(video_url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
+            elif video_url.endswith('.f4m'):
+                formats.extend(self._extract_f4m_formats(video_url + '?hdcore=3.4.1.1', video_id, f4m_id='hds', fatal=False))
+            else:
+                key = video_file.attrib.get('bitrate')
+                format_info = {
+                    'format_id': key,
+                    'url': video_url,
+                }
+                mobj = re.search(r'(\d+)x(\d+)(?:_(\d+))?', key)
+                if mobj:
+                    format_info.update({
+                        'width': int(mobj.group(1)),
+                        'height': int(mobj.group(2)),
+                        'tbr': int_or_none(mobj.group(3)),
+                    })
+                formats.append(format_info)
+        self._sort_formats(formats)
  
          return {
-            'id': shortened_video_id,
-            'url': video_url,
+            'id': video_id,
              'title': title,
              'description': description,
              'duration': duration,
+            'timestamp': timestamp,
+            'thumbnails': thumbnails,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index dc2091be0d0c8706b2f3b6d78d88fa22fcb8b6d1..46504cd5ff6aafa40d662caa58eaec08b1c88a48 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -3,15 +3,16 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_HTTPError,
-)
+from .theplatform import ThePlatformIE
  from ..utils import (
-    ExtractorError,
      find_xpath_attr,
      lowercase_escape,
+    smuggle_url,
      unescapeHTML,
+    update_url_query,
+    int_or_none,
+    HEADRequest,
+    parse_iso8601,
  )
  
  
@@ -21,38 +22,51 @@ class NBCIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.nbc.com/the-tonight-show/segments/112966',
-            # md5 checksum is not stable
              'info_dict': {
-                'id': 'c9xnCo0YPOPH',
-                'ext': 'flv',
+                'id': '112966',
+                'ext': 'mp4',
                  'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
                  'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
+                'timestamp': 1424246400,
+                'upload_date': '20150218',
+                'uploader': 'NBCU-COM',
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
              },
          },
          {
              'url': 'http://www.nbc.com/the-tonight-show/episodes/176',
              'info_dict': {
-                'id': 'XwU9KZkp98TH',
+                'id': '176',
                  'ext': 'flv',
                  'title': 'Ricky Gervais, Steven Van Zandt, ILoveMakonnen',
                  'description': 'A brand new episode of The Tonight Show welcomes Ricky Gervais, Steven Van Zandt and ILoveMakonnen.',
              },
-            'skip': 'Only works from US',
+            'skip': '404 Not Found',
          },
          {
              'url': 'http://www.nbc.com/saturday-night-live/video/star-wars-teaser/2832821',
              'info_dict': {
-                'id': '8iUuyzWDdYUZ',
-                'ext': 'flv',
+                'id': '2832821',
+                'ext': 'mp4',
                  'title': 'Star Wars Teaser',
                  'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
+                'timestamp': 1417852800,
+                'upload_date': '20141206',
+                'uploader': 'NBCU-COM',
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
              },
              'skip': 'Only works from US',
          },
          {
              # This video has expired but with an escaped embedURL
              'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
-            'skip': 'Expired'
+            'only_matching': True,
          }
      ]
  
@@ -62,12 +76,18 @@ class NBCIE(InfoExtractor):
          theplatform_url = unescapeHTML(lowercase_escape(self._html_search_regex(
              [
                  r'(?:class="video-player video-player-full" data-mpx-url|class="player" src)="(.*?)"',
+                r'<iframe[^>]+src="((?:https?:)?//player\.theplatform\.com/[^"]+)"',
                  r'"embedURL"\s*:\s*"([^"]+)"'
              ],
              webpage, 'theplatform url').replace('_no_endcard', '').replace('\\/', '/')))
          if theplatform_url.startswith('//'):
              theplatform_url = 'http:' + theplatform_url
-        return self.url_result(theplatform_url)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': smuggle_url(theplatform_url, {'source_url': url}),
+            'id': video_id,
+        }
  
  
  class NBCSportsVPlayerIE(InfoExtractor):
@@ -80,6 +100,9 @@ class NBCSportsVPlayerIE(InfoExtractor):
              'ext': 'flv',
              'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
              'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
+            'timestamp': 1426270238,
+            'upload_date': '20150313',
+            'uploader': 'NBCU-SPORTS',
          }
      }, {
          'url': 'http://vplayer.nbcsports.com/p/BxmELC/nbc_embedshare/select/_hqLjQ95yx8Z',
@@ -101,8 +124,8 @@ class NBCSportsVPlayerIE(InfoExtractor):
  
  
  class NBCSportsIE(InfoExtractor):
-    # Does not include https becuase its certificate is invalid
-    _VALID_URL = r'http://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
+    # Does not include https because its certificate is invalid
+    _VALID_URL = r'https?://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
  
      _TEST = {
          'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
@@ -111,6 +134,9 @@ class NBCSportsIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
              'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
+            'uploader': 'NBCU-SPORTS',
+            'upload_date': '20150330',
+            'timestamp': 1427726529,
          }
      }
  
@@ -121,10 +147,37 @@ class NBCSportsIE(InfoExtractor):
              NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer')
  
  
-class NBCNewsIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
+class CSNNEIE(InfoExtractor):
+    _VALID_URL = r'https?://www\.csnne\.com/video/(?P<id>[0-9a-z-]+)'
+
+    _TEST = {
+        'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
+        'info_dict': {
+            'id': 'yvBLLUgQ8WU0',
+            'ext': 'mp4',
+            'title': 'SNC evening update: Wright named Red Sox\' No. 5 starter.',
+            'description': 'md5:1753cfee40d9352b19b4c9b3e589b9e3',
+            'timestamp': 1459369979,
+            'upload_date': '20160330',
+            'uploader': 'NBCU-SPORTS',
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
+            'url': self._html_search_meta('twitter:player:stream', webpage),
+            'display_id': display_id,
+        }
+
+
+class NBCNewsIE(ThePlatformIE):
+    _VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/
          (?:video/.+?/(?P<id>\d+)|
-        (?:feature|nightly-news)/[^/]+/(?P<title>.+))
+        ([^/]+/)*(?P<display_id>[^/?]+))
          '''
  
      _TESTS = [
@@ -139,15 +192,14 @@ class NBCNewsIE(InfoExtractor):
              },
          },
          {
-            'url': 'http://www.nbcnews.com/feature/edward-snowden-interview/how-twitter-reacted-snowden-interview-n117236',
-            'md5': 'b2421750c9f260783721d898f4c42063',
+            'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
+            'md5': 'af1adfa51312291a017720403826bb64',
              'info_dict': {
-                'id': 'I1wpAI_zmhsQ',
+                'id': '269389891880',
                  'ext': 'mp4',
                  'title': 'How Twitter Reacted To The Snowden Interview',
                  'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
              },
-            'add_ie': ['ThePlatform'],
          },
          {
              'url': 'http://www.nbcnews.com/feature/dateline-full-episodes/full-episode-family-business-n285156',
@@ -158,17 +210,45 @@ class NBCNewsIE(InfoExtractor):
                  'title': 'FULL EPISODE: Family Business',
                  'description': 'md5:757988edbaae9d7be1d585eb5d55cc04',
              },
+            'skip': 'This page is unavailable.',
          },
          {
              'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
-            'md5': 'b5dda8cddd8650baa0dcb616dd2cf60d',
+            'md5': '73135a2e0ef819107bbb55a5a9b2a802',
              'info_dict': {
-                'id': 'sekXqyTVnmN3',
+                'id': '394064451844',
                  'ext': 'mp4',
                  'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
                  'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
              },
          },
+        {
+            'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
+            'md5': 'a49e173825e5fcd15c13fc297fced39d',
+            'info_dict': {
+                'id': '529953347624',
+                'ext': 'mp4',
+                'title': 'Volkswagen U.S. Chief: We \'Totally Screwed Up\'',
+                'description': 'md5:d22d1281a24f22ea0880741bb4dd6301',
+            },
+            'expected_warnings': ['http-6000 is not available']
+        },
+        {
+            'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
+            'md5': '118d7ca3f0bea6534f119c68ef539f71',
+            'info_dict': {
+                'id': '669831235788',
+                'ext': 'mp4',
+                'title': 'See the aurora borealis from space in stunning new NASA video',
+                'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
+                'upload_date': '20160420',
+                'timestamp': 1461152093,
+            },
+        },
+        {
+            'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
@@ -183,52 +263,111 @@ class NBCNewsIE(InfoExtractor):
                  'title': info.find('headline').text,
                  'ext': 'flv',
                  'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
-                'description': compat_str(info.find('caption').text),
+                'description': info.find('caption').text,
                  'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
              }
          else:
              # "feature" and "nightly-news" pages use theplatform.com
-            title = mobj.group('title')
-            webpage = self._download_webpage(url, title)
+            display_id = mobj.group('display_id')
+            webpage = self._download_webpage(url, display_id)
+            info = None
              bootstrap_json = self._search_regex(
-                r'var\s+(?:bootstrapJson|playlistData)\s*=\s*({.+});?\s*$',
-                webpage, 'bootstrap json', flags=re.MULTILINE)
-            bootstrap = self._parse_json(bootstrap_json, video_id)
-            info = bootstrap['results'][0]['video']
-            mpxid = info['mpxId']
-
-            base_urls = [
-                info['fallbackPlaylistUrl'],
-                info['associatedPlaylistUrl'],
-            ]
-
-            for base_url in base_urls:
-                if not base_url:
-                    continue
-                playlist_url = base_url + '?form=MPXNBCNewsAPI'
+                r'(?m)var\s+(?:bootstrapJson|playlistData)\s*=\s*({.+});?\s*$',
+                webpage, 'bootstrap json', default=None)
+            if bootstrap_json:
+                bootstrap = self._parse_json(bootstrap_json, display_id)
+                info = bootstrap['results'][0]['video']
+            else:
+                player_instance_json = self._search_regex(
+                    r'videoObj\s*:\s*({.+})', webpage, 'player instance', default=None)
+                if not player_instance_json:
+                    player_instance_json = self._html_search_regex(
+                        r'data-video="([^"]+)"', webpage, 'video json')
+                info = self._parse_json(player_instance_json, display_id)
+            video_id = info['mpxId']
+            title = info['title']
  
-                try:
-                    all_videos = self._download_json(playlist_url, title)
-                except ExtractorError as ee:
-                    if isinstance(ee.cause, compat_HTTPError):
-                        continue
-                    raise
+            subtitles = {}
+            caption_links = info.get('captionLinks')
+            if caption_links:
+                for (sub_key, sub_ext) in (('smpte-tt', 'ttml'), ('web-vtt', 'vtt'), ('srt', 'srt')):
+                    sub_url = caption_links.get(sub_key)
+                    if sub_url:
+                        subtitles.setdefault('en', []).append({
+                            'url': sub_url,
+                            'ext': sub_ext,
+                        })
  
-                if not all_videos or 'videos' not in all_videos:
+            formats = []
+            for video_asset in info['videoAssets']:
+                video_url = video_asset.get('publicUrl')
+                if not video_url:
                      continue
-
-                try:
-                    info = next(v for v in all_videos['videos'] if v['mpxId'] == mpxid)
-                    break
-                except StopIteration:
+                container = video_asset.get('format')
+                asset_type = video_asset.get('assetType') or ''
+                if container == 'ISM' or asset_type == 'FireTV-Once':
                      continue
-
-            if info is None:
-                raise ExtractorError('Could not find video in playlists')
+                elif asset_type == 'OnceURL':
+                    tp_formats, tp_subtitles = self._extract_theplatform_smil(
+                        video_url, video_id)
+                    formats.extend(tp_formats)
+                    subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+                else:
+                    tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000)
+                    format_id = 'http%s' % ('-%d' % tbr if tbr else '')
+                    video_url = update_url_query(
+                        video_url, {'format': 'redirect'})
+                    # resolve the url so that we can check availability and detect the correct extension
+                    head = self._request_webpage(
+                        HEADRequest(video_url), video_id,
+                        'Checking %s url' % format_id,
+                        '%s is not available' % format_id,
+                        fatal=False)
+                    if head:
+                        video_url = head.geturl()
+                        formats.append({
+                            'format_id': format_id,
+                            'url': video_url,
+                            'width': int_or_none(video_asset.get('width')),
+                            'height': int_or_none(video_asset.get('height')),
+                            'tbr': tbr,
+                            'container': video_asset.get('format'),
+                        })
+            self._sort_formats(formats)
  
              return {
-                '_type': 'url',
-                # We get the best quality video
-                'url': info['videoAssets'][-1]['publicUrl'],
-                'ie_key': 'ThePlatform',
+                'id': video_id,
+                'title': title,
+                'description': info.get('description'),
+                'thumbnail': info.get('thumbnail'),
+                'duration': int_or_none(info.get('duration')),
+                'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
+                'formats': formats,
+                'subtitles': subtitles,
              }
+
+
+class MSNBCIE(InfoExtractor):
+    # https URLs redirect to corresponding http ones
+    _VALID_URL = r'https?://www\.msnbc\.com/[^/]+/watch/(?P<id>[^/]+)'
+    _TEST = {
+        'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
+        'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
+        'info_dict': {
+            'id': 'n_hayes_Aimm_140801_272214',
+            'ext': 'mp4',
+            'title': 'The chaotic GOP immigration vote',
+            'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1406937606,
+            'upload_date': '20140802',
+            'uploader': 'NBCU-NEWS',
+            'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        embed_url = self._html_search_meta('embedURL', webpage)
+        return self.url_result(embed_url)
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index 79a13958b05e25a1c9e586168bb3a10742fbe01f..0cded6b5c3d0bbcb095de8672de70fa81b9f7fd1 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -1,130 +1,387 @@
-# encoding: utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
  from ..utils import (
-    ExtractorError,
+    determine_ext,
      int_or_none,
+    parse_iso8601,
      qualities,
-    parse_duration,
  )
  
  
  class NDRBaseIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        display_id = next(group for group in mobj.groups() if group)
+        webpage = self._download_webpage(url, display_id)
+        return self._extract_embed(webpage, display_id)
  
-        page = self._download_webpage(url, video_id, 'Downloading page')
  
-        title = self._og_search_title(page).strip()
-        description = self._og_search_description(page)
-        if description:
-            description = description.strip()
+class NDRIE(NDRBaseIE):
+    IE_NAME = 'ndr'
+    IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
+    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
+    _TESTS = [{
+        # httpVideo, same content id
+        'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
+        'md5': '6515bc255dc5c5f8c85bbc38e035a659',
+        'info_dict': {
+            'id': 'hafengeburtstag988',
+            'display_id': 'Party-Poette-und-Parade',
+            'ext': 'mp4',
+            'title': 'Party, Pötte und Parade',
+            'description': 'md5:ad14f9d2f91d3040b6930c697e5f6b4c',
+            'uploader': 'ndrtv',
+            'timestamp': 1431108900,
+            'upload_date': '20150510',
+            'duration': 3498,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # httpVideo, different content id
+        'url': 'http://www.ndr.de/sport/fussball/40-Osnabrueck-spielt-sich-in-einen-Rausch,osna270.html',
+        'md5': '1043ff203eab307f0c51702ec49e9a71',
+        'info_dict': {
+            'id': 'osna272',
+            'display_id': '40-Osnabrueck-spielt-sich-in-einen-Rausch',
+            'ext': 'mp4',
+            'title': 'Osnabrück - Wehen Wiesbaden: Die Highlights',
+            'description': 'md5:32e9b800b3d2d4008103752682d5dc01',
+            'uploader': 'ndrtv',
+            'timestamp': 1442059200,
+            'upload_date': '20150912',
+            'duration': 510,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # httpAudio, same content id
+        'url': 'http://www.ndr.de/info/La-Valette-entgeht-der-Hinrichtung,audio51535.html',
+        'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
+        'info_dict': {
+            'id': 'audio51535',
+            'display_id': 'La-Valette-entgeht-der-Hinrichtung',
+            'ext': 'mp3',
+            'title': 'La Valette entgeht der Hinrichtung',
+            'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
+            'uploader': 'ndrinfo',
+            'timestamp': 1290626100,
+            'upload_date': '20140729',
+            'duration': 884,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.ndr.de/Fettes-Brot-Ferris-MC-und-Thees-Uhlmann-live-on-stage,festivalsommer116.html',
+        'only_matching': True,
+    }]
+
+    def _extract_embed(self, webpage, display_id):
+        embed_url = self._html_search_meta(
+            'embedURL', webpage, 'embed URL', fatal=True)
+        description = self._search_regex(
+            r'<p[^>]+itemprop="description">([^<]+)</p>',
+            webpage, 'description', default=None) or self._og_search_description(webpage)
+        timestamp = parse_iso8601(
+            self._search_regex(
+                r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
+                webpage, 'upload date', fatal=False))
+        return {
+            '_type': 'url_transparent',
+            'url': embed_url,
+            'display_id': display_id,
+            'description': description,
+            'timestamp': timestamp,
+        }
  
-        duration = int_or_none(self._html_search_regex(r'duration: (\d+),\n', page, 'duration', default=None))
-        if not duration:
-            duration = parse_duration(self._html_search_regex(
-                r'(<span class="min">\d+</span>:<span class="sec">\d+</span>)',
-                page, 'duration', default=None))
  
-        formats = []
+class NJoyIE(NDRBaseIE):
+    IE_NAME = 'njoy'
+    IE_DESC = 'N-JOY'
+    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
+    _TESTS = [{
+        # httpVideo, same content id
+        'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
+        'md5': 'cb63be60cd6f9dd75218803146d8dc67',
+        'info_dict': {
+            'id': 'comedycontest2480',
+            'display_id': 'Benaissa-beim-NDR-Comedy-Contest',
+            'ext': 'mp4',
+            'title': 'Benaissa beim NDR Comedy Contest',
+            'description': 'md5:f057a6c4e1c728b10d33b5ffd36ddc39',
+            'uploader': 'ndrtv',
+            'upload_date': '20141129',
+            'duration': 654,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # httpVideo, different content id
+        'url': 'http://www.n-joy.de/musik/Das-frueheste-DJ-Set-des-Nordens-live-mit-Felix-Jaehn-,felixjaehn168.html',
+        'md5': '417660fffa90e6df2fda19f1b40a64d8',
+        'info_dict': {
+            'id': 'dockville882',
+            'display_id': 'Das-frueheste-DJ-Set-des-Nordens-live-mit-Felix-Jaehn-',
+            'ext': 'mp4',
+            'title': '"Ich hab noch nie" mit Felix Jaehn',
+            'description': 'md5:85dd312d53be1b99e1f998a16452a2f3',
+            'uploader': 'njoy',
+            'upload_date': '20150822',
+            'duration': 211,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.n-joy.de/radio/webradio/morningshow209.html',
+        'only_matching': True,
+    }]
+
+    def _extract_embed(self, webpage, display_id):
+        video_id = self._search_regex(
+            r'<iframe[^>]+id="pp_([\da-z]+)"', webpage, 'embed id')
+        description = self._search_regex(
+            r'<div[^>]+class="subline"[^>]*>[^<]+</div>\s*<p>([^<]+)</p>',
+            webpage, 'description', fatal=False)
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'NDREmbedBase',
+            'url': 'ndr:%s' % video_id,
+            'display_id': display_id,
+            'description': description,
+        }
+
+
+class NDREmbedBaseIE(InfoExtractor):
+    IE_NAME = 'ndr:embed:base'
+    _VALID_URL = r'(?:ndr:(?P<id_s>[\da-z]+)|https?://www\.ndr\.de/(?P<id>[\da-z]+)-ppjson\.json)'
+    _TESTS = [{
+        'url': 'ndr:soundcheck3366',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/soundcheck3366-ppjson.json',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id') or mobj.group('id_s')
+
+        ppjson = self._download_json(
+            'http://www.ndr.de/%s-ppjson.json' % video_id, video_id)
  
-        mp3_url = re.search(r'''\{src:'(?P<audio>[^']+)', type:"audio/mp3"},''', page)
-        if mp3_url:
-            formats.append({
-                'url': mp3_url.group('audio'),
-                'format_id': 'mp3',
-            })
+        playlist = ppjson['playlist']
  
-        thumbnail = None
+        formats = []
+        quality_key = qualities(('xs', 's', 'm', 'l', 'xl'))
  
-        video_url = re.search(r'''3: \{src:'(?P<video>.+?)\.(lo|hi|hq)\.mp4', type:"video/mp4"},''', page)
-        if video_url:
-            thumbnails = re.findall(r'''\d+: \{src: "([^"]+)"(?: \|\| '[^']+')?, quality: '([^']+)'}''', page)
-            if thumbnails:
-                quality_key = qualities(['xs', 's', 'm', 'l', 'xl'])
-                largest = max(thumbnails, key=lambda thumb: quality_key(thumb[1]))
-                thumbnail = 'http://www.ndr.de' + largest[0]
+        for format_id, f in playlist.items():
+            src = f.get('src')
+            if not src:
+                continue
+            ext = determine_ext(src, None)
+            if ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
+            else:
+                quality = f.get('quality')
+                ff = {
+                    'url': src,
+                    'format_id': quality or format_id,
+                    'quality': quality_key(quality),
+                }
+                type_ = f.get('type')
+                if type_ and type_.split('/')[0] == 'audio':
+                    ff['vcodec'] = 'none'
+                    ff['ext'] = ext or 'mp3'
+                formats.append(ff)
+        self._sort_formats(formats)
  
-            for format_id in 'lo', 'hi', 'hq':
-                formats.append({
-                    'url': '%s.%s.mp4' % (video_url.group('video'), format_id),
-                    'format_id': format_id,
-                })
+        config = playlist['config']
  
-        if not formats:
-            raise ExtractorError('No media links available for %s' % video_id)
+        live = playlist.get('config', {}).get('streamType') in ['httpVideoLive', 'httpAudioLive']
+        title = config['title']
+        if live:
+            title = self._live_title(title)
+        uploader = ppjson.get('config', {}).get('branding')
+        upload_date = ppjson.get('config', {}).get('publicationDate')
+        duration = int_or_none(config.get('duration'))
+
+        thumbnails = [{
+            'id': thumbnail.get('quality') or thumbnail_id,
+            'url': thumbnail['src'],
+            'preference': quality_key(thumbnail.get('quality')),
+        } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
  
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
+            'is_live': live,
+            'uploader': uploader if uploader != '-' else None,
+            'upload_date': upload_date[0:8] if upload_date else None,
              'duration': duration,
+            'thumbnails': thumbnails,
              'formats': formats,
          }
  
  
-class NDRIE(NDRBaseIE):
-    IE_NAME = 'ndr'
-    IE_DESC = 'NDR.de - Mediathek'
-    _VALID_URL = r'https?://www\.ndr\.de/.+?(?P<id>\d+)\.html'
-
-    _TESTS = [
-        {
-            'url': 'http://www.ndr.de/fernsehen/sendungen/nordmagazin/Kartoffeltage-in-der-Lewitz,nordmagazin25866.html',
-            'md5': '5bc5f5b92c82c0f8b26cddca34f8bb2c',
-            'note': 'Video file',
-            'info_dict': {
-                'id': '25866',
-                'ext': 'mp4',
-                'title': 'Kartoffeltage in der Lewitz',
-                'description': 'md5:48c4c04dde604c8a9971b3d4e3b9eaa8',
-                'duration': 166,
-            },
-            'skip': '404 Not found',
-        },
-        {
-            'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
-            'md5': 'dadc003c55ae12a5d2f6bd436cd73f59',
-            'info_dict': {
-                'id': '988',
-                'ext': 'mp4',
-                'title': 'Party, Pötte und Parade',
-                'description': 'Hunderttausende feiern zwischen Speicherstadt und St. Pauli den 826. Hafengeburtstag. Die NDR Sondersendung zeigt die schönsten und spektakulärsten Bilder vom Auftakt.',
-                'duration': 3498,
-            },
-        },
-        {
-            'url': 'http://www.ndr.de/info/audio51535.html',
-            'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
-            'note': 'Audio file',
-            'info_dict': {
-                'id': '51535',
-                'ext': 'mp3',
-                'title': 'La Valette entgeht der Hinrichtung',
-                'description': 'md5:22f9541913a40fe50091d5cdd7c9f536',
-                'duration': 884,
-            }
-        }
-    ]
-
+class NDREmbedIE(NDREmbedBaseIE):
+    IE_NAME = 'ndr:embed'
+    _VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
+    _TESTS = [{
+        'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
+        'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
+        'info_dict': {
+            'id': 'ndraktuell28488',
+            'ext': 'mp4',
+            'title': 'Norddeutschland begrüßt Flüchtlinge',
+            'is_live': False,
+            'uploader': 'ndrtv',
+            'upload_date': '20150907',
+            'duration': 132,
+        },
+    }, {
+        'url': 'http://www.ndr.de/ndr2/events/soundcheck/soundcheck3366-player.html',
+        'md5': '002085c44bae38802d94ae5802a36e78',
+        'info_dict': {
+            'id': 'soundcheck3366',
+            'ext': 'mp4',
+            'title': 'Ella Henderson braucht Vergleiche nicht zu scheuen',
+            'is_live': False,
+            'uploader': 'ndr2',
+            'upload_date': '20150912',
+            'duration': 3554,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.ndr.de/info/audio51535-player.html',
+        'md5': 'bb3cd38e24fbcc866d13b50ca59307b8',
+        'info_dict': {
+            'id': 'audio51535',
+            'ext': 'mp3',
+            'title': 'La Valette entgeht der Hinrichtung',
+            'is_live': False,
+            'uploader': 'ndrinfo',
+            'upload_date': '20140729',
+            'duration': 884,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.ndr.de/fernsehen/sendungen/visite/visite11010-externalPlayer.html',
+        'md5': 'ae57f80511c1e1f2fd0d0d3d31aeae7c',
+        'info_dict': {
+            'id': 'visite11010',
+            'ext': 'mp4',
+            'title': 'Visite - die ganze Sendung',
+            'is_live': False,
+            'uploader': 'ndrtv',
+            'upload_date': '20150902',
+            'duration': 3525,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # httpVideoLive
+        'url': 'http://www.ndr.de/fernsehen/livestream/livestream217-externalPlayer.html',
+        'info_dict': {
+            'id': 'livestream217',
+            'ext': 'flv',
+            'title': 're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'is_live': True,
+            'upload_date': '20150910',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.ndr.de/ndrkultur/audio255020-player.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/fernsehen/sendungen/nordtour/nordtour7124-player.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/kultur/film/videos/videoimport10424-player.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/fernsehen/sendungen/hamburg_journal/hamj43006-player.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/fernsehen/sendungen/weltbilder/weltbilder4518-player.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ndr.de/fernsehen/doku952-player.html',
+        'only_matching': True,
+    }]
  
-class NJoyIE(NDRBaseIE):
-    IE_NAME = 'N-JOY'
-    _VALID_URL = r'https?://www\.n-joy\.de/.+?(?P<id>\d+)\.html'
  
-    _TEST = {
-        'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
-        'md5': 'cb63be60cd6f9dd75218803146d8dc67',
+class NJoyEmbedIE(NDREmbedBaseIE):
+    IE_NAME = 'njoy:embed'
+    _VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
+    _TESTS = [{
+        # httpVideo
+        'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',
+        'md5': '8483cbfe2320bd4d28a349d62d88bd74',
          'info_dict': {
-            'id': '2480',
+            'id': 'doku948',
              'ext': 'mp4',
-            'title': 'Benaissa beim NDR Comedy Contest',
-            'description': 'Von seinem sehr "behaarten" Leben lässt sich Benaissa trotz aller Schwierigkeiten nicht unterkriegen.',
-            'duration': 654,
-        }
-    }
+            'title': 'Zehn Jahre Reeperbahn Festival - die Doku',
+            'is_live': False,
+            'upload_date': '20150807',
+            'duration': 1011,
+        },
+    }, {
+        # httpAudio
+        'url': 'http://www.n-joy.de/news_wissen/stefanrichter100-player_image-d5e938b1-f21a-4b9a-86b8-aaba8bca3a13_theme-n-joy.html',
+        'md5': 'd989f80f28ac954430f7b8a48197188a',
+        'info_dict': {
+            'id': 'stefanrichter100',
+            'ext': 'mp3',
+            'title': 'Interview mit einem Augenzeugen',
+            'is_live': False,
+            'uploader': 'njoy',
+            'upload_date': '20150909',
+            'duration': 140,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # httpAudioLive, no explicit ext
+        'url': 'http://www.n-joy.de/news_wissen/webradioweltweit100-player_image-3fec0484-2244-4565-8fb8-ed25fd28b173_theme-n-joy.html',
+        'info_dict': {
+            'id': 'webradioweltweit100',
+            'ext': 'mp3',
+            'title': 're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'is_live': True,
+            'uploader': 'njoy',
+            'upload_date': '20150810',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.n-joy.de/musik/dockville882-player_image-3905259e-0803-4764-ac72-8b7de077d80a_theme-n-joy.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.n-joy.de/radio/sendungen/morningshow/urlaubsfotos190-player_image-066a5df1-5c95-49ec-a323-941d848718db_theme-n-joy.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.n-joy.de/entertainment/comedy/krudetv290-player_image-ab261bfe-51bf-4bf3-87ba-c5122ee35b3d_theme-n-joy.html',
+        'only_matching': True,
+    }]
diff --git a/youtube_dl/extractor/nerdcubed.py b/youtube_dl/extractor/nerdcubed.py

index dff78e4862390e4e6468a34d804001d2156221a7..9feccc6723395db129b0d79d2ad5035493b12fda 100644 (file)
--- a/youtube_dl/extractor/nerdcubed.py
+++ b/youtube_dl/extractor/nerdcubed.py
@@ -18,14 +18,14 @@ class NerdCubedFeedIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        feed = self._download_json(url, url, "Downloading NerdCubed JSON feed")
+        feed = self._download_json(url, url, 'Downloading NerdCubed JSON feed')
  
          entries = [{
              '_type': 'url',
              'title': feed_entry['title'],
              'uploader': feed_entry['source']['name'] if feed_entry['source'] else None,
              'upload_date': datetime.datetime.strptime(feed_entry['date'], '%Y-%m-%d').strftime('%Y%m%d'),
-            'url': "http://www.youtube.com/watch?v=" + feed_entry['youtube_id'],
+            'url': 'http://www.youtube.com/watch?v=' + feed_entry['youtube_id'],
          } for feed_entry in feed]
  
          return {
diff --git a/youtube_dl/extractor/nerdist.py b/youtube_dl/extractor/nerdist.py

deleted file mode 100644 (file)

index c6dc34b..0000000
--- a/youtube_dl/extractor/nerdist.py
+++ /dev/null
@@ -1,80 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-
-from ..utils import (
-    determine_ext,
-    parse_iso8601,
-    xpath_text,
-)
-
-
-class NerdistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
-        'md5': '3698ed582931b90d9e81e02e26e89f23',
-        'info_dict': {
-            'display_id': 'exclusive-which-dc-characters-w',
-            'id': 'RPHpvJyr',
-            'ext': 'mp4',
-            'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
-            'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
-            'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
-            'uploader': 'Eric Diaz',
-            'upload_date': '20150202',
-            'timestamp': 1422892808,
-        }
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'''(?x)<script\s+(?:type="text/javascript"\s+)?
-                src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
-            webpage, 'video ID')
-        timestamp = parse_iso8601(self._html_search_meta(
-            'shareaholic:article_published_time', webpage, 'upload date'))
-        uploader = self._html_search_meta(
-            'shareaholic:article_author_name', webpage, 'article author')
-
-        doc = self._download_xml(
-            'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
-        video_info = doc.find('.//item')
-        title = xpath_text(video_info, './title', fatal=True)
-        description = xpath_text(video_info, './description')
-        thumbnail = xpath_text(
-            video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
-
-        formats = []
-        for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
-            vurl = source.attrib['file']
-            ext = determine_ext(vurl)
-            if ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
-                    vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
-                    preference=0))
-            elif ext == 'smil':
-                formats.extend(self._extract_smil_formats(
-                    vurl, video_id, fatal=False
-                ))
-            else:
-                formats.append({
-                    'format_id': ext,
-                    'url': vurl,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'formats': formats,
-            'uploader': uploader,
-        }
diff --git a/youtube_dl/extractor/neteasemusic.py b/youtube_dl/extractor/neteasemusic.py

index a8e0a64ed4933644965fd07c3eb3216fc532c915..978a05841ce68161330f9db24169dd330e51efc1 100644 (file)
--- a/youtube_dl/extractor/neteasemusic.py
+++ b/youtube_dl/extractor/neteasemusic.py
@@ -8,11 +8,14 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_str,
      compat_itertools_count,
  )
+from ..utils import (
+    sanitized_Request,
+    float_or_none,
+)
  
  
  class NetEaseMusicBaseIE(InfoExtractor):
@@ -32,23 +35,32 @@ class NetEaseMusicBaseIE(InfoExtractor):
          result = b64encode(m.digest()).decode('ascii')
          return result.replace('/', '_').replace('+', '-')
  
-    @classmethod
-    def extract_formats(cls, info):
+    def extract_formats(self, info):
          formats = []
-        for song_format in cls._FORMATS:
+        for song_format in self._FORMATS:
              details = info.get(song_format)
              if not details:
                  continue
-            formats.append({
-                'url': 'http://m1.music.126.net/%s/%s.%s' %
-                       (cls._encrypt(details['dfsId']), details['dfsId'],
-                        details['extension']),
-                'ext': details.get('extension'),
-                'abr': details.get('bitrate', 0) / 1000,
-                'format_id': song_format,
-                'filesize': details.get('size'),
-                'asr': details.get('sr')
-            })
+            song_file_path = '/%s/%s.%s' % (
+                self._encrypt(details['dfsId']), details['dfsId'], details['extension'])
+
+            # 203.130.59.9, 124.40.233.182, 115.231.74.139, etc is a reverse proxy-like feature
+            # from NetEase's CDN provider that can be used if m5.music.126.net does not
+            # work, especially for users outside of Mainland China
+            # via: https://github.com/JixunMoe/unblock-163/issues/3#issuecomment-163115880
+            for host in ('http://m5.music.126.net', 'http://115.231.74.139/m1.music.126.net',
+                         'http://124.40.233.182/m1.music.126.net', 'http://203.130.59.9/m1.music.126.net'):
+                song_url = host + song_file_path
+                if self._is_valid_url(song_url, info['id'], 'song'):
+                    formats.append({
+                        'url': song_url,
+                        'ext': details.get('extension'),
+                        'abr': float_or_none(details.get('bitrate'), scale=1000),
+                        'format_id': song_format,
+                        'filesize': details.get('size'),
+                        'asr': details.get('sr')
+                    })
+                    break
          return formats
  
      @classmethod
@@ -56,7 +68,7 @@ class NetEaseMusicBaseIE(InfoExtractor):
          return int(round(ms / 1000.0))
  
      def query_api(self, endpoint, video_id, note):
-        req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint))
+        req = sanitized_Request('%s%s' % (self._API_BASE, endpoint))
          req.add_header('Referer', self._API_BASE)
          return self._download_json(req, video_id, note)
  
@@ -77,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'timestamp': 1431878400,
              'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'No lyrics translation.',
          'url': 'http://music.163.com/#/song?id=29822014',
@@ -89,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'timestamp': 1419523200,
              'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'No lyrics.',
          'url': 'http://music.163.com/song?id=17241424',
@@ -100,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'upload_date': '20080211',
              'timestamp': 1202745600,
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Has translated name.',
          'url': 'http://music.163.com/#/song?id=22735043',
@@ -112,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'upload_date': '20100127',
              'timestamp': 1264608000,
              'alt_title': '说出愿望吧(Genie)',
-        }
+        },
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _process_lyrics(self, lyrics_info):
@@ -141,7 +157,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
              'ids': '[%s]' % song_id
          }
          info = self.query_api(
-            'song/detail?' + compat_urllib_parse.urlencode(params),
+            'song/detail?' + compat_urllib_parse_urlencode(params),
              song_id, 'Downloading song info')['songs'][0]
  
          formats = self.extract_formats(info)
@@ -180,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
              'title': 'B\'day',
          },
          'playlist_count': 23,
+        'skip': 'Blocked outside Mainland China',
      }
  
      def _real_extract(self, url):
@@ -211,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
              'title': '张惠妹 - aMEI;阿密特',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Singer has translated name.',
          'url': 'http://music.163.com/#/artist?id=124098',
@@ -219,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
              'title': '李昇基 - 이승기',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -254,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
              'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
          },
          'playlist_count': 99,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'Toplist/Charts sample',
          'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@@ -263,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
              'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
          },
          'playlist_count': 50,
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -302,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
              'creator': '白雅言',
              'upload_date': '20150520',
          },
+        'skip': 'Blocked outside Mainland China',
      }
  
      def _real_extract(self, url):
@@ -345,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
              'upload_date': '20150613',
              'duration': 900,
          },
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'This program has accompanying songs.',
          'url': 'http://music.163.com/#/program?id=10141022',
@@ -354,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
              'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
          },
          'playlist_count': 4,
+        'skip': 'Blocked outside Mainland China',
      }, {
          'note': 'This program has accompanying songs.',
          'url': 'http://music.163.com/#/program?id=10141022',
@@ -367,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
          },
          'params': {
              'noplaylist': True
-        }
+        },
+        'skip': 'Blocked outside Mainland China',
      }]
  
      def _real_extract(self, url):
@@ -426,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
              'description': 'md5:766220985cbd16fdd552f64c578a6b15'
          },
          'playlist_mincount': 40,
+        'skip': 'Blocked outside Mainland China',
      }
      _PAGE_SIZE = 1000
  
diff --git a/youtube_dl/extractor/newgrounds.py b/youtube_dl/extractor/newgrounds.py

index cd117b04edeff88d90842f2ed8e15a8c43bde714..7059403239ce19ac8b2861fa4af0dde93c98467b 100644 (file)
--- a/youtube_dl/extractor/newgrounds.py
+++ b/youtube_dl/extractor/newgrounds.py
@@ -7,8 +7,8 @@ from .common import InfoExtractor
  
  
  class NewgroundsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'http://www.newgrounds.com/audio/listen/549479',
          'md5': 'fe6033d297591288fa1c1f780386f07a',
          'info_dict': {
@@ -17,7 +17,16 @@ class NewgroundsIE(InfoExtractor):
              'title': 'B7 - BusMode',
              'uploader': 'Burn7',
          }
-    }
+    }, {
+        'url': 'http://www.newgrounds.com/portal/view/673111',
+        'md5': '3394735822aab2478c31b1004fe5e5bc',
+        'info_dict': {
+            'id': '673111',
+            'ext': 'mp4',
+            'title': 'Dancin',
+            'uploader': 'Squirrelman82',
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -25,9 +34,11 @@ class NewgroundsIE(InfoExtractor):
          webpage = self._download_webpage(url, music_id)
  
          title = self._html_search_regex(
-            r',"name":"([^"]+)",', webpage, 'music title')
+            r'<title>([^>]+)</title>', webpage, 'title')
+
          uploader = self._html_search_regex(
-            r',"artist":"([^"]+)",', webpage, 'music uploader')
+            [r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
+            webpage, 'uploader')
  
          music_url_json_string = self._html_search_regex(
              r'({"url":"[^"]+"),', webpage, 'music url') + '}'
diff --git a/youtube_dl/extractor/newstube.py b/youtube_dl/extractor/newstube.py

index 5a9e73cd66a1b1224bdec848722f5e9d14f65c38..0092b85ceaa27e9b190a05ea1d4dff299351b6c7 100644 (file)
--- a/youtube_dl/extractor/newstube.py
+++ b/youtube_dl/extractor/newstube.py
@@ -4,24 +4,24 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
  
  
  class NewstubeIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
      _TEST = {
          'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
+        'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
          'info_dict': {
              'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Телеканал CNN переместил город Славянск в Крым',
              'description': 'md5:419a8c9f03442bc0b0a794d689360335',
              'duration': 31.05,
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
      }
  
      def _real_extract(self, url):
@@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
              server = media_location.find(ns('./Server')).text
              app = media_location.find(ns('./App')).text
              media_id = stream_info.find(ns('./Id')).text
-            quality_id = stream_info.find(ns('./QualityId')).text
              name = stream_info.find(ns('./Name')).text
              width = int(stream_info.find(ns('./Width')).text)
              height = int(stream_info.find(ns('./Height')).text)
@@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
                  'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
                  'page_url': url,
                  'ext': 'flv',
-                'format_id': quality_id,
-                'format_note': name,
+                'format_id': 'rtmp' + ('-%s' % name if name else ''),
                  'width': width,
                  'height': height,
              })
  
+        sources_data = self._download_json(
+            'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
+            video_guid, fatal=False)
+        if sources_data:
+            for source in sources_data.get('Sources', []):
+                source_url = source.get('Src')
+                if not source_url:
+                    continue
+                height = int_or_none(source.get('Height'))
+                f = {
+                    'format_id': 'http' + ('-%dp' % height if height else ''),
+                    'url': source_url,
+                    'width': int_or_none(source.get('Width')),
+                    'height': height,
+                }
+                source_type = source.get('Type')
+                if source_type:
+                    mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
+                    if mobj:
+                        vcodec, acodec = mobj.groups()
+                        f.update({
+                            'vcodec': vcodec,
+                            'acodec': acodec,
+                        })
+                formats.append(f)
+
+        self._check_formats(formats, video_guid)
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/nextmedia.py b/youtube_dl/extractor/nextmedia.py

index c10784f6b7321395e69c86ab06f91f8d3b37b655..aae7aeeebb8e2adebd2669bcd899caec3432275d 100644 (file)
--- a/youtube_dl/extractor/nextmedia.py
+++ b/youtube_dl/extractor/nextmedia.py
@@ -7,7 +7,7 @@ from ..utils import parse_iso8601
  
  class NextMediaIE(InfoExtractor):
      IE_DESC = '蘋果日報'
-    _VALID_URL = r'http://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
+    _VALID_URL = r'https?://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
          'md5': 'dff9fad7009311c421176d1ac90bfe4f',
@@ -68,7 +68,7 @@ class NextMediaIE(InfoExtractor):
  
  class NextMediaActionNewsIE(NextMediaIE):
      IE_DESC = '蘋果日報 - 動新聞'
-    _VALID_URL = r'http://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
+    _VALID_URL = r'https?://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
      _TESTS = [{
          'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
          'md5': '05fce8ffeed7a5e00665d4b7cf0f9201',
@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
  
  class AppleDailyIE(NextMediaIE):
      IE_DESC = '臺灣蘋果日報'
-    _VALID_URL = r'http://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
+    _VALID_URL = r'https?://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
      _TESTS = [{
          'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
          'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -126,7 +126,8 @@ class AppleDailyIE(NextMediaIE):
              'thumbnail': 're:^https?://.*\.jpg$',
              'description': 'md5:23c0aac567dc08c9c16a3161a2c2e3cd',
              'upload_date': '20150128',
-        }
+        },
+        'skip': 'redirect to http://www.appledaily.com.tw/animation/',
      }, {
          # No thumbnail
          'url': 'http://www.appledaily.com.tw/animation/realtimenews/new/20150128/5003673/',
@@ -140,10 +141,19 @@ class AppleDailyIE(NextMediaIE):
          },
          'expected_warnings': [
              'video thumbnail',
-        ]
+        ],
+        'skip': 'redirect to http://www.appledaily.com.tw/animation/',
      }, {
          'url': 'http://www.appledaily.com.tw/appledaily/article/supplement/20140417/35770334/',
-        'only_matching': True,
+        'md5': 'eaa20e6b9df418c912d7f5dec2ba734d',
+        'info_dict': {
+            'id': '35770334',
+            'ext': 'mp4',
+            'title': '咖啡占卜測 XU裝熟指數',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
+            'upload_date': '20140417',
+        },
      }]
  
      _URL_PATTERN = r'\{url: \'(.+)\'\}'
diff --git a/youtube_dl/extractor/nextmovie.py b/youtube_dl/extractor/nextmovie.py

new file mode 100644 (file)

index 0000000..9ccd7d7
--- /dev/null
+++ b/youtube_dl/extractor/nextmovie.py
@@ -0,0 +1,30 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .mtv import MTVServicesInfoExtractor
+from ..compat import compat_urllib_parse_urlencode
+
+
+class NextMovieIE(MTVServicesInfoExtractor):
+    IE_NAME = 'nextmovie.com'
+    _VALID_URL = r'https?://(?:www\.)?nextmovie\.com/shows/[^/]+/\d{4}-\d{2}-\d{2}/(?P<id>[^/?#]+)'
+    _FEED_URL = 'http://lite.dextr.mtvi.com/service1/dispatch.htm'
+    _TESTS = [{
+        'url': 'http://www.nextmovie.com/shows/exclusives/2013-03-10/mgid:uma:videolist:nextmovie.com:1715019/',
+        'md5': '09a9199f2f11f10107d04fcb153218aa',
+        'info_dict': {
+            'id': '961726',
+            'ext': 'mp4',
+            'title': 'The Muppets\' Gravity',
+        },
+    }]
+
+    def _get_feed_query(self, uri):
+        return compat_urllib_parse_urlencode({
+            'feed': '1505',
+            'mgid': uri,
+        })
+
+    def _real_extract(self, url):
+        mgid = self._match_id(url)
+        return self._get_videos_info(mgid)
diff --git a/youtube_dl/extractor/nfb.py b/youtube_dl/extractor/nfb.py

index ea077254b4320fe18e59eb9b67461b13c146b873..51e4a34f789f0e7e9dff2eeb9ec839e655632c75 100644 (file)
--- a/youtube_dl/extractor/nfb.py
+++ b/youtube_dl/extractor/nfb.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
+from ..utils import (
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -40,8 +40,9 @@ class NFBIE(InfoExtractor):
          uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
                                             page, 'director name', fatal=False)
  
-        request = compat_urllib_request.Request('https://www.nfb.ca/film/%s/player_config' % video_id,
-                                                compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii'))
+        request = sanitized_Request(
+            'https://www.nfb.ca/film/%s/player_config' % video_id,
+            urlencode_postdata({'getConfig': 'true'}))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
  
diff --git a/youtube_dl/extractor/nfl.py b/youtube_dl/extractor/nfl.py

index dc54634a58e440fc70ae9bcb3e7d5781981b2b1e..200874d68e765e43a6b9787473c6d2b5af54cfb2 100644 (file)
--- a/youtube_dl/extractor/nfl.py
+++ b/youtube_dl/extractor/nfl.py
@@ -16,53 +16,118 @@ from ..utils import (
  
  class NFLIE(InfoExtractor):
      IE_NAME = 'nfl.com'
-    _VALID_URL = r'''(?x)https?://
-        (?P<host>(?:www\.)?(?:nfl\.com|.*?\.clubs\.nfl\.com))/
-        (?:.+?/)*
-        (?P<id>(?:[a-z0-9]{16}|\w{8}\-(?:\w{4}\-){3}\w{12}))'''
-    _TESTS = [
-        {
-            'url': 'http://www.nfl.com/videos/nfl-game-highlights/0ap3000000398478/Week-3-Redskins-vs-Eagles-highlights',
-            'md5': '394ef771ddcd1354f665b471d78ec4c6',
-            'info_dict': {
-                'id': '0ap3000000398478',
-                'ext': 'mp4',
-                'title': 'Week 3: Redskins vs. Eagles highlights',
-                'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478',
-                'upload_date': '20140921',
-                'timestamp': 1411337580,
-                'thumbnail': 're:^https?://.*\.jpg$',
-            }
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?P<host>
+                            (?:www\.)?
+                            (?:
+                                (?:
+                                    nfl|
+                                    buffalobills|
+                                    miamidolphins|
+                                    patriots|
+                                    newyorkjets|
+                                    baltimoreravens|
+                                    bengals|
+                                    clevelandbrowns|
+                                    steelers|
+                                    houstontexans|
+                                    colts|
+                                    jaguars|
+                                    titansonline|
+                                    denverbroncos|
+                                    kcchiefs|
+                                    raiders|
+                                    chargers|
+                                    dallascowboys|
+                                    giants|
+                                    philadelphiaeagles|
+                                    redskins|
+                                    chicagobears|
+                                    detroitlions|
+                                    packers|
+                                    vikings|
+                                    atlantafalcons|
+                                    panthers|
+                                    neworleanssaints|
+                                    buccaneers|
+                                    azcardinals|
+                                    stlouisrams|
+                                    49ers|
+                                    seahawks
+                                )\.com|
+                                .+?\.clubs\.nfl\.com
+                            )
+                        )/
+                        (?:.+?/)*
+                        (?P<id>[^/#?&]+)
+                    '''
+    _TESTS = [{
+        'url': 'http://www.nfl.com/videos/nfl-game-highlights/0ap3000000398478/Week-3-Redskins-vs-Eagles-highlights',
+        'md5': '394ef771ddcd1354f665b471d78ec4c6',
+        'info_dict': {
+            'id': '0ap3000000398478',
+            'ext': 'mp4',
+            'title': 'Week 3: Redskins vs. Eagles highlights',
+            'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478',
+            'upload_date': '20140921',
+            'timestamp': 1411337580,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        }
+    }, {
+        'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266',
+        'md5': 'cf85bdb4bc49f6e9d3816d130c78279c',
+        'info_dict': {
+            'id': '9d72f26a-9e2b-4718-84d3-09fb4046c266',
+            'ext': 'mp4',
+            'title': 'LIVE: Post Game vs. Browns',
+            'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8',
+            'upload_date': '20131229',
+            'timestamp': 1388354455,
+            'thumbnail': 're:^https?://.*\.jpg$',
+        }
+    }, {
+        'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish',
+        'info_dict': {
+            'id': '0ap3000000467607',
+            'ext': 'mp4',
+            'title': 'Frustrations flare on the field',
+            'description': 'Emotions ran high at the end of the Super Bowl on both sides of the ball after a dramatic finish.',
+            'timestamp': 1422850320,
+            'upload_date': '20150202',
          },
-        {
-            'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266',
-            'md5': 'cf85bdb4bc49f6e9d3816d130c78279c',
-            'info_dict': {
-                'id': '9d72f26a-9e2b-4718-84d3-09fb4046c266',
-                'ext': 'mp4',
-                'title': 'LIVE: Post Game vs. Browns',
-                'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8',
-                'upload_date': '20131229',
-                'timestamp': 1388354455,
-                'thumbnail': 're:^https?://.*\.jpg$',
-            }
+    }, {
+        'url': 'http://www.patriots.com/video/2015/09/18/10-days-gillette',
+        'md5': '4c319e2f625ffd0b481b4382c6fc124c',
+        'info_dict': {
+            'id': 'n-238346',
+            'ext': 'mp4',
+            'title': '10 Days at Gillette',
+            'description': 'md5:8cd9cd48fac16de596eadc0b24add951',
+            'timestamp': 1442618809,
+            'upload_date': '20150918',
          },
-        {
-            'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish',
-            'info_dict': {
-                'id': '0ap3000000467607',
-                'ext': 'mp4',
-                'title': 'Frustrations flare on the field',
-                'description': 'Emotions ran high at the end of the Super Bowl on both sides of the ball after a dramatic finish.',
-                'timestamp': 1422850320,
-                'upload_date': '20150202',
-            },
+    }, {
+        # lowercase data-contentid
+        'url': 'http://www.steelers.com/news/article-1/Tomlin-on-Ben-getting-Vick-ready/56399c96-4160-48cf-a7ad-1d17d4a3aef7',
+        'info_dict': {
+            'id': '12693586-6ea9-4743-9c1c-02c59e4a5ef2',
+            'ext': 'mp4',
+            'title': 'Tomlin looks ahead to Ravens on a short week',
+            'description': 'md5:32f3f7b139f43913181d5cbb24ecad75',
+            'timestamp': 1443459651,
+            'upload_date': '20150928',
          },
-        {
-            'url': 'http://www.nfl.com/videos/nfl-network-top-ten/09000d5d810a6bd4/Top-10-Gutsiest-Performances-Jack-Youngblood',
-            'only_matching': True,
-        }
-    ]
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.nfl.com/videos/nfl-network-top-ten/09000d5d810a6bd4/Top-10-Gutsiest-Performances-Jack-Youngblood',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.buffalobills.com/video/videos/Rex_Ryan_Show_World_Wide_Rex/b1dcfab2-3190-4bb1-bfc0-d6e603d6601a',
+        'only_matching': True,
+    }]
  
      @staticmethod
      def prepend_host(host, url):
@@ -95,13 +160,14 @@ class NFLIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          config_url = NFLIE.prepend_host(host, self._search_regex(
-            r'(?:config|configURL)\s*:\s*"([^"]+)"', webpage, 'config URL',
-            default='static/content/static/config/video/config.json'))
+            r'(?:(?:config|configURL)\s*:\s*|<nflcs:avplayer[^>]+data-config\s*=\s*)(["\'])(?P<config>.+?)\1',
+            webpage, 'config URL', default='static/content/static/config/video/config.json',
+            group='config'))
          # For articles, the id in the url is not the video id
          video_id = self._search_regex(
-            r'contentId\s*:\s*"([^"]+)"', webpage, 'video id', default=video_id)
-        config = self._download_json(config_url, video_id,
-                                     note='Downloading player config')
+            r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>.+?)\1',
+            webpage, 'video id', default=video_id, group='id')
+        config = self._download_json(config_url, video_id, 'Downloading player config')
          url_template = NFLIE.prepend_host(
              host, '{contentURLTemplate:}'.format(**config))
          video_data = self._download_json(
diff --git a/youtube_dl/extractor/nhl.py b/youtube_dl/extractor/nhl.py

index 279b18386197560346e1cbce716ecf7ff61af2f9..b04d2111312d5a9e956762b10a30422ae5a8cd64 100644 (file)
--- a/youtube_dl/extractor/nhl.py
+++ b/youtube_dl/extractor/nhl.py
@@ -7,11 +7,16 @@ import os
  from .common import InfoExtractor
  from ..compat import (
      compat_urlparse,
-    compat_urllib_parse,
-    compat_urllib_parse_urlparse
+    compat_urllib_parse_urlencode,
+    compat_urllib_parse_urlparse,
+    compat_str,
  )
  from ..utils import (
      unified_strdate,
+    determine_ext,
+    int_or_none,
+    parse_iso8601,
+    parse_duration,
  )
  
  
@@ -38,7 +43,7 @@ class NHLBaseInfoExtractor(InfoExtractor):
              parsed_url = compat_urllib_parse_urlparse(initial_video_url)
              filename, ext = os.path.splitext(parsed_url.path)
              path = '%s_sd%s' % (filename, ext)
-            data = compat_urllib_parse.urlencode({
+            data = compat_urllib_parse_urlencode({
                  'type': 'fvod',
                  'path': compat_urlparse.urlunparse(parsed_url[:2] + (path,) + parsed_url[3:])
              })
@@ -70,9 +75,9 @@ class NHLBaseInfoExtractor(InfoExtractor):
          return ret
  
  
-class NHLIE(NHLBaseInfoExtractor):
-    IE_NAME = 'nhl.com'
-    _VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console)?(?:\?(?:.*?[?&])?)(?:id|hlg)=(?P<id>[-0-9a-zA-Z,]+)'
+class NHLVideocenterIE(NHLBaseInfoExtractor):
+    IE_NAME = 'nhl.com:videocenter'
+    _VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
  
      _TESTS = [{
          'url': 'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
@@ -136,6 +141,9 @@ class NHLIE(NHLBaseInfoExtractor):
          'params': {
              'skip_download': True,  # Requires rtmpdump
          }
+    }, {
+        'url': 'http://video.nhl.com/videocenter/embed?playlist=836127',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -146,9 +154,9 @@ class NHLIE(NHLBaseInfoExtractor):
  class NHLNewsIE(NHLBaseInfoExtractor):
      IE_NAME = 'nhl.com:news'
      IE_DESC = 'NHL news'
-    _VALID_URL = r'https?://(?:www\.)?nhl\.com/ice/news\.html?(?:\?(?:.*?[?&])?)id=(?P<id>[-0-9a-zA-Z]+)'
+    _VALID_URL = r'https?://(?:.+?\.)?nhl\.com/(?:ice|club)/news\.html?(?:\?(?:.*?[?&])?)id=(?P<id>[-0-9a-zA-Z]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.nhl.com/ice/news.htm?id=750727',
          'md5': '4b3d1262e177687a3009937bd9ec0be8',
          'info_dict': {
@@ -159,19 +167,32 @@ class NHLNewsIE(NHLBaseInfoExtractor):
              'duration': 37,
              'upload_date': '20150128',
          },
-    }
+    }, {
+        # iframe embed
+        'url': 'http://sabres.nhl.com/club/news.htm?id=780189',
+        'md5': '9f663d1c006c90ac9fb82777d4294e12',
+        'info_dict': {
+            'id': '836127',
+            'ext': 'mp4',
+            'title': 'Morning Skate: OTT vs. BUF (9/23/15)',
+            'description': "Brian Duff chats with Tyler Ennis prior to Buffalo's first preseason home game.",
+            'duration': 93,
+            'upload_date': '20150923',
+        },
+    }]
  
      def _real_extract(self, url):
          news_id = self._match_id(url)
          webpage = self._download_webpage(url, news_id)
          video_id = self._search_regex(
-            [r'pVid(\d+)', r"nlid\s*:\s*'(\d+)'"],
+            [r'pVid(\d+)', r"nlid\s*:\s*'(\d+)'",
+             r'<iframe[^>]+src=["\']https?://video.*?\.nhl\.com/videocenter/embed\?.*\bplaylist=(\d+)'],
              webpage, 'video id')
          return self._real_extract_video(video_id)
  
  
-class NHLVideocenterIE(NHLBaseInfoExtractor):
-    IE_NAME = 'nhl.com:videocenter'
+class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
+    IE_NAME = 'nhl.com:videocenter:category'
      IE_DESC = 'NHL videocenter category'
      _VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
      _TEST = {
@@ -195,7 +216,7 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
              r'tab0"[^>]*?>(.*?)</td>',
              webpage, 'playlist title', flags=re.DOTALL).lower().capitalize()
  
-        data = compat_urllib_parse.urlencode({
+        data = compat_urllib_parse_urlencode({
              'cid': cat_id,
              # This is the default value
              'count': 12,
@@ -207,7 +228,7 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
          response = self._download_webpage(request_url, playlist_title)
          response = self._fix_json(response)
          if not response.strip():
-            self._downloader.report_warning('Got an empty reponse, trying '
+            self._downloader.report_warning('Got an empty response, trying '
                                              'adding the "newvideos" parameter')
              response = self._download_webpage(request_url + '&newvideos=true',
                                                playlist_title)
@@ -220,3 +241,86 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
              'id': cat_id,
              'entries': [self._extract_video(v) for v in videos],
          }
+
+
+class NHLIE(InfoExtractor):
+    IE_NAME = 'nhl.com'
+    _VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
+    _TESTS = [{
+        # type=video
+        'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
+        'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
+        'info_dict': {
+            'id': '43663503',
+            'ext': 'mp4',
+            'title': 'Anisimov cleans up mess',
+            'description': 'md5:a02354acdfe900e940ce40706939ca63',
+            'timestamp': 1461288600,
+            'upload_date': '20160422',
+        },
+    }, {
+        # type=article
+        'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
+        'md5': '1f39f4ea74c1394dea110699a25b366c',
+        'info_dict': {
+            'id': '40784403',
+            'ext': 'mp4',
+            'title': 'Wideman suspended by NHL',
+            'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
+            'upload_date': '20160204',
+            'timestamp': 1454544904,
+        },
+    }]
+
+    def _real_extract(self, url):
+        tmp_id = self._match_id(url)
+        video_data = self._download_json(
+            'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
+            tmp_id)
+        if video_data.get('type') == 'article':
+            video_data = video_data['media']
+
+        video_id = compat_str(video_data['id'])
+        title = video_data['title']
+
+        formats = []
+        for playback in video_data.get('playbacks', []):
+            playback_url = playback.get('url')
+            if not playback_url:
+                continue
+            ext = determine_ext(playback_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    playback_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=playback.get('name', 'hls'), fatal=False))
+            else:
+                height = int_or_none(playback.get('height'))
+                formats.append({
+                    'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
+                    'url': playback_url,
+                    'width': int_or_none(playback.get('width')),
+                    'height': height,
+                })
+        self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
+
+        thumbnails = []
+        for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
+            thumbnail_url = thumbnail_data.get('src')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'id': thumbnail_id,
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail_data.get('width')),
+                'height': int_or_none(thumbnail_data.get('height')),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'timestamp': parse_iso8601(video_data.get('date')),
+            'duration': parse_duration(video_data.get('duration')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/nick.py b/youtube_dl/extractor/nick.py

new file mode 100644 (file)

index 0000000..ce065f2
--- /dev/null
+++ b/youtube_dl/extractor/nick.py
@@ -0,0 +1,63 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .mtv import MTVServicesInfoExtractor
+from ..compat import compat_urllib_parse_urlencode
+
+
+class NickIE(MTVServicesInfoExtractor):
+    IE_NAME = 'nick.com'
+    _VALID_URL = r'https?://(?:www\.)?nick\.com/videos/clip/(?P<id>[^/?#.]+)'
+    _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
+    _TESTS = [{
+        'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
+        'playlist': [
+            {
+                'md5': '6e5adc1e28253bbb1b28ab05403dd4d4',
+                'info_dict': {
+                    'id': 'be6a17b0-412d-11e5-8ff7-0026b9414f30',
+                    'ext': 'mp4',
+                    'title': 'ALVINNN!!! and The Chipmunks: "Mojo Missing/Who\'s The Animal" S1',
+                    'description': 'Alvin is convinced his mojo was in a cap he gave to a fan, and must find a way to get his hat back before the Chipmunks’ big concert.\nDuring a costume visit to the zoo, Alvin finds himself mistaken for the real Tasmanian devil.',
+
+                }
+            },
+            {
+                'md5': 'd7be441fc53a1d4882fa9508a1e5b3ce',
+                'info_dict': {
+                    'id': 'be6b8f96-412d-11e5-8ff7-0026b9414f30',
+                    'ext': 'mp4',
+                    'title': 'ALVINNN!!! and The Chipmunks: "Mojo Missing/Who\'s The Animal" S2',
+                    'description': 'Alvin is convinced his mojo was in a cap he gave to a fan, and must find a way to get his hat back before the Chipmunks’ big concert.\nDuring a costume visit to the zoo, Alvin finds himself mistaken for the real Tasmanian devil.',
+
+                }
+            },
+            {
+                'md5': 'efffe1728a234b2b0d2f2b343dd1946f',
+                'info_dict': {
+                    'id': 'be6cf7e6-412d-11e5-8ff7-0026b9414f30',
+                    'ext': 'mp4',
+                    'title': 'ALVINNN!!! and The Chipmunks: "Mojo Missing/Who\'s The Animal" S3',
+                    'description': 'Alvin is convinced his mojo was in a cap he gave to a fan, and must find a way to get his hat back before the Chipmunks’ big concert.\nDuring a costume visit to the zoo, Alvin finds himself mistaken for the real Tasmanian devil.',
+                }
+            },
+            {
+                'md5': '1ec6690733ab9f41709e274a1d5c7556',
+                'info_dict': {
+                    'id': 'be6e3354-412d-11e5-8ff7-0026b9414f30',
+                    'ext': 'mp4',
+                    'title': 'ALVINNN!!! and The Chipmunks: "Mojo Missing/Who\'s The Animal" S4',
+                    'description': 'Alvin is convinced his mojo was in a cap he gave to a fan, and must find a way to get his hat back before the Chipmunks’ big concert.\nDuring a costume visit to the zoo, Alvin finds himself mistaken for the real Tasmanian devil.',
+                }
+            },
+        ],
+    }]
+
+    def _get_feed_query(self, uri):
+        return compat_urllib_parse_urlencode({
+            'feed': 'nick_arc_player_prime',
+            'mgid': uri,
+        })
+
+    def _extract_mgid(self, webpage):
+        return self._search_regex(r'data-contenturi="([^"]+)', webpage, 'mgid')
diff --git a/youtube_dl/extractor/niconico.py b/youtube_dl/extractor/niconico.py

index 0f8aa5adad5b2247621ce00249f3bd03a33a104a..dd75a48afcc9dfa4a728c600c836741785056770 100644 (file)
--- a/youtube_dl/extractor/niconico.py
+++ b/youtube_dl/extractor/niconico.py
@@ -7,8 +7,7 @@ import datetime
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -16,8 +15,10 @@ from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
+    sanitized_Request,
      xpath_text,
      determine_ext,
+    urlencode_postdata,
  )
  
  
@@ -100,11 +101,8 @@ class NiconicoIE(InfoExtractor):
              'mail': username,
              'password': password,
          }
-        # Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
-        # chokes on unicode
-        login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k, v in login_form_strs.items())
-        login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
-        request = compat_urllib_request.Request(
+        login_data = urlencode_postdata(login_form_strs)
+        request = sanitized_Request(
              'https://secure.nicovideo.jp/secure/login', login_data)
          login_results = self._download_webpage(
              request, None, note='Logging in', errnote='Unable to log in')
@@ -143,11 +141,11 @@ class NiconicoIE(InfoExtractor):
                  r'\'thumbPlayKey\'\s*:\s*\'(.*?)\'', ext_player_info, 'thumbPlayKey')
  
              # Get flv info
-            flv_info_data = compat_urllib_parse.urlencode({
+            flv_info_data = compat_urllib_parse_urlencode({
                  'k': thumb_play_key,
                  'v': video_id
              })
-            flv_info_request = compat_urllib_request.Request(
+            flv_info_request = sanitized_Request(
                  'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
                  {'Content-Type': 'application/x-www-form-urlencoded'})
              flv_info_webpage = self._download_webpage(
diff --git a/youtube_dl/extractor/ninegag.py b/youtube_dl/extractor/ninegag.py

index 7f842b5c2560211cc88280e2b97cf107af588bfe..a06d38afde37a0f4ad3947776910e9c3b5a39286 100644 (file)
--- a/youtube_dl/extractor/ninegag.py
+++ b/youtube_dl/extractor/ninegag.py
@@ -1,7 +1,6 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
  from ..utils import str_to_int
@@ -9,61 +8,93 @@ from ..utils import str_to_int
  
  class NineGagIE(InfoExtractor):
      IE_NAME = '9gag'
-    _VALID_URL = r'''(?x)^https?://(?:www\.)?9gag\.tv/
-        (?:
-            v/(?P<numid>[0-9]+)|
-            p/(?P<id>[a-zA-Z0-9]+)/(?P<display_id>[^?#/]+)
-        )
-    '''
+    _VALID_URL = r'https?://(?:www\.)?9gag(?:\.com/tv|\.tv)/(?:p|embed)/(?P<id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^?#/]+))?'
  
      _TESTS = [{
-        "url": "http://9gag.tv/v/1912",
-        "info_dict": {
-            "id": "1912",
-            "ext": "mp4",
-            "description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
-            "title": "\"People Are Awesome 2013\" Is Absolutely Awesome",
+        'url': 'http://9gag.com/tv/p/Kk2X5/people-are-awesome-2013-is-absolutely-awesome',
+        'info_dict': {
+            'id': 'Kk2X5',
+            'ext': 'mp4',
+            'description': 'This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)',
+            'title': '\"People Are Awesome 2013\" Is Absolutely Awesome',
              'uploader_id': 'UCdEH6EjDKwtTe-sO2f0_1XA',
              'uploader': 'CompilationChannel',
              'upload_date': '20131110',
-            "view_count": int,
-            "thumbnail": "re:^https?://",
+            'view_count': int,
          },
-        'add_ie': ['Youtube']
+        'add_ie': ['Youtube'],
      }, {
-        'url': 'http://9gag.tv/p/KklwM/alternate-banned-opening-scene-of-gravity?ref=fsidebar',
+        'url': 'http://9gag.com/tv/p/aKolP3',
          'info_dict': {
-            'id': 'KklwM',
+            'id': 'aKolP3',
              'ext': 'mp4',
-            'display_id': 'alternate-banned-opening-scene-of-gravity',
-            "description": "While Gravity was a pretty awesome movie already, YouTuber Krishna Shenoi came up with a way to improve upon it, introducing a much better solution to Sandra Bullock's seemingly endless tumble in space. The ending is priceless.",
-            'title': "Banned Opening Scene Of \"Gravity\" That Changes The Whole Movie",
-            'uploader': 'Krishna Shenoi',
-            'upload_date': '20140401',
-            'uploader_id': 'krishnashenoi93',
+            'title': 'This Guy Travelled 11 countries In 44 days Just To Make This Amazing Video',
+            'description': "I just saw more in 1 minute than I've seen in 1 year. This guy's video is epic!!",
+            'uploader_id': 'rickmereki',
+            'uploader': 'Rick Mereki',
+            'upload_date': '20110803',
+            'view_count': int,
          },
+        'add_ie': ['Vimeo'],
+    }, {
+        'url': 'http://9gag.com/tv/p/KklwM',
+        'only_matching': True,
+    }, {
+        'url': 'http://9gag.tv/p/Kk2X5',
+        'only_matching': True,
+    }, {
+        'url': 'http://9gag.com/tv/embed/a5Dmvl',
+        'only_matching': True,
      }]
  
+    _EXTERNAL_VIDEO_PROVIDER = {
+        '1': {
+            'url': '%s',
+            'ie_key': 'Youtube',
+        },
+        '2': {
+            'url': 'http://player.vimeo.com/video/%s',
+            'ie_key': 'Vimeo',
+        },
+        '3': {
+            'url': 'http://instagram.com/p/%s',
+            'ie_key': 'Instagram',
+        },
+        '4': {
+            'url': 'http://vine.co/v/%s',
+            'ie_key': 'Vine',
+        },
+    }
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('numid') or mobj.group('id')
+        video_id = mobj.group('id')
          display_id = mobj.group('display_id') or video_id
  
          webpage = self._download_webpage(url, display_id)
  
-        post_view = json.loads(self._html_search_regex(
-            r'var postView = new app\.PostView\({\s*post:\s*({.+?}),\s*posts:\s*prefetchedCurrentPost', webpage, 'post view'))
+        post_view = self._parse_json(
+            self._search_regex(
+                r'var\s+postView\s*=\s*new\s+app\.PostView\({\s*post:\s*({.+?})\s*,\s*posts:\s*prefetchedCurrentPost',
+                webpage, 'post view'),
+            display_id)
  
-        youtube_id = post_view['videoExternalId']
+        ie_key = None
+        source_url = post_view.get('sourceUrl')
+        if not source_url:
+            external_video_id = post_view['videoExternalId']
+            external_video_provider = post_view['videoExternalProvider']
+            source_url = self._EXTERNAL_VIDEO_PROVIDER[external_video_provider]['url'] % external_video_id
+            ie_key = self._EXTERNAL_VIDEO_PROVIDER[external_video_provider]['ie_key']
          title = post_view['title']
-        description = post_view['description']
-        view_count = str_to_int(post_view['externalView'])
+        description = post_view.get('description')
+        view_count = str_to_int(post_view.get('externalView'))
          thumbnail = post_view.get('thumbnail_700w') or post_view.get('ogImageUrl') or post_view.get('thumbnail_300w')
  
          return {
              '_type': 'url_transparent',
-            'url': youtube_id,
-            'ie_key': 'Youtube',
+            'url': source_url,
+            'ie_key': ie_key,
              'id': video_id,
              'display_id': display_id,
              'title': title,
diff --git a/youtube_dl/extractor/noco.py b/youtube_dl/extractor/noco.py

index a53e27b274eaa21ac15a1dc5077001d520832696..06f2bda07dd5db2c54e1e0492f244dbf0fc5a526 100644 (file)
--- a/youtube_dl/extractor/noco.py
+++ b/youtube_dl/extractor/noco.py
@@ -8,8 +8,7 @@ import hashlib
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
-    compat_urllib_parse,
-    compat_urllib_request,
+    compat_urlparse,
  )
  from ..utils import (
      clean_html,
@@ -17,11 +16,13 @@ from ..utils import (
      int_or_none,
      float_or_none,
      parse_iso8601,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class NocoIE(InfoExtractor):
-    _VALID_URL = r'http://(?:(?:www\.)?noco\.tv/emission/|player\.noco\.tv/\?idvideo=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?noco\.tv/emission/|player\.noco\.tv/\?idvideo=)(?P<id>\d+)'
      _LOGIN_URL = 'http://noco.tv/do.php'
      _API_URL_TEMPLATE = 'https://api.noco.tv/1.1/%s?ts=%s&tk=%s'
      _SUB_LANG_TEMPLATE = '&sub_lang=%s'
@@ -74,7 +75,7 @@ class NocoIE(InfoExtractor):
              'username': username,
              'password': password,
          }
-        request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
+        request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
  
          login = self._download_json(request, None, 'Logging in as %s' % username)
@@ -82,14 +83,21 @@ class NocoIE(InfoExtractor):
          if 'erreur' in login:
              raise ExtractorError('Unable to login: %s' % clean_html(login['erreur']), expected=True)
  
+    @staticmethod
+    def _ts():
+        return int(time.time() * 1000)
+
      def _call_api(self, path, video_id, note, sub_lang=None):
-        ts = compat_str(int(time.time() * 1000))
+        ts = compat_str(self._ts() + self._ts_offset)
          tk = hashlib.md5((hashlib.md5(ts.encode('ascii')).hexdigest() + '#8S?uCraTedap6a').encode('ascii')).hexdigest()
          url = self._API_URL_TEMPLATE % (path, ts, tk)
          if sub_lang:
              url += self._SUB_LANG_TEMPLATE % sub_lang
  
-        resp = self._download_json(url, video_id, note)
+        request = sanitized_Request(url)
+        request.add_header('Referer', self._referer)
+
+        resp = self._download_json(request, video_id, note)
  
          if isinstance(resp, dict) and resp.get('error'):
              self._raise_error(resp['error'], resp['description'])
@@ -102,8 +110,22 @@ class NocoIE(InfoExtractor):
              expected=True)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
+
+        # Timestamp adjustment offset between server time and local time
+        # must be calculated in order to use timestamps closest to server's
+        # in all API requests (see https://github.com/rg3/youtube-dl/issues/7864)
+        webpage = self._download_webpage(url, video_id)
+
+        player_url = self._search_regex(
+            r'(["\'])(?P<player>https?://noco\.tv/(?:[^/]+/)+NocoPlayer.+?\.swf.*?)\1',
+            webpage, 'noco player', group='player',
+            default='http://noco.tv/cdata/js/player/NocoPlayer-v1.2.40.swf')
+
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(player_url).query)
+        ts = int_or_none(qs.get('ts', [None])[0])
+        self._ts_offset = ts - self._ts() if ts else 0
+        self._referer = player_url
  
          medias = self._call_api(
              'shows/%s/medias' % video_id,
@@ -155,8 +177,8 @@ class NocoIE(InfoExtractor):
                          'format_id': format_id_extended,
                          'width': int_or_none(fmt.get('res_width')),
                          'height': int_or_none(fmt.get('res_lines')),
-                        'abr': int_or_none(fmt.get('audiobitrate')),
-                        'vbr': int_or_none(fmt.get('videobitrate')),
+                        'abr': int_or_none(fmt.get('audiobitrate'), 1000),
+                        'vbr': int_or_none(fmt.get('videobitrate'), 1000),
                          'filesize': int_or_none(fmt.get('filesize')),
                          'format_note': qualities[format_id].get('quality_name'),
                          'quality': qualities[format_id].get('priority'),
diff --git a/youtube_dl/extractor/normalboots.py b/youtube_dl/extractor/normalboots.py

index 5952d136f7b3efd3e9f91843ba12de6a13d989ba..77e09107299824f5ae4063817d73e505e893c2af 100644 (file)
--- a/youtube_dl/extractor/normalboots.py
+++ b/youtube_dl/extractor/normalboots.py
@@ -9,7 +9,7 @@ from ..utils import (
  
  
  class NormalbootsIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
+    _VALID_URL = r'https?://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
      _TEST = {
          'url': 'http://normalboots.com/video/home-alone-games-jontron/',
          'md5': '8bf6de238915dd501105b44ef5f1e0f6',
diff --git a/youtube_dl/extractor/nosvideo.py b/youtube_dl/extractor/nosvideo.py

index f5ef856db0155dd84f10d5db4a8cef8e6c08213c..eab816e4916bc2fae7d72cde598cb5b5f69bfde4 100644 (file)
--- a/youtube_dl/extractor/nosvideo.py
+++ b/youtube_dl/extractor/nosvideo.py
@@ -4,11 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
      urlencode_postdata,
      xpath_text,
      xpath_with_ns,
@@ -41,7 +39,7 @@ class NosVideoIE(InfoExtractor):
              'op': 'download1',
              'method_free': 'Continue to Video',
          }
-        req = compat_urllib_request.Request(url, urlencode_postdata(fields))
+        req = sanitized_Request(url, urlencode_postdata(fields))
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
          webpage = self._download_webpage(req, video_id,
                                           'Downloading download page')
diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py

index 3f9c776ef665ab47624eeab7ba60f5754dbf213e..17671ad398b9e9a8148bceff74db678969d26d3f 100644 (file)
--- a/youtube_dl/extractor/nova.py
+++ b/youtube_dl/extractor/nova.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class NovaIE(InfoExtractor):
      IE_DESC = 'TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz'
-    _VALID_URL = 'http://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
+    _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
      _TESTS = [{
          'url': 'http://tvnoviny.nova.cz/clanek/novinky/co-na-sebe-sportaci-praskli-vime-jestli-pujde-hrdlicka-na-materskou.html?utm_source=tvnoviny&utm_medium=cpfooter&utm_campaign=novaplus',
          'info_dict': {
diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py

index 04d779890af1960d65b070d0b2f80e429db21d07..3bbd4735502e113fcc46a07981ff5863c52fef15 100644 (file)
--- a/youtube_dl/extractor/novamov.py
+++ b/youtube_dl/extractor/novamov.py
@@ -3,11 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
+    NO_DEFAULT,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -15,42 +16,70 @@ class NovaMovIE(InfoExtractor):
      IE_NAME = 'novamov'
      IE_DESC = 'NovaMov'
  
-    _VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
+    _VALID_URL_TEMPLATE = r'''(?x)
+                            http://
+                                (?:
+                                    (?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
+                                    (?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
+                                )
+                                (?P<id>[a-z\d]{13})
+                            '''
      _VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
  
      _HOST = 'www.novamov.com'
  
      _FILE_DELETED_REGEX = r'This file no longer exists on our servers!</h2>'
-    _FILEKEY_REGEX = r'flashvars\.filekey="(?P<filekey>[^"]+)";'
+    _FILEKEY_REGEX = r'flashvars\.filekey=(?P<filekey>"?[^"]+"?);'
      _TITLE_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>([^<]+)</h3>'
      _DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
+    _URL_TEMPLATE = 'http://%s/video/%s'
  
-    _TEST = {
-        'url': 'http://www.novamov.com/video/4rurhn9x446jj',
-        'md5': '7205f346a52bbeba427603ba10d4b935',
-        'info_dict': {
-            'id': '4rurhn9x446jj',
-            'ext': 'flv',
-            'title': 'search engine optimization',
-            'description': 'search engine optimization is used to rank the web page in the google search engine'
-        },
-        'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        page = self._download_webpage(
-            'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
+    _TEST = None
  
-        if re.search(self._FILE_DELETED_REGEX, page) is not None:
+    def _check_existence(self, webpage, video_id):
+        if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
              raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
-        filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
-
-        title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
-        description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        url = self._URL_TEMPLATE % (self._HOST, video_id)
+
+        webpage = self._download_webpage(
+            url, video_id, 'Downloading video page')
+
+        self._check_existence(webpage, video_id)
+
+        def extract_filekey(default=NO_DEFAULT):
+            filekey = self._search_regex(
+                self._FILEKEY_REGEX, webpage, 'filekey', default=default)
+            if filekey is not default and (filekey[0] != '"' or filekey[-1] != '"'):
+                return self._search_regex(
+                    r'var\s+%s\s*=\s*"([^"]+)"' % re.escape(filekey), webpage, 'filekey', default=default)
+            else:
+                return filekey
+
+        filekey = extract_filekey(default=None)
+
+        if not filekey:
+            fields = self._hidden_inputs(webpage)
+            post_url = self._search_regex(
+                r'<form[^>]+action=(["\'])(?P<url>.+?)\1', webpage,
+                'post url', default=url, group='url')
+            if not post_url.startswith('http'):
+                post_url = compat_urlparse.urljoin(url, post_url)
+            request = sanitized_Request(
+                post_url, urlencode_postdata(fields))
+            request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+            request.add_header('Referer', post_url)
+            webpage = self._download_webpage(
+                request, video_id, 'Downloading continue to the video page')
+            self._check_existence(webpage, video_id)
+
+        filekey = extract_filekey()
+
+        title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title')
+        description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
  
          api_response = self._download_webpage(
              'http://%s/api/player.api.php?key=%s&file=%s' % (self._HOST, filekey, video_id), video_id,
@@ -69,3 +98,115 @@ class NovaMovIE(InfoExtractor):
              'title': title,
              'description': description
          }
+
+
+class WholeCloudIE(NovaMovIE):
+    IE_NAME = 'wholecloud'
+    IE_DESC = 'WholeCloud'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': '(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
+
+    _HOST = 'www.wholecloud.net'
+
+    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
+    _TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
+    _DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
+
+    _TEST = {
+        'url': 'http://www.wholecloud.net/video/559e28be54d96',
+        'md5': 'abd31a2132947262c50429e1d16c1bfd',
+        'info_dict': {
+            'id': '559e28be54d96',
+            'ext': 'flv',
+            'title': 'dissapeared image',
+            'description': 'optical illusion  dissapeared image  magic illusion',
+        }
+    }
+
+
+class NowVideoIE(NovaMovIE):
+    IE_NAME = 'nowvideo'
+    IE_DESC = 'NowVideo'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
+
+    _HOST = 'www.nowvideo.to'
+
+    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
+    _TITLE_REGEX = r'<h4>([^<]+)</h4>'
+    _DESCRIPTION_REGEX = r'</h4>\s*<p>([^<]+)</p>'
+
+    _TEST = {
+        'url': 'http://www.nowvideo.sx/video/f1d6fce9a968b',
+        'md5': '12c82cad4f2084881d8bc60ee29df092',
+        'info_dict': {
+            'id': 'f1d6fce9a968b',
+            'ext': 'flv',
+            'title': 'youtubedl test video BaWjenozKc',
+            'description': 'Description',
+        },
+    }
+
+
+class VideoWeedIE(NovaMovIE):
+    IE_NAME = 'videoweed'
+    IE_DESC = 'VideoWeed'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
+
+    _HOST = 'www.videoweed.es'
+
+    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
+    _TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
+    _URL_TEMPLATE = 'http://%s/file/%s'
+
+    _TEST = {
+        'url': 'http://www.videoweed.es/file/b42178afbea14',
+        'md5': 'abd31a2132947262c50429e1d16c1bfd',
+        'info_dict': {
+            'id': 'b42178afbea14',
+            'ext': 'flv',
+            'title': 'optical illusion  dissapeared image magic illusion',
+            'description': ''
+        },
+    }
+
+
+class CloudTimeIE(NovaMovIE):
+    IE_NAME = 'cloudtime'
+    IE_DESC = 'CloudTime'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'cloudtime\.to'}
+
+    _HOST = 'www.cloudtime.to'
+
+    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
+    _TITLE_REGEX = r'<div[^>]+class=["\']video_det["\'][^>]*>\s*<strong>([^<]+)</strong>'
+
+    _TEST = None
+
+
+class AuroraVidIE(NovaMovIE):
+    IE_NAME = 'auroravid'
+    IE_DESC = 'AuroraVid'
+
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'}
+
+    _HOST = 'www.auroravid.to'
+
+    _FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
+
+    _TESTS = [{
+        'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
+        'md5': '7205f346a52bbeba427603ba10d4b935',
+        'info_dict': {
+            'id': '4rurhn9x446jj',
+            'ext': 'flv',
+            'title': 'search engine optimization',
+            'description': 'search engine optimization is used to rank the web page in the google search engine'
+        },
+        'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
+    }, {
+        'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
+        'only_matching': True,
+    }]
diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py

index 6b2f3f55a60d19ff3b4735027a399b6c38ad1310..74860eb2054e4f685b4b52c89149e49563ffe230 100644 (file)
--- a/youtube_dl/extractor/nowness.py
+++ b/youtube_dl/extractor/nowness.py
@@ -1,64 +1,147 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import re
-
-from .brightcove import BrightcoveIE
+from .brightcove import (
+    BrightcoveLegacyIE,
+    BrightcoveNewIE,
+)
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
+)
+
+
+class NownessBaseIE(InfoExtractor):
+    def _extract_url_result(self, post):
+        if post['type'] == 'video':
+            for media in post['media']:
+                if media['type'] == 'video':
+                    video_id = media['content']
+                    source = media['source']
+                    if source == 'brightcove':
+                        player_code = self._download_webpage(
+                            'http://www.nowness.com/iframe?id=%s' % video_id, video_id,
+                            note='Downloading player JavaScript',
+                            errnote='Unable to download player JavaScript')
+                        bc_url = BrightcoveLegacyIE._extract_brightcove_url(player_code)
+                        if bc_url:
+                            return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
+                        bc_url = BrightcoveNewIE._extract_url(player_code)
+                        if bc_url:
+                            return self.url_result(bc_url, BrightcoveNewIE.ie_key())
+                        raise ExtractorError('Could not find player definition')
+                    elif source == 'vimeo':
+                        return self.url_result('http://vimeo.com/%s' % video_id, 'Vimeo')
+                    elif source == 'youtube':
+                        return self.url_result(video_id, 'Youtube')
+                    elif source == 'cinematique':
+                        # youtube-dl currently doesn't support cinematique
+                        # return self.url_result('http://cinematique.com/embed/%s' % video_id, 'Cinematique')
+                        pass
  
+    def _api_request(self, url, request_path):
+        display_id = self._match_id(url)
+        request = sanitized_Request(
+            'http://api.nowness.com/api/' + request_path % display_id,
+            headers={
+                'X-Nowness-Language': 'zh-cn' if 'cn.nowness.com' in url else 'en-us',
+            })
+        return display_id, self._download_json(request, display_id)
  
-class NownessIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:www|cn)\.)?nowness\.com/[^?#]*?/(?P<id>[0-9]+)/(?P<slug>[^/]+?)(?:$|[?#])'
  
-    _TESTS = [
-        {
-            'url': 'http://www.nowness.com/day/2013/6/27/3131/candor--the-art-of-gesticulation',
-            'md5': '068bc0202558c2e391924cb8cc470676',
-            'info_dict': {
-                'id': '2520295746001',
-                'ext': 'mp4',
-                'title': 'Candor: The Art of Gesticulation',
-                'description': 'Candor: The Art of Gesticulation',
-                'thumbnail': 're:^https?://.*\.jpg',
-                'uploader': 'Nowness',
-            }
+class NownessIE(NownessBaseIE):
+    IE_NAME = 'nowness'
+    _VALID_URL = r'https?://(?:(?:www|cn)\.)?nowness\.com/(?:story|(?:series|category)/[^/]+)/(?P<id>[^/]+?)(?:$|[?#])'
+    _TESTS = [{
+        'url': 'https://www.nowness.com/story/candor-the-art-of-gesticulation',
+        'md5': '068bc0202558c2e391924cb8cc470676',
+        'info_dict': {
+            'id': '2520295746001',
+            'ext': 'mp4',
+            'title': 'Candor: The Art of Gesticulation',
+            'description': 'Candor: The Art of Gesticulation',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1446745676,
+            'upload_date': '20151105',
+            'uploader_id': '2385340575001',
          },
-        {
-            'url': 'http://cn.nowness.com/day/2014/8/7/4069/kasper-bj-rke-ft-jaakko-eino-kalevi--tnr',
-            'md5': 'e79cf125e387216f86b2e0a5b5c63aa3',
-            'info_dict': {
-                'id': '3716354522001',
-                'ext': 'mp4',
-                'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
-                'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
-                'thumbnail': 're:^https?://.*\.jpg',
-                'uploader': 'Nowness',
-            }
+        'add_ie': ['BrightcoveNew'],
+    }, {
+        'url': 'https://cn.nowness.com/story/kasper-bjorke-ft-jaakko-eino-kalevi-tnr',
+        'md5': 'e79cf125e387216f86b2e0a5b5c63aa3',
+        'info_dict': {
+            'id': '3716354522001',
+            'ext': 'mp4',
+            'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
+            'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1407315371,
+            'upload_date': '20140806',
+            'uploader_id': '2385340575001',
          },
-    ]
+        'add_ie': ['BrightcoveNew'],
+    }, {
+        # vimeo
+        'url': 'https://www.nowness.com/series/nowness-picks/jean-luc-godard-supercut',
+        'md5': '9a5a6a8edf806407e411296ab6bc2a49',
+        'info_dict': {
+            'id': '130020913',
+            'ext': 'mp4',
+            'title': 'Bleu, Blanc, Rouge - A Godard Supercut',
+            'description': 'md5:f0ea5f1857dffca02dbd37875d742cec',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'upload_date': '20150607',
+            'uploader': 'Cinema Sem Lei',
+            'uploader_id': 'cinemasemlei',
+        },
+        'add_ie': ['Vimeo'],
+    }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('slug')
+        _, post = self._api_request(url, 'post/getBySlug/%s')
+        return self._extract_url_result(post)
  
-        webpage = self._download_webpage(url, video_id)
-        player_url = self._search_regex(
-            r'"([^"]+/content/issue-[0-9.]+.js)"', webpage, 'player URL')
-        real_id = self._search_regex(
-            r'\sdata-videoId="([0-9]+)"', webpage, 'internal video ID')
  
-        player_code = self._download_webpage(
-            player_url, video_id,
-            note='Downloading player JavaScript',
-            errnote='Player download failed')
-        player_code = player_code.replace("'+d+'", real_id)
+class NownessPlaylistIE(NownessBaseIE):
+    IE_NAME = 'nowness:playlist'
+    _VALID_URL = r'https?://(?:(?:www|cn)\.)?nowness\.com/playlist/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://www.nowness.com/playlist/3286/i-guess-thats-why-they-call-it-the-blues',
+        'info_dict': {
+            'id': '3286',
+        },
+        'playlist_mincount': 8,
+    }
  
-        bc_url = BrightcoveIE._extract_brightcove_url(player_code)
-        if bc_url is None:
-            raise ExtractorError('Could not find player definition')
-        return {
-            '_type': 'url',
-            'url': bc_url,
-            'ie_key': 'Brightcove',
-        }
+    def _real_extract(self, url):
+        playlist_id, playlist = self._api_request(url, 'post?PlaylistId=%s')
+        entries = [self._extract_url_result(item) for item in playlist['items']]
+        return self.playlist_result(entries, playlist_id)
+
+
+class NownessSeriesIE(NownessBaseIE):
+    IE_NAME = 'nowness:series'
+    _VALID_URL = r'https?://(?:(?:www|cn)\.)?nowness\.com/series/(?P<id>[^/]+?)(?:$|[?#])'
+    _TEST = {
+        'url': 'https://www.nowness.com/series/60-seconds',
+        'info_dict': {
+            'id': '60',
+            'title': '60 Seconds',
+            'description': 'One-minute wisdom in a new NOWNESS series',
+        },
+        'playlist_mincount': 4,
+    }
+
+    def _real_extract(self, url):
+        display_id, series = self._api_request(url, 'series/getBySlug/%s')
+        entries = [self._extract_url_result(post) for post in series['posts']]
+        series_title = None
+        series_description = None
+        translations = series.get('translations', [])
+        if translations:
+            series_title = translations[0].get('title') or translations[0]['seoTitle']
+            series_description = translations[0].get('seoDescription')
+        return self.playlist_result(
+            entries, compat_str(series['id']), series_title, series_description)
diff --git a/youtube_dl/extractor/nowtv.py b/youtube_dl/extractor/nowtv.py

index 0b5ff47600559e50a2282b184b23d3fc7263d46d..916a102bfc381cbfe9d2baf83ceb5d39241cd69d 100644 (file)
--- a/youtube_dl/extractor/nowtv.py
+++ b/youtube_dl/extractor/nowtv.py
@@ -7,6 +7,7 @@ from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
      ExtractorError,
+    determine_ext,
      int_or_none,
      parse_iso8601,
      parse_duration,
@@ -14,8 +15,64 @@ from ..utils import (
  )
  
  
-class NowTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nowtv\.de/(?P<station>rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/player'
+class NowTVBaseIE(InfoExtractor):
+    _VIDEO_FIELDS = (
+        'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
+        'broadcastStartDate', 'seoUrl', 'duration', 'files',
+        'format.defaultImage169Format', 'format.defaultImage169Logo')
+
+    def _extract_video(self, info, display_id=None):
+        video_id = compat_str(info['id'])
+
+        files = info['files']
+        if not files:
+            if info.get('geoblocked', False):
+                raise ExtractorError(
+                    'Video %s is not available from your location due to geo restriction' % video_id,
+                    expected=True)
+            if not info.get('free', True):
+                raise ExtractorError(
+                    'Video %s is not available for free' % video_id, expected=True)
+
+        formats = []
+        for item in files['items']:
+            if determine_ext(item['path']) != 'f4v':
+                continue
+            app, play_path = remove_start(item['path'], '/').split('/', 1)
+            formats.append({
+                'url': 'rtmpe://fms.rtl.de',
+                'app': app,
+                'play_path': 'mp4:%s' % play_path,
+                'ext': 'flv',
+                'page_url': 'http://rtlnow.rtl.de',
+                'player_url': 'http://cdn.static-fra.de/now/vodplayer.swf',
+                'tbr': int_or_none(item.get('bitrate')),
+            })
+        self._sort_formats(formats)
+
+        title = info['title']
+        description = info.get('articleLong') or info.get('articleShort')
+        timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
+        duration = parse_duration(info.get('duration'))
+
+        f = info.get('format', {})
+        thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+
+        return {
+            'id': video_id,
+            'display_id': display_id or info.get('seoUrl'),
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
+        }
+
+
+class NowTVIE(NowTVBaseIE):
+    _WORKING = False
+    _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:(?:list/[^/]+|jahr/\d{4}/\d{1,2})/)?(?P<id>[^/]+)/(?:player|preview)'
  
      _TESTS = [{
          # rtl
@@ -23,8 +80,8 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '203519',
              'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
-            'ext': 'mp4',
-            'title': 'Die neuen Bauern und eine Hochzeit',
+            'ext': 'flv',
+            'title': 'Inka Bause stellt die neuen Bauern vor',
              'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
              'thumbnail': 're:^https?://.*\.jpg$',
              'timestamp': 1432580700,
@@ -32,7 +89,7 @@ class NowTVIE(InfoExtractor):
              'duration': 2786,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
      }, {
@@ -41,7 +98,7 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '203481',
              'display_id': 'berlin-tag-nacht/berlin-tag-nacht-folge-934',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Berlin - Tag & Nacht (Folge 934)',
              'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -50,7 +107,7 @@ class NowTVIE(InfoExtractor):
              'duration': 2641,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
      }, {
@@ -59,7 +116,7 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '165780',
              'display_id': 'alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Hals- und Beinbruch',
              'description': 'md5:b50d248efffe244e6f56737f0911ca57',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -68,7 +125,7 @@ class NowTVIE(InfoExtractor):
              'duration': 2742,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
      }, {
@@ -77,7 +134,7 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '99205',
              'display_id': 'medicopter-117/angst',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Angst!',
              'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -86,7 +143,7 @@ class NowTVIE(InfoExtractor):
              'duration': 3025,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
      }, {
@@ -95,7 +152,7 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '203521',
              'display_id': 'ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
              'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -104,7 +161,7 @@ class NowTVIE(InfoExtractor):
              'duration': 1083,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
      }, {
@@ -113,7 +170,7 @@ class NowTVIE(InfoExtractor):
          'info_dict': {
              'id': '128953',
              'display_id': 'der-hundeprofi/buero-fall-chihuahua-joel',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': "Büro-Fall / Chihuahua 'Joel'",
              'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -122,71 +179,83 @@ class NowTVIE(InfoExtractor):
              'duration': 3092,
          },
          'params': {
-            # m3u8 download
+            # rtmp download
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.nowtv.de/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nowtv.at/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview?return=/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nowtv.de/rtl2/echtzeit/list/aktuell/schnelles-geld-am-ende-der-welt/player',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nowtv.de/rtl2/zuhause-im-glueck/jahr/2015/11/eine-erschuetternde-diagnose/player',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('id')
-        station = mobj.group('station')
+        display_id = '%s/%s' % (mobj.group('show_id'), mobj.group('id'))
  
          info = self._download_json(
-            'https://api.nowtv.de/v3/movies/%s?fields=id,title,free,geoblocked,articleLong,articleShort,broadcastStartDate,seoUrl,duration,format,files' % display_id,
-            display_id)
+            'https://api.nowtv.de/v3/movies/%s?fields=%s'
+            % (display_id, ','.join(self._VIDEO_FIELDS)), display_id)
  
-        video_id = compat_str(info['id'])
+        return self._extract_video(info, display_id)
  
-        files = info['files']
-        if not files:
-            if info.get('geoblocked', False):
-                raise ExtractorError(
-                    'Video %s is not available from your location due to geo restriction' % video_id,
-                    expected=True)
-            if not info.get('free', True):
-                raise ExtractorError(
-                    'Video %s is not available for free' % video_id, expected=True)
  
-        f = info.get('format', {})
-        station = f.get('station') or station
-
-        STATIONS = {
-            'rtl': 'rtlnow',
-            'rtl2': 'rtl2now',
-            'vox': 'voxnow',
-            'nitro': 'rtlnitronow',
-            'ntv': 'n-tvnow',
-            'superrtl': 'superrtlnow'
-        }
+class NowTVListIE(NowTVBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/list/(?P<id>[^?/#&]+)$'
  
-        formats = []
-        for item in files['items']:
-            item_path = remove_start(item['path'], '/')
-            tbr = int_or_none(item['bitrate'])
-            m3u8_url = 'http://hls.fra.%s.de/hls-vod-enc/%s.m3u8' % (STATIONS[station], item_path)
-            m3u8_url = m3u8_url.replace('now/', 'now/videos/')
-            formats.append({
-                'url': m3u8_url,
-                'format_id': '%s-%sk' % (item['id'], tbr),
-                'ext': 'mp4',
-                'tbr': tbr,
-            })
-        self._sort_formats(formats)
+    _SHOW_FIELDS = ('title', )
+    _SEASON_FIELDS = ('id', 'headline', 'seoheadline', )
  
-        title = info['title']
-        description = info.get('articleLong') or info.get('articleShort')
-        timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
-        duration = parse_duration(info.get('duration'))
-        thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
+    _TESTS = [{
+        'url': 'http://www.nowtv.at/rtl/stern-tv/list/aktuell',
+        'info_dict': {
+            'id': '17006',
+            'title': 'stern TV - Aktuell',
+        },
+        'playlist_count': 1,
+    }, {
+        'url': 'http://www.nowtv.at/rtl/das-supertalent/list/free-staffel-8',
+        'info_dict': {
+            'id': '20716',
+            'title': 'Das Supertalent - FREE Staffel 8',
+        },
+        'playlist_count': 14,
+    }]
  
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'timestamp': timestamp,
-            'duration': duration,
-            'formats': formats,
-        }
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        show_id = mobj.group('show_id')
+        season_id = mobj.group('id')
+
+        fields = []
+        fields.extend(self._SHOW_FIELDS)
+        fields.extend('formatTabs.%s' % field for field in self._SEASON_FIELDS)
+        fields.extend(
+            'formatTabs.formatTabPages.container.movies.%s' % field
+            for field in self._VIDEO_FIELDS)
+
+        list_info = self._download_json(
+            'https://api.nowtv.de/v3/formats/seo?fields=%s&name=%s.php'
+            % (','.join(fields), show_id),
+            season_id)
+
+        season = next(
+            season for season in list_info['formatTabs']['items']
+            if season.get('seoheadline') == season_id)
+
+        title = '%s - %s' % (list_info['title'], season['headline'])
+
+        entries = []
+        for container in season['formatTabPages']['items']:
+            for info in ((container.get('container') or {}).get('movies') or {}).get('items') or []:
+                entries.append(self._extract_video(info))
+
+        return self.playlist_result(
+            entries, compat_str(season.get('id') or season_id), title)
diff --git a/youtube_dl/extractor/nowvideo.py b/youtube_dl/extractor/nowvideo.py

deleted file mode 100644 (file)

index dec09cd..0000000
--- a/youtube_dl/extractor/nowvideo.py
+++ /dev/null
@@ -1,28 +0,0 @@
-from __future__ import unicode_literals
-
-from .novamov import NovaMovIE
-
-
-class NowVideoIE(NovaMovIE):
-    IE_NAME = 'nowvideo'
-    IE_DESC = 'NowVideo'
-
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|sx|eu|at|ag|co|li)'}
-
-    _HOST = 'www.nowvideo.ch'
-
-    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
-    _FILEKEY_REGEX = r'var fkzd="([^"]+)";'
-    _TITLE_REGEX = r'<h4>([^<]+)</h4>'
-    _DESCRIPTION_REGEX = r'</h4>\s*<p>([^<]+)</p>'
-
-    _TEST = {
-        'url': 'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
-        'md5': 'f8fbbc8add72bd95b7850c6a02fc8817',
-        'info_dict': {
-            'id': '0mw0yow7b6dxa',
-            'ext': 'flv',
-            'title': 'youtubedl test video _BaW_jenozKc.mp4',
-            'description': 'Description',
-        }
-    }
diff --git a/youtube_dl/extractor/noz.py b/youtube_dl/extractor/noz.py

new file mode 100644 (file)

index 0000000..c47a33d
--- /dev/null
+++ b/youtube_dl/extractor/noz.py
@@ -0,0 +1,89 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_xpath,
+)
+from ..utils import (
+    int_or_none,
+    find_xpath_attr,
+    xpath_text,
+    update_url_query,
+)
+
+
+class NozIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?noz\.de/video/(?P<id>[0-9]+)/'
+    _TESTS = [{
+        'url': 'http://www.noz.de/video/25151/32-Deutschland-gewinnt-Badminton-Lnderspiel-in-Melle',
+        'info_dict': {
+            'id': '25151',
+            'ext': 'mp4',
+            'duration': 215,
+            'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle',
+            'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.',
+            'thumbnail': 're:^http://.*\.jpg',
+        },
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        description = self._og_search_description(webpage)
+
+        edge_url = self._html_search_regex(
+            r'<script\s+(?:type="text/javascript"\s+)?src="(.*?/videojs_.*?)"',
+            webpage, 'edge URL')
+        edge_content = self._download_webpage(edge_url, 'meta configuration')
+
+        config_url_encoded = self._search_regex(
+            r'so\.addVariable\("config_url","[^,]*,(.*?)"',
+            edge_content, 'config URL'
+        )
+        config_url = compat_urllib_parse_unquote(config_url_encoded)
+
+        doc = self._download_xml(config_url, 'video configuration')
+        title = xpath_text(doc, './/title')
+        thumbnail = xpath_text(doc, './/article/thumbnail/url')
+        duration = int_or_none(xpath_text(
+            doc, './/article/movie/file/duration'))
+        formats = []
+        for qnode in doc.findall(compat_xpath('.//article/movie/file/qualities/qual')):
+            http_url_ele = find_xpath_attr(
+                qnode, './html_urls/video_url', 'format', 'video/mp4')
+            http_url = http_url_ele.text if http_url_ele is not None else None
+            if http_url:
+                formats.append({
+                    'url': http_url,
+                    'format_name': xpath_text(qnode, './name'),
+                    'format_id': '%s-%s' % ('http', xpath_text(qnode, './id')),
+                    'height': int_or_none(xpath_text(qnode, './height')),
+                    'width': int_or_none(xpath_text(qnode, './width')),
+                    'tbr': int_or_none(xpath_text(qnode, './bitrate'), scale=1000),
+                })
+            else:
+                f4m_url = xpath_text(qnode, 'url_hd2')
+                if f4m_url:
+                    formats.extend(self._extract_f4m_formats(
+                        update_url_query(f4m_url, {'hdcore': '3.4.0'}),
+                        video_id, f4m_id='hds', fatal=False))
+                m3u8_url_ele = find_xpath_attr(
+                    qnode, './html_urls/video_url',
+                    'format', 'application/vnd.apple.mpegurl')
+                m3u8_url = m3u8_url_ele.text if m3u8_url_ele is not None else None
+                if m3u8_url:
+                    formats.extend(self._extract_m3u8_formats(
+                        m3u8_url, video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': title,
+            'duration': duration,
+            'description': description,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/npo.py b/youtube_dl/extractor/npo.py

index 0c2d02c108ed425cf4688c34aa2a826ddaa400b4..87f5675c7ff8b14169291420feb9bcf85edf894d 100644 (file)
--- a/youtube_dl/extractor/npo.py
+++ b/youtube_dl/extractor/npo.py
@@ -189,7 +189,7 @@ class NPOIE(NPOBaseIE):
                  if not video_url:
                      continue
                  if format_id == 'adaptive':
-                    formats.extend(self._extract_m3u8_formats(video_url, video_id))
+                    formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4'))
                  else:
                      formats.append({
                          'url': video_url,
@@ -406,7 +406,40 @@ class NPORadioFragmentIE(InfoExtractor):
          }
  
  
+class SchoolTVIE(InfoExtractor):
+    IE_NAME = 'schooltv'
+    _VALID_URL = r'https?://(?:www\.)?schooltv\.nl/video/(?P<id>[^/?#&]+)'
+
+    _TEST = {
+        'url': 'http://www.schooltv.nl/video/ademhaling-de-hele-dag-haal-je-adem-maar-wat-gebeurt-er-dan-eigenlijk-in-je-lichaam/',
+        'info_dict': {
+            'id': 'WO_NTR_429477',
+            'display_id': 'ademhaling-de-hele-dag-haal-je-adem-maar-wat-gebeurt-er-dan-eigenlijk-in-je-lichaam',
+            'title': 'Ademhaling: De hele dag haal je adem. Maar wat gebeurt er dan eigenlijk in je lichaam?',
+            'ext': 'mp4',
+            'description': 'md5:abfa0ff690adb73fd0297fd033aaa631'
+        },
+        'params': {
+            # Skip because of m3u8 download
+            'skip_download': True
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._search_regex(
+            r'data-mid=(["\'])(?P<id>.+?)\1', webpage, 'video_id', group='id')
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'NPO',
+            'url': 'npo:%s' % video_id,
+            'display_id': display_id
+        }
+
+
  class VPROIE(NPOIE):
+    IE_NAME = 'vpro'
      _VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
  
      _TESTS = [
diff --git a/youtube_dl/extractor/npr.py b/youtube_dl/extractor/npr.py

new file mode 100644 (file)

index 0000000..1777aa1
--- /dev/null
+++ b/youtube_dl/extractor/npr.py
@@ -0,0 +1,82 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlencode
+from ..utils import (
+    int_or_none,
+    qualities,
+)
+
+
+class NprIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?npr\.org/player/v2/mediaPlayer\.html\?.*\bid=(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.npr.org/player/v2/mediaPlayer.html?id=449974205',
+        'info_dict': {
+            'id': '449974205',
+            'title': 'New Music From Beach House, Chairlift, CMJ Discoveries And More'
+        },
+        'playlist_count': 7,
+    }, {
+        'url': 'http://www.npr.org/player/v2/mediaPlayer.html?action=1&t=1&islist=false&id=446928052&m=446929930&live=1',
+        'info_dict': {
+            'id': '446928052',
+            'title': "Songs We Love: Tigran Hamasyan, 'Your Mercy is Boundless'"
+        },
+        'playlist': [{
+            'md5': '12fa60cb2d3ed932f53609d4aeceabf1',
+            'info_dict': {
+                'id': '446929930',
+                'ext': 'mp3',
+                'title': 'Your Mercy is Boundless (Bazum en Qo gtutyunqd)',
+                'duration': 402,
+            },
+        }],
+    }]
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        config = self._download_json(
+            'http://api.npr.org/query?%s' % compat_urllib_parse_urlencode({
+                'id': playlist_id,
+                'fields': 'titles,audio,show',
+                'format': 'json',
+                'apiKey': 'MDAzMzQ2MjAyMDEyMzk4MTU1MDg3ZmM3MQ010',
+            }), playlist_id)
+
+        story = config['list']['story'][0]
+
+        KNOWN_FORMATS = ('threegp', 'mp4', 'mp3')
+        quality = qualities(KNOWN_FORMATS)
+
+        entries = []
+        for audio in story.get('audio', []):
+            title = audio.get('title', {}).get('$text')
+            duration = int_or_none(audio.get('duration', {}).get('$text'))
+            formats = []
+            for format_id, formats_entry in audio.get('format', {}).items():
+                if not formats_entry:
+                    continue
+                if isinstance(formats_entry, list):
+                    formats_entry = formats_entry[0]
+                format_url = formats_entry.get('$text')
+                if not format_url:
+                    continue
+                if format_id in KNOWN_FORMATS:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': format_id,
+                        'ext': formats_entry.get('type'),
+                        'quality': quality(format_id),
+                    })
+            self._sort_formats(formats)
+            entries.append({
+                'id': audio['id'],
+                'title': title,
+                'duration': duration,
+                'formats': formats,
+            })
+
+        playlist_title = story.get('title', {}).get('$text')
+        return self.playlist_result(entries, playlist_id, playlist_title)
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index d066a96db137ee3fb2c36f712803622e73b4aa40..9df20082224f84099657d2c2415cb9b2e66df8b6 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -4,7 +4,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urlparse,
+    compat_urllib_parse_unquote,
+)
  from ..utils import (
+    determine_ext,
      ExtractorError,
      float_or_none,
      parse_duration,
@@ -47,12 +52,23 @@ class NRKIE(InfoExtractor):
              'http://v8.psapi.nrk.no/mediaelement/%s' % video_id,
              video_id, 'Downloading media JSON')
  
-        if data['usageRights']['isGeoBlocked']:
-            raise ExtractorError(
-                'NRK har ikke rettig-heter til å vise dette programmet utenfor Norge',
-                expected=True)
+        media_url = data.get('mediaUrl')
+
+        if not media_url:
+            if data['usageRights']['isGeoBlocked']:
+                raise ExtractorError(
+                    'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
+                    expected=True)
  
-        video_url = data['mediaUrl'] + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81'
+        if determine_ext(media_url) == 'f4m':
+            formats = self._extract_f4m_formats(
+                media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
+            self._sort_formats(formats)
+        else:
+            formats = [{
+                'url': media_url,
+                'ext': 'flv',
+            }]
  
          duration = parse_duration(data.get('duration'))
  
@@ -66,17 +82,16 @@ class NRKIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'url': video_url,
-            'ext': 'flv',
              'title': data['title'],
              'description': data['description'],
              'duration': duration,
              'thumbnail': thumbnail,
+            'formats': formats,
          }
  
  
  class NRKPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
  
      _TESTS = [{
          'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
@@ -115,6 +130,37 @@ class NRKPlaylistIE(InfoExtractor):
              entries, playlist_id, playlist_title, playlist_description)
  
  
+class NRKSkoleIE(InfoExtractor):
+    IE_DESC = 'NRK Skole'
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/klippdetalj?.*\btopic=(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://nrk.no/skole/klippdetalj?topic=nrk:klipp/616532',
+        'md5': '04cd85877cc1913bce73c5d28a47e00f',
+        'info_dict': {
+            'id': '6021',
+            'ext': 'flv',
+            'title': 'Genetikk og eneggede tvillinger',
+            'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d',
+            'duration': 399,
+        },
+    }, {
+        'url': 'http://www.nrk.no/skole/klippdetalj?topic=nrk%3Aklipp%2F616532#embed',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.nrk.no/skole/klippdetalj?topic=urn:x-mediadb:21379',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = compat_urllib_parse_unquote(self._match_id(url))
+
+        webpage = self._download_webpage(url, video_id)
+
+        nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
+        return self.url_result('nrk:%s' % nrk_id)
+
+
  class NRKTVIE(InfoExtractor):
      IE_DESC = 'NRK TV and NRK Radio'
      _VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
@@ -122,26 +168,32 @@ class NRKTVIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
-            'md5': 'adf2c5454fa2bf032f47a9f8fb351342',
              'info_dict': {
                  'id': 'MUHH48000314',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': '20 spørsmål',
                  'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
                  'upload_date': '20140523',
                  'duration': 1741.52,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
          },
          {
              'url': 'https://tv.nrk.no/program/mdfp15000514',
-            'md5': '383650ece2b25ecec996ad7b5bb2a384',
              'info_dict': {
                  'id': 'mdfp15000514',
-                'ext': 'flv',
-                'title': 'Kunnskapskanalen: Grunnlovsjubiléet - Stor ståhei for ingenting',
+                'ext': 'mp4',
+                'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
                  'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
                  'upload_date': '20140524',
-                'duration': 4605.0,
+                'duration': 4605.08,
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
              },
          },
          {
@@ -196,20 +248,6 @@ class NRKTVIE(InfoExtractor):
          }
      ]
  
-    def _debug_print(self, txt):
-        if self._downloader.params.get('verbose', False):
-            self.to_screen('[debug] %s' % txt)
-
-    def _get_subtitles(self, subtitlesurl, video_id, baseurl):
-        url = "%s%s" % (baseurl, subtitlesurl)
-        self._debug_print('%s: Subtitle url: %s' % (video_id, url))
-        captions = self._download_xml(
-            url, video_id, 'Downloading subtitles')
-        lang = captions.get('lang', 'no')
-        return {lang: [
-            {'ext': 'ttml', 'url': url},
-        ]}
-
      def _extract_f4m(self, manifest_url, video_id):
          return self._extract_f4m_formats(
              manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
@@ -218,7 +256,7 @@ class NRKTVIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          part_id = mobj.group('part_id')
-        baseurl = mobj.group('baseurl')
+        base_url = mobj.group('baseurl')
  
          webpage = self._download_webpage(url, video_id)
  
@@ -278,11 +316,14 @@ class NRKTVIE(InfoExtractor):
          self._sort_formats(formats)
  
          subtitles_url = self._html_search_regex(
-            r'data-subtitlesurl[ ]*=[ ]*"([^"]+)"',
-            webpage, 'subtitle URL', default=None)
-        subtitles = None
+            r'data-subtitlesurl\s*=\s*(["\'])(?P<url>.+?)\1',
+            webpage, 'subtitle URL', default=None, group='url')
+        subtitles = {}
          if subtitles_url:
-            subtitles = self.extract_subtitles(subtitles_url, video_id, baseurl)
+            subtitles['no'] = [{
+                'ext': 'ttml',
+                'url': compat_urlparse.urljoin(base_url, subtitles_url),
+            }]
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/ntvde.py b/youtube_dl/extractor/ntvde.py

index d2cfe096192f6a44fd0f9dad2ece1474a0d845ab..a83e85cb8109ef44468851355f2b522e22fc5831 100644 (file)
--- a/youtube_dl/extractor/ntvde.py
+++ b/youtube_dl/extractor/ntvde.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      int_or_none,
      js_to_json,
@@ -34,7 +35,7 @@ class NTVDeIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          info = self._parse_json(self._search_regex(
-            r'(?s)ntv.pageInfo.article =\s(\{.*?\});', webpage, 'info'),
+            r'(?s)ntv\.pageInfo\.article\s*=\s*(\{.*?\});', webpage, 'info'),
              video_id, transform_source=js_to_json)
          timestamp = int_or_none(info.get('publishedDateAsUnixTimeStamp'))
          vdata = self._parse_json(self._search_regex(
@@ -42,18 +43,24 @@ class NTVDeIE(InfoExtractor):
              webpage, 'player data'),
              video_id, transform_source=js_to_json)
          duration = parse_duration(vdata.get('duration'))
-        formats = [{
-            'format_id': 'flash',
-            'url': 'rtmp://fms.n-tv.de/' + vdata['video'],
-        }, {
-            'format_id': 'mobile',
-            'url': 'http://video.n-tv.de' + vdata['videoMp4'],
-            'tbr': 400,  # estimation
-        }]
-        m3u8_url = 'http://video.n-tv.de' + vdata['videoM3u8']
-        formats.extend(self._extract_m3u8_formats(
-            m3u8_url, video_id, ext='mp4',
-            entry_protocol='m3u8_native', preference=0))
+
+        formats = []
+        if vdata.get('video'):
+            formats.append({
+                'format_id': 'flash',
+                'url': 'rtmp://fms.n-tv.de/%s' % vdata['video'],
+            })
+        if vdata.get('videoMp4'):
+            formats.append({
+                'format_id': 'mobile',
+                'url': compat_urlparse.urljoin('http://video.n-tv.de', vdata['videoMp4']),
+                'tbr': 400,  # estimation
+            })
+        if vdata.get('videoM3u8'):
+            m3u8_url = compat_urlparse.urljoin('http://video.n-tv.de', vdata['videoM3u8'])
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+                preference=0, m3u8_id='hls', fatal=False))
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py

index 2cd924d059dafd9aa3734697c9c4a396b2bb01f6..0895d7ea4cb88f805605a55cb0c1fe56ff1d475d 100644 (file)
--- a/youtube_dl/extractor/ntvru.py
+++ b/youtube_dl/extractor/ntvru.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  class NTVRuIE(InfoExtractor):
      IE_NAME = 'ntv.ru'
-    _VALID_URL = r'http://(?:www\.)?ntv\.ru/(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?ntv\.ru/(?P<id>.+)'
  
      _TESTS = [
          {
diff --git a/youtube_dl/extractor/nuevo.py b/youtube_dl/extractor/nuevo.py

new file mode 100644 (file)

index 0000000..ef093de
--- /dev/null
+++ b/youtube_dl/extractor/nuevo.py
@@ -0,0 +1,38 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+from ..utils import (
+    float_or_none,
+    xpath_text
+)
+
+
+class NuevoBaseIE(InfoExtractor):
+    def _extract_nuevo(self, config_url, video_id):
+        config = self._download_xml(
+            config_url, video_id, transform_source=lambda s: s.strip())
+
+        title = xpath_text(config, './title', 'title', fatal=True).strip()
+        video_id = xpath_text(config, './mediaid', default=video_id)
+        thumbnail = xpath_text(config, ['./image', './thumb'])
+        duration = float_or_none(xpath_text(config, './duration'))
+
+        formats = []
+        for element_name, format_id in (('file', 'sd'), ('filehd', 'hd')):
+            video_url = xpath_text(config, element_name)
+            if video_url:
+                formats.append({
+                    'url': video_url,
+                    'format_id': format_id,
+                })
+        self._check_formats(formats, video_id)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/nuvid.py b/youtube_dl/extractor/nuvid.py

index 57928f2aedcc0acfa5ba71d6e9f0a62af9d67b71..9fa7cefadc79ef1d8bda971dc52483a0b8d998eb 100644 (file)
--- a/youtube_dl/extractor/nuvid.py
+++ b/youtube_dl/extractor/nuvid.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_duration,
+    sanitized_Request,
      unified_strdate,
  )
  
@@ -33,7 +31,7 @@ class NuvidIE(InfoExtractor):
          formats = []
  
          for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]:
-            request = compat_urllib_request.Request(
+            request = sanitized_Request(
                  'http://m.nuvid.com/play/%s' % video_id)
              request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed)
              webpage = self._download_webpage(
diff --git a/youtube_dl/extractor/nytimes.py b/youtube_dl/extractor/nytimes.py

index 7f254b867da66f70a79ff7aac5d81eb6f37bd997..681683e86f54e796f1c954de2c0cb374016fe303 100644 (file)
--- a/youtube_dl/extractor/nytimes.py
+++ b/youtube_dl/extractor/nytimes.py
@@ -18,8 +18,9 @@ class NYTimesBaseIE(InfoExtractor):
          description = video_data.get('summary')
          duration = float_or_none(video_data.get('duration'), 1000)
  
-        uploader = video_data['byline']
-        timestamp = parse_iso8601(video_data['publication_date'][:-8])
+        uploader = video_data.get('byline')
+        publication_date = video_data.get('publication_date')
+        timestamp = parse_iso8601(publication_date[:-8]) if publication_date else None
  
          def get_file_size(file_size):
              if isinstance(file_size, int):
@@ -37,7 +38,7 @@ class NYTimesBaseIE(InfoExtractor):
                  'width': int_or_none(video.get('width')),
                  'height': int_or_none(video.get('height')),
                  'filesize': get_file_size(video.get('fileSize')),
-            } for video in video_data['renditions']
+            } for video in video_data['renditions'] if video.get('url')
          ]
          self._sort_formats(formats)
  
@@ -46,7 +47,7 @@ class NYTimesBaseIE(InfoExtractor):
                  'url': 'http://www.nytimes.com/%s' % image['url'],
                  'width': int_or_none(image.get('width')),
                  'height': int_or_none(image.get('height')),
-            } for image in video_data['images']
+            } for image in video_data.get('images', []) if image.get('url')
          ]
  
          return {
diff --git a/youtube_dl/extractor/odnoklassniki.py b/youtube_dl/extractor/odnoklassniki.py

index 215ffe87b55db126300f0c18c98d9c5bfd920ed7..f9e064a60e445668200b759ca4e0ad1a6f7c28ab 100644 (file)
--- a/youtube_dl/extractor/odnoklassniki.py
+++ b/youtube_dl/extractor/odnoklassniki.py
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..compat import compat_urllib_parse_unquote
  from ..utils import (
+    ExtractorError,
      unified_strdate,
      int_or_none,
      qualities,
@@ -12,20 +13,23 @@ from ..utils import (
  
  
  class OdnoklassnikiIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:odnoklassniki|ok)\.ru/(?:video|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
+    _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
      _TESTS = [{
          # metadata in JSON
          'url': 'http://ok.ru/video/20079905452',
-        'md5': '8e24ad2da6f387948e7a7d44eb8668fe',
+        'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
          'info_dict': {
              'id': '20079905452',
              'ext': 'mp4',
              'title': 'Культура меняет нас (прекрасный ролик!))',
              'duration': 100,
+            'upload_date': '20141207',
              'uploader_id': '330537914540',
              'uploader': 'Виталий Добровольский',
              'like_count': int,
+            'age_limit': 0,
          },
+        'skip': 'Video has been blocked',
      }, {
          # metadataUrl
          'url': 'http://ok.ru/video/63567059965189-0',
@@ -35,13 +39,42 @@ class OdnoklassnikiIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Девушка без комплексов ...',
              'duration': 191,
+            'upload_date': '20150518',
              'uploader_id': '534380003155',
-            'uploader': 'Андрей Мещанинов',
+            'uploader': '☭ Андрей Мещанинов ☭',
              'like_count': int,
+            'age_limit': 0,
+        },
+    }, {
+        # YouTube embed (metadataUrl, provider == USER_YOUTUBE)
+        'url': 'http://ok.ru/video/64211978996595-1',
+        'md5': '5d7475d428845cd2e13bae6f1a992278',
+        'info_dict': {
+            'id': '64211978996595-1',
+            'ext': 'mp4',
+            'title': 'Космическая среда от 26 августа 2015',
+            'description': 'md5:848eb8b85e5e3471a3a803dae1343ed0',
+            'duration': 440,
+            'upload_date': '20150826',
+            'uploader_id': '750099571',
+            'uploader': 'Алина П',
+            'age_limit': 0,
          },
      }, {
          'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452',
          'only_matching': True,
+    }, {
+        'url': 'http://www.ok.ru/video/20648036891',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.ok.ru/videoembed/20648036891',
+        'only_matching': True,
+    }, {
+        'url': 'http://m.ok.ru/video/20079905452',
+        'only_matching': True,
+    }, {
+        'url': 'http://mobile.ok.ru/video/20079905452',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -50,9 +83,16 @@ class OdnoklassnikiIE(InfoExtractor):
          webpage = self._download_webpage(
              'http://ok.ru/video/%s' % video_id, video_id)
  
+        error = self._search_regex(
+            r'[^>]+class="vp_video_stub_txt"[^>]*>([^<]+)<',
+            webpage, 'error', default=None)
+        if error:
+            raise ExtractorError(error, expected=True)
+
          player = self._parse_json(
              unescapeHTML(self._search_regex(
-                r'data-attributes="([^"]+)"', webpage, 'player')),
+                r'data-options=(?P<quote>["\'])(?P<player>{.+?%s.+?})(?P=quote)' % video_id,
+                webpage, 'player', group='player')),
              video_id)
  
          flashvars = player['flashvars']
@@ -85,16 +125,7 @@ class OdnoklassnikiIE(InfoExtractor):
  
          like_count = int_or_none(metadata.get('likeCount'))
  
-        quality = qualities(('mobile', 'lowest', 'low', 'sd', 'hd'))
-
-        formats = [{
-            'url': f['url'],
-            'ext': 'mp4',
-            'format_id': f['name'],
-            'quality': quality(f['name']),
-        } for f in metadata['videos']]
-
-        return {
+        info = {
              'id': video_id,
              'title': title,
              'thumbnail': thumbnail,
@@ -104,5 +135,24 @@ class OdnoklassnikiIE(InfoExtractor):
              'uploader_id': uploader_id,
              'like_count': like_count,
              'age_limit': age_limit,
-            'formats': formats,
          }
+
+        if metadata.get('provider') == 'USER_YOUTUBE':
+            info.update({
+                '_type': 'url_transparent',
+                'url': movie['contentId'],
+            })
+            return info
+
+        quality = qualities(('mobile', 'lowest', 'low', 'sd', 'hd'))
+
+        formats = [{
+            'url': f['url'],
+            'ext': 'mp4',
+            'format_id': f['name'],
+            'quality': quality(f['name']),
+        } for f in metadata['videos']]
+        self._sort_formats(formats)
+
+        info['formats'] = formats
+        return info
diff --git a/youtube_dl/extractor/once.py b/youtube_dl/extractor/once.py

new file mode 100644 (file)

index 0000000..1bf96ea
--- /dev/null
+++ b/youtube_dl/extractor/once.py
@@ -0,0 +1,42 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class OnceIE(InfoExtractor):
+    _VALID_URL = r'https?://.+?\.unicornmedia\.com/now/[^/]+/[^/]+/(?P<domain_id>[^/]+)/(?P<application_id>[^/]+)/(?:[^/]+/)?(?P<media_item_id>[^/]+)/content\.(?:once|m3u8|mp4)'
+    ADAPTIVE_URL_TEMPLATE = 'http://once.unicornmedia.com/now/master/playlist/%s/%s/%s/content.m3u8'
+    PROGRESSIVE_URL_TEMPLATE = 'http://once.unicornmedia.com/now/media/progressive/%s/%s/%s/%s/content.mp4'
+
+    def _extract_once_formats(self, url):
+        domain_id, application_id, media_item_id = re.match(
+            OnceIE._VALID_URL, url).groups()
+        formats = self._extract_m3u8_formats(
+            self.ADAPTIVE_URL_TEMPLATE % (
+                domain_id, application_id, media_item_id),
+            media_item_id, 'mp4', m3u8_id='hls', fatal=False)
+        progressive_formats = []
+        for adaptive_format in formats:
+            # Prevent advertisement from embedding into m3u8 playlist (see
+            # https://github.com/rg3/youtube-dl/issues/8893#issuecomment-199912684)
+            adaptive_format['url'] = re.sub(
+                r'\badsegmentlength=\d+', r'adsegmentlength=0', adaptive_format['url'])
+            rendition_id = self._search_regex(
+                r'/now/media/playlist/[^/]+/[^/]+/([^/]+)',
+                adaptive_format['url'], 'redition id', default=None)
+            if rendition_id:
+                progressive_format = adaptive_format.copy()
+                progressive_format.update({
+                    'url': self.PROGRESSIVE_URL_TEMPLATE % (
+                        domain_id, application_id, rendition_id, media_item_id),
+                    'format_id': adaptive_format['format_id'].replace(
+                        'hls', 'http'),
+                    'protocol': 'http',
+                })
+                progressive_formats.append(progressive_format)
+        self._check_formats(progressive_formats, media_item_id)
+        formats.extend(progressive_formats)
+        return formats
diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py

index 0f1f448fe3126670932b371498b73b0bf0a7924e..d7b13a0f1fbc53deba74e47f6d1eefc0bdee8be2 100644 (file)
--- a/youtube_dl/extractor/onionstudios.py
+++ b/youtube_dl/extractor/onionstudios.py
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import determine_ext
+from ..utils import (
+    determine_ext,
+    int_or_none,
+)
  
  
  class OnionStudiosIE(InfoExtractor):
@@ -17,7 +20,7 @@ class OnionStudiosIE(InfoExtractor):
              'id': '2937',
              'ext': 'mp4',
              'title': 'Hannibal charges forward, stops for a cocktail',
-            'description': 'md5:545299bda6abf87e5ec666548c6a9448',
+            'description': 'md5:e786add7f280b7f0fe237b64cc73df76',
              'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'The A.V. Club',
              'uploader_id': 'TheAVClub',
@@ -42,9 +45,19 @@ class OnionStudiosIE(InfoExtractor):
  
          formats = []
          for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
-            if determine_ext(src) != 'm3u8':  # m3u8 always results in 403
+            ext = determine_ext(src)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+            else:
+                height = int_or_none(self._search_regex(
+                    r'/(\d+)\.%s' % ext, src, 'height', default=None))
                  formats.append({
+                    'format_id': ext + ('-%sp' % height if height else ''),
                      'url': src,
+                    'height': height,
+                    'ext': ext,
+                    'preference': 1,
                  })
          self._sort_formats(formats)
  
@@ -52,7 +65,7 @@ class OnionStudiosIE(InfoExtractor):
              r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
              webpage, 'title', group='title')
          description = self._search_regex(
-            r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
+            r'share_description\s*=\s*(["\'])(?P<description>[^\'"]+?)\1',
              webpage, 'description', default=None, group='description')
          thumbnail = self._search_regex(
              r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index a262a9f6d4ec232e78ee34f3ce38b03ea27d01e9..16f040191aa31bd9e8dd49b37a42085c2b340582 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -1,108 +1,85 @@
  from __future__ import unicode_literals
  import re
-import json
  import base64
  
  from .common import InfoExtractor
  from ..utils import (
-    unescapeHTML,
-    ExtractorError,
-    determine_ext,
      int_or_none,
+    float_or_none,
+    ExtractorError,
+    unsmuggle_url,
  )
+from ..compat import compat_urllib_parse_urlencode
  
  
  class OoyalaBaseIE(InfoExtractor):
-
-    def _extract_result(self, info, more_info):
-        embedCode = info['embedCode']
-        video_url = info.get('ipad_url') or info['url']
-
-        if determine_ext(video_url) == 'm3u8':
-            formats = self._extract_m3u8_formats(video_url, embedCode, ext='mp4')
-        else:
-            formats = [{
-                'url': video_url,
-                'ext': 'mp4',
-            }]
-
-        return {
-            'id': embedCode,
-            'title': unescapeHTML(info['title']),
-            'formats': formats,
-            'description': unescapeHTML(more_info['description']),
-            'thumbnail': more_info['promo'],
+    _PLAYER_BASE = 'http://player.ooyala.com/'
+    _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
+    _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v1/authorization/embed_code/%s/%s?'
+
+    def _extract(self, content_tree_url, video_id, domain='example.org'):
+        content_tree = self._download_json(content_tree_url, video_id)['content_tree']
+        metadata = content_tree[list(content_tree)[0]]
+        embed_code = metadata['embed_code']
+        pcode = metadata.get('asset_pcode') or embed_code
+        video_info = {
+            'id': embed_code,
+            'title': metadata['title'],
+            'description': metadata.get('description'),
+            'thumbnail': metadata.get('thumbnail_image') or metadata.get('promo_image'),
+            'duration': float_or_none(metadata.get('duration'), 1000),
          }
  
-    def _extract(self, player_url, video_id):
-        player = self._download_webpage(player_url, video_id)
-        mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
-                                        player, 'mobile player url')
-        # Looks like some videos are only available for particular devices
-        # (e.g. http://player.ooyala.com/player.js?embedCode=x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0
-        # is only available for ipad)
-        # Working around with fetching URLs for all the devices found starting with 'unknown'
-        # until we succeed or eventually fail for each device.
-        devices = re.findall(r'device\s*=\s*"([^"]+)";', player)
-        devices.remove('unknown')
-        devices.insert(0, 'unknown')
-        for device in devices:
-            mobile_player = self._download_webpage(
-                '%s&device=%s' % (mobile_url, device), video_id,
-                'Downloading mobile player JS for %s device' % device)
-            videos_info = self._search_regex(
-                r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
-                mobile_player, 'info', fatal=False, default=None)
-            if videos_info:
-                break
-
-        if not videos_info:
-            formats = []
+        urls = []
+        formats = []
+        for supported_format in ('mp4', 'm3u8', 'hds', 'rtmp'):
              auth_data = self._download_json(
-                'http://player.ooyala.com/sas/player_api/v1/authorization/embed_code/%s/%s?domain=www.example.org&supportedFormats=mp4,webm' % (video_id, video_id),
-                video_id)
-
-            cur_auth_data = auth_data['authorization_data'][video_id]
-
-            for stream in cur_auth_data['streams']:
-                formats.append({
-                    'url': base64.b64decode(stream['url']['data'].encode('ascii')).decode('utf-8'),
-                    'ext': stream.get('delivery_type'),
-                    'format': stream.get('video_codec'),
-                    'format_id': stream.get('profile'),
-                    'width': int_or_none(stream.get('width')),
-                    'height': int_or_none(stream.get('height')),
-                    'abr': int_or_none(stream.get('audio_bitrate')),
-                    'vbr': int_or_none(stream.get('video_bitrate')),
-                })
-            if formats:
-                return {
-                    'id': video_id,
-                    'formats': formats,
-                    'title': 'Ooyala video',
-                }
-
-            if not cur_auth_data['authorized']:
-                raise ExtractorError(cur_auth_data['message'], expected=True)
-
-        if not videos_info:
-            raise ExtractorError('Unable to extract info')
-        videos_info = videos_info.replace('\\"', '"')
-        videos_more_info = self._search_regex(
-            r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, 'more info').replace('\\"', '"')
-        videos_info = json.loads(videos_info)
-        videos_more_info = json.loads(videos_more_info)
-
-        if videos_more_info.get('lineup'):
-            videos = [self._extract_result(info, more_info) for (info, more_info) in zip(videos_info, videos_more_info['lineup'])]
-            return {
-                '_type': 'playlist',
-                'id': video_id,
-                'title': unescapeHTML(videos_more_info['title']),
-                'entries': videos,
-            }
-        else:
-            return self._extract_result(videos_info[0], videos_more_info)
+                self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
+                compat_urllib_parse_urlencode({
+                    'domain': domain,
+                    'supportedFormats': supported_format
+                }),
+                video_id, 'Downloading %s JSON' % supported_format)
+
+            cur_auth_data = auth_data['authorization_data'][embed_code]
+
+            if cur_auth_data['authorized']:
+                for stream in cur_auth_data['streams']:
+                    url = base64.b64decode(
+                        stream['url']['data'].encode('ascii')).decode('utf-8')
+                    if url in urls:
+                        continue
+                    urls.append(url)
+                    delivery_type = stream['delivery_type']
+                    if delivery_type == 'hls' or '.m3u8' in url:
+                        formats.extend(self._extract_m3u8_formats(
+                            url, embed_code, 'mp4', 'm3u8_native',
+                            m3u8_id='hls', fatal=False))
+                    elif delivery_type == 'hds' or '.f4m' in url:
+                        formats.extend(self._extract_f4m_formats(
+                            url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
+                    elif '.smil' in url:
+                        formats.extend(self._extract_smil_formats(
+                            url, embed_code, fatal=False))
+                    else:
+                        formats.append({
+                            'url': url,
+                            'ext': stream.get('delivery_type'),
+                            'vcodec': stream.get('video_codec'),
+                            'format_id': delivery_type,
+                            'width': int_or_none(stream.get('width')),
+                            'height': int_or_none(stream.get('height')),
+                            'abr': int_or_none(stream.get('audio_bitrate')),
+                            'vbr': int_or_none(stream.get('video_bitrate')),
+                            'fps': float_or_none(stream.get('framerate')),
+                        })
+            else:
+                raise ExtractorError('%s said: %s' % (
+                    self.IE_NAME, cur_auth_data['message']), expected=True)
+        self._sort_formats(formats)
+
+        video_info['formats'] = formats
+        return video_info
  
  
  class OoyalaIE(OoyalaBaseIE):
@@ -117,6 +94,7 @@ class OoyalaIE(OoyalaBaseIE):
                  'ext': 'mp4',
                  'title': 'Explaining Data Recovery from Hard Drives and SSDs',
                  'description': 'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
+                'duration': 853.386,
              },
          }, {
              # Only available for ipad
@@ -125,7 +103,7 @@ class OoyalaIE(OoyalaBaseIE):
                  'id': 'x1b3lqZDq9y_7kMyC2Op5qo-p077tXD0',
                  'ext': 'mp4',
                  'title': 'Simulation Overview - Levels of Simulation',
-                'description': '',
+                'duration': 194.948,
              },
          },
          {
@@ -136,7 +114,8 @@ class OoyalaIE(OoyalaBaseIE):
              'info_dict': {
                  'id': 'FiOG81ZTrvckcchQxmalf4aQj590qTEx',
                  'ext': 'mp4',
-                'title': 'Ooyala video',
+                'title': 'Divide Tool Path.mp4',
+                'duration': 204.405,
              }
          }
      ]
@@ -151,9 +130,11 @@ class OoyalaIE(OoyalaBaseIE):
                                ie=cls.ie_key())
  
      def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
          embed_code = self._match_id(url)
-        player_url = 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
-        return self._extract(player_url, embed_code)
+        domain = smuggled_data.get('domain')
+        content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
+        return self._extract(content_tree_url, embed_code, domain)
  
  
  class OoyalaExternalIE(OoyalaBaseIE):
@@ -170,7 +151,7 @@ class OoyalaExternalIE(OoyalaBaseIE):
                          .*?&pcode=
                      )
                      (?P<pcode>.+?)
-                    (&|$)
+                    (?:&|$)
                      '''
  
      _TEST = {
@@ -179,7 +160,7 @@ class OoyalaExternalIE(OoyalaBaseIE):
              'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
              'ext': 'mp4',
              'title': 'dm_140128_30for30Shorts___JudgingJewellv2',
-            'description': '',
+            'duration': 1302.0,
          },
          'params': {
              # m3u8 download
@@ -188,9 +169,6 @@ class OoyalaExternalIE(OoyalaBaseIE):
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        partner_id = mobj.group('partner_id')
-        video_id = mobj.group('id')
-        pcode = mobj.group('pcode')
-        player_url = 'http://player.ooyala.com/player.js?externalId=%s:%s&pcode=%s' % (partner_id, video_id, pcode)
-        return self._extract(player_url, video_id)
+        partner_id, video_id, pcode = re.match(self._VALID_URL, url).groups()
+        content_tree_url = self._CONTENT_TREE_BASE + 'external_id/%s/%s:%s' % (pcode, partner_id, video_id)
+        return self._extract(content_tree_url, video_id)
diff --git a/youtube_dl/extractor/openfilm.py b/youtube_dl/extractor/openfilm.py

deleted file mode 100644 (file)

index d2ceedd..0000000
--- a/youtube_dl/extractor/openfilm.py
+++ /dev/null
@@ -1,70 +0,0 @@
-from __future__ import unicode_literals
-
-import json
-
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote_plus
-from ..utils import (
-    parse_iso8601,
-    parse_age_limit,
-    int_or_none,
-)
-
-
-class OpenFilmIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)openfilm\.com/videos/(?P<id>.+)'
-    _TEST = {
-        'url': 'http://www.openfilm.com/videos/human-resources-remastered',
-        'md5': '42bcd88c2f3ec13b65edf0f8ad1cac37',
-        'info_dict': {
-            'id': '32736',
-            'display_id': 'human-resources-remastered',
-            'ext': 'mp4',
-            'title': 'Human Resources (Remastered)',
-            'description': 'Social Engineering in the 20th Century.',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'duration': 7164,
-            'timestamp': 1334756988,
-            'upload_date': '20120418',
-            'uploader_id': '41117',
-            'view_count': int,
-            'age_limit': 0,
-        },
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        player = compat_urllib_parse_unquote_plus(
-            self._og_search_video_url(webpage))
-
-        video = json.loads(self._search_regex(
-            r'\bp=({.+?})(?:&|$)', player, 'video JSON'))
-
-        video_url = '%s1.mp4' % video['location']
-        video_id = video.get('video_id')
-        display_id = video.get('alias') or display_id
-        title = video.get('title')
-        description = video.get('description')
-        thumbnail = video.get('main_thumb')
-        duration = int_or_none(video.get('duration'))
-        timestamp = parse_iso8601(video.get('dt_published'), ' ')
-        uploader_id = video.get('user_id')
-        view_count = int_or_none(video.get('views_count'))
-        age_limit = parse_age_limit(video.get('age_limit'))
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'url': video_url,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'timestamp': timestamp,
-            'uploader_id': uploader_id,
-            'view_count': view_count,
-            'age_limit': age_limit,
-        }
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

new file mode 100644 (file)

index 0000000..456561b
--- /dev/null
+++ b/youtube_dl/extractor/openload.py
@@ -0,0 +1,127 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_chr
+from ..utils import (
+    determine_ext,
+    encode_base_n,
+    ExtractorError,
+    mimetype2ext,
+)
+
+
+class OpenloadIE(InfoExtractor):
+    _VALID_URL = r'https://openload.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-]+)'
+
+    _TESTS = [{
+        'url': 'https://openload.co/f/kUEfGclsU9o',
+        'md5': 'bf1c059b004ebc7a256f89408e65c36e',
+        'info_dict': {
+            'id': 'kUEfGclsU9o',
+            'ext': 'mp4',
+            'title': 'skyrim_no-audio_1080.mp4',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+    }, {
+        'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
+        'only_matching': True,
+    }, {
+        'url': 'https://openload.io/f/ZAn6oz-VZGE/',
+        'only_matching': True,
+    }, {
+        # unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
+        # for title and ext
+        'url': 'https://openload.co/embed/Sxz5sADo82g/',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def openload_level2_debase(m):
+        radix, num = int(m.group(1)) + 27, int(m.group(2))
+        return '"' + encode_base_n(num, radix) + '"'
+
+    @classmethod
+    def openload_level2(cls, txt):
+        # The function name is ǃ \u01c3
+        # Using escaped unicode literals does not work in Python 3.2
+        return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
+
+    # Openload uses a variant of aadecode
+    # openload_decode and related functions are originally written by
+    # vitas@matfyz.cz and released with public domain
+    # See https://github.com/rg3/youtube-dl/issues/8489
+    @classmethod
+    def openload_decode(cls, txt):
+        symbol_table = [
+            ('_', '(ﾟДﾟ) [ﾟΘﾟ]'),
+            ('a', '(ﾟДﾟ) [ﾟωﾟﾉ]'),
+            ('b', '(ﾟДﾟ) [ﾟΘﾟﾉ]'),
+            ('c', '(ﾟДﾟ) [\'c\']'),
+            ('d', '(ﾟДﾟ) [ﾟｰﾟﾉ]'),
+            ('e', '(ﾟДﾟ) [ﾟДﾟﾉ]'),
+            ('f', '(ﾟДﾟ) [1]'),
+
+            ('o', '(ﾟДﾟ) [\'o\']'),
+            ('u', '(oﾟｰﾟo)'),
+            ('c', '(ﾟДﾟ) [\'c\']'),
+
+            ('7', '((ﾟｰﾟ) + (o^_^o))'),
+            ('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
+            ('5', '((ﾟｰﾟ) + (ﾟΘﾟ))'),
+            ('4', '(-~3)'),
+            ('3', '(-~-~1)'),
+            ('2', '(-~1)'),
+            ('1', '(-~0)'),
+            ('0', '((c^_^o)-(c^_^o))'),
+        ]
+        delim = '(ﾟДﾟ)[ﾟεﾟ]+'
+        ret = ''
+        for aachar in txt.split(delim):
+            for val, pat in symbol_table:
+                aachar = aachar.replace(pat, val)
+            aachar = aachar.replace('+ ', '')
+            m = re.match(r'^\d+', aachar)
+            if m:
+                ret += compat_chr(int(m.group(0), 8))
+            else:
+                m = re.match(r'^u([\da-f]+)', aachar)
+                if m:
+                    ret += compat_chr(int(m.group(1), 16))
+        return cls.openload_level2(ret)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        if 'File not found' in webpage:
+            raise ExtractorError('File not found', expected=True)
+
+        code = self._search_regex(
+            r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
+            webpage, 'JS code')
+
+        decoded = self.openload_decode(code)
+
+        video_url = self._search_regex(
+            r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
+
+        title = self._og_search_title(webpage, default=None) or self._search_regex(
+            r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
+            'title', default=None) or self._html_search_meta(
+            'description', webpage, 'title', fatal=True)
+
+        ext = mimetype2ext(self._search_regex(
+            r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
+            'mimetype', default=None, group='mimetype')) or determine_ext(
+            video_url, 'mp4')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'ext': ext,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
+            'url': video_url,
+        }
diff --git a/youtube_dl/extractor/ora.py b/youtube_dl/extractor/ora.py

new file mode 100644 (file)

index 0000000..8545fb1
--- /dev/null
+++ b/youtube_dl/extractor/ora.py
@@ -0,0 +1,72 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+    get_element_by_attribute,
+    qualities,
+    unescapeHTML,
+)
+
+
+class OraTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ora\.tv/([^/]+/)*(?P<id>[^/\?#]+)'
+    _TEST = {
+        'url': 'https://www.ora.tv/larrykingnow/2015/12/16/vine-youtube-stars-zach-king-king-bach-on-their-viral-videos-0_36jupg6090pq',
+        'md5': 'fa33717591c631ec93b04b0e330df786',
+        'info_dict': {
+            'id': '50178',
+            'ext': 'mp4',
+            'title': 'Vine & YouTube Stars Zach King & King Bach On Their Viral Videos!',
+            'description': 'md5:ebbc5b1424dd5dba7be7538148287ac1',
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        video_data = self._search_regex(
+            r'"(?:video|current)"\s*:\s*({[^}]+?})', webpage, 'current video')
+        m3u8_url = self._search_regex(
+            r'hls_stream"?\s*:\s*"([^"]+)', video_data, 'm3u8 url', None)
+        if m3u8_url:
+            formats = self._extract_m3u8_formats(
+                m3u8_url, display_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False)
+            # similar to GameSpotIE
+            m3u8_path = compat_urlparse.urlparse(m3u8_url).path
+            QUALITIES_RE = r'((,[a-z]+\d+)+,?)'
+            available_qualities = self._search_regex(
+                QUALITIES_RE, m3u8_path, 'qualities').strip(',').split(',')
+            http_path = m3u8_path[1:].split('/', 1)[1]
+            http_template = re.sub(QUALITIES_RE, r'%s', http_path)
+            http_template = http_template.replace('.csmil/master.m3u8', '')
+            http_template = compat_urlparse.urljoin(
+                'http://videocdn-pmd.ora.tv/', http_template)
+            preference = qualities(
+                ['mobile400', 'basic400', 'basic600', 'sd900', 'sd1200', 'sd1500', 'hd720', 'hd1080'])
+            for q in available_qualities:
+                formats.append({
+                    'url': http_template % q,
+                    'format_id': q,
+                    'preference': preference(q),
+                })
+            self._sort_formats(formats)
+        else:
+            return self.url_result(self._search_regex(
+                r'"youtube_id"\s*:\s*"([^"]+)', webpage, 'youtube id'), 'Youtube')
+
+        return {
+            'id': self._search_regex(
+                r'"id"\s*:\s*(\d+)', video_data, 'video id', default=display_id),
+            'display_id': display_id,
+            'title': unescapeHTML(self._og_search_title(webpage)),
+            'description': get_element_by_attribute(
+                'class', 'video_txt_decription', webpage),
+            'thumbnail': self._proto_relative_url(self._search_regex(
+                r'"thumb"\s*:\s*"([^"]+)', video_data, 'thumbnail', None)),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index 2e6c9872b5d251be4eb3c61addab113aad4d2416..66c75f8b3559752127c091d437e4764b7c722e9d 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -112,6 +112,7 @@ class ORFTVthekIE(InfoExtractor):
                              % geo_str),
                          fatal=False)
  
+            self._check_formats(formats, video_id)
              self._sort_formats(formats)
  
              upload_date = unified_strdate(sd['created_date'])
@@ -136,7 +137,7 @@ class ORFTVthekIE(InfoExtractor):
  class ORFOE1IE(InfoExtractor):
      IE_NAME = 'orf:oe1'
      IE_DESC = 'Radio Österreich 1'
-    _VALID_URL = r'http://oe1\.orf\.at/(?:programm/|konsole.*?#\?track_id=)(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://oe1\.orf\.at/(?:programm/|konsole.*?#\?track_id=)(?P<id>[0-9]+)'
  
      # Audios on ORF radio are only available for 7 days, so we can't add tests.
      _TEST = {
@@ -170,7 +171,21 @@ class ORFOE1IE(InfoExtractor):
  class ORFFM4IE(InfoExtractor):
      IE_NAME = 'orf:fm4'
      IE_DESC = 'radio FM4'
-    _VALID_URL = r'http://fm4\.orf\.at/7tage/?#(?P<date>[0-9]+)/(?P<show>\w+)'
+    _VALID_URL = r'https?://fm4\.orf\.at/(?:7tage/?#|player/)(?P<date>[0-9]+)/(?P<show>\w+)'
+
+    _TEST = {
+        'url': 'http://fm4.orf.at/player/20160110/IS/',
+        'md5': '01e736e8f1cef7e13246e880a59ad298',
+        'info_dict': {
+            'id': '2016-01-10_2100_tl_54_7DaysSun13_11244',
+            'ext': 'mp3',
+            'title': 'Im Sumpf',
+            'description': 'md5:384c543f866c4e422a55f66a62d669cd',
+            'duration': 7173,
+            'timestamp': 1452456073,
+            'upload_date': '20160110',
+        },
+    }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -207,7 +222,7 @@ class ORFFM4IE(InfoExtractor):
  class ORFIPTVIE(InfoExtractor):
      IE_NAME = 'orf:iptv'
      IE_DESC = 'iptv.ORF.at'
-    _VALID_URL = r'http://iptv\.orf\.at/(?:#/)?stories/(?P<id>\d+)'
+    _VALID_URL = r'https?://iptv\.orf\.at/(?:#/)?stories/(?P<id>\d+)'
  
      _TEST = {
          'url': 'http://iptv.orf.at/stories/2275236/',
diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py

new file mode 100644 (file)

index 0000000..8d49f5c
--- /dev/null
+++ b/youtube_dl/extractor/pandoratv.py
@@ -0,0 +1,78 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
+from ..utils import (
+    ExtractorError,
+    float_or_none,
+    parse_duration,
+    str_to_int,
+)
+
+
+class PandoraTVIE(InfoExtractor):
+    IE_NAME = 'pandora.tv'
+    IE_DESC = '판도라TV'
+    _VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
+    _TEST = {
+        'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
+        'info_dict': {
+            'id': '53294230',
+            'ext': 'flv',
+            'title': '頭を撫でてくれる？',
+            'description': '頭を撫でてくれる？',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 39,
+            'upload_date': '20151218',
+            'uploader': 'カワイイ動物まとめ',
+            'uploader_id': 'mikakim',
+            'view_count': int,
+            'like_count': int,
+        }
+    }
+
+    def _real_extract(self, url):
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+        video_id = qs.get('prgid', [None])[0]
+        user_id = qs.get('ch_userid', [None])[0]
+        if any(not f for f in (video_id, user_id,)):
+            raise ExtractorError('Invalid URL', expected=True)
+
+        data = self._download_json(
+            'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s'
+            % (user_id, video_id), video_id)
+
+        info = data['data']['rows']['vod_play_info']['result']
+
+        formats = []
+        for format_id, format_url in info.items():
+            if not format_url:
+                continue
+            height = self._search_regex(
+                r'^v(\d+)[Uu]rl$', format_id, 'height', default=None)
+            if not height:
+                continue
+            formats.append({
+                'format_id': '%sp' % height,
+                'url': format_url,
+                'height': int(height),
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': info['subject'],
+            'description': info.get('body'),
+            'thumbnail': info.get('thumbnail') or info.get('poster'),
+            'duration': float_or_none(info.get('runtime'), 1000) or parse_duration(info.get('time')),
+            'upload_date': info['fid'][:8] if isinstance(info.get('fid'), compat_str) else None,
+            'uploader': info.get('nickname'),
+            'uploader_id': info.get('upload_userid'),
+            'view_count': str_to_int(info.get('hit')),
+            'like_count': str_to_int(info.get('likecnt')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/patreon.py b/youtube_dl/extractor/patreon.py

index 6cdc2638b4930dc92835d71f673b560dea99022d..22975066516a0d37e74c9c520dd4faf0a68305a6 100644 (file)
--- a/youtube_dl/extractor/patreon.py
+++ b/youtube_dl/extractor/patreon.py
@@ -2,9 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    js_to_json,
-)
+from ..utils import js_to_json
  
  
  class PatreonIE(InfoExtractor):
@@ -65,9 +63,9 @@ class PatreonIE(InfoExtractor):
              'password': password,
          }
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://www.patreon.com/processLogin',
-            compat_urllib_parse.urlencode(login_form).encode('utf-8')
+            compat_urllib_parse_urlencode(login_form).encode('utf-8')
          )
          login_page = self._download_webpage(request, None, note='Logging in as %s' % username)
  
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index a53479aad762d2fbe8095867a625df996bbf1473..f43e3a146e7bd35d9a99ab730289f4a1d4f5b91c 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -4,26 +4,194 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
+    js_to_json,
+    strip_jsonp,
      unified_strdate,
      US_RATINGS,
  )
  
  
  class PBSIE(InfoExtractor):
+    _STATIONS = (
+        (r'(?:video|www|player)\.pbs\.org', 'PBS: Public Broadcasting Service'),  # http://www.pbs.org/
+        (r'video\.aptv\.org', 'APT - Alabama Public Television (WBIQ)'),  # http://aptv.org/
+        (r'video\.gpb\.org', 'GPB/Georgia Public Broadcasting (WGTV)'),  # http://www.gpb.org/
+        (r'video\.mpbonline\.org', 'Mississippi Public Broadcasting (WMPN)'),  # http://www.mpbonline.org
+        (r'video\.wnpt\.org', 'Nashville Public Television (WNPT)'),  # http://www.wnpt.org
+        (r'video\.wfsu\.org', 'WFSU-TV (WFSU)'),  # http://wfsu.org/
+        (r'video\.wsre\.org', 'WSRE (WSRE)'),  # http://www.wsre.org
+        (r'video\.wtcitv\.org', 'WTCI (WTCI)'),  # http://www.wtcitv.org
+        (r'video\.pba\.org', 'WPBA/Channel 30 (WPBA)'),  # http://pba.org/
+        (r'video\.alaskapublic\.org', 'Alaska Public Media (KAKM)'),  # http://alaskapublic.org/kakm
+        # (r'kuac\.org', 'KUAC (KUAC)'),  # http://kuac.org/kuac-tv/
+        # (r'ktoo\.org', '360 North (KTOO)'),  # http://www.ktoo.org/
+        # (r'azpm\.org', 'KUAT 6 (KUAT)'),  # http://www.azpm.org/
+        (r'video\.azpbs\.org', 'Arizona PBS (KAET)'),  # http://www.azpbs.org
+        (r'portal\.knme\.org', 'KNME-TV/Channel 5 (KNME)'),  # http://www.newmexicopbs.org/
+        (r'video\.vegaspbs\.org', 'Vegas PBS (KLVX)'),  # http://vegaspbs.org/
+        (r'watch\.aetn\.org', 'AETN/ARKANSAS ETV NETWORK (KETS)'),  # http://www.aetn.org/
+        (r'video\.ket\.org', 'KET (WKLE)'),  # http://www.ket.org/
+        (r'video\.wkno\.org', 'WKNO/Channel 10 (WKNO)'),  # http://www.wkno.org/
+        (r'video\.lpb\.org', 'LPB/LOUISIANA PUBLIC BROADCASTING (WLPB)'),  # http://www.lpb.org/
+        (r'videos\.oeta\.tv', 'OETA (KETA)'),  # http://www.oeta.tv
+        (r'video\.optv\.org', 'Ozarks Public Television (KOZK)'),  # http://www.optv.org/
+        (r'watch\.wsiu\.org', 'WSIU Public Broadcasting (WSIU)'),  # http://www.wsiu.org/
+        (r'video\.keet\.org', 'KEET TV (KEET)'),  # http://www.keet.org
+        (r'pbs\.kixe\.org', 'KIXE/Channel 9 (KIXE)'),  # http://kixe.org/
+        (r'video\.kpbs\.org', 'KPBS San Diego (KPBS)'),  # http://www.kpbs.org/
+        (r'video\.kqed\.org', 'KQED (KQED)'),  # http://www.kqed.org
+        (r'vids\.kvie\.org', 'KVIE Public Television (KVIE)'),  # http://www.kvie.org
+        (r'video\.pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'),  # http://www.pbssocal.org/
+        (r'video\.valleypbs\.org', 'ValleyPBS (KVPT)'),  # http://www.valleypbs.org/
+        (r'video\.cptv\.org', 'CONNECTICUT PUBLIC TELEVISION (WEDH)'),  # http://cptv.org
+        (r'watch\.knpb\.org', 'KNPB Channel 5 (KNPB)'),  # http://www.knpb.org/
+        (r'video\.soptv\.org', 'SOPTV (KSYS)'),  # http://www.soptv.org
+        # (r'klcs\.org', 'KLCS/Channel 58 (KLCS)'),  # http://www.klcs.org
+        # (r'krcb\.org', 'KRCB Television & Radio (KRCB)'),  # http://www.krcb.org
+        # (r'kvcr\.org', 'KVCR TV/DT/FM :: Vision for the Future (KVCR)'),  # http://kvcr.org
+        (r'video\.rmpbs\.org', 'Rocky Mountain PBS (KRMA)'),  # http://www.rmpbs.org
+        (r'video\.kenw\.org', 'KENW-TV3 (KENW)'),  # http://www.kenw.org
+        (r'video\.kued\.org', 'KUED Channel 7 (KUED)'),  # http://www.kued.org
+        (r'video\.wyomingpbs\.org', 'Wyoming PBS (KCWC)'),  # http://www.wyomingpbs.org
+        (r'video\.cpt12\.org', 'Colorado Public Television / KBDI 12 (KBDI)'),  # http://www.cpt12.org/
+        (r'video\.kbyueleven\.org', 'KBYU-TV (KBYU)'),  # http://www.kbyutv.org/
+        (r'video\.thirteen\.org', 'Thirteen/WNET New York (WNET)'),  # http://www.thirteen.org
+        (r'video\.wgbh\.org', 'WGBH/Channel 2 (WGBH)'),  # http://wgbh.org
+        (r'video\.wgby\.org', 'WGBY (WGBY)'),  # http://www.wgby.org
+        (r'watch\.njtvonline\.org', 'NJTV Public Media NJ (WNJT)'),  # http://www.njtvonline.org/
+        # (r'ripbs\.org', 'Rhode Island PBS (WSBE)'),  # http://www.ripbs.org/home/
+        (r'watch\.wliw\.org', 'WLIW21 (WLIW)'),  # http://www.wliw.org/
+        (r'video\.mpt\.tv', 'mpt/Maryland Public Television (WMPB)'),  # http://www.mpt.org
+        (r'watch\.weta\.org', 'WETA Television and Radio (WETA)'),  # http://www.weta.org
+        (r'video\.whyy\.org', 'WHYY (WHYY)'),  # http://www.whyy.org
+        (r'video\.wlvt\.org', 'PBS 39 (WLVT)'),  # http://www.wlvt.org/
+        (r'video\.wvpt\.net', 'WVPT - Your Source for PBS and More! (WVPT)'),  # http://www.wvpt.net
+        (r'video\.whut\.org', 'Howard University Television (WHUT)'),  # http://www.whut.org
+        (r'video\.wedu\.org', 'WEDU PBS (WEDU)'),  # http://www.wedu.org
+        (r'video\.wgcu\.org', 'WGCU Public Media (WGCU)'),  # http://www.wgcu.org/
+        # (r'wjct\.org', 'WJCT Public Broadcasting (WJCT)'),  # http://www.wjct.org
+        (r'video\.wpbt2\.org', 'WPBT2 (WPBT)'),  # http://www.wpbt2.org
+        (r'video\.wucftv\.org', 'WUCF TV (WUCF)'),  # http://wucftv.org
+        (r'video\.wuft\.org', 'WUFT/Channel 5 (WUFT)'),  # http://www.wuft.org
+        (r'watch\.wxel\.org', 'WXEL/Channel 42 (WXEL)'),  # http://www.wxel.org/home/
+        (r'video\.wlrn\.org', 'WLRN/Channel 17 (WLRN)'),  # http://www.wlrn.org/
+        (r'video\.wusf\.usf\.edu', 'WUSF Public Broadcasting (WUSF)'),  # http://wusf.org/
+        (r'video\.scetv\.org', 'ETV (WRLK)'),  # http://www.scetv.org
+        (r'video\.unctv\.org', 'UNC-TV (WUNC)'),  # http://www.unctv.org/
+        # (r'pbsguam\.org', 'PBS Guam (KGTF)'),  # http://www.pbsguam.org/
+        (r'video\.pbshawaii\.org', 'PBS Hawaii - Oceanic Cable Channel 10 (KHET)'),  # http://www.pbshawaii.org/
+        (r'video\.idahoptv\.org', 'Idaho Public Television (KAID)'),  # http://idahoptv.org
+        (r'video\.ksps\.org', 'KSPS (KSPS)'),  # http://www.ksps.org/home/
+        (r'watch\.opb\.org', 'OPB (KOPB)'),  # http://www.opb.org
+        (r'watch\.nwptv\.org', 'KWSU/Channel 10 & KTNW/Channel 31 (KWSU)'),  # http://www.kwsu.org
+        (r'video\.will\.illinois\.edu', 'WILL-TV (WILL)'),  # http://will.illinois.edu/
+        (r'video\.networkknowledge\.tv', 'Network Knowledge - WSEC/Springfield (WSEC)'),  # http://www.wsec.tv
+        (r'video\.wttw\.com', 'WTTW11 (WTTW)'),  # http://www.wttw.com/
+        # (r'wtvp\.org', 'WTVP & WTVP.org, Public Media for Central Illinois (WTVP)'),  # http://www.wtvp.org/
+        (r'video\.iptv\.org', 'Iowa Public Television/IPTV (KDIN)'),  # http://www.iptv.org/
+        (r'video\.ninenet\.org', 'Nine Network (KETC)'),  # http://www.ninenet.org
+        (r'video\.wfwa\.org', 'PBS39 Fort Wayne (WFWA)'),  # http://wfwa.org/
+        (r'video\.wfyi\.org', 'WFYI Indianapolis (WFYI)'),  # http://www.wfyi.org
+        (r'video\.mptv\.org', 'Milwaukee Public Television (WMVS)'),  # http://www.mptv.org
+        (r'video\.wnin\.org', 'WNIN (WNIN)'),  # http://www.wnin.org/
+        (r'video\.wnit\.org', 'WNIT Public Television (WNIT)'),  # http://www.wnit.org/
+        (r'video\.wpt\.org', 'WPT (WPNE)'),  # http://www.wpt.org/
+        (r'video\.wvut\.org', 'WVUT/Channel 22 (WVUT)'),  # http://wvut.org/
+        (r'video\.weiu\.net', 'WEIU/Channel 51 (WEIU)'),  # http://www.weiu.net
+        (r'video\.wqpt\.org', 'WQPT-TV (WQPT)'),  # http://www.wqpt.org
+        (r'video\.wycc\.org', 'WYCC PBS Chicago (WYCC)'),  # http://www.wycc.org
+        # (r'lakeshorepublicmedia\.org', 'Lakeshore Public Television (WYIN)'),  # http://lakeshorepublicmedia.org/
+        (r'video\.wipb\.org', 'WIPB-TV (WIPB)'),  # http://wipb.org
+        (r'video\.indianapublicmedia\.org', 'WTIU (WTIU)'),  # http://indianapublicmedia.org/tv/
+        (r'watch\.cetconnect\.org', 'CET  (WCET)'),  # http://www.cetconnect.org
+        (r'video\.thinktv\.org', 'ThinkTVNetwork (WPTD)'),  # http://www.thinktv.org
+        (r'video\.wbgu\.org', 'WBGU-TV (WBGU)'),  # http://wbgu.org
+        (r'video\.wgvu\.org', 'WGVU TV (WGVU)'),  # http://www.wgvu.org/
+        (r'video\.netnebraska\.org', 'NET1 (KUON)'),  # http://netnebraska.org
+        (r'video\.pioneer\.org', 'Pioneer Public Television (KWCM)'),  # http://www.pioneer.org
+        (r'watch\.sdpb\.org', 'SDPB Television (KUSD)'),  # http://www.sdpb.org
+        (r'video\.tpt\.org', 'TPT (KTCA)'),  # http://www.tpt.org
+        (r'watch\.ksmq\.org', 'KSMQ (KSMQ)'),  # http://www.ksmq.org/
+        (r'watch\.kpts\.org', 'KPTS/Channel 8 (KPTS)'),  # http://www.kpts.org/
+        (r'watch\.ktwu\.org', 'KTWU/Channel 11 (KTWU)'),  # http://ktwu.org
+        # (r'shptv\.org', 'Smoky Hills Public Television (KOOD)'),  # http://www.shptv.org
+        # (r'kcpt\.org', 'KCPT Kansas City Public Television (KCPT)'),  # http://kcpt.org/
+        # (r'blueridgepbs\.org', 'Blue Ridge PBS (WBRA)'),  # http://www.blueridgepbs.org/
+        (r'watch\.easttennesseepbs\.org', 'East Tennessee PBS (WSJK)'),  # http://easttennesseepbs.org
+        (r'video\.wcte\.tv', 'WCTE-TV (WCTE)'),  # http://www.wcte.org
+        (r'video\.wljt\.org', 'WLJT, Channel 11 (WLJT)'),  # http://wljt.org/
+        (r'video\.wosu\.org', 'WOSU TV (WOSU)'),  # http://wosu.org/
+        (r'video\.woub\.org', 'WOUB/WOUC (WOUB)'),  # http://woub.org/tv/index.php?section=5
+        (r'video\.wvpublic\.org', 'WVPB (WVPB)'),  # http://wvpublic.org/
+        (r'video\.wkyupbs\.org', 'WKYU-PBS (WKYU)'),  # http://www.wkyupbs.org
+        # (r'wyes\.org', 'WYES-TV/New Orleans (WYES)'),  # http://www.wyes.org
+        (r'video\.kera\.org', 'KERA 13 (KERA)'),  # http://www.kera.org/
+        (r'video\.mpbn\.net', 'MPBN (WCBB)'),  # http://www.mpbn.net/
+        (r'video\.mountainlake\.org', 'Mountain Lake PBS (WCFE)'),  # http://www.mountainlake.org/
+        (r'video\.nhptv\.org', 'NHPTV (WENH)'),  # http://nhptv.org/
+        (r'video\.vpt\.org', 'Vermont PBS (WETK)'),  # http://www.vpt.org
+        (r'video\.witf\.org', 'witf (WITF)'),  # http://www.witf.org
+        (r'watch\.wqed\.org', 'WQED Multimedia (WQED)'),  # http://www.wqed.org/
+        (r'video\.wmht\.org', 'WMHT Educational Telecommunications (WMHT)'),  # http://www.wmht.org/home/
+        (r'video\.deltabroadcasting\.org', 'Q-TV (WDCQ)'),  # http://www.deltabroadcasting.org
+        (r'video\.dptv\.org', 'WTVS Detroit Public TV (WTVS)'),  # http://www.dptv.org/
+        (r'video\.wcmu\.org', 'CMU Public Television (WCMU)'),  # http://www.wcmu.org
+        (r'video\.wkar\.org', 'WKAR-TV (WKAR)'),  # http://wkar.org/
+        (r'wnmuvideo\.nmu\.edu', 'WNMU-TV Public TV 13 (WNMU)'),  # http://wnmutv.nmu.edu
+        (r'video\.wdse\.org', 'WDSE - WRPT (WDSE)'),  # http://www.wdse.org/
+        (r'video\.wgte\.org', 'WGTE TV (WGTE)'),  # http://www.wgte.org
+        (r'video\.lptv\.org', 'Lakeland Public Television (KAWE)'),  # http://www.lakelandptv.org
+        # (r'prairiepublic\.org', 'PRAIRIE PUBLIC (KFME)'),  # http://www.prairiepublic.org/
+        (r'video\.kmos\.org', 'KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS)'),  # http://www.kmos.org/
+        (r'watch\.montanapbs\.org', 'MontanaPBS (KUSM)'),  # http://montanapbs.org
+        (r'video\.krwg\.org', 'KRWG/Channel 22 (KRWG)'),  # http://www.krwg.org
+        (r'video\.kacvtv\.org', 'KACV (KACV)'),  # http://www.panhandlepbs.org/home/
+        (r'video\.kcostv\.org', 'KCOS/Channel 13 (KCOS)'),  # www.kcostv.org
+        (r'video\.wcny\.org', 'WCNY/Channel 24 (WCNY)'),  # http://www.wcny.org
+        (r'video\.wned\.org', 'WNED (WNED)'),  # http://www.wned.org/
+        (r'watch\.wpbstv\.org', 'WPBS (WPBS)'),  # http://www.wpbstv.org
+        (r'video\.wskg\.org', 'WSKG Public TV (WSKG)'),  # http://wskg.org
+        (r'video\.wxxi\.org', 'WXXI (WXXI)'),  # http://wxxi.org
+        (r'video\.wpsu\.org', 'WPSU (WPSU)'),  # http://www.wpsu.org
+        # (r'wqln\.org', 'WQLN/Channel 54 (WQLN)'),  # http://www.wqln.org
+        (r'on-demand\.wvia\.org', 'WVIA Public Media Studios (WVIA)'),  # http://www.wvia.org/
+        (r'video\.wtvi\.org', 'WTVI (WTVI)'),  # http://www.wtvi.org/
+        # (r'whro\.org', 'WHRO (WHRO)'),  # http://whro.org
+        (r'video\.westernreservepublicmedia\.org', 'Western Reserve PBS (WNEO)'),  # http://www.WesternReservePublicMedia.org/
+        (r'video\.ideastream\.org', 'WVIZ/PBS ideastream (WVIZ)'),  # http://www.wviz.org/
+        (r'video\.kcts9\.org', 'KCTS 9 (KCTS)'),  # http://kcts9.org/
+        (r'video\.basinpbs\.org', 'Basin PBS (KPBT)'),  # http://www.basinpbs.org
+        (r'video\.houstonpbs\.org', 'KUHT / Channel 8 (KUHT)'),  # http://www.houstonpublicmedia.org/
+        # (r'tamu\.edu', 'KAMU - TV (KAMU)'),  # http://KAMU.tamu.edu
+        # (r'kedt\.org', 'KEDT/Channel 16 (KEDT)'),  # http://www.kedt.org
+        (r'video\.klrn\.org', 'KLRN (KLRN)'),  # http://www.klrn.org
+        (r'video\.klru\.tv', 'KLRU (KLRU)'),  # http://www.klru.org
+        # (r'kmbh\.org', 'KMBH-TV (KMBH)'),  # http://www.kmbh.org
+        # (r'knct\.org', 'KNCT (KNCT)'),  # http://www.knct.org
+        # (r'ktxt\.org', 'KTTZ-TV (KTXT)'),  # http://www.ktxt.org
+        (r'video\.wtjx\.org', 'WTJX Channel 12 (WTJX)'),  # http://www.wtjx.org/
+        (r'video\.ideastations\.org', 'WCVE PBS (WCVE)'),  # http://ideastations.org/
+        (r'video\.kbtc\.org', 'KBTC Public Television (KBTC)'),  # http://kbtc.org
+    )
+
+    IE_NAME = 'pbs'
+    IE_DESC = 'Public Broadcasting Service (PBS) and member stations: %s' % ', '.join(list(zip(*_STATIONS))[1])
+
      _VALID_URL = r'''(?x)https?://
          (?:
             # Direct video URL
-           video\.pbs\.org/(?:viralplayer|video)/(?P<id>[0-9]+)/? |
+           (?:%s)/(?:viralplayer|video)/(?P<id>[0-9]+)/? |
             # Article with embedded player (or direct video)
             (?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
             # Player
-           video\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
+           (?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
          )
-    '''
+    ''' % '|'.join(list(zip(*_STATIONS))[0])
  
      _TESTS = [
          {
@@ -33,7 +201,7 @@ class PBSIE(InfoExtractor):
                  'id': '2365006249',
                  'ext': 'mp4',
                  'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
-                'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
+                'description': 'md5:36f341ae62e251b8f5bd2b754b95a071',
                  'duration': 3190,
              },
              'params': {
@@ -47,7 +215,7 @@ class PBSIE(InfoExtractor):
                  'id': '2365297690',
                  'ext': 'mp4',
                  'title': 'FRONTLINE - Losing Iraq',
-                'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
+                'description': 'md5:4d3eaa01f94e61b3e73704735f1196d9',
                  'duration': 5050,
              },
              'params': {
@@ -61,7 +229,7 @@ class PBSIE(InfoExtractor):
                  'id': '2201174722',
                  'ext': 'mp4',
                  'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
-                'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
+                'description': 'md5:95a19f568689d09a166dff9edada3301',
                  'duration': 801,
              },
          },
@@ -71,8 +239,8 @@ class PBSIE(InfoExtractor):
              'info_dict': {
                  'id': '2365297708',
                  'ext': 'mp4',
-                'description': 'md5:68d87ef760660eb564455eb30ca464fe',
                  'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
+                'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
                  'duration': 6559,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
@@ -92,6 +260,7 @@ class PBSIE(InfoExtractor):
                  'duration': 3172,
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'upload_date': '20140122',
+                'age_limit': 10,
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
@@ -107,12 +276,27 @@ class PBSIE(InfoExtractor):
          {
              'url': 'http://www.pbs.org/wgbh/americanexperience/films/death/player/',
              'info_dict': {
-                'id': '2280706814',
+                'id': '2276541483',
                  'display_id': 'player',
                  'ext': 'mp4',
-                'title': 'American Experience - Death and the Civil War',
-                'description': 'American Experience, TV’s most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
-                'duration': 6705,
+                'title': 'American Experience - Death and the Civil War, Chapter 1',
+                'description': 'md5:1b80a74e0380ed2a4fb335026de1600d',
+                'duration': 682,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            },
+            'params': {
+                'skip_download': True,  # requires ffmpeg
+            },
+        },
+        {
+            'url': 'http://www.pbs.org/video/2365245528/',
+            'info_dict': {
+                'id': '2365245528',
+                'display_id': '2365245528',
+                'ext': 'mp4',
+                'title': 'FRONTLINE - United States of Secrets (Part One)',
+                'description': 'md5:55756bd5c551519cc4b7703e373e217e',
+                'duration': 6851,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
              'params': {
@@ -120,21 +304,69 @@ class PBSIE(InfoExtractor):
              },
          },
          {
-            'url': 'http://video.pbs.org/video/2365367186/',
+            # Video embedded in iframe containing angle brackets as attribute's value (e.g.
+            # "<iframe style='position: absolute;<br />\ntop: 0; left: 0;' ...", see
+            # https://github.com/rg3/youtube-dl/issues/7059)
+            'url': 'http://www.pbs.org/food/features/a-chefs-life-season-3-episode-5-prickly-business/',
              'info_dict': {
-                'id': '2365367186',
-                'display_id': '2365367186',
+                'id': '2365546844',
+                'display_id': 'a-chefs-life-season-3-episode-5-prickly-business',
                  'ext': 'mp4',
-                'title': 'To Catch A Comet - Full Episode',
-                'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
-                'duration': 3342,
+                'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
+                'description': 'md5:54033c6baa1f9623607c6e2ed245888b',
+                'duration': 1480,
                  'thumbnail': 're:^https?://.*\.jpg$',
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
              },
+        },
+        {
+            # Frontline video embedded via flp2012.js
+            'url': 'http://www.pbs.org/wgbh/pages/frontline/the-atomic-artists',
+            'info_dict': {
+                'id': '2070868960',
+                'display_id': 'the-atomic-artists',
+                'ext': 'mp4',
+                'title': 'FRONTLINE - The Atomic Artists',
+                'description': 'md5:1a2481e86b32b2e12ec1905dd473e2c1',
+                'duration': 723,
+                'thumbnail': 're:^https?://.*\.jpg$',
+            },
+            'params': {
+                'skip_download': True,  # requires ffmpeg
+            },
+        },
+        {
+            # Serves hd only via wigget/partnerplayer page
+            'url': 'http://www.pbs.org/video/2365641075/',
+            'info_dict': {
+                'id': '2365641075',
+                'ext': 'mp4',
+                'title': 'FRONTLINE - Netanyahu at War',
+                'duration': 6852,
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'formats': 'mincount:8',
+            },
+            'params': {
+                'skip_download': True,  # requires ffmpeg
+            },
+        },
+        {
+            'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://watch.knpb.org/video/2365616055/',
+            'only_matching': True,
          }
      ]
+    _ERRORS = {
+        101: 'We\'re sorry, but this video is not yet available.',
+        403: 'We\'re sorry, but this video is not available in your region due to right restrictions.',
+        404: 'We are experiencing technical difficulties that are preventing us from playing the video at this time. Please check back again soon.',
+        410: 'This video has expired and is no longer available for online streaming.',
+    }
  
      def _extract_webpage(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -149,14 +381,19 @@ class PBSIE(InfoExtractor):
                  webpage, 'upload date', default=None))
  
              # tabbed frontline videos
-            tabbed_videos = re.findall(
-                r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"', webpage)
-            if tabbed_videos:
-                return tabbed_videos, presumptive_id, upload_date
+            MULTI_PART_REGEXES = (
+                r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"',
+                r'<a[^>]+href=["\']#video-\d+["\'][^>]+data-coveid=["\'](\d+)',
+            )
+            for p in MULTI_PART_REGEXES:
+                tabbed_videos = re.findall(p, webpage)
+                if tabbed_videos:
+                    return tabbed_videos, presumptive_id, upload_date
  
              MEDIA_ID_REGEXES = [
                  r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'",  # frontline video embed
                  r'class="coveplayerid">([^<]+)<',                       # coveplayer
+                r'<section[^>]+data-coveid="(\d+)"',                    # coveplayer from http://www.pbs.org/wgbh/frontline/film/real-csi/
                  r'<input type="hidden" id="pbs_video_id_[0-9]+" value="([0-9]+)"/>',  # jwplayer
              ]
  
@@ -165,9 +402,30 @@ class PBSIE(InfoExtractor):
              if media_id:
                  return media_id, presumptive_id, upload_date
  
-            url = self._search_regex(
-                r'<iframe\s+[^>]*\s+src=["\']([^\'"]+partnerplayer[^\'"]+)["\']',
-                webpage, 'player URL')
+            # Fronline video embedded via flp
+            video_id = self._search_regex(
+                r'videoid\s*:\s*"([\d+a-z]{7,})"', webpage, 'videoid', default=None)
+            if video_id:
+                # pkg_id calculation is reverse engineered from
+                # http://www.pbs.org/wgbh/pages/frontline/js/flp2012.js
+                prg_id = self._search_regex(
+                    r'videoid\s*:\s*"([\d+a-z]{7,})"', webpage, 'videoid')[7:]
+                if 'q' in prg_id:
+                    prg_id = prg_id.split('q')[1]
+                prg_id = int(prg_id, 16)
+                getdir = self._download_json(
+                    'http://www.pbs.org/wgbh/pages/frontline/.json/getdir/getdir%d.json' % prg_id,
+                    presumptive_id, 'Downloading getdir JSON',
+                    transform_source=strip_jsonp)
+                return getdir['mid'], presumptive_id, upload_date
+
+            for iframe in re.findall(r'(?s)<iframe(.+?)></iframe>', webpage):
+                url = self._search_regex(
+                    r'src=(["\'])(?P<url>.+?partnerplayer.+?)\1', iframe,
+                    'player URL', default=None, group='url')
+                if url:
+                    break
+
              mobj = re.match(self._VALID_URL, url)
  
          player_id = mobj.group('player_id')
@@ -194,31 +452,61 @@ class PBSIE(InfoExtractor):
                  for vid_id in video_id]
              return self.playlist_result(entries, display_id)
  
-        info = self._download_json(
-            'http://video.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
-            display_id)
+        info = None
+        redirects = []
+        redirect_urls = set()
+
+        def extract_redirect_urls(info):
+            for encoding_name in ('recommended_encoding', 'alternate_encoding'):
+                redirect = info.get(encoding_name)
+                if not redirect:
+                    continue
+                redirect_url = redirect.get('url')
+                if redirect_url and redirect_url not in redirect_urls:
+                    redirects.append(redirect)
+                    redirect_urls.add(redirect_url)
+
+        try:
+            video_info = self._download_json(
+                'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
+                display_id, 'Downloading video info JSON')
+            extract_redirect_urls(video_info)
+            info = video_info
+        except ExtractorError as e:
+            # videoInfo API may not work for some videos
+            if not isinstance(e.cause, compat_HTTPError) or e.cause.code != 404:
+                raise
+
+        # Player pages may also serve different qualities
+        for page in ('widget/partnerplayer', 'portalplayer'):
+            player = self._download_webpage(
+                'http://player.pbs.org/%s/%s' % (page, video_id),
+                display_id, 'Downloading %s page' % page, fatal=False)
+            if player:
+                video_info = self._parse_json(
+                    self._search_regex(
+                        r'(?s)PBS\.videoData\s*=\s*({.+?});\n',
+                        player, '%s video data' % page, default='{}'),
+                    display_id, transform_source=js_to_json, fatal=False)
+                if video_info:
+                    extract_redirect_urls(video_info)
+                    if not info:
+                        info = video_info
  
          formats = []
-        for encoding_name in ('recommended_encoding', 'alternate_encoding'):
-            redirect = info.get(encoding_name)
-            if not redirect:
-                continue
-            redirect_url = redirect.get('url')
-            if not redirect_url:
-                continue
+        for num, redirect in enumerate(redirects):
+            redirect_id = redirect.get('eeid')
  
              redirect_info = self._download_json(
-                redirect_url + '?format=json', display_id,
-                'Downloading %s video url info' % encoding_name)
+                '%s?format=json' % redirect['url'], display_id,
+                'Downloading %s video url info' % (redirect_id or num))
  
              if redirect_info['status'] == 'error':
-                if redirect_info['http_code'] == 403:
-                    message = (
-                        'The video is not available in your region due to '
-                        'right restrictions')
-                else:
-                    message = redirect_info['message']
-                raise ExtractorError(message, expected=True)
+                raise ExtractorError(
+                    '%s said: %s' % (
+                        self.IE_NAME,
+                        self._ERRORS.get(redirect_info['http_code'], redirect_info['message'])),
+                    expected=True)
  
              format_url = redirect_info.get('url')
              if not format_url:
@@ -230,8 +518,9 @@ class PBSIE(InfoExtractor):
              else:
                  formats.append({
                      'url': format_url,
-                    'format_id': redirect.get('eeid'),
+                    'format_id': redirect_id,
                  })
+        self._remove_duplicate_formats(formats)
          self._sort_formats(formats)
  
          rating_str = info.get('rating')
@@ -257,7 +546,7 @@ class PBSIE(InfoExtractor):
              'id': video_id,
              'display_id': display_id,
              'title': info['title'],
-            'description': info['program'].get('description'),
+            'description': info.get('description') or info.get('program', {}).get('description'),
              'thumbnail': info.get('image_url'),
              'duration': int_or_none(info.get('duration')),
              'age_limit': age_limit,
diff --git a/youtube_dl/extractor/people.py b/youtube_dl/extractor/people.py

new file mode 100644 (file)

index 0000000..9ecdbc1
--- /dev/null
+++ b/youtube_dl/extractor/people.py
@@ -0,0 +1,32 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class PeopleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?people\.com/people/videos/0,,(?P<id>\d+),00\.html'
+
+    _TEST = {
+        'url': 'http://www.people.com/people/videos/0,,20995451,00.html',
+        'info_dict': {
+            'id': 'ref:20995451',
+            'ext': 'mp4',
+            'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”',
+            'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 246.318,
+            'timestamp': 1458720585,
+            'upload_date': '20160323',
+            'uploader_id': '416418724',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+
+    def _real_extract(self, url):
+        return self.url_result(
+            'http://players.brightcove.net/416418724/default_default/index.html?videoId=ref:%s'
+            % self._match_id(url), 'BrightcoveNew')
diff --git a/youtube_dl/extractor/periscope.py b/youtube_dl/extractor/periscope.py

new file mode 100644 (file)

index 0000000..514e9b4
--- /dev/null
+++ b/youtube_dl/extractor/periscope.py
@@ -0,0 +1,81 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import parse_iso8601
+
+
+class PeriscopeIE(InfoExtractor):
+    IE_DESC = 'Periscope'
+    _VALID_URL = r'https?://(?:www\.)?periscope\.tv/[^/]+/(?P<id>[^/?#]+)'
+    # Alive example URLs can be found here http://onperiscope.com/
+    _TESTS = [{
+        'url': 'https://www.periscope.tv/w/aJUQnjY3MjA3ODF8NTYxMDIyMDl2zCg2pECBgwTqRpQuQD352EMPTKQjT4uqlM3cgWFA-g==',
+        'md5': '65b57957972e503fcbbaeed8f4fa04ca',
+        'info_dict': {
+            'id': '56102209',
+            'ext': 'mp4',
+            'title': 'Bec Boop - 🚠✈️🇬🇧 Fly above #London in Emirates Air Line cable car at night 🇬🇧✈️🚠 #BoopScope 🎀💗',
+            'timestamp': 1438978559,
+            'upload_date': '20150807',
+            'uploader': 'Bec Boop',
+            'uploader_id': '1465763',
+        },
+        'skip': 'Expires in 24 hours',
+    }, {
+        'url': 'https://www.periscope.tv/w/1ZkKzPbMVggJv',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.periscope.tv/bastaakanoggano/1OdKrlkZZjOJX',
+        'only_matching': True,
+    }]
+
+    def _call_api(self, method, value):
+        return self._download_json(
+            'https://api.periscope.tv/api/v2/%s?broadcast_id=%s' % (method, value), value)
+
+    def _real_extract(self, url):
+        token = self._match_id(url)
+
+        broadcast_data = self._call_api('getBroadcastPublic', token)
+        broadcast = broadcast_data['broadcast']
+        status = broadcast['status']
+
+        uploader = broadcast.get('user_display_name') or broadcast_data.get('user', {}).get('display_name')
+        uploader_id = broadcast.get('user_id') or broadcast_data.get('user', {}).get('id')
+
+        title = '%s - %s' % (uploader, status) if uploader else status
+        state = broadcast.get('state').lower()
+        if state == 'running':
+            title = self._live_title(title)
+        timestamp = parse_iso8601(broadcast.get('created_at'))
+
+        thumbnails = [{
+            'url': broadcast[image],
+        } for image in ('image_url', 'image_url_small') if broadcast.get(image)]
+
+        stream = self._call_api('getAccessPublic', token)
+
+        formats = []
+        for format_id in ('replay', 'rtmp', 'hls', 'https_hls'):
+            video_url = stream.get(format_id + '_url')
+            if not video_url:
+                continue
+            f = {
+                'url': video_url,
+                'ext': 'flv' if format_id == 'rtmp' else 'mp4',
+            }
+            if format_id != 'rtmp':
+                f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
+            formats.append(f)
+        self._sort_formats(formats)
+
+        return {
+            'id': broadcast.get('id') or token,
+            'title': title,
+            'timestamp': timestamp,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/philharmoniedeparis.py b/youtube_dl/extractor/philharmoniedeparis.py

index 6e60e5fe98920c64d310c3610194d98b01790ceb..f1008ae514f78f6c843e399031135afb00f5f23f 100644 (file)
--- a/youtube_dl/extractor/philharmoniedeparis.py
+++ b/youtube_dl/extractor/philharmoniedeparis.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  class PhilharmonieDeParisIE(InfoExtractor):
      IE_DESC = 'Philharmonie de Paris'
-    _VALID_URL = r'http://live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)(?P<id>\d+)'
+    _VALID_URL = r'https?://live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://live.philharmoniedeparis.fr/concert/1032066.html',
          'info_dict': {
diff --git a/youtube_dl/extractor/phoenix.py b/youtube_dl/extractor/phoenix.py

index 46cebc0d7b05080491d5f1d32ee8a709b549debc..ac009f60f7785ea4efaaa7b0c867c10a998e877e 100644 (file)
--- a/youtube_dl/extractor/phoenix.py
+++ b/youtube_dl/extractor/phoenix.py
@@ -1,10 +1,10 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from .zdf import extract_from_xml_url
+from .zdf import ZDFIE
  
  
-class PhoenixIE(InfoExtractor):
+class PhoenixIE(ZDFIE):
+    IE_NAME = 'phoenix.de'
      _VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
          (?:
              phoenix/die_sendungen/(?:[^/]+/)?
@@ -41,5 +41,5 @@ class PhoenixIE(InfoExtractor):
              r'<div class="phx_vod" id="phx_vod_([0-9]+)"',
              webpage, 'internal video ID')
  
-        api_url = 'http://www.phoenix.de/php/zdfplayer-v1.3/data/beitragsDetails.php?ak=web&id=%s' % internal_id
-        return extract_from_xml_url(self, video_id, api_url)
+        api_url = 'http://www.phoenix.de/php/mediaplayer/data/beitrags_details.php?ak=web&id=%s' % internal_id
+        return self.extract_from_xml_url(video_id, api_url)
diff --git a/youtube_dl/extractor/photobucket.py b/youtube_dl/extractor/photobucket.py

index 788411ccc18082f59588d40704900c26dba1fe21..6c8bbe1d95c3c4972baa4c956ad1a62ef6518e2d 100644 (file)
--- a/youtube_dl/extractor/photobucket.py
+++ b/youtube_dl/extractor/photobucket.py
@@ -8,7 +8,7 @@ from ..compat import compat_urllib_parse_unquote
  
  
  class PhotobucketIE(InfoExtractor):
-    _VALID_URL = r'http://(?:[a-z0-9]+\.)?photobucket\.com/.*(([\?\&]current=)|_)(?P<id>.*)\.(?P<ext>(flv)|(mp4))'
+    _VALID_URL = r'https?://(?:[a-z0-9]+\.)?photobucket\.com/.*(([\?\&]current=)|_)(?P<id>.*)\.(?P<ext>(flv)|(mp4))'
      _TEST = {
          'url': 'http://media.photobucket.com/user/rachaneronas/media/TiredofLinkBuildingTryBacklinkMyDomaincom_zpsc0c3b9fa.mp4.html?filters[term]=search&filters[primary]=videos&filters[secondary]=images&sort=1&o=0',
          'md5': '7dabfb92b0a31f6c16cebc0f8e60ff99',
diff --git a/youtube_dl/extractor/pladform.py b/youtube_dl/extractor/pladform.py

index 551c8c9f0fef4566afd5691628b2c216c157fd0c..bc559d1df289fca39b96f5cfc5519bf6acb8bbb3 100644 (file)
--- a/youtube_dl/extractor/pladform.py
+++ b/youtube_dl/extractor/pladform.py
@@ -1,6 +1,8 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
@@ -44,6 +46,13 @@ class PladformIE(InfoExtractor):
          'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src="(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)"', webpage)
+        if mobj:
+            return mobj.group('url')
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
diff --git a/youtube_dl/extractor/planetaplay.py b/youtube_dl/extractor/planetaplay.py

deleted file mode 100644 (file)

index 06505e9..0000000
--- a/youtube_dl/extractor/planetaplay.py
+++ /dev/null
@@ -1,61 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-
-
-class PlanetaPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?planetaplay\.com/\?sng=(?P<id>[0-9]+)'
-    _API_URL = 'http://planetaplay.com/action/playlist/?sng={0:}'
-    _THUMBNAIL_URL = 'http://planetaplay.com/img/thumb/{thumb:}'
-    _TEST = {
-        'url': 'http://planetaplay.com/?sng=3586',
-        'md5': '9d569dceb7251a4e01355d5aea60f9db',
-        'info_dict': {
-            'id': '3586',
-            'ext': 'flv',
-            'title': 'md5:e829428ee28b1deed00de90de49d1da1',
-        },
-        'skip': 'Not accessible from Travis CI server',
-    }
-
-    _SONG_FORMATS = {
-        'lq': (0, 'http://www.planetaplay.com/videoplayback/{med_hash:}'),
-        'hq': (1, 'http://www.planetaplay.com/videoplayback/hi/{med_hash:}'),
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        response = self._download_json(
-            self._API_URL.format(video_id), video_id)['response']
-        try:
-            data = response.get('data')[0]
-        except IndexError:
-            raise ExtractorError(
-                '%s: failed to get the playlist' % self.IE_NAME, expected=True)
-
-        title = '{song_artists:} - {sng_name:}'.format(**data)
-        thumbnail = self._THUMBNAIL_URL.format(**data)
-
-        formats = []
-        for format_id, (quality, url_template) in self._SONG_FORMATS.items():
-            formats.append({
-                'format_id': format_id,
-                'url': url_template.format(**data),
-                'quality': quality,
-                'ext': 'flv',
-            })
-
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': thumbnail,
-        }
diff --git a/youtube_dl/extractor/played.py b/youtube_dl/extractor/played.py

index 8a1c296dda8b57611a0e464387be43ab0fc9a370..57c875ef05dd74a429ea55313b0d956ef7e5ec56 100644 (file)
--- a/youtube_dl/extractor/played.py
+++ b/youtube_dl/extractor/played.py
@@ -5,12 +5,10 @@ import re
  import os.path
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -42,11 +40,11 @@ class PlayedIE(InfoExtractor):
  
          self._sleep(2, video_id)
  
-        post = compat_urllib_parse.urlencode(data)
+        post = urlencode_postdata(data)
          headers = {
              b'Content-Type': b'application/x-www-form-urlencoded',
          }
-        req = compat_urllib_request.Request(url, post, headers)
+        req = sanitized_Request(url, post, headers)
          webpage = self._download_webpage(
              req, video_id, note='Downloading video page ...')
  
diff --git a/youtube_dl/extractor/plays.py b/youtube_dl/extractor/plays.py

new file mode 100644 (file)

index 0000000..c3c38cf
--- /dev/null
+++ b/youtube_dl/extractor/plays.py
@@ -0,0 +1,51 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class PlaysTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?plays\.tv/video/(?P<id>[0-9a-f]{18})'
+    _TEST = {
+        'url': 'http://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
+        'md5': 'dfeac1198506652b5257a62762cec7bc',
+        'info_dict': {
+            'id': '56af17f56c95335490',
+            'ext': 'mp4',
+            'title': 'When you outplay the Azir wall',
+            'description': 'Posted by Bjergsen',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._og_search_title(webpage)
+        content = self._parse_json(
+            self._search_regex(
+                r'R\.bindContent\(({.+?})\);', webpage,
+                'content'), video_id)['content']
+        mpd_url, sources = re.search(
+            r'(?s)<video[^>]+data-mpd="([^"]+)"[^>]*>(.+?)</video>',
+            content).groups()
+        formats = self._extract_mpd_formats(
+            self._proto_relative_url(mpd_url), video_id, mpd_id='DASH')
+        for format_id, height, format_url in re.findall(r'<source\s+res="((\d+)h?)"\s+src="([^"]+)"', sources):
+            formats.append({
+                'url': self._proto_relative_url(format_url),
+                'format_id': 'http-' + format_id,
+                'height': int_or_none(height),
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/playtvak.py b/youtube_dl/extractor/playtvak.py

new file mode 100644 (file)

index 0000000..1e8096a
--- /dev/null
+++ b/youtube_dl/extractor/playtvak.py
@@ -0,0 +1,181 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urlparse,
+    compat_urllib_parse_urlencode,
+)
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
+    qualities,
+)
+
+
+class PlaytvakIE(InfoExtractor):
+    IE_DESC = 'Playtvak.cz, iDNES.cz and Lidovky.cz'
+    _VALID_URL = r'https?://(?:.+?\.)?(?:playtvak|idnes|lidovky|metro)\.cz/.*\?(?:c|idvideo)=(?P<id>[^&]+)'
+    _TESTS = [{
+        'url': 'http://www.playtvak.cz/vyzente-vosy-a-srsne-ze-zahrady-dn5-/hodinovy-manzel.aspx?c=A150730_150323_hodinovy-manzel_kuko',
+        'md5': '4525ae312c324b4be2f4603cc78ceb4a',
+        'info_dict': {
+            'id': 'A150730_150323_hodinovy-manzel_kuko',
+            'ext': 'mp4',
+            'title': 'Vyžeňte vosy a sršně ze zahrady',
+            'description': 'md5:f93d398691044d303bc4a3de62f3e976',
+            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'duration': 279,
+            'timestamp': 1438732860,
+            'upload_date': '20150805',
+            'is_live': False,
+        }
+    }, {  # live video test
+        'url': 'http://slowtv.playtvak.cz/planespotting-0pr-/planespotting.aspx?c=A150624_164934_planespotting_cat',
+        'info_dict': {
+            'id': 'A150624_164934_planespotting_cat',
+            'ext': 'flv',
+            'title': 're:^Přímý přenos iDNES.cz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+            'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze',
+            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'is_live': True,
+        },
+        'params': {
+            'skip_download': True,  # requires rtmpdump
+        },
+    }, {  # idnes.cz
+        'url': 'http://zpravy.idnes.cz/pes-zavreny-v-aute-rozbijeni-okynek-v-aute-fj5-/domaci.aspx?c=A150809_104116_domaci_pku',
+        'md5': '819832ba33cd7016e58a6658577fe289',
+        'info_dict': {
+            'id': 'A150809_104116_domaci_pku',
+            'ext': 'mp4',
+            'title': 'Zavřeli jsme mraženou pizzu do auta. Upekla se',
+            'description': 'md5:01e73f02329e2e5760bd5eed4d42e3c2',
+            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'duration': 39,
+            'timestamp': 1438969140,
+            'upload_date': '20150807',
+            'is_live': False,
+        }
+    }, {  # lidovky.cz
+        'url': 'http://www.lidovky.cz/dalsi-demonstrace-v-praze-o-migraci-duq-/video.aspx?c=A150808_214044_ln-video_ELE',
+        'md5': 'c7209ac4ba9d234d4ad5bab7485bcee8',
+        'info_dict': {
+            'id': 'A150808_214044_ln-video_ELE',
+            'ext': 'mp4',
+            'title': 'Táhni! Demonstrace proti imigrantům budila emoce',
+            'description': 'md5:97c81d589a9491fbfa323c9fa3cca72c',
+            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'timestamp': 1439052180,
+            'upload_date': '20150808',
+            'is_live': False,
+        }
+    }, {  # metro.cz
+        'url': 'http://www.metro.cz/video-pod-billboardem-se-na-vltavske-roztocil-kolotoc-deti-vozil-jen-par-hodin-1hx-/metro-extra.aspx?c=A141111_173251_metro-extra_row',
+        'md5': '84fc1deedcac37b7d4a6ccae7c716668',
+        'info_dict': {
+            'id': 'A141111_173251_metro-extra_row',
+            'ext': 'mp4',
+            'title': 'Recesisté udělali z billboardu kolotoč',
+            'description': 'md5:7369926049588c3989a66c9c1a043c4c',
+            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'timestamp': 1415725500,
+            'upload_date': '20141111',
+            'is_live': False,
+        }
+    }, {
+        'url': 'http://www.playtvak.cz/embed.aspx?idvideo=V150729_141549_play-porad_kuko',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        info_url = self._html_search_regex(
+            r'Misc\.videoFLV\(\s*{\s*data\s*:\s*"([^"]+)"', webpage, 'info url')
+
+        parsed_url = compat_urlparse.urlparse(info_url)
+
+        qs = compat_urlparse.parse_qs(parsed_url.query)
+        qs.update({
+            'reklama': ['0'],
+            'type': ['js'],
+        })
+
+        info_url = compat_urlparse.urlunparse(
+            parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
+
+        json_info = self._download_json(
+            info_url, video_id,
+            transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1])
+
+        item = None
+        for i in json_info['items']:
+            if i.get('type') == 'video' or i.get('type') == 'stream':
+                item = i
+                break
+        if not item:
+            raise ExtractorError('No suitable stream found')
+
+        quality = qualities(('low', 'middle', 'high'))
+
+        formats = []
+        for fmt in item['video']:
+            video_url = fmt.get('file')
+            if not video_url:
+                continue
+
+            format_ = fmt['format']
+            format_id = '%s_%s' % (format_, fmt['quality'])
+            preference = None
+
+            if format_ in ('mp4', 'webm'):
+                ext = format_
+            elif format_ == 'rtmp':
+                ext = 'flv'
+            elif format_ == 'apple':
+                ext = 'mp4'
+                # Some streams have mp3 audio which does not play
+                # well with ffmpeg filter aac_adtstoasc
+                preference = -1
+            elif format_ == 'adobe':  # f4m manifest fails with 404 in 80% of requests
+                continue
+            else:  # Other formats not supported yet
+                continue
+
+            formats.append({
+                'url': video_url,
+                'ext': ext,
+                'format_id': format_id,
+                'quality': quality(fmt.get('quality')),
+                'preference': preference,
+            })
+        self._sort_formats(formats)
+
+        title = item['title']
+        is_live = item['type'] == 'stream'
+        if is_live:
+            title = self._live_title(title)
+        description = self._og_search_description(webpage, default=None) or self._html_search_meta(
+            'description', webpage, 'description')
+        timestamp = None
+        duration = None
+        if not is_live:
+            duration = int_or_none(item.get('length'))
+            timestamp = item.get('published')
+            if timestamp:
+                timestamp = parse_iso8601(timestamp[:-5])
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': item.get('image'),
+            'duration': duration,
+            'timestamp': timestamp,
+            'is_live': is_live,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/playwire.py b/youtube_dl/extractor/playwire.py

index bdc71017bc2b1ca12bf1fe70afb7bda95d9b8a2b..6d138ef25d2d5cec02a012f5a06af085a6c35d26 100644 (file)
--- a/youtube_dl/extractor/playwire.py
+++ b/youtube_dl/extractor/playwire.py
@@ -19,7 +19,7 @@ class PlaywireIE(InfoExtractor):
              'id': '3353705',
              'ext': 'mp4',
              'title': 'S04_RM_UCL_Rus',
-            'thumbnail': 're:^http://.*\.png$',
+            'thumbnail': 're:^https?://.*\.png$',
              'duration': 145.94,
          },
      }, {
diff --git a/youtube_dl/extractor/pluralsight.py b/youtube_dl/extractor/pluralsight.py

new file mode 100644 (file)

index 0000000..9aab776
--- /dev/null
+++ b/youtube_dl/extractor/pluralsight.py
@@ -0,0 +1,296 @@
+from __future__ import unicode_literals
+
+import re
+import json
+import random
+import collections
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_duration,
+    qualities,
+    sanitized_Request,
+    urlencode_postdata,
+)
+
+
+class PluralsightBaseIE(InfoExtractor):
+    _API_BASE = 'http://app.pluralsight.com'
+
+
+class PluralsightIE(PluralsightBaseIE):
+    IE_NAME = 'pluralsight'
+    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?'
+    _LOGIN_URL = 'https://app.pluralsight.com/id/'
+
+    _NETRC_MACHINE = 'pluralsight'
+
+    _TESTS = [{
+        'url': 'http://www.pluralsight.com/training/player?author=mike-mckeown&name=hosting-sql-server-windows-azure-iaas-m7-mgmt&mode=live&clip=3&course=hosting-sql-server-windows-azure-iaas',
+        'md5': '4d458cf5cf4c593788672419a8dd4cf8',
+        'info_dict': {
+            'id': 'hosting-sql-server-windows-azure-iaas-m7-mgmt-04',
+            'ext': 'mp4',
+            'title': 'Management of SQL Server - Demo Monitoring',
+            'duration': 338,
+        },
+        'skip': 'Requires pluralsight account credentials',
+    }, {
+        'url': 'https://app.pluralsight.com/training/player?course=angularjs-get-started&author=scott-allen&name=angularjs-get-started-m1-introduction&clip=0&mode=live',
+        'only_matching': True,
+    }, {
+        # available without pluralsight account
+        'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started',
+        'only_matching': True,
+    }]
+
+    def _real_initialize(self):
+        self._login()
+
+    def _login(self):
+        (username, password) = self._get_login_info()
+        if username is None:
+            return
+
+        login_page = self._download_webpage(
+            self._LOGIN_URL, None, 'Downloading login page')
+
+        login_form = self._hidden_inputs(login_page)
+
+        login_form.update({
+            'Username': username,
+            'Password': password,
+        })
+
+        post_url = self._search_regex(
+            r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
+            'post url', default=self._LOGIN_URL, group='url')
+
+        if not post_url.startswith('http'):
+            post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
+
+        request = sanitized_Request(
+            post_url, urlencode_postdata(login_form))
+        request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+
+        response = self._download_webpage(
+            request, None, 'Logging in as %s' % username)
+
+        error = self._search_regex(
+            r'<span[^>]+class="field-validation-error"[^>]*>([^<]+)</span>',
+            response, 'error message', default=None)
+        if error:
+            raise ExtractorError('Unable to login: %s' % error, expected=True)
+
+        if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')):
+            raise ExtractorError('Unable to log in')
+
+    def _real_extract(self, url):
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+
+        author = qs.get('author', [None])[0]
+        name = qs.get('name', [None])[0]
+        clip_id = qs.get('clip', [None])[0]
+        course = qs.get('course', [None])[0]
+
+        if any(not f for f in (author, name, clip_id, course,)):
+            raise ExtractorError('Invalid URL', expected=True)
+
+        display_id = '%s-%s' % (name, clip_id)
+
+        webpage = self._download_webpage(url, display_id)
+
+        modules = self._search_regex(
+            r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)',
+            webpage, 'modules', default=None)
+
+        if modules:
+            collection = self._parse_json(modules, display_id)
+        else:
+            # Webpage may be served in different layout (see
+            # https://github.com/rg3/youtube-dl/issues/7607)
+            collection = self._parse_json(
+                self._search_regex(
+                    r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'),
+                display_id)['course']['modules']
+
+        module, clip = None, None
+
+        for module_ in collection:
+            if name in (module_.get('moduleName'), module_.get('name')):
+                module = module_
+                for clip_ in module_.get('clips', []):
+                    clip_index = clip_.get('clipIndex')
+                    if clip_index is None:
+                        clip_index = clip_.get('index')
+                    if clip_index is None:
+                        continue
+                    if compat_str(clip_index) == clip_id:
+                        clip = clip_
+                        break
+
+        if not clip:
+            raise ExtractorError('Unable to resolve clip')
+
+        QUALITIES = {
+            'low': {'width': 640, 'height': 480},
+            'medium': {'width': 848, 'height': 640},
+            'high': {'width': 1024, 'height': 768},
+            'high-widescreen': {'width': 1280, 'height': 720},
+        }
+
+        QUALITIES_PREFERENCE = ('low', 'medium', 'high', 'high-widescreen',)
+        quality_key = qualities(QUALITIES_PREFERENCE)
+
+        AllowedQuality = collections.namedtuple('AllowedQuality', ['ext', 'qualities'])
+
+        ALLOWED_QUALITIES = (
+            AllowedQuality('webm', ['high', ]),
+            AllowedQuality('mp4', ['low', 'medium', 'high', ]),
+        )
+
+        # Some courses also offer widescreen resolution for high quality (see
+        # https://github.com/rg3/youtube-dl/issues/7766)
+        widescreen = True if re.search(
+            r'courseSupportsWidescreenVideoFormats\s*:\s*true', webpage) else False
+        best_quality = 'high-widescreen' if widescreen else 'high'
+        if widescreen:
+            for allowed_quality in ALLOWED_QUALITIES:
+                allowed_quality.qualities.append(best_quality)
+
+        # In order to minimize the number of calls to ViewClip API and reduce
+        # the probability of being throttled or banned by Pluralsight we will request
+        # only single format until formats listing was explicitly requested.
+        if self._downloader.params.get('listformats', False):
+            allowed_qualities = ALLOWED_QUALITIES
+        else:
+            def guess_allowed_qualities():
+                req_format = self._downloader.params.get('format') or 'best'
+                req_format_split = req_format.split('-', 1)
+                if len(req_format_split) > 1:
+                    req_ext, req_quality = req_format_split
+                    for allowed_quality in ALLOWED_QUALITIES:
+                        if req_ext == allowed_quality.ext and req_quality in allowed_quality.qualities:
+                            return (AllowedQuality(req_ext, (req_quality, )), )
+                req_ext = 'webm' if self._downloader.params.get('prefer_free_formats') else 'mp4'
+                return (AllowedQuality(req_ext, (best_quality, )), )
+            allowed_qualities = guess_allowed_qualities()
+
+        formats = []
+        for ext, qualities_ in allowed_qualities:
+            for quality in qualities_:
+                f = QUALITIES[quality].copy()
+                clip_post = {
+                    'a': author,
+                    'cap': 'false',
+                    'cn': clip_id,
+                    'course': course,
+                    'lc': 'en',
+                    'm': name,
+                    'mt': ext,
+                    'q': '%dx%d' % (f['width'], f['height']),
+                }
+                request = sanitized_Request(
+                    '%s/training/Player/ViewClip' % self._API_BASE,
+                    json.dumps(clip_post).encode('utf-8'))
+                request.add_header('Content-Type', 'application/json;charset=utf-8')
+                format_id = '%s-%s' % (ext, quality)
+                clip_url = self._download_webpage(
+                    request, display_id, 'Downloading %s URL' % format_id, fatal=False)
+
+                # Pluralsight tracks multiple sequential calls to ViewClip API and start
+                # to return 429 HTTP errors after some time (see
+                # https://github.com/rg3/youtube-dl/pull/6989). Moreover it may even lead
+                # to account ban (see https://github.com/rg3/youtube-dl/issues/6842).
+                # To somewhat reduce the probability of these consequences
+                # we will sleep random amount of time before each call to ViewClip.
+                self._sleep(
+                    random.randint(2, 5), display_id,
+                    '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
+
+                if not clip_url:
+                    continue
+                f.update({
+                    'url': clip_url,
+                    'ext': ext,
+                    'format_id': format_id,
+                    'quality': quality_key(quality),
+                })
+                formats.append(f)
+        self._sort_formats(formats)
+
+        # TODO: captions
+        # http://www.pluralsight.com/training/Player/ViewClip + cap = true
+        # or
+        # http://www.pluralsight.com/training/Player/Captions
+        # { a = author, cn = clip_id, lc = end, m = name }
+
+        return {
+            'id': clip.get('clipName') or clip['name'],
+            'title': '%s - %s' % (module['title'], clip['title']),
+            'duration': int_or_none(clip.get('duration')) or parse_duration(clip.get('formattedDuration')),
+            'creator': author,
+            'formats': formats
+        }
+
+
+class PluralsightCourseIE(PluralsightBaseIE):
+    IE_NAME = 'pluralsight:course'
+    _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:library/)?courses/(?P<id>[^/]+)'
+    _TESTS = [{
+        # Free course from Pluralsight Starter Subscription for Microsoft TechNet
+        # https://offers.pluralsight.com/technet?loc=zTS3z&prod=zOTprodz&tech=zOttechz&prog=zOTprogz&type=zSOz&media=zOTmediaz&country=zUSz
+        'url': 'http://www.pluralsight.com/courses/hosting-sql-server-windows-azure-iaas',
+        'info_dict': {
+            'id': 'hosting-sql-server-windows-azure-iaas',
+            'title': 'Hosting SQL Server in Microsoft Azure IaaS Fundamentals',
+            'description': 'md5:61b37e60f21c4b2f91dc621a977d0986',
+        },
+        'playlist_count': 31,
+    }, {
+        # available without pluralsight account
+        'url': 'https://www.pluralsight.com/courses/angularjs-get-started',
+        'only_matching': True,
+    }, {
+        'url': 'https://app.pluralsight.com/library/courses/understanding-microsoft-azure-amazon-aws/table-of-contents',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        course_id = self._match_id(url)
+
+        # TODO: PSM cookie
+
+        course = self._download_json(
+            '%s/data/course/%s' % (self._API_BASE, course_id),
+            course_id, 'Downloading course JSON')
+
+        title = course['title']
+        description = course.get('description') or course.get('shortDescription')
+
+        course_data = self._download_json(
+            '%s/data/course/content/%s' % (self._API_BASE, course_id),
+            course_id, 'Downloading course data JSON')
+
+        entries = []
+        for num, module in enumerate(course_data, 1):
+            for clip in module.get('clips', []):
+                player_parameters = clip.get('playerParameters')
+                if not player_parameters:
+                    continue
+                entries.append({
+                    '_type': 'url_transparent',
+                    'url': '%s/training/player?%s' % (self._API_BASE, player_parameters),
+                    'ie_key': PluralsightIE.ie_key(),
+                    'chapter': module.get('title'),
+                    'chapter_number': num,
+                    'chapter_id': module.get('moduleRef'),
+                })
+
+        return self.playlist_result(entries, course_id, title, description)
diff --git a/youtube_dl/extractor/porn91.py b/youtube_dl/extractor/porn91.py

index 72d1b2718692d902b16efa303f9a14dd7d456f1f..9894f32620c1692830df023423ae02a6199121b1 100644 (file)
--- a/youtube_dl/extractor/porn91.py
+++ b/youtube_dl/extractor/porn91.py
@@ -1,7 +1,10 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-from ..compat import compat_urllib_parse
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlencode,
+)
  from .common import InfoExtractor
  from ..utils import (
      parse_duration,
@@ -22,14 +25,16 @@ class Porn91IE(InfoExtractor):
              'title': '18岁大一漂亮学妹，水嫩性感，再爽一次！',
              'ext': 'mp4',
              'duration': 431,
+            'age_limit': 18,
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        url = 'http://91porn.com/view_video.php?viewkey=%s' % video_id
          self._set_cookie('91porn.com', 'language', 'cn_CN')
-        webpage = self._download_webpage(url, video_id, 'get HTML content')
+
+        webpage = self._download_webpage(
+            'http://91porn.com/view_video.php?viewkey=%s' % video_id, video_id)
  
          if '作为游客，你每天只可观看10个视频' in webpage:
              raise ExtractorError('91 Porn says: Daily limit 10 videos exceeded', expected=True)
@@ -45,7 +50,7 @@ class Porn91IE(InfoExtractor):
              r'so.addVariable\(\'seccode\',\'([^\']+)\'', webpage, 'sec code')
          max_vid = self._search_regex(
              r'so.addVariable\(\'max_vid\',\'(\d+)\'', webpage, 'max vid')
-        url_params = compat_urllib_parse.urlencode({
+        url_params = compat_urllib_parse_urlencode({
              'VID': file_id,
              'mp4': '1',
              'seccode': sec_code,
@@ -53,8 +58,9 @@ class Porn91IE(InfoExtractor):
          })
          info_cn = self._download_webpage(
              'http://91porn.com/getfile.php?' + url_params, video_id,
-            'get real video url')
-        video_url = self._search_regex(r'file=([^&]+)&', info_cn, 'url')
+            'Downloading real video url')
+        video_url = compat_urllib_parse_unquote(self._search_regex(
+            r'file=([^&]+)&', info_cn, 'url'))
  
          duration = parse_duration(self._search_regex(
              r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))
@@ -68,4 +74,5 @@ class Porn91IE(InfoExtractor):
              'url': video_url,
              'duration': duration,
              'comment_count': comment_count,
+            'age_limit': self._rta_search(webpage),
          }
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index dbb2c3bd95fdd88df1edb6ea7a1a416262076620..39b53ecf68c77786f18956040bf7ccac4fd6dbc5 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class PornHdIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)(?:/(?P<display_id>.+))?'
+    _VALID_URL = r'https?://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)(?:/(?P<display_id>.+))?'
      _TEST = {
          'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
          'md5': '956b8ca569f7f4d8ec563e2c41598441',
@@ -36,7 +36,8 @@ class PornHdIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id or video_id)
  
          title = self._html_search_regex(
-            r'<title>(.+) porn HD.+?</title>', webpage, 'title')
+            [r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)',
+             r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
          description = self._html_search_regex(
              r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
          view_count = int_or_none(self._html_search_regex(
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 0b7886840fbced3d9fa6fb219050f40ac709c080..407ea08d4350b52666150e2784652535625c5e31 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -1,17 +1,21 @@
  from __future__ import unicode_literals
  
+import itertools
  import os
  import re
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_HTTPError,
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
  from ..utils import (
      ExtractorError,
+    int_or_none,
+    orderedSet,
+    sanitized_Request,
      str_to_int,
  )
  from ..aes import (
@@ -20,20 +24,28 @@ from ..aes import (
  
  
  class PornHubIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
+    _VALID_URL = r'https?://(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
      _TESTS = [{
          'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
-        'md5': '882f488fa1f0026f023f33576004a2ed',
+        'md5': '1e19b41231a02eba417839222ac9d58e',
          'info_dict': {
              'id': '648719015',
              'ext': 'mp4',
-            "uploader": "Babes",
-            "title": "Seductive Indian beauty strips down and fingers her pink pussy",
-            "age_limit": 18
+            'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
+            'uploader': 'Babes',
+            'duration': 361,
+            'view_count': int,
+            'like_count': int,
+            'dislike_count': int,
+            'comment_count': int,
+            'age_limit': 18,
          }
      }, {
          'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
          'only_matching': True,
+    }, {
+        'url': 'http://fr.pornhub.com/view_video.php?viewkey=ph55ca2f9760862',
+        'only_matching': True,
      }]
  
      @classmethod
@@ -50,7 +62,7 @@ class PornHubIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
@@ -64,13 +76,23 @@ class PornHubIE(InfoExtractor):
                  'PornHub said: %s' % error_msg,
                  expected=True, video_id=video_id)
  
-        video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
+        flashvars = self._parse_json(
+            self._search_regex(
+                r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
+            video_id)
+        if flashvars:
+            video_title = flashvars.get('video_title')
+            thumbnail = flashvars.get('image_url')
+            duration = int_or_none(flashvars.get('video_duration'))
+        else:
+            video_title, thumbnail, duration = [None] * 3
+
+        if not video_title:
+            video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
+
          video_uploader = self._html_search_regex(
              r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
              webpage, 'uploader', fatal=False)
-        thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
-        if thumbnail:
-            thumbnail = compat_urllib_parse_unquote(thumbnail)
  
          view_count = self._extract_count(
              r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
@@ -81,7 +103,7 @@ class PornHubIE(InfoExtractor):
          comment_count = self._extract_count(
              r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
  
-        video_urls = list(map(compat_urllib_parse_unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
+        video_urls = list(map(compat_urllib_parse_unquote, re.findall(r"player_quality_[0-9]{3}p\s*=\s*'([^']+)'", webpage)))
          if webpage.find('"encrypted":true') != -1:
              password = compat_urllib_parse_unquote_plus(
                  self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
@@ -92,9 +114,9 @@ class PornHubIE(InfoExtractor):
              path = compat_urllib_parse_urlparse(video_url).path
              extension = os.path.splitext(path)[1][1:]
              format = path.split('/')[5].split('_')[:2]
-            format = "-".join(format)
+            format = '-'.join(format)
  
-            m = re.match(r'^(?P<height>[0-9]+)P-(?P<tbr>[0-9]+)K$', format)
+            m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
              if m is None:
                  height = None
                  tbr = None
@@ -117,6 +139,7 @@ class PornHubIE(InfoExtractor):
              'uploader': video_uploader,
              'title': video_title,
              'thumbnail': thumbnail,
+            'duration': duration,
              'view_count': view_count,
              'like_count': like_count,
              'dislike_count': dislike_count,
@@ -126,26 +149,23 @@ class PornHubIE(InfoExtractor):
          }
  
  
-class PornHubPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
-    _TESTS = [{
-        'url': 'http://www.pornhub.com/playlist/6201671',
-        'info_dict': {
-            'id': '6201671',
-            'title': 'P0p4',
-        },
-        'playlist_mincount': 35,
-    }]
+class PornHubPlaylistBaseIE(InfoExtractor):
+    def _extract_entries(self, webpage):
+        return [
+            self.url_result(
+                'http://www.pornhub.com/%s' % video_url,
+                PornHubIE.ie_key(), video_title=title)
+            for video_url, title in orderedSet(re.findall(
+                r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
+                webpage))
+        ]
  
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
  
          webpage = self._download_webpage(url, playlist_id)
  
-        entries = [
-            self.url_result('http://www.pornhub.com/%s' % video_url, 'PornHub')
-            for video_url in set(re.findall('href="/?(view_video\.php\?viewkey=\d+[^"]*)"', webpage))
-        ]
+        entries = self._extract_entries(webpage)
  
          playlist = self._parse_json(
              self._search_regex(
@@ -154,3 +174,48 @@ class PornHubPlaylistIE(InfoExtractor):
  
          return self.playlist_result(
              entries, playlist_id, playlist.get('title'), playlist.get('description'))
+
+
+class PornHubPlaylistIE(PornHubPlaylistBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.pornhub.com/playlist/6201671',
+        'info_dict': {
+            'id': '6201671',
+            'title': 'P0p4',
+        },
+        'playlist_mincount': 35,
+    }]
+
+
+class PornHubUserVideosIE(PornHubPlaylistBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
+    _TESTS = [{
+        'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
+        'info_dict': {
+            'id': 'zoe_ph',
+        },
+        'playlist_mincount': 171,
+    }, {
+        'url': 'http://www.pornhub.com/users/rushandlia/videos',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        user_id = self._match_id(url)
+
+        entries = []
+        for page_num in itertools.count(1):
+            try:
+                webpage = self._download_webpage(
+                    url, user_id, 'Downloading page %d' % page_num,
+                    query={'page': page_num})
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
+                    break
+            page_entries = self._extract_entries(webpage)
+            if not page_entries:
+                break
+            entries.extend(page_entries)
+
+        return self.playlist_result(entries, user_id)
diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py

index 34735c51e19c7dbbb1c07f2fc4a203df4dda70a9..5398e708b68337b76739282abf6c00e8a39745ab 100644 (file)
--- a/youtube_dl/extractor/pornotube.py
+++ b/youtube_dl/extractor/pornotube.py
@@ -3,11 +3,9 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -46,7 +44,7 @@ class PornotubeIE(InfoExtractor):
              'authenticationSpaceKey': originAuthenticationSpaceKey,
              'credentials': 'Clip Application',
          }
-        token_req = compat_urllib_request.Request(
+        token_req = sanitized_Request(
              'https://api.aebn.net/auth/v1/token/primal',
              data=json.dumps(token_req_data).encode('utf-8'))
          token_req.add_header('Content-Type', 'application/json')
@@ -56,7 +54,7 @@ class PornotubeIE(InfoExtractor):
          token = token_answer['tokenKey']
  
          # Get video URL
-        delivery_req = compat_urllib_request.Request(
+        delivery_req = sanitized_Request(
              'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id)
          delivery_req.add_header('Authorization', token)
          delivery_info = self._download_json(
@@ -64,7 +62,7 @@ class PornotubeIE(InfoExtractor):
          video_url = delivery_info['mediaUrl']
  
          # Get additional info (title etc.)
-        info_req = compat_urllib_request.Request(
+        info_req = sanitized_Request(
              'https://api.aebn.net/content/v1/clips/%s?expand='
              'title,description,primaryImageNumber,startSecond,endSecond,'
              'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,'
diff --git a/youtube_dl/extractor/pornovoisines.py b/youtube_dl/extractor/pornovoisines.py

index eba4dfbb39576bff355b722c997dd31e07ce370f..6b51e5c5400ee59859eb0d29cb740a31f34f3a96 100644 (file)
--- a/youtube_dl/extractor/pornovoisines.py
+++ b/youtube_dl/extractor/pornovoisines.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  
  class PornoVoisinesIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?pornovoisines\.com/showvideo/(?P<id>\d+)/(?P<display_id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?pornovoisines\.com/showvideo/(?P<id>\d+)/(?P<display_id>[^/]+)'
  
      _VIDEO_URL_TEMPLATE = 'http://stream%d.pornovoisines.com' \
          '/static/media/video/transcoded/%s-640x360-1000-trscded.mp4'
@@ -56,7 +56,7 @@ class PornoVoisinesIE(InfoExtractor):
              r'<h1>(.+?)</h1>', webpage, 'title', flags=re.DOTALL)
          description = self._html_search_regex(
              r'<article id="descriptif">(.+?)</article>',
-            webpage, "description", fatal=False, flags=re.DOTALL)
+            webpage, 'description', fatal=False, flags=re.DOTALL)
  
          thumbnail = self._search_regex(
              r'<div id="mediaspace%s">\s*<img src="/?([^"]+)"' % video_id,
diff --git a/youtube_dl/extractor/presstv.py b/youtube_dl/extractor/presstv.py

new file mode 100644 (file)

index 0000000..2da93ed
--- /dev/null
+++ b/youtube_dl/extractor/presstv.py
@@ -0,0 +1,74 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import remove_start
+
+
+class PressTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?presstv\.ir/[^/]+/(?P<y>\d+)/(?P<m>\d+)/(?P<d>\d+)/(?P<id>\d+)/(?P<display_id>[^/]+)?'
+
+    _TEST = {
+        'url': 'http://www.presstv.ir/Detail/2016/04/09/459911/Australian-sewerage-treatment-facility-/',
+        'md5': '5d7e3195a447cb13e9267e931d8dd5a5',
+        'info_dict': {
+            'id': '459911',
+            'display_id': 'Australian-sewerage-treatment-facility-',
+            'ext': 'mp4',
+            'title': 'Organic mattresses used to clean waste water',
+            'upload_date': '20160409',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'description': 'md5:20002e654bbafb6908395a5c0cfcd125'
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        # extract video URL from webpage
+        video_url = self._hidden_inputs(webpage)['inpPlayback']
+
+        # build list of available formats
+        # specified in http://www.presstv.ir/Scripts/playback.js
+        base_url = 'http://192.99.219.222:82/presstv'
+        _formats = [
+            (180, '_low200.mp4'),
+            (360, '_low400.mp4'),
+            (720, '_low800.mp4'),
+            (1080, '.mp4')
+        ]
+
+        formats = [{
+            'url': base_url + video_url[:-4] + extension,
+            'format_id': '%dp' % height,
+            'height': height,
+        } for height, extension in _formats]
+
+        # extract video metadata
+        title = remove_start(
+            self._html_search_meta('title', webpage, fatal=True), 'PressTV-')
+
+        thumbnail = self._og_search_thumbnail(webpage)
+        description = self._og_search_description(webpage)
+
+        upload_date = '%04d%02d%02d' % (
+            int(mobj.group('y')),
+            int(mobj.group('m')),
+            int(mobj.group('d')),
+        )
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'formats': formats,
+            'thumbnail': thumbnail,
+            'upload_date': upload_date,
+            'description': description
+        }
diff --git a/youtube_dl/extractor/primesharetv.py b/youtube_dl/extractor/primesharetv.py

index 304359dc5b189b8ce27c967c2d369b26db334532..0c1024772c860404209bd3b3f2a183d90c41face 100644 (file)
--- a/youtube_dl/extractor/primesharetv.py
+++ b/youtube_dl/extractor/primesharetv.py
@@ -1,11 +1,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
  )
-from ..utils import ExtractorError
  
  
  class PrimeShareTVIE(InfoExtractor):
@@ -41,8 +41,8 @@ class PrimeShareTVIE(InfoExtractor):
              webpage, 'wait time', default=7)) + 1
          self._sleep(wait_time, video_id)
  
-        req = compat_urllib_request.Request(
-            url, compat_urllib_parse.urlencode(fields), headers)
+        req = sanitized_Request(
+            url, urlencode_postdata(fields), headers)
          video_page = self._download_webpage(
              req, video_id, 'Downloading video page')
  
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

index 8190ed6766ce5c878fc82700524ec6d012d70a57..f93bd19ff6dde40c87672b4fd18a3f1aab11382e 100644 (file)
--- a/youtube_dl/extractor/promptfile.py
+++ b/youtube_dl/extractor/promptfile.py
@@ -4,13 +4,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      determine_ext,
      ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -36,8 +34,8 @@ class PromptFileIE(InfoExtractor):
                                   expected=True)
  
          fields = self._hidden_inputs(webpage)
-        post = compat_urllib_parse.urlencode(fields)
-        req = compat_urllib_request.Request(url, post)
+        post = urlencode_postdata(fields)
+        req = sanitized_Request(url, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
          webpage = self._download_webpage(
              req, video_id, 'Downloading video page')
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index effcf1db37b06d1af40d99233359f0df295eaa03..07d49d489d6779b0f6bb7bd12bc610497c576c2e 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -5,9 +5,7 @@ import re
  
  from hashlib import sha1
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse_urlencode
  from ..utils import (
      ExtractorError,
      determine_ext,
@@ -20,7 +18,7 @@ from ..utils import (
  class ProSiebenSat1IE(InfoExtractor):
      IE_NAME = 'prosiebensat1'
      IE_DESC = 'ProSiebenSat.1 Digital'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany)\.(?:de|at)|ran\.de|fem\.com)/(?P<id>.+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
  
      _TESTS = [
          {
@@ -32,7 +30,7 @@ class ProSiebenSat1IE(InfoExtractor):
              'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
              'info_dict': {
                  'id': '2104602',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Episode 18 - Staffel 2',
                  'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                  'upload_date': '20131231',
@@ -138,14 +136,13 @@ class ProSiebenSat1IE(InfoExtractor):
              'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
              'info_dict': {
                  'id': '2572814',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Andreas Kümmert: Rocket Man',
                  'description': 'md5:6ddb02b0781c6adf778afea606652e38',
                  'upload_date': '20131017',
                  'duration': 469.88,
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -153,13 +150,12 @@ class ProSiebenSat1IE(InfoExtractor):
              'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
              'info_dict': {
                  'id': '2156342',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Kurztrips zum Valentinstag',
-                'description': 'Romantischer Kurztrip zum Valentinstag? Wir verraten, was sich hier wirklich lohnt.',
+                'description': 'Romantischer Kurztrip zum Valentinstag? Nina Heinemann verrät, was sich hier wirklich lohnt.',
                  'duration': 307.24,
              },
              'params': {
-                # rtmp download
                  'skip_download': True,
              },
          },
@@ -172,12 +168,26 @@ class ProSiebenSat1IE(InfoExtractor):
              },
              'playlist_count': 2,
          },
+        {
+            'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
+            'info_dict': {
+                'id': '4187506',
+                'ext': 'flv',
+                'title': 'Best of Circus HalliGalli',
+                'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
+                'upload_date': '20151229',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
      ]
  
      _CLIPID_REGEXES = [
          r'"clip_id"\s*:\s+"(\d+)"',
          r'clipid: "(\d+)"',
          r'clip[iI]d=(\d+)',
+        r'clip[iI]d\s*=\s*["\'](\d+)',
          r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
      ]
      _TITLE_REGEXES = [
@@ -186,12 +196,16 @@ class ProSiebenSat1IE(InfoExtractor):
          r'<!-- start video -->\s*<h1>(.+?)</h1>',
          r'<h1 class="att-name">\s*(.+?)</h1>',
          r'<header class="module_header">\s*<h2>([^<]+)</h2>\s*</header>',
+        r'<h2 class="video-title" itemprop="name">\s*(.+?)</h2>',
+        r'<div[^>]+id="veeseoTitle"[^>]*>(.+?)</div>',
      ]
      _DESCRIPTION_REGEXES = [
          r'<p itemprop="description">\s*(.+?)</p>',
          r'<div class="videoDecription">\s*<p><strong>Beschreibung</strong>: (.+?)</p>',
          r'<div class="g-plusone" data-size="medium"></div>\s*</div>\s*</header>\s*(.+?)\s*<footer>',
          r'<p class="att-description">\s*(.+?)\s*</p>',
+        r'<p class="video-description" itemprop="description">\s*(.+?)</p>',
+        r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
      ]
      _UPLOAD_DATE_REGEXES = [
          r'<meta property="og:published_time" content="(.+?)">',
@@ -219,7 +233,7 @@ class ProSiebenSat1IE(InfoExtractor):
          client_name = 'kolibri-2.0.19-splec4'
          client_location = url
  
-        videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse.urlencode({
+        videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse_urlencode({
              'access_token': access_token,
              'client_location': client_location,
              'client_name': client_name,
@@ -240,7 +254,7 @@ class ProSiebenSat1IE(InfoExtractor):
          client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name])
                                   .encode('utf-8')).hexdigest()
  
-        sources_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources?%s' % (clip_id, compat_urllib_parse.urlencode({
+        sources_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources?%s' % (clip_id, compat_urllib_parse_urlencode({
              'access_token': access_token,
              'client_id': client_id,
              'client_location': client_location,
@@ -254,7 +268,7 @@ class ProSiebenSat1IE(InfoExtractor):
                                            client_location, source_ids_str, g, client_name])
                                   .encode('utf-8')).hexdigest()
  
-        url_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url?%s' % (clip_id, compat_urllib_parse.urlencode({
+        url_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url?%s' % (clip_id, compat_urllib_parse_urlencode({
              'access_token': access_token,
              'client_id': client_id,
              'client_location': client_location,
diff --git a/youtube_dl/extractor/puls4.py b/youtube_dl/extractor/puls4.py

index cce84b9e4d95e53731f01d334830faac9f1e008d..fca30e1aae5b35f9ef439fccc8396b5127f79aa9 100644 (file)
--- a/youtube_dl/extractor/puls4.py
+++ b/youtube_dl/extractor/puls4.py
@@ -40,7 +40,7 @@ class Puls4IE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          error_message = self._html_search_regex(
-            r'<div class="message-error">(.+?)</div>',
+            r'<div[^>]+class="message-error"[^>]*>(.+?)</div>',
              webpage, 'error message', default=None)
          if error_message:
              raise ExtractorError(
diff --git a/youtube_dl/extractor/pyvideo.py b/youtube_dl/extractor/pyvideo.py

index 6d5732d45c3d3e22d085319ff45449881ac73ad2..cc0416cb81eb23ed87d1dae0cdf2573a6df8936a 100644 (file)
--- a/youtube_dl/extractor/pyvideo.py
+++ b/youtube_dl/extractor/pyvideo.py
@@ -7,19 +7,19 @@ from .common import InfoExtractor
  
  
  class PyvideoIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
+    _VALID_URL = r'https?://(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
  
      _TESTS = [
          {
              'url': 'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
-            'md5': 'de317418c8bc76b1fd8633e4f32acbc6',
+            'md5': '520915673e53a5c5d487c36e0c4d85b5',
              'info_dict': {
                  'id': '24_4WWkSmNo',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'title': 'Become a logging expert in 30 minutes',
                  'description': 'md5:9665350d466c67fb5b1598de379021f7',
                  'upload_date': '20130320',
-                'uploader': 'NextDayVideo',
+                'uploader': 'Next Day Video',
                  'uploader_id': 'NextDayVideo',
              },
              'add_ie': ['Youtube'],
diff --git a/youtube_dl/extractor/qqmusic.py b/youtube_dl/extractor/qqmusic.py

index 1654a641f00db6833056571394b4dbe0b2856150..ff0af9543c2b5e5527f406958e9ae5ae4d1adbda 100644 (file)
--- a/youtube_dl/extractor/qqmusic.py
+++ b/youtube_dl/extractor/qqmusic.py
@@ -7,17 +7,18 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
+    sanitized_Request,
      strip_jsonp,
      unescapeHTML,
      clean_html,
+    ExtractorError,
  )
-from ..compat import compat_urllib_request
  
  
  class QQMusicIE(InfoExtractor):
      IE_NAME = 'qqmusic'
      IE_DESC = 'QQ音乐'
-    _VALID_URL = r'http://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
      _TESTS = [{
          'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
          'md5': '9ce1c1c8445f561506d2e3cfb0255705',
@@ -25,7 +26,7 @@ class QQMusicIE(InfoExtractor):
              'id': '004295Et37taLD',
              'ext': 'mp3',
              'title': '可惜没如果',
-            'upload_date': '20141227',
+            'release_date': '20141227',
              'creator': '林俊杰',
              'description': 'md5:d327722d0361576fde558f1ac68a7065',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -38,11 +39,26 @@ class QQMusicIE(InfoExtractor):
              'id': '004MsGEo3DdNxV',
              'ext': 'mp3',
              'title': '如果',
-            'upload_date': '20050626',
+            'release_date': '20050626',
              'creator': '李季美',
              'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
              'thumbnail': 're:^https?://.*\.jpg$',
          }
+    }, {
+        'note': 'lyrics not in .lrc format',
+        'url': 'http://y.qq.com/#type=song&mid=001JyApY11tIp6',
+        'info_dict': {
+            'id': '001JyApY11tIp6',
+            'ext': 'mp3',
+            'title': 'Shadows Over Transylvania',
+            'release_date': '19970225',
+            'creator': 'Dark Funeral',
+            'description': 'md5:ed14d5bd7ecec19609108052c25b2c11',
+            'thumbnail': 're:^https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      _FORMATS = {
@@ -112,15 +128,27 @@ class QQMusicIE(InfoExtractor):
          self._check_formats(formats, mid)
          self._sort_formats(formats)
  
-        return {
+        actual_lrc_lyrics = ''.join(
+            line + '\n' for line in re.findall(
+                r'(?m)^(\[[0-9]{2}:[0-9]{2}(?:\.[0-9]{2,})?\][^\n]*|\[[^\]]*\])', lrc_content))
+
+        info_dict = {
              'id': mid,
              'formats': formats,
              'title': song_name,
-            'upload_date': publish_time,
+            'release_date': publish_time,
              'creator': singer,
              'description': lrc_content,
-            'thumbnail': thumbnail_url,
+            'thumbnail': thumbnail_url
          }
+        if actual_lrc_lyrics:
+            info_dict['subtitles'] = {
+                'origin': [{
+                    'ext': 'lrc',
+                    'data': actual_lrc_lyrics,
+                }]
+            }
+        return info_dict
  
  
  class QQPlaylistBaseIE(InfoExtractor):
@@ -144,13 +172,13 @@ class QQPlaylistBaseIE(InfoExtractor):
  class QQMusicSingerIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:singer'
      IE_DESC = 'QQ音乐 - 歌手'
-    _VALID_URL = r'http://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
      _TEST = {
          'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
          'info_dict': {
              'id': '001BLpXF2DyJe2',
              'title': '林俊杰',
-            'description': 'md5:2a222d89ba4455a3af19940c0481bb78',
+            'description': 'md5:870ec08f7d8547c29c93010899103751',
          },
          'playlist_count': 12,
      }
@@ -174,7 +202,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
          singer_desc = None
  
          if singer_id:
-            req = compat_urllib_request.Request(
+            req = sanitized_Request(
                  'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg?utf8=1&outCharset=utf-8&format=xml&singerid=%s' % singer_id)
              req.add_header(
                  'Referer', 'http://s.plcloud.music.qq.com/xhr_proxy_utf8.html')
@@ -189,7 +217,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
  class QQMusicAlbumIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:album'
      IE_DESC = 'QQ音乐 - 专辑'
-    _VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
+    _VALID_URL = r'https?://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
  
      _TESTS = [{
          'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
@@ -232,7 +260,7 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
  class QQMusicToplistIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:toplist'
      IE_DESC = 'QQ音乐 - 排行榜'
-    _VALID_URL = r'http://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
+    _VALID_URL = r'https?://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
  
      _TESTS = [{
          'url': 'http://y.qq.com/#type=toplist&p=global_123',
@@ -245,7 +273,7 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
          'url': 'http://y.qq.com/#type=toplist&p=top_3',
          'info_dict': {
              'id': 'top_3',
-            'title': 'QQ音乐巅峰榜·欧美',
+            'title': '巅峰榜·欧美',
              'description': 'QQ音乐巅峰榜·欧美根据用户收听行为自动生成，集结当下最流行的欧美新歌！:更新时间：每周四22点|统'
                             '计周期：一周（上周四至本周三）|统计对象：三个月内发行的欧美歌曲|统计数量：100首|统计算法：根据'
                             '歌曲在一周内的有效播放次数，由高到低取前100名（同一歌手最多允许5首歌曲同时上榜）|有效播放次数：'
@@ -286,9 +314,9 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
  class QQMusicPlaylistIE(QQPlaylistBaseIE):
      IE_NAME = 'qqmusic:playlist'
      IE_DESC = 'QQ音乐 - 歌单'
-    _VALID_URL = r'http://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://y.qq.com/#type=taoge&id=3462654915',
          'info_dict': {
              'id': '3462654915',
@@ -296,7 +324,16 @@ class QQMusicPlaylistIE(QQPlaylistBaseIE):
              'description': 'md5:d2c9d758a96b9888cf4fe82f603121d4',
          },
          'playlist_count': 40,
-    }
+        'skip': 'playlist gone',
+    }, {
+        'url': 'http://y.qq.com/#type=taoge&id=1374105607',
+        'info_dict': {
+            'id': '1374105607',
+            'title': '易入人心的华语民谣',
+            'description': '民谣的歌曲易于传唱、、歌词朗朗伤口、旋律简单温馨。属于那种才入耳孔。却上心头的感觉。没有太多的复杂情绪。简单而直接地表达乐者的情绪，就是这样的简单才易入人心。',
+        },
+        'playlist_count': 20,
+    }]
  
      def _real_extract(self, url):
          list_id = self._match_id(url)
@@ -304,14 +341,21 @@ class QQMusicPlaylistIE(QQPlaylistBaseIE):
          list_json = self._download_json(
              'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg?type=1&json=1&utf8=1&onlysong=0&disstid=%s'
              % list_id, list_id, 'Download list page',
-            transform_source=strip_jsonp)['cdlist'][0]
-
+            transform_source=strip_jsonp)
+        if not len(list_json.get('cdlist', [])):
+            if list_json.get('code'):
+                raise ExtractorError(
+                    'QQ Music said: error %d in fetching playlist info' % list_json['code'],
+                    expected=True)
+            raise ExtractorError('Unable to get playlist info')
+
+        cdlist = list_json['cdlist'][0]
          entries = [
              self.url_result(
                  'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
-            ) for song in list_json['songlist']
+            ) for song in cdlist['songlist']
          ]
  
-        list_name = list_json.get('dissname')
-        list_description = clean_html(unescapeHTML(list_json.get('desc')))
+        list_name = cdlist.get('dissname')
+        list_description = clean_html(unescapeHTML(cdlist.get('desc')))
          return self.playlist_result(entries, list_id, list_name, list_description)
diff --git a/youtube_dl/extractor/quickvid.py b/youtube_dl/extractor/quickvid.py

deleted file mode 100644 (file)

index f414e23..0000000
--- a/youtube_dl/extractor/quickvid.py
+++ /dev/null
@@ -1,54 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
-    compat_urlparse,
-)
-from ..utils import (
-    determine_ext,
-    int_or_none,
-)
-
-
-class QuickVidIE(InfoExtractor):
-    _VALID_URL = r'https?://(www\.)?quickvid\.org/watch\.php\?v=(?P<id>[a-zA-Z_0-9-]+)'
-    _TEST = {
-        'url': 'http://quickvid.org/watch.php?v=sUQT3RCG8dx',
-        'md5': 'c0c72dd473f260c06c808a05d19acdc5',
-        'info_dict': {
-            'id': 'sUQT3RCG8dx',
-            'ext': 'mp4',
-            'title': 'Nick Offerman\'s Summer Reading Recap',
-            'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
-            'view_count': int,
-        },
-        'skip': 'Not accessible from Travis CI server',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_regex(r'<h2>(.*?)</h2>', webpage, 'title')
-        view_count = int_or_none(self._html_search_regex(
-            r'(?s)<div id="views">(.*?)</div>',
-            webpage, 'view count', fatal=False))
-        video_code = self._search_regex(
-            r'(?s)<video id="video"[^>]*>(.*?)</video>', webpage, 'video code')
-        formats = [
-            {
-                'url': compat_urlparse.urljoin(url, src),
-                'format_id': determine_ext(src, None),
-            } for src in re.findall('<source\s+src="([^"]+)"', video_code)
-        ]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': self._og_search_thumbnail(webpage),
-            'view_count': view_count,
-        }
diff --git a/youtube_dl/extractor/radiobremen.py b/youtube_dl/extractor/radiobremen.py

index 0d706312ea800709bea7156113da1f1f3315410e..0cbb15f086f4b3c747f2da80f8af813f8dbf50f0 100644 (file)
--- a/youtube_dl/extractor/radiobremen.py
+++ b/youtube_dl/extractor/radiobremen.py
@@ -28,16 +28,16 @@ class RadioBremenIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        meta_url = "http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s" % video_id
+        meta_url = 'http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s' % video_id
          meta_doc = self._download_webpage(
              meta_url, video_id, 'Downloading metadata')
          title = self._html_search_regex(
-            r"<h1.*>(?P<title>.+)</h1>", meta_doc, "title")
+            r'<h1.*>(?P<title>.+)</h1>', meta_doc, 'title')
          description = self._html_search_regex(
-            r"<p>(?P<description>.*)</p>", meta_doc, "description", fatal=False)
+            r'<p>(?P<description>.*)</p>', meta_doc, 'description', fatal=False)
          duration = parse_duration(self._html_search_regex(
-            r"L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>",
-            meta_doc, "duration", fatal=False))
+            r'L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>',
+            meta_doc, 'duration', fatal=False))
  
          page_doc = self._download_webpage(
              url, video_id, 'Downloading video information')
@@ -51,7 +51,7 @@ class RadioBremenIE(InfoExtractor):
          formats = [{
              'url': video_url,
              'ext': 'mp4',
-            'width': int(mobj.group("width")),
+            'width': int(mobj.group('width')),
          }]
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/radiofrance.py b/youtube_dl/extractor/radiofrance.py

index 09352ed8250819518be78e2d5cf8bb97108913e0..a8afc001460b2eedf5b0e29104ff0039ea5fe611 100644 (file)
--- a/youtube_dl/extractor/radiofrance.py
+++ b/youtube_dl/extractor/radiofrance.py
@@ -16,9 +16,9 @@ class RadioFranceIE(InfoExtractor):
          'info_dict': {
              'id': 'one-one',
              'ext': 'ogg',
-            "title": "One to one",
-            "description": "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
-            "uploader": "Thomas Hercouët",
+            'title': 'One to one',
+            'description': "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
+            'uploader': 'Thomas Hercouët',
          },
      }
  
diff --git a/youtube_dl/extractor/rai.py b/youtube_dl/extractor/rai.py

index 1631faf29f61c9cc15bca99394966c1917ca1a08..e36ce1aa1940deafd5a633bec814e7462008c3b1 100644 (file)
--- a/youtube_dl/extractor/rai.py
+++ b/youtube_dl/extractor/rai.py
@@ -5,22 +5,27 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse,
+    compat_urlparse,
  )
  from ..utils import (
+    ExtractorError,
+    determine_ext,
      parse_duration,
      unified_strdate,
+    int_or_none,
+    xpath_text,
  )
  
  
-class RaiIE(InfoExtractor):
-    _VALID_URL = r'(?P<url>(?P<host>http://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it))/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html)'
+class RaiTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+media/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
      _TESTS = [
          {
              'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
-            'md5': 'c064c0b2d09c278fb293116ef5d0a32d',
+            'md5': '96382709b61dd64a6b88e0f791e6df4c',
              'info_dict': {
                  'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'Report del 07/04/2014',
                  'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
                  'upload_date': '20140407',
@@ -29,16 +34,14 @@ class RaiIE(InfoExtractor):
          },
          {
              'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
-            'md5': '8bb9c151924ce241b74dd52ef29ceafa',
+            'md5': 'd9751b78eac9710d62c2447b224dea39',
              'info_dict': {
                  'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
-                'ext': 'mp4',
+                'ext': 'flv',
                  'title': 'TG PRIMO TEMPO',
-                'description': '',
                  'upload_date': '20140612',
                  'duration': 1758,
              },
-            'skip': 'Error 404',
          },
          {
              'url': 'http://www.rainews.it/dl/rainews/media/state-of-the-net-Antonella-La-Carpia-regole-virali-7aafdea9-0e5d-49d5-88a6-7e65da67ae13.html',
@@ -54,95 +57,103 @@ class RaiIE(InfoExtractor):
          },
          {
              'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-b4a49761-e0cc-4b14-8736-2729f6f73132-tg2.html',
-            'md5': '35694f062977fe6619943f08ed935730',
              'info_dict': {
                  'id': 'b4a49761-e0cc-4b14-8736-2729f6f73132',
                  'ext': 'mp4',
                  'title': 'Alluvione in Sardegna e dissesto idrogeologico',
                  'description': 'Edizione delle ore 20:30 ',
-            }
+            },
+            'skip': 'invalid urls',
          },
          {
              'url': 'http://www.ilcandidato.rai.it/dl/ray/media/Il-Candidato---Primo-episodio-Le-Primarie-28e5525a-b495-45e8-a7c3-bc48ba45d2b6.html',
-            'md5': '02b64456f7cc09f96ff14e7dd489017e',
+            'md5': '496ab63e420574447f70d02578333437',
              'info_dict': {
                  'id': '28e5525a-b495-45e8-a7c3-bc48ba45d2b6',
                  'ext': 'flv',
                  'title': 'Il Candidato - Primo episodio: "Le Primarie"',
-                'description': 'Primo appuntamento con "Il candidato" con Filippo Timi, alias Piero Zucca presidente!',
-                'uploader': 'RaiTre',
+                'description': 'md5:364b604f7db50594678f483353164fb8',
+                'upload_date': '20140923',
+                'duration': 386,
              }
-        }
+        },
      ]
  
-    def _extract_relinker_url(self, webpage):
-        return self._proto_relative_url(self._search_regex(
-            [r'name="videourl" content="([^"]+)"', r'var\s+videoURL(?:_MP4)?\s*=\s*"([^"]+)"'],
-            webpage, 'relinker url', default=None))
-
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        host = mobj.group('host')
+        video_id = self._match_id(url)
+        media = self._download_json(
+            'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % video_id,
+            video_id, 'Downloading video JSON')
  
-        webpage = self._download_webpage(url, video_id)
+        thumbnails = []
+        for image_type in ('image', 'image_medium', 'image_300'):
+            thumbnail_url = media.get(image_type)
+            if thumbnail_url:
+                thumbnails.append({
+                    'url': thumbnail_url,
+                })
  
-        relinker_url = self._extract_relinker_url(webpage)
+        subtitles = []
+        formats = []
+        media_type = media['type']
+        if 'Audio' in media_type:
+            formats.append({
+                'format_id': media.get('formatoAudio'),
+                'url': media['audioUrl'],
+                'ext': media.get('formatoAudio'),
+            })
+        elif 'Video' in media_type:
+            def fix_xml(xml):
+                return xml.replace(' tag elementi', '').replace('>/', '</')
  
-        if not relinker_url:
-            iframe_path = self._search_regex(
-                r'<iframe[^>]+src="/?(dl/[^"]+\?iframe\b[^"]*)"',
-                webpage, 'iframe')
-            webpage = self._download_webpage(
-                '%s/%s' % (host, iframe_path), video_id)
-            relinker_url = self._extract_relinker_url(webpage)
+            relinker = self._download_xml(
+                media['mediaUri'] + '&output=43',
+                video_id, transform_source=fix_xml)
  
-        relinker = self._download_json(
-            '%s&output=47' % relinker_url, video_id)
+            has_subtitle = False
  
-        media_url = relinker['video'][0]
-        ct = relinker.get('ct')
-        if ct == 'f4m':
-            formats = self._extract_f4m_formats(
-                media_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id)
-        else:
-            formats = [{
-                'url': media_url,
-                'format_id': ct,
-            }]
+            for element in relinker.findall('element'):
+                media_url = xpath_text(element, 'url')
+                ext = determine_ext(media_url)
+                content_type = xpath_text(element, 'content-type')
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        media_url, video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+                elif ext == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        media_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
+                        video_id, f4m_id='hds', fatal=False))
+                elif ext == 'stl':
+                    has_subtitle = True
+                elif content_type.startswith('video/'):
+                    bitrate = int_or_none(xpath_text(element, 'bitrate'))
+                    formats.append({
+                        'url': media_url,
+                        'tbr': bitrate if bitrate > 0 else None,
+                        'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
+                    })
+                elif content_type.startswith('image/'):
+                    thumbnails.append({
+                        'url': media_url,
+                    })
+
+            self._sort_formats(formats)
  
-        json_link = self._html_search_meta(
-            'jsonlink', webpage, 'JSON link', default=None)
-        if json_link:
-            media = self._download_json(
-                host + json_link, video_id, 'Downloading video JSON')
-            title = media.get('name')
-            description = media.get('desc')
-            thumbnail = media.get('image_300') or media.get('image_medium') or media.get('image')
-            duration = parse_duration(media.get('length'))
-            uploader = media.get('author')
-            upload_date = unified_strdate(media.get('date'))
+            if has_subtitle:
+                webpage = self._download_webpage(url, video_id)
+                subtitles = self._get_subtitles(video_id, webpage)
          else:
-            title = (self._search_regex(
-                r'var\s+videoTitolo\s*=\s*"(.+?)";',
-                webpage, 'title', default=None) or self._og_search_title(webpage)).replace('\\"', '"')
-            description = self._og_search_description(webpage)
-            thumbnail = self._og_search_thumbnail(webpage)
-            duration = None
-            uploader = self._html_search_meta('Editore', webpage, 'uploader')
-            upload_date = unified_strdate(self._html_search_meta(
-                'item-date', webpage, 'upload date', default=None))
-
-        subtitles = self.extract_subtitles(video_id, webpage)
+            raise ExtractorError('not a media file')
  
          return {
              'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'upload_date': upload_date,
-            'duration': duration,
+            'title': media['name'],
+            'description': media.get('desc'),
+            'thumbnails': thumbnails,
+            'uploader': media.get('author'),
+            'upload_date': unified_strdate(media.get('date')),
+            'duration': parse_duration(media.get('length')),
              'formats': formats,
              'subtitles': subtitles,
          }
@@ -161,3 +172,36 @@ class RaiIE(InfoExtractor):
                  'url': 'http://www.rai.tv%s' % compat_urllib_parse.quote(captions),
              }]
          return subtitles
+
+
+class RaiIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
+    _TESTS = [
+        {
+            'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
+            'md5': 'e0e7a8a131e249d1aa0ebf270d1d8db7',
+            'info_dict': {
+                'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
+                'ext': 'flv',
+                'title': 'Il pacco',
+                'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
+                'upload_date': '20141221',
+            },
+        }
+    ]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if RaiTVIE.suitable(url) else super(RaiIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        iframe_url = self._search_regex(
+            [r'<iframe[^>]+src="([^"]*/dl/[^"]+\?iframe\b[^"]*)"',
+             r'drawMediaRaiTV\(["\'](.+?)["\']'],
+            webpage, 'iframe')
+        if not iframe_url.startswith('http'):
+            iframe_url = compat_urlparse.urljoin(url, iframe_url)
+        return self.url_result(iframe_url)
diff --git a/youtube_dl/extractor/rbmaradio.py b/youtube_dl/extractor/rbmaradio.py

index 0f8f3ebde0999e8599eaa86516dd2b52524c9b40..7932af6ef7c599fdcce5c95bcdbf4e77f162d45d 100644 (file)
--- a/youtube_dl/extractor/rbmaradio.py
+++ b/youtube_dl/extractor/rbmaradio.py
@@ -18,11 +18,11 @@ class RBMARadioIE(InfoExtractor):
          'info_dict': {
              'id': 'ford-lopatin-live-at-primavera-sound-2011',
              'ext': 'mp3',
-            "uploader_id": "ford-lopatin",
-            "location": "Spain",
-            "description": "Joel Ford and Daniel ’Oneohtrix Point Never’ Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
-            "uploader": "Ford & Lopatin",
-            "title": "Live at Primavera Sound 2011",
+            'uploader_id': 'ford-lopatin',
+            'location': 'Spain',
+            'description': 'Joel Ford and Daniel ’Oneohtrix Point Never’ Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
+            'uploader': 'Ford & Lopatin',
+            'title': 'Live at Primavera Sound 2011',
          },
      }
  
diff --git a/youtube_dl/extractor/redtube.py b/youtube_dl/extractor/redtube.py

index d6054d7175fd49a22117dd357bea7905f6e739be..7ba41ba593295cdc7d2e28e6b64702321ed1ef08 100644 (file)
--- a/youtube_dl/extractor/redtube.py
+++ b/youtube_dl/extractor/redtube.py
@@ -5,7 +5,7 @@ from ..utils import ExtractorError
  
  
  class RedTubeIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?redtube\.com/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?redtube\.com/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.redtube.com/66418',
          'md5': '7b8c22b5e7098a3e1c09709df1126d2d',
diff --git a/youtube_dl/extractor/regiotv.py b/youtube_dl/extractor/regiotv.py

new file mode 100644 (file)

index 0000000..e250a52
--- /dev/null
+++ b/youtube_dl/extractor/regiotv.py
@@ -0,0 +1,62 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+from ..utils import (
+    sanitized_Request,
+    xpath_text,
+    xpath_with_ns,
+)
+
+
+class RegioTVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?regio-tv\.de/video/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.regio-tv.de/video/395808.html',
+        'info_dict': {
+            'id': '395808',
+            'ext': 'mp4',
+            'title': 'Wir in Ludwigsburg',
+            'description': 'Mit unseren zuckersüßen Adventskindern, außerdem besuchen wir die Abendsterne!',
+        }
+    }, {
+        'url': 'http://www.regio-tv.de/video/395808',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        key = self._search_regex(
+            r'key\s*:\s*(["\'])(?P<key>.+?)\1', webpage, 'key', group='key')
+        title = self._og_search_title(webpage)
+
+        SOAP_TEMPLATE = '<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><{0} xmlns="http://v.telvi.de/"><key xsi:type="xsd:string">{1}</key></{0}></soap:Body></soap:Envelope>'
+
+        request = sanitized_Request(
+            'http://v.telvi.de/',
+            SOAP_TEMPLATE.format('GetHTML5VideoData', key).encode('utf-8'))
+        video_data = self._download_xml(request, video_id, 'Downloading video XML')
+
+        NS_MAP = {
+            'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
+            'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
+        }
+
+        video_url = xpath_text(
+            video_data, xpath_with_ns('.//video', NS_MAP), 'video url', fatal=True)
+        thumbnail = xpath_text(
+            video_data, xpath_with_ns('.//image', NS_MAP), 'thumbnail')
+        description = self._og_search_description(
+            webpage) or self._html_search_meta('description', webpage)
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/restudy.py b/youtube_dl/extractor/restudy.py

index b17c2bfc06b7bd63a1abc4bc112c6692d98b9756..fd50065d4ad05d668919f2c0392e99f3ab637032 100644 (file)
--- a/youtube_dl/extractor/restudy.py
+++ b/youtube_dl/extractor/restudy.py
@@ -31,6 +31,7 @@ class RestudyIE(InfoExtractor):
          formats = self._extract_smil_formats(
              'https://www.restudy.dk/awsmedia/SmilDirectory/video_%s.xml' % video_id,
              video_id)
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/reverbnation.py b/youtube_dl/extractor/reverbnation.py

index ec7e7df7bc1f7a6b8ffdb4fc46b24a9bf8cb5148..3c6725aeb42945ce7f4e07b49bcd0d629248fcac 100644 (file)
--- a/youtube_dl/extractor/reverbnation.py
+++ b/youtube_dl/extractor/reverbnation.py
@@ -12,12 +12,12 @@ class ReverbNationIE(InfoExtractor):
          'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
          'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
          'info_dict': {
-            "id": "16965047",
-            "ext": "mp3",
-            "title": "MONA LISA",
-            "uploader": "ALKILADOS",
-            "uploader_id": "216429",
-            "thumbnail": "re:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$"
+            'id': '16965047',
+            'ext': 'mp3',
+            'title': 'MONA LISA',
+            'uploader': 'ALKILADOS',
+            'uploader_id': '216429',
+            'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$'
          },
      }]
  
diff --git a/youtube_dl/extractor/revision3.py b/youtube_dl/extractor/revision3.py

new file mode 100644 (file)

index 0000000..99979eb
--- /dev/null
+++ b/youtube_dl/extractor/revision3.py
@@ -0,0 +1,176 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    unescapeHTML,
+    qualities,
+)
+
+
+class Revision3IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|testtube|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)'
+    _TESTS = [{
+        'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016',
+        'md5': 'd94a72d85d0a829766de4deb8daaf7df',
+        'info_dict': {
+            'id': '71089',
+            'display_id': 'technobuffalo/5-google-predictions-for-2016',
+            'ext': 'webm',
+            'title': '5 Google Predictions for 2016',
+            'description': 'Google had a great 2015, but it\'s already time to look ahead. Here are our five predictions for 2016.',
+            'upload_date': '20151228',
+            'timestamp': 1451325600,
+            'duration': 187,
+            'uploader': 'TechnoBuffalo',
+            'uploader_id': 'technobuffalo',
+        }
+    }, {
+        # Show
+        'url': 'http://testtube.com/brainstuff',
+        'info_dict': {
+            'id': '251',
+            'title': 'BrainStuff',
+            'description': 'Whether the topic is popcorn or particle physics, you can count on the HowStuffWorks team to explore-and explain-the everyday science in the world around us on BrainStuff.',
+        },
+        'playlist_mincount': 93,
+    }, {
+        'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
+        'info_dict': {
+            'id': '58227',
+            'display_id': 'dnews/5-weird-ways-plants-can-eat-animals',
+            'duration': 275,
+            'ext': 'webm',
+            'title': '5 Weird Ways Plants Can Eat Animals',
+            'description': 'Why have some plants evolved to eat meat?',
+            'upload_date': '20150120',
+            'timestamp': 1421763300,
+            'uploader': 'DNews',
+            'uploader_id': 'dnews',
+        },
+    }, {
+        'url': 'http://testtube.com/tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
+        'info_dict': {
+            'id': '71618',
+            'ext': 'mp4',
+            'display_id': 'tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
+            'title': 'The Israel-Palestine Conflict Explained in Ten Minutes',
+            'description': 'If you\'d like to learn about the struggle between Israelis and Palestinians, this video is a great place to start',
+            'uploader': 'Editors\' Picks',
+            'uploader_id': 'tt-editors-picks',
+            'timestamp': 1453309200,
+            'upload_date': '20160120',
+        },
+        'add_ie': ['Youtube'],
+    }, {
+        # Tag
+        'url': 'http://testtube.com/tech-news',
+        'info_dict': {
+            'id': '21018',
+            'title': 'tech news',
+        },
+        'playlist_mincount': 9,
+    }]
+    _PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s'
+    _API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
+
+    def _real_extract(self, url):
+        domain, display_id = re.match(self._VALID_URL, url).groups()
+        site = domain.split('.')[0]
+        page_info = self._download_json(
+            self._PAGE_DATA_TEMPLATE % (domain, display_id, domain), display_id)
+
+        page_data = page_info['data']
+        page_type = page_data['type']
+        if page_type in ('episode', 'embed'):
+            show_data = page_data['show']['data']
+            page_id = compat_str(page_data['id'])
+            video_id = compat_str(page_data['video']['data']['id'])
+
+            preference = qualities(['mini', 'small', 'medium', 'large'])
+            thumbnails = [{
+                'url': image_url,
+                'id': image_id,
+                'preference': preference(image_id)
+            } for image_id, image_url in page_data.get('images', {}).items()]
+
+            info = {
+                'id': page_id,
+                'display_id': display_id,
+                'title': unescapeHTML(page_data['name']),
+                'description': unescapeHTML(page_data.get('summary')),
+                'timestamp': parse_iso8601(page_data.get('publishTime'), ' '),
+                'author': page_data.get('author'),
+                'uploader': show_data.get('name'),
+                'uploader_id': show_data.get('slug'),
+                'thumbnails': thumbnails,
+                'extractor_key': site,
+            }
+
+            if page_type == 'embed':
+                info.update({
+                    '_type': 'url_transparent',
+                    'url': page_data['video']['data']['embed'],
+                })
+                return info
+
+            video_data = self._download_json(
+                'http://revision3.com/api/getPlaylist.json?api_key=%s&codecs=h264,vp8,theora&video_id=%s' % (self._API_KEY, video_id),
+                video_id)['items'][0]
+
+            formats = []
+            for vcodec, media in video_data['media'].items():
+                for quality_id, quality in media.items():
+                    if quality_id == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            quality['url'], video_id, 'mp4',
+                            'm3u8_native', m3u8_id='hls', fatal=False))
+                    else:
+                        formats.append({
+                            'url': quality['url'],
+                            'format_id': '%s-%s' % (vcodec, quality_id),
+                            'tbr': int_or_none(quality.get('bitrate')),
+                            'vcodec': vcodec,
+                        })
+            self._sort_formats(formats)
+
+            info.update({
+                'title': unescapeHTML(video_data['title']),
+                'description': unescapeHTML(video_data.get('summary')),
+                'uploader': video_data.get('show', {}).get('name'),
+                'uploader_id': video_data.get('show', {}).get('slug'),
+                'duration': int_or_none(video_data.get('duration')),
+                'formats': formats,
+            })
+            return info
+        else:
+            list_data = page_info[page_type]['data']
+            episodes_data = page_info['episodes']['data']
+            num_episodes = page_info['meta']['totalEpisodes']
+            processed_episodes = 0
+            entries = []
+            page_num = 1
+            while True:
+                entries.extend([{
+                    '_type': 'url',
+                    'url': 'http://%s%s' % (domain, episode['path']),
+                    'id': compat_str(episode['id']),
+                    'ie_key': 'Revision3',
+                    'extractor_key': site,
+                } for episode in episodes_data])
+                processed_episodes += len(episodes_data)
+                if processed_episodes == num_episodes:
+                    break
+                page_num += 1
+                episodes_data = self._download_json(self._PAGE_DATA_TEMPLATE % (
+                    domain, display_id + '/' + compat_str(page_num), domain),
+                    display_id)['episodes']['data']
+
+            return self.playlist_result(
+                entries, compat_str(list_data['id']),
+                list_data.get('name'), list_data.get('summary'))
diff --git a/youtube_dl/extractor/rice.py b/youtube_dl/extractor/rice.py

new file mode 100644 (file)

index 0000000..f855719
--- /dev/null
+++ b/youtube_dl/extractor/rice.py
@@ -0,0 +1,116 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_parse_qs
+from ..utils import (
+    xpath_text,
+    xpath_element,
+    int_or_none,
+    parse_iso8601,
+    ExtractorError,
+)
+
+
+class RICEIE(InfoExtractor):
+    _VALID_URL = r'https?://mediahub\.rice\.edu/app/[Pp]ortal/video\.aspx\?(?P<query>.+)'
+    _TEST = {
+        'url': 'https://mediahub.rice.edu/app/Portal/video.aspx?PortalID=25ffd62c-3d01-4b29-8c70-7c94270efb3e&DestinationID=66bc9434-03bd-4725-b47e-c659d8d809db&ContentID=YEWIvbhb40aqdjMD1ALSqw',
+        'md5': '9b83b4a2eead4912dc3b7fac7c449b6a',
+        'info_dict': {
+            'id': 'YEWIvbhb40aqdjMD1ALSqw',
+            'ext': 'mp4',
+            'title': 'Active Learning in Archeology',
+            'upload_date': '20140616',
+            'timestamp': 1402926346,
+        }
+    }
+    _NS = 'http://schemas.datacontract.org/2004/07/ensembleVideo.Data.Service.Contracts.Models.Player.Config'
+
+    def _real_extract(self, url):
+        qs = compat_parse_qs(re.match(self._VALID_URL, url).group('query'))
+        if not qs.get('PortalID') or not qs.get('DestinationID') or not qs.get('ContentID'):
+            raise ExtractorError('Invalid URL', expected=True)
+
+        portal_id = qs['PortalID'][0]
+        playlist_id = qs['DestinationID'][0]
+        content_id = qs['ContentID'][0]
+
+        content_data = self._download_xml('https://mediahub.rice.edu/api/portal/GetContentTitle', content_id, query={
+            'portalId': portal_id,
+            'playlistId': playlist_id,
+            'contentId': content_id
+        })
+        metadata = xpath_element(content_data, './/metaData', fatal=True)
+        title = xpath_text(metadata, 'primaryTitle', fatal=True)
+        encodings = xpath_element(content_data, './/encodings', fatal=True)
+        player_data = self._download_xml('https://mediahub.rice.edu/api/player/GetPlayerConfig', content_id, query={
+            'temporaryLinkId': xpath_text(encodings, 'temporaryLinkId', fatal=True),
+            'contentId': content_id,
+        })
+
+        common_fmt = {}
+        dimensions = xpath_text(encodings, 'dimensions')
+        if dimensions:
+            wh = dimensions.split('x')
+            if len(wh) == 2:
+                common_fmt.update({
+                    'width': int_or_none(wh[0]),
+                    'height': int_or_none(wh[1]),
+                })
+
+        formats = []
+        rtsp_path = xpath_text(player_data, self._xpath_ns('RtspPath', self._NS))
+        if rtsp_path:
+            fmt = {
+                'url': rtsp_path,
+                'format_id': 'rtsp',
+            }
+            fmt.update(common_fmt)
+            formats.append(fmt)
+        for source in player_data.findall(self._xpath_ns('.//Source', self._NS)):
+            video_url = xpath_text(source, self._xpath_ns('File', self._NS))
+            if not video_url:
+                continue
+            if '.m3u8' in video_url:
+                formats.extend(self._extract_m3u8_formats(video_url, content_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+            else:
+                fmt = {
+                    'url': video_url,
+                    'format_id': video_url.split(':')[0],
+                }
+                fmt.update(common_fmt)
+                rtmp = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', video_url)
+                if rtmp:
+                    fmt.update({
+                        'url': rtmp.group('url'),
+                        'play_path': rtmp.group('playpath'),
+                        'app': rtmp.group('app'),
+                        'ext': 'flv',
+                    })
+                formats.append(fmt)
+        self._sort_formats(formats)
+
+        thumbnails = []
+        for content_asset in content_data.findall('.//contentAssets'):
+            asset_type = xpath_text(content_asset, 'type')
+            if asset_type == 'image':
+                image_url = xpath_text(content_asset, 'httpPath')
+                if not image_url:
+                    continue
+                thumbnails.append({
+                    'id': xpath_text(content_asset, 'ID'),
+                    'url': image_url,
+                })
+
+        return {
+            'id': content_id,
+            'title': title,
+            'description': xpath_text(metadata, 'abstract'),
+            'duration': int_or_none(xpath_text(metadata, 'duration')),
+            'timestamp': parse_iso8601(xpath_text(metadata, 'dateUpdated')),
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/ringtv.py b/youtube_dl/extractor/ringtv.py

index efa4afeb6a6615a4fa1e90781f27d3dd65083810..2c2c707bd36ad3f737072bf1f9011027e0514bd9 100644 (file)
--- a/youtube_dl/extractor/ringtv.py
+++ b/youtube_dl/extractor/ringtv.py
@@ -6,15 +6,15 @@ from .common import InfoExtractor
  
  
  class RingTVIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
      _TEST = {
-        "url": "http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30",
-        "md5": "d25945f5df41cdca2d2587165ac28720",
-        "info_dict": {
+        'url': 'http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30',
+        'md5': 'd25945f5df41cdca2d2587165ac28720',
+        'info_dict': {
              'id': '857645',
              'ext': 'mp4',
-            "title": 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
-            "description": 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
+            'title': 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
+            'description': 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
          }
      }
  
@@ -32,8 +32,8 @@ class RingTVIE(InfoExtractor):
          description = self._html_search_regex(
              r'addthis:description="([^"]+)"',
              webpage, 'description', fatal=False)
-        final_url = "http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4" % video_id
-        thumbnail_url = "http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg" % video_id
+        final_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4' % video_id
+        thumbnail_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg' % video_id
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/rottentomatoes.py b/youtube_dl/extractor/rottentomatoes.py

index e8bb20a0803700937875355d2f854d1de88cea1a..f9cd48790c3b4a92b82bf1880020d53a074b1434 100644 (file)
--- a/youtube_dl/extractor/rottentomatoes.py
+++ b/youtube_dl/extractor/rottentomatoes.py
@@ -1,11 +1,11 @@
  from __future__ import unicode_literals
  
-from .videodetective import VideoDetectiveIE
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from .internetvideoarchive import InternetVideoArchiveIE
  
  
-# It just uses the same method as videodetective.com,
-# the internetvideoarchive.com is extracted from the og:video property
-class RottenTomatoesIE(VideoDetectiveIE):
+class RottenTomatoesIE(InfoExtractor):
      _VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
  
      _TEST = {
@@ -13,7 +13,19 @@ class RottenTomatoesIE(VideoDetectiveIE):
          'info_dict': {
              'id': '613340',
              'ext': 'mp4',
-            'title': 'TOY STORY 3',
-            'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
+            'title': 'Toy Story 3',
          },
      }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        og_video = self._og_search_video_url(webpage)
+        query = compat_urlparse.urlparse(og_video).query
+
+        return {
+            '_type': 'url_transparent',
+            'url': InternetVideoArchiveIE._build_xml_url(query),
+            'ie_key': InternetVideoArchiveIE.ie_key(),
+            'title': self._og_search_title(webpage),
+        }
diff --git a/youtube_dl/extractor/rtbf.py b/youtube_dl/extractor/rtbf.py

index e4215d546219bb95fe79abfb184da149148962db..28cc5522d89083cec2ad7631d51fb0aa0798ccbd 100644 (file)
--- a/youtube_dl/extractor/rtbf.py
+++ b/youtube_dl/extractor/rtbf.py
@@ -4,60 +4,95 @@ from __future__ import unicode_literals
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    unescapeHTML,
+    ExtractorError,
  )
  
  
  class RTBFIE(InfoExtractor):
-    _VALID_URL = r'https?://www.rtbf.be/video/[^\?]+\?id=(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'''(?x)
+        https?://(?:www\.)?rtbf\.be/
+        (?:
+            video/[^?]+\?.*\bid=|
+            ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=|
+            auvio/[^/]+\?.*id=
+        )(?P<id>\d+)'''
+    _TESTS = [{
          'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
          'md5': '799f334ddf2c0a582ba80c44655be570',
          'info_dict': {
              'id': '1921274',
              'ext': 'mp4',
              'title': 'Les Diables au coeur (épisode 2)',
+            'description': 'Football - Diables Rouges',
              'duration': 3099,
+            'upload_date': '20140425',
+            'timestamp': 1398456336,
+            'uploader': 'rtbfsport',
          }
+    }, {
+        # geo restricted
+        'url': 'http://www.rtbf.be/ouftivi/heros/detail_scooby-doo-mysteres-associes?id=1097&videoId=2057442',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.rtbf.be/auvio/detail_jeudi-en-prime-siegfried-bracke?id=2102996',
+        'only_matching': True,
+    }]
+    _IMAGE_HOST = 'http://ds1.ds.static.rtbf.be'
+    _PROVIDERS = {
+        'YOUTUBE': 'Youtube',
+        'DAILYMOTION': 'Dailymotion',
+        'VIMEO': 'Vimeo',
      }
-
      _QUALITIES = [
-        ('mobile', 'mobile'),
-        ('web', 'SD'),
-        ('url', 'MD'),
+        ('mobile', 'SD'),
+        ('web', 'MD'),
          ('high', 'HD'),
      ]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        data = self._download_json(
+            'http://www.rtbf.be/api/media/video?method=getVideoDetail&args[]=%s' % video_id, video_id)
+
+        error = data.get('error')
+        if error:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
  
-        webpage = self._download_webpage(
-            'http://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
+        data = data['data']
  
-        data = self._parse_json(
-            unescapeHTML(self._search_regex(
-                r'data-video="([^"]+)"', webpage, 'data video')),
-            video_id)
+        provider = data.get('provider')
+        if provider in self._PROVIDERS:
+            return self.url_result(data['url'], self._PROVIDERS[provider])
  
-        if data.get('provider').lower() == 'youtube':
-            video_url = data.get('downloadUrl') or data.get('url')
-            return self.url_result(video_url, 'Youtube')
          formats = []
          for key, format_id in self._QUALITIES:
-            format_url = data['sources'].get(key)
+            format_url = data.get(key + 'Url')
              if format_url:
                  formats.append({
                      'format_id': format_id,
                      'url': format_url,
                  })
  
+        thumbnails = []
+        for thumbnail_id, thumbnail_url in data.get('thumbnail', {}).items():
+            if thumbnail_id != 'default':
+                thumbnails.append({
+                    'url': self._IMAGE_HOST + thumbnail_url,
+                    'id': thumbnail_id,
+                })
+
          return {
              'id': video_id,
              'formats': formats,
              'title': data['title'],
              'description': data.get('description') or data.get('subtitle'),
-            'thumbnail': data.get('thumbnail'),
+            'thumbnails': thumbnails,
              'duration': data.get('duration') or data.get('realDuration'),
              'timestamp': int_or_none(data.get('created')),
              'view_count': int_or_none(data.get('viewCount')),
+            'uploader': data.get('channel'),
+            'tags': data.get('tags'),
          }
diff --git a/youtube_dl/extractor/rte.py b/youtube_dl/extractor/rte.py

index 04158b9931c09d9d028175dc895e90cf7da6a7f3..ebe563ebb89e86e28a6bf55669cd066aca44d851 100644 (file)
--- a/youtube_dl/extractor/rte.py
+++ b/youtube_dl/extractor/rte.py
@@ -1,24 +1,29 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
+import re
  
+from .common import InfoExtractor
  from ..utils import (
      float_or_none,
+    parse_iso8601,
+    unescapeHTML,
  )
  
  
  class RteIE(InfoExtractor):
-    _VALID_URL = r'http?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/(?P<id>[0-9]+)/'
+    IE_NAME = 'rte'
+    IE_DESC = 'Raidió Teilifís Éireann TV'
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
      _TEST = {
-        'url': 'http://www.rte.ie/player/de/show/10363114/',
+        'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
          'info_dict': {
-            'id': '10363114',
-            'ext': 'mp4',
-            'title': 'One News',
+            'id': '10478715',
+            'ext': 'flv',
+            'title': 'Watch iWitness  online',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'The One O\'Clock News followed by Weather.',
-            'duration': 436.844,
+            'description': 'iWitness : The spirit of Ireland, one voice and one minute at a time.',
+            'duration': 60.046,
          },
          'params': {
              'skip_download': 'f4m fails with --test atm'
@@ -34,23 +39,22 @@ class RteIE(InfoExtractor):
          duration = float_or_none(self._html_search_meta(
              'duration', webpage, 'duration', fatal=False), 1000)
  
-        thumbnail_id = self._search_regex(
-            r'<meta name="thumbnail" content="uri:irus:(.*?)" />', webpage, 'thumbnail')
-        thumbnail = 'http://img.rasset.ie/' + thumbnail_id + '.jpg'
+        thumbnail = None
+        thumbnail_meta = self._html_search_meta('thumbnail', webpage)
+        if thumbnail_meta:
+            thumbnail_id = self._search_regex(
+                r'uri:irus:(.+)', thumbnail_meta,
+                'thumbnail id', fatal=False)
+            if thumbnail_id:
+                thumbnail = 'http://img.rasset.ie/%s.jpg' % thumbnail_id
  
-        feeds_url = self._html_search_meta("feeds-prefix", webpage, 'feeds url') + video_id
+        feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
          json_string = self._download_json(feeds_url, video_id)
  
          # f4m_url = server + relative_url
          f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
          f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
-        f4m_formats = [{
-            'format_id': f['format_id'],
-            'url': f['url'],
-            'ext': 'mp4',
-            'width': f['width'],
-            'height': f['height'],
-        } for f in f4m_formats]
+        self._sort_formats(f4m_formats)
  
          return {
              'id': video_id,
@@ -60,3 +64,102 @@ class RteIE(InfoExtractor):
              'thumbnail': thumbnail,
              'duration': duration,
          }
+
+
+class RteRadioIE(InfoExtractor):
+    IE_NAME = 'rte:radio'
+    IE_DESC = 'Raidió Teilifís Éireann radio'
+    # Radioplayer URLs have two distinct specifier formats,
+    # the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
+    # the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
+    # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
+    # An <id> uniquely defines an individual recording, and is the only part we require.
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
+
+    _TESTS = [{
+        # Old-style player URL; HLS and RTMPE formats
+        'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
+        'info_dict': {
+            'id': '10507902',
+            'ext': 'mp4',
+            'title': 'Gloria',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
+            'timestamp': 1451203200,
+            'upload_date': '20151227',
+            'duration': 7230.0,
+        },
+        'params': {
+            'skip_download': 'f4m fails with --test atm'
+        }
+    }, {
+        # New-style player URL; RTMPE formats only
+        'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
+        'info_dict': {
+            'id': '3250678',
+            'ext': 'flv',
+            'title': 'The Lyric Concert with Paul Herriott',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': '',
+            'timestamp': 1333742400,
+            'upload_date': '20120406',
+            'duration': 7199.016,
+        },
+        'params': {
+            'skip_download': 'f4m fails with --test atm'
+        }
+    }]
+
+    def _real_extract(self, url):
+        item_id = self._match_id(url)
+
+        json_string = self._download_json(
+            'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
+            item_id)
+
+        # NB the string values in the JSON are stored using XML escaping(!)
+        show = json_string['shows'][0]
+        title = unescapeHTML(show['title'])
+        description = unescapeHTML(show.get('description'))
+        thumbnail = show.get('thumbnail')
+        duration = float_or_none(show.get('duration'), 1000)
+        timestamp = parse_iso8601(show.get('published'))
+
+        mg = show['media:group'][0]
+
+        formats = []
+
+        if mg.get('url'):
+            m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url'])
+            if m:
+                m = m.groupdict()
+                formats.append({
+                    'url': m['url'] + '/' + m['app'],
+                    'app': m['app'],
+                    'play_path': m['playpath'],
+                    'player_url': url,
+                    'ext': 'flv',
+                    'format_id': 'rtmp',
+                })
+
+        if mg.get('hls_server') and mg.get('hls_url'):
+            formats.extend(self._extract_m3u8_formats(
+                mg['hls_server'] + mg['hls_url'], item_id, 'mp4',
+                entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+
+        if mg.get('hds_server') and mg.get('hds_url'):
+            formats.extend(self._extract_f4m_formats(
+                mg['hds_server'] + mg['hds_url'], item_id,
+                f4m_id='hds', fatal=False))
+
+        self._sort_formats(formats)
+
+        return {
+            'id': item_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/rtl2.py b/youtube_dl/extractor/rtl2.py

index 72cd80498328ca4af9a9ac008afb16c5f0300c30..de004671d564eb455e45361666fd304f8ca040a6 100644 (file)
--- a/youtube_dl/extractor/rtl2.py
+++ b/youtube_dl/extractor/rtl2.py
@@ -1,6 +1,7 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
+import re
  from .common import InfoExtractor
  
  
@@ -8,22 +9,28 @@ class RTL2IE(InfoExtractor):
      _VALID_URL = r'http?://(?:www\.)?rtl2\.de/[^?#]*?/(?P<id>[^?#/]*?)(?:$|/(?:$|[?#]))'
      _TESTS = [{
          'url': 'http://www.rtl2.de/sendung/grip-das-motormagazin/folge/folge-203-0',
-        'md5': 'bfcc179030535b08dc2b36b469b5adc7',
          'info_dict': {
              'id': 'folge-203-0',
              'ext': 'f4v',
              'title': 'GRIP sucht den Sommerkönig',
              'description': 'Matthias, Det und Helge treten gegeneinander an.'
          },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.rtl2.de/sendung/koeln-50667/video/5512-anna/21040-anna-erwischt-alex/',
-        'md5': 'ffcd517d2805b57ce11a58a2980c2b02',
          'info_dict': {
              'id': '21040-anna-erwischt-alex',
              'ext': 'mp4',
              'title': 'Anna erwischt Alex!',
              'description': 'Anna ist Alex\' Tochter bei Köln 50667.'
          },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
@@ -34,12 +41,18 @@ class RTL2IE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        vico_id = self._html_search_regex(
-            r'vico_id\s*:\s*([0-9]+)', webpage, 'vico_id')
-        vivi_id = self._html_search_regex(
-            r'vivi_id\s*:\s*([0-9]+)', webpage, 'vivi_id')
+        mobj = re.search(
+            r'<div[^>]+data-collection="(?P<vico_id>\d+)"[^>]+data-video="(?P<vivi_id>\d+)"',
+            webpage)
+        if mobj:
+            vico_id = mobj.group('vico_id')
+            vivi_id = mobj.group('vivi_id')
+        else:
+            vico_id = self._html_search_regex(
+                r'vico_id\s*:\s*([0-9]+)', webpage, 'vico_id')
+            vivi_id = self._html_search_regex(
+                r'vivi_id\s*:\s*([0-9]+)', webpage, 'vivi_id')
          info_url = 'http://www.rtl2.de/video/php/get_video.php?vico_id=' + vico_id + '&vivi_id=' + vivi_id
-        webpage = self._download_webpage(info_url, '')
  
          info = self._download_json(info_url, video_id)
          video_info = info['video']
@@ -50,7 +63,7 @@ class RTL2IE(InfoExtractor):
          download_url = video_info['streamurl']
          download_url = download_url.replace('\\', '')
          stream_url = 'mp4:' + self._html_search_regex(r'ondemand/(.*)', download_url, 'stream URL')
-        rtmp_conn = ["S:connect", "O:1", "NS:pageUrl:" + url, "NB:fpad:0", "NN:videoFunction:1", "O:0"]
+        rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
  
          formats = [{
              'url': download_url,
diff --git a/youtube_dl/extractor/rtp.py b/youtube_dl/extractor/rtp.py

index ecf4939cdc031683eca7ddd7240a2439f803947d..82b323cdd4e40b027d3a6c2c06e9ea9d58b171e2 100644 (file)
--- a/youtube_dl/extractor/rtp.py
+++ b/youtube_dl/extractor/rtp.py
@@ -18,6 +18,10 @@ class RTPIE(InfoExtractor):
              'description': 'As paixões musicais de António Cartaxo e António Macedo',
              'thumbnail': 're:^https?://.*\.jpg',
          },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
          'only_matching': True,
diff --git a/youtube_dl/extractor/rts.py b/youtube_dl/extractor/rts.py

index 12639f08bbc24b2c520b5d93a36b61dbb5e7d831..3cc32847b7d0ffb937465a4b5f2d9f33f864bc09 100644 (file)
--- a/youtube_dl/extractor/rts.py
+++ b/youtube_dl/extractor/rts.py
@@ -3,7 +3,7 @@ from __future__ import unicode_literals
  
  import re
  
-from .common import InfoExtractor
+from .srgssr import SRGSSRIE
  from ..compat import (
      compat_str,
      compat_urllib_parse_urlparse,
@@ -17,23 +17,14 @@ from ..utils import (
  )
  
  
-class RTSIE(InfoExtractor):
+class RTSIE(SRGSSRIE):
      IE_DESC = 'RTS.ch'
-    _VALID_URL = r'''(?x)
-                    (?:
-                        rts:(?P<rts_id>\d+)|
-                        https?://
-                            (?:www\.)?rts\.ch/
-                            (?:
-                                (?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html|
-                                play/tv/[^/]+/video/(?P<display_id_new>.+?)\?id=(?P<id_new>[0-9]+)
-                            )
-                    )'''
+    _VALID_URL = r'rts:(?P<rts_id>\d+)|https?://(?:www\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html'
  
      _TESTS = [
          {
              'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
-            'md5': '753b877968ad8afaeddccc374d4256a5',
+            'md5': 'f254c4b26fb1d3c183793d52bc40d3e7',
              'info_dict': {
                  'id': '3449373',
                  'display_id': 'les-enfants-terribles',
@@ -47,13 +38,17 @@ class RTSIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.image',
                  'view_count': int,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
          },
          {
              'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
-            'md5': 'c148457a27bdc9e5b1ffe081a7a8337b',
+            'md5': 'f1077ac5af686c76528dc8d7c5df29ba',
              'info_dict': {
-                'id': '5624067',
-                'display_id': 'entre-ciel-et-mer',
+                'id': '5742494',
+                'display_id': '5742494',
                  'ext': 'mp4',
                  'duration': 3720,
                  'title': 'Les yeux dans les cieux - Mon homard au Canada',
@@ -64,6 +59,10 @@ class RTSIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.image',
                  'view_count': int,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
          },
          {
              'url': 'http://www.rts.ch/video/sport/hockey/5745975-1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski.html',
@@ -85,7 +84,7 @@ class RTSIE(InfoExtractor):
          },
          {
              'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
-            'md5': '9bb06503773c07ce83d3cbd793cebb91',
+            'md5': '9f713382f15322181bb366cc8c3a4ff0',
              'info_dict': {
                  'id': '5745356',
                  'display_id': 'londres-cachee-par-un-epais-smog',
@@ -99,6 +98,10 @@ class RTSIE(InfoExtractor):
                  'thumbnail': 're:^https?://.*\.image',
                  'view_count': int,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
          },
          {
              'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
@@ -114,23 +117,6 @@ class RTSIE(InfoExtractor):
                  'timestamp': 1396551600,
              },
          },
-        {
-            'url': 'http://www.rts.ch/play/tv/-/video/le-19h30?id=6348260',
-            'md5': '968777c8779e5aa2434be96c54e19743',
-            'info_dict': {
-                'id': '6348260',
-                'display_id': 'le-19h30',
-                'ext': 'mp4',
-                'duration': 1796,
-                'title': 'Le 19h30',
-                'description': '',
-                'uploader': 'Le 19h30',
-                'upload_date': '20141201',
-                'timestamp': 1417458600,
-                'thumbnail': 're:^https?://.*\.image',
-                'view_count': int,
-            },
-        },
          {
              # article with videos on rhs
              'url': 'http://www.rts.ch/sport/hockey/6693917-hockey-davos-decroche-son-31e-titre-de-champion-de-suisse.html',
@@ -139,42 +125,47 @@ class RTSIE(InfoExtractor):
                  'title': 'Hockey: Davos décroche son 31e titre de champion de Suisse',
              },
              'playlist_mincount': 5,
-        },
-        {
-            'url': 'http://www.rts.ch/play/tv/le-19h30/video/le-chantier-du-nouveau-parlement-vaudois-a-permis-une-trouvaille-historique?id=6348280',
-            'only_matching': True,
          }
      ]
  
      def _real_extract(self, url):
          m = re.match(self._VALID_URL, url)
-        video_id = m.group('rts_id') or m.group('id') or m.group('id_new')
-        display_id = m.group('display_id') or m.group('display_id_new')
+        media_id = m.group('rts_id') or m.group('id')
+        display_id = m.group('display_id') or media_id
  
          def download_json(internal_id):
              return self._download_json(
                  'http://www.rts.ch/a/%s.html?f=json/article' % internal_id,
                  display_id)
  
-        all_info = download_json(video_id)
+        all_info = download_json(media_id)
  
-        # video_id extracted out of URL is not always a real id
+        # media_id extracted out of URL is not always a real id
          if 'video' not in all_info and 'audio' not in all_info:
              page = self._download_webpage(url, display_id)
  
              # article with videos on rhs
              videos = re.findall(
-                r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:rts:video:(\d+)"',
+                r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:([^"]+)"',
                  page)
+            if not videos:
+                videos = re.findall(
+                    r'(?s)<iframe[^>]+class="srg-player"[^>]+src="[^"]+urn:([^"]+)"',
+                    page)
              if videos:
-                entries = [self.url_result('rts:%s' % video_urn, 'RTS') for video_urn in videos]
-                return self.playlist_result(entries, video_id, self._og_search_title(page))
+                entries = [self.url_result('srgssr:%s' % video_urn, 'SRGSSR') for video_urn in videos]
+                return self.playlist_result(entries, media_id, self._og_search_title(page))
  
              internal_id = self._html_search_regex(
                  r'<(?:video|audio) data-id="([0-9]+)"', page,
                  'internal video id')
              all_info = download_json(internal_id)
  
+        media_type = 'video' if 'video' in all_info else 'audio'
+
+        # check for errors
+        self.get_media_data('rts', media_type, media_id)
+
          info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
  
          upload_timestamp = parse_iso8601(info.get('broadcast_date'))
@@ -190,19 +181,23 @@ class RTSIE(InfoExtractor):
  
          formats = []
          for format_id, format_url in info['streams'].items():
+            if format_id == 'hds_sd' and 'hds' in info['streams']:
+                continue
+            if format_id == 'hls_sd' and 'hls' in info['streams']:
+                continue
              if format_url.endswith('.f4m'):
                  token = self._download_xml(
                      'http://tp.srgssr.ch/token/akahd.xml?stream=%s/*' % compat_urllib_parse_urlparse(format_url).path,
-                    video_id, 'Downloading %s token' % format_id)
+                    media_id, 'Downloading %s token' % format_id)
                  auth_params = xpath_text(token, './/authparams', 'auth params')
                  if not auth_params:
                      continue
                  formats.extend(self._extract_f4m_formats(
                      '%s?%s&hdcore=3.4.0&plugin=aasp-3.4.0.132.66' % (format_url, auth_params),
-                    video_id, f4m_id=format_id))
+                    media_id, f4m_id=format_id, fatal=False))
              elif format_url.endswith('.m3u8'):
                  formats.extend(self._extract_m3u8_formats(
-                    format_url, video_id, 'mp4', m3u8_id=format_id))
+                    format_url, media_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
              else:
                  formats.append({
                      'format_id': format_id,
@@ -217,11 +212,11 @@ class RTSIE(InfoExtractor):
                  'tbr': media['rate'] or extract_bitrate(media['url']),
              } for media in info['media'] if media.get('rate')])
  
-        self._check_formats(formats, video_id)
+        self._check_formats(formats, media_id)
          self._sort_formats(formats)
  
          return {
-            'id': video_id,
+            'id': media_id,
              'display_id': display_id,
              'formats': formats,
              'title': info['title'],
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index 82cd98ac742bf436b24fbbc77cac9a6fb8a44ff6..79af477158630503078d86b117f960a36f5f1f73 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -6,11 +6,12 @@ import re
  import time
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
      float_or_none,
      remove_end,
+    remove_start,
+    sanitized_Request,
      std_headers,
      struct_unpack,
  )
@@ -61,7 +62,7 @@ def _decrypt_url(png):
  class RTVEALaCartaIE(InfoExtractor):
      IE_NAME = 'rtve.es:alacarta'
      IE_DESC = 'RTVE a la carta'
-    _VALID_URL = r'http://www\.rtve\.es/(m/)?alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
+    _VALID_URL = r'https?://www\.rtve\.es/(m/)?alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
@@ -102,18 +103,14 @@ class RTVEALaCartaIE(InfoExtractor):
          if info['state'] == 'DESPU':
              raise ExtractorError('The video is no longer available', expected=True)
          png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id)
-        png = self._download_webpage(png_url, video_id, 'Downloading url information')
+        png_request = sanitized_Request(png_url)
+        png_request.add_header('Referer', url)
+        png = self._download_webpage(png_request, video_id, 'Downloading url information')
          video_url = _decrypt_url(png)
          if not video_url.endswith('.f4m'):
-            auth_url = video_url.replace(
+            video_url = video_url.replace(
                  'resources/', 'auth/resources/'
              ).replace('.net.rtve', '.multimedia.cdn.rtve')
-            video_path = self._download_webpage(
-                auth_url, video_id, 'Getting video url')
-            # Use mvod1.akcdn instead of flash.akamaihd.multimedia.cdn to get
-            # the right Content-Length header and the mp4 format
-            video_url = compat_urlparse.urljoin(
-                'http://mvod1.akcdn.rtve.es/', video_path)
  
          subtitles = None
          if info.get('sbtFile') is not None:
@@ -182,14 +179,14 @@ class RTVEInfantilIE(InfoExtractor):
  class RTVELiveIE(InfoExtractor):
      IE_NAME = 'rtve.es:live'
      IE_DESC = 'RTVE.es live streams'
-    _VALID_URL = r'http://www\.rtve\.es/(?:deportes/directo|noticias|television)/(?P<id>[a-zA-Z0-9-]+)'
+    _VALID_URL = r'https?://www\.rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
  
      _TESTS = [{
-        'url': 'http://www.rtve.es/noticias/directo-la-1/',
+        'url': 'http://www.rtve.es/directo/la-1/',
          'info_dict': {
-            'id': 'directo-la-1',
-            'ext': 'flv',
-            'title': 're:^La 1 de TVE [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
+            'id': 'la-1',
+            'ext': 'mp4',
+            'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
          },
          'params': {
              'skip_download': 'live stream',
@@ -202,23 +199,21 @@ class RTVELiveIE(InfoExtractor):
          video_id = mobj.group('id')
  
          webpage = self._download_webpage(url, video_id)
-        player_url = self._search_regex(
-            r'<param name="movie" value="([^"]+)"/>', webpage, 'player URL')
-        title = remove_end(self._og_search_title(webpage), ' en directo')
+        title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
+        title = remove_start(title, 'Estoy viendo ')
          title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
  
          vidplayer_id = self._search_regex(
-            r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
-        png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
+            r'playerId=player([0-9]+)', webpage, 'internal video ID')
+        png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
          png = self._download_webpage(png_url, video_id, 'Downloading url information')
-        video_url = _decrypt_url(png)
+        m3u8_url = _decrypt_url(png)
+        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'ext': 'flv',
              'title': title,
-            'url': video_url,
-            'app': 'rtve-live-live?ovpfv=2.1.2',
-            'player_url': player_url,
-            'rtmp_live': True,
+            'formats': formats,
+            'is_live': True,
          }
diff --git a/youtube_dl/extractor/rtvnh.py b/youtube_dl/extractor/rtvnh.py

new file mode 100644 (file)

index 0000000..4896d09
--- /dev/null
+++ b/youtube_dl/extractor/rtvnh.py
@@ -0,0 +1,48 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+
+class RTVNHIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?rtvnh\.nl/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.rtvnh.nl/video/131946',
+        'md5': '6e1d0ab079e2a00b6161442d3ceacfc1',
+        'info_dict': {
+            'id': '131946',
+            'ext': 'mp4',
+            'title': 'Grote zoektocht in zee bij Zandvoort naar vermiste vrouw',
+            'thumbnail': 're:^https?:.*\.jpg$'
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        meta = self._parse_json(self._download_webpage(
+            'http://www.rtvnh.nl/video/json?m=' + video_id, video_id), video_id)
+
+        status = meta.get('status')
+        if status != 200:
+            raise ExtractorError(
+                '%s returned error code %d' % (self.IE_NAME, status), expected=True)
+
+        formats = self._extract_smil_formats(
+            'http://www.rtvnh.nl/video/smil?m=' + video_id, video_id, fatal=False)
+
+        for item in meta['source']['fb']:
+            if item.get('type') == 'hls':
+                formats.extend(self._extract_m3u8_formats(
+                    item['file'], video_id, ext='mp4', entry_protocol='m3u8_native'))
+            elif item.get('type') == '':
+                formats.append({'url': item['file']})
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': meta['title'].strip(),
+            'thumbnail': meta.get('image'),
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/ruhd.py b/youtube_dl/extractor/ruhd.py

index 0e470e73f538fd60d7ed34cbe515042f6abc078b..1f7c262993c8ce7e0d602f612fc6316e80052f66 100644 (file)
--- a/youtube_dl/extractor/ruhd.py
+++ b/youtube_dl/extractor/ruhd.py
@@ -5,7 +5,7 @@ from .common import InfoExtractor
  
  
  class RUHDIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?ruhd\.ru/play\.php\?vid=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?ruhd\.ru/play\.php\?vid=(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.ruhd.ru/play.php?vid=207',
          'md5': 'd1a9ec4edf8598e3fbd92bb16072ba83',
diff --git a/youtube_dl/extractor/ruleporn.py b/youtube_dl/extractor/ruleporn.py

new file mode 100644 (file)

index 0000000..ebf9808
--- /dev/null
+++ b/youtube_dl/extractor/ruleporn.py
@@ -0,0 +1,44 @@
+from __future__ import unicode_literals
+
+from .nuevo import NuevoBaseIE
+
+
+class RulePornIE(NuevoBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?ruleporn\.com/(?:[^/?#&]+/)*(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://ruleporn.com/brunette-nympho-chick-takes-her-boyfriend-in-every-angle/',
+        'md5': '86861ebc624a1097c7c10eaf06d7d505',
+        'info_dict': {
+            'id': '48212',
+            'display_id': 'brunette-nympho-chick-takes-her-boyfriend-in-every-angle',
+            'ext': 'mp4',
+            'title': 'Brunette Nympho Chick Takes Her Boyfriend In Every Angle',
+            'description': 'md5:6d28be231b981fff1981deaaa03a04d5',
+            'age_limit': 18,
+            'duration': 635.1,
+        }
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_id = self._search_regex(
+            r'lovehomeporn\.com/embed/(\d+)', webpage, 'video id')
+
+        title = self._search_regex(
+            r'<h2[^>]+title=(["\'])(?P<url>.+?)\1',
+            webpage, 'title', group='url')
+        description = self._html_search_meta('description', webpage)
+
+        info = self._extract_nuevo(
+            'http://lovehomeporn.com/media/nuevo/econfig.php?key=%s&rp=true' % video_id,
+            video_id)
+        info.update({
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'age_limit': 18
+        })
+        return info
diff --git a/youtube_dl/extractor/rutube.py b/youtube_dl/extractor/rutube.py

index 5b1c3577a02bb541d912faa6958533086534af01..9ca4ae147cb1e3c430de3abd9fd0927aaee2ed5a 100644 (file)
--- a/youtube_dl/extractor/rutube.py
+++ b/youtube_dl/extractor/rutube.py
@@ -9,7 +9,7 @@ from ..compat import (
      compat_str,
  )
  from ..utils import (
-    ExtractorError,
+    determine_ext,
      unified_strdate,
  )
  
@@ -17,9 +17,9 @@ from ..utils import (
  class RutubeIE(InfoExtractor):
      IE_NAME = 'rutube'
      IE_DESC = 'Rutube videos'
-    _VALID_URL = r'https?://rutube\.ru/video/(?P<id>[\da-z]{32})'
+    _VALID_URL = r'https?://rutube\.ru/(?:video|play/embed)/(?P<id>[\da-z]{32})'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
          'info_dict': {
              'id': '3eac3b4561676c17df9132a9a1e62e3e',
@@ -30,12 +30,16 @@ class RutubeIE(InfoExtractor):
              'uploader': 'NTDRussian',
              'uploader_id': '29790',
              'upload_date': '20131016',
+            'age_limit': 0,
          },
          'params': {
              # It requires ffmpeg (m3u8 download)
              'skip_download': True,
          },
-    }
+    }, {
+        'url': 'http://rutube.ru/play/embed/a10e53b86e8f349080f718582ce4c661',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -50,10 +54,21 @@ class RutubeIE(InfoExtractor):
              'http://rutube.ru/api/play/options/%s/?format=json' % video_id,
              video_id, 'Downloading options JSON')
  
-        m3u8_url = options['video_balancer'].get('m3u8')
-        if m3u8_url is None:
-            raise ExtractorError('Couldn\'t find m3u8 manifest url')
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        formats = []
+        for format_id, format_url in options['video_balancer'].items():
+            ext = determine_ext(format_url)
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    format_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    format_url, video_id, f4m_id=format_id, fatal=False))
+            else:
+                formats.append({
+                    'url': format_url,
+                    'format_id': format_id,
+                })
+        self._sort_formats(formats)
  
          return {
              'id': video['id'],
@@ -73,9 +88,9 @@ class RutubeIE(InfoExtractor):
  class RutubeEmbedIE(InfoExtractor):
      IE_NAME = 'rutube:embed'
      IE_DESC = 'Rutube embedded videos'
-    _VALID_URL = 'https?://rutube\.ru/video/embed/(?P<id>[0-9]+)'
+    _VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',
          'info_dict': {
              'id': 'a10e53b86e8f349080f718582ce4c661',
@@ -89,7 +104,10 @@ class RutubeEmbedIE(InfoExtractor):
          'params': {
              'skip_download': 'Requires ffmpeg',
          },
-    }
+    }, {
+        'url': 'http://rutube.ru/play/embed/8083783',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          embed_id = self._match_id(url)
@@ -104,7 +122,7 @@ class RutubeEmbedIE(InfoExtractor):
  class RutubeChannelIE(InfoExtractor):
      IE_NAME = 'rutube:channel'
      IE_DESC = 'Rutube channels'
-    _VALID_URL = r'http://rutube\.ru/tags/video/(?P<id>\d+)'
+    _VALID_URL = r'https?://rutube\.ru/tags/video/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://rutube.ru/tags/video/1800/',
          'info_dict': {
@@ -138,7 +156,7 @@ class RutubeChannelIE(InfoExtractor):
  class RutubeMovieIE(RutubeChannelIE):
      IE_NAME = 'rutube:movie'
      IE_DESC = 'Rutube movies'
-    _VALID_URL = r'http://rutube\.ru/metainfo/tv/(?P<id>\d+)'
+    _VALID_URL = r'https?://rutube\.ru/metainfo/tv/(?P<id>\d+)'
      _TESTS = []
  
      _MOVIE_TEMPLATE = 'http://rutube.ru/api/metainfo/tv/%s/?format=json'
@@ -156,7 +174,7 @@ class RutubeMovieIE(RutubeChannelIE):
  class RutubePersonIE(RutubeChannelIE):
      IE_NAME = 'rutube:person'
      IE_DESC = 'Rutube person videos'
-    _VALID_URL = r'http://rutube\.ru/video/person/(?P<id>\d+)'
+    _VALID_URL = r'https?://rutube\.ru/video/person/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://rutube.ru/video/person/313878/',
          'info_dict': {
diff --git a/youtube_dl/extractor/rutv.py b/youtube_dl/extractor/rutv.py

index d9df0686133a6772deb1e58260069857620afc58..a2379eb04c2e6744a49f315ebee2a0c9fb0170f6 100644 (file)
--- a/youtube_dl/extractor/rutv.py
+++ b/youtube_dl/extractor/rutv.py
@@ -14,7 +14,7 @@ class RUTVIE(InfoExtractor):
      IE_DESC = 'RUTV.RU'
      _VALID_URL = r'''(?x)
          https?://player\.(?:rutv\.ru|vgtrk\.com)/
-            (?P<path>flash2v/container\.swf\?id=
+            (?P<path>flash\d+v/container\.swf\?id=
              |iframe/(?P<type>swf|video|live)/id/
              |index/iframe/cast_id/)
              (?P<id>\d+)'''
@@ -109,7 +109,7 @@ class RUTVIE(InfoExtractor):
              return mobj.group('url')
  
          mobj = re.search(
-            r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>https?://player\.(?:rutv\.ru|vgtrk\.com)/flash2v/container\.swf\?id=.+?\2)',
+            r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>https?://player\.(?:rutv\.ru|vgtrk\.com)/flash\d+v/container\.swf\?id=.+?\2)',
              webpage)
          if mobj:
              return mobj.group('url')
@@ -119,7 +119,7 @@ class RUTVIE(InfoExtractor):
          video_id = mobj.group('id')
          video_path = mobj.group('path')
  
-        if video_path.startswith('flash2v'):
+        if re.match(r'flash\d+v', video_path):
              video_type = 'video'
          elif video_path.startswith('iframe'):
              video_type = mobj.group('type')
@@ -131,7 +131,7 @@ class RUTVIE(InfoExtractor):
          is_live = video_type == 'live'
  
          json_data = self._download_json(
-            'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if is_live else '', video_id),
+            'http://player.rutv.ru/iframe/data%s/id/%s' % ('live' if is_live else 'video', video_id),
              video_id, 'Downloading JSON')
  
          if json_data['errors']:
@@ -168,7 +168,7 @@ class RUTVIE(InfoExtractor):
                          'play_path': mobj.group('playpath'),
                          'app': mobj.group('app'),
                          'page_url': 'http://player.rutv.ru',
-                        'player_url': 'http://player.rutv.ru/flash2v/osmf.swf?i=22',
+                        'player_url': 'http://player.rutv.ru/flash3v/osmf.swf?i=22',
                          'rtmp_live': True,
                          'ext': 'flv',
                          'vbr': int(quality),
diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py

index 4e22628d031bc462f66394384f2d398465105baf..ffea438cc4645c267c87b54a761394e0c1eca247 100644 (file)
--- a/youtube_dl/extractor/ruutu.py
+++ b/youtube_dl/extractor/ruutu.py
@@ -6,19 +6,19 @@ from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
      determine_ext,
      int_or_none,
+    xpath_attr,
      xpath_text,
  )
  
  
  class RuutuIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?ruutu\.fi/ohjelmat/(?:[^/?#]+/)*(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?ruutu\.fi/video/(?P<id>\d+)'
      _TESTS = [
          {
-            'url': 'http://www.ruutu.fi/ohjelmat/oletko-aina-halunnut-tietaa-mita-tapahtuu-vain-hetki-ennen-lahetysta-nyt-se-selvisi',
+            'url': 'http://www.ruutu.fi/video/2058907',
              'md5': 'ab2093f39be1ca8581963451b3c0234f',
              'info_dict': {
                  'id': '2058907',
-                'display_id': 'oletko-aina-halunnut-tietaa-mita-tapahtuu-vain-hetki-ennen-lahetysta-nyt-se-selvisi',
                  'ext': 'mp4',
                  'title': 'Oletko aina halunnut tietää mitä tapahtuu vain hetki ennen lähetystä? - Nyt se selvisi!',
                  'description': 'md5:cfc6ccf0e57a814360df464a91ff67d6',
@@ -28,14 +28,13 @@ class RuutuIE(InfoExtractor):
              },
          },
          {
-            'url': 'http://www.ruutu.fi/ohjelmat/superpesis/superpesis-katso-koko-kausi-ruudussa',
+            'url': 'http://www.ruutu.fi/video/2057306',
              'md5': '065a10ae4d5b8cfd9d0c3d332465e3d9',
              'info_dict': {
                  'id': '2057306',
-                'display_id': 'superpesis-katso-koko-kausi-ruudussa',
                  'ext': 'mp4',
                  'title': 'Superpesis: katso koko kausi Ruudussa',
-                'description': 'md5:44c44a99fdbe5b380ab74ebd75f0af77',
+                'description': 'md5:da2736052fef3b2bd5e0005e63c25eac',
                  'thumbnail': 're:^https?://.*\.jpg$',
                  'duration': 40,
                  'age_limit': 0,
@@ -44,29 +43,10 @@ class RuutuIE(InfoExtractor):
      ]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
+        video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'data-media-id="(\d+)"', webpage, 'media id')
-
-        video_xml_url = None
-
-        media_data = self._search_regex(
-            r'jQuery\.extend\([^,]+,\s*(.+?)\);', webpage,
-            'media data', default=None)
-        if media_data:
-            media_json = self._parse_json(media_data, display_id, fatal=False)
-            if media_json:
-                xml_url = media_json.get('ruutuplayer', {}).get('xmlUrl')
-                if xml_url:
-                    video_xml_url = xml_url.replace('{ID}', video_id)
-
-        if not video_xml_url:
-            video_xml_url = 'http://gatling.ruutu.fi/media-xml-cache?id=%s' % video_id
-
-        video_xml = self._download_xml(video_xml_url, video_id)
+        video_xml = self._download_xml(
+            'http://gatling.ruutu.fi/media-xml-cache?id=%s' % video_id, video_id)
  
          formats = []
          processed_urls = []
@@ -77,16 +57,17 @@ class RuutuIE(InfoExtractor):
                      extract_formats(child)
                  elif child.tag.endswith('File'):
                      video_url = child.text
-                    if not video_url or video_url in processed_urls or 'NOT_USED' in video_url:
+                    if (not video_url or video_url in processed_urls or
+                            any(p in video_url for p in ('NOT_USED', 'NOT-USED'))):
                          return
                      processed_urls.append(video_url)
                      ext = determine_ext(video_url)
                      if ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            video_url, video_id, 'mp4', m3u8_id='hls'))
+                            video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
                      elif ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
-                            video_url, video_id, f4m_id='hds'))
+                            video_url, video_id, f4m_id='hds', fatal=False))
                      else:
                          proto = compat_urllib_parse_urlparse(video_url).scheme
                          if not child.tag.startswith('HTTP') and proto != 'rtmp':
@@ -94,9 +75,12 @@ class RuutuIE(InfoExtractor):
                          preference = -1 if proto == 'rtmp' else 1
                          label = child.get('label')
                          tbr = int_or_none(child.get('bitrate'))
-                        width, height = [int_or_none(x) for x in child.get('resolution', '').split('x')]
+                        format_id = '%s-%s' % (proto, label if label else tbr) if label or tbr else proto
+                        if not self._is_valid_url(video_url, video_id, format_id):
+                            continue
+                        width, height = [int_or_none(x) for x in child.get('resolution', 'x').split('x')[:2]]
                          formats.append({
-                            'format_id': '%s-%s' % (proto, label if label else tbr),
+                            'format_id': format_id,
                              'url': video_url,
                              'width': width,
                              'height': height,
@@ -109,10 +93,9 @@ class RuutuIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'display_id': display_id,
-            'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'title': xpath_attr(video_xml, './/Behavior/Program', 'program_name', 'title', fatal=True),
+            'description': xpath_attr(video_xml, './/Behavior/Program', 'description', 'description'),
+            'thumbnail': xpath_attr(video_xml, './/Behavior/Startpicture', 'href', 'thumbnail'),
              'duration': int_or_none(xpath_text(video_xml, './/Runtime', 'duration')),
              'age_limit': int_or_none(xpath_text(video_xml, './/AgeLimit', 'age limit')),
              'formats': formats,
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index f3c80708c86ab2fc29fbd029b245bbe894af2dfb..6ba91f202baadbfd72160cc739efde868a60d421 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -4,49 +4,45 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
  
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
-    smuggle_url,
+    sanitized_Request,
      std_headers,
+    urlencode_postdata,
+    update_url_query,
  )
  
  
  class SafariBaseIE(InfoExtractor):
      _LOGIN_URL = 'https://www.safaribooksonline.com/accounts/login/'
      _SUCCESSFUL_LOGIN_REGEX = r'<a href="/accounts/logout/"[^>]*>Sign Out</a>'
-    _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to supply credentials for safaribooksonline.com'
      _NETRC_MACHINE = 'safari'
  
-    _API_BASE = 'https://www.safaribooksonline.com/api/v1/book'
+    _API_BASE = 'https://www.safaribooksonline.com/api/v1'
      _API_FORMAT = 'json'
  
      LOGGED_IN = False
  
      def _real_initialize(self):
-        # We only need to log in once for courses or individual videos
-        if not self.LOGGED_IN:
-            self._login()
-            SafariBaseIE.LOGGED_IN = True
+        self._login()
  
      def _login(self):
+        # We only need to log in once for courses or individual videos
+        if self.LOGGED_IN:
+            return
+
          (username, password) = self._get_login_info()
          if username is None:
-            raise ExtractorError(
-                self._ACCOUNT_CREDENTIALS_HINT,
-                expected=True)
+            return
  
-        headers = std_headers
+        headers = std_headers.copy()
          if 'Referer' not in headers:
              headers['Referer'] = self._LOGIN_URL
+        login_page_request = sanitized_Request(self._LOGIN_URL, headers=headers)
  
          login_page = self._download_webpage(
-            self._LOGIN_URL, None,
+            login_page_request, None,
              'Downloading login form')
  
          csrf = self._html_search_regex(
@@ -61,8 +57,8 @@ class SafariBaseIE(InfoExtractor):
              'next': '',
          }
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form), headers=headers)
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form), headers=headers)
          login_page = self._download_webpage(
              request, None, 'Logging in as %s' % username)
  
@@ -71,35 +67,27 @@ class SafariBaseIE(InfoExtractor):
                  'Login failed; make sure your credentials are correct and try again.',
                  expected=True)
  
+        SafariBaseIE.LOGGED_IN = True
+
          self.to_screen('Login successful')
  
  
  class SafariIE(SafariBaseIE):
      IE_NAME = 'safari'
      IE_DESC = 'safaribooksonline.com online video'
-    _VALID_URL = r'''(?x)https?://
-                            (?:www\.)?safaribooksonline\.com/
-                                (?:
-                                    library/view/[^/]+|
-                                    api/v1/book
-                                )/
-                                (?P<course_id>[^/]+)/
-                                    (?:chapter(?:-content)?/)?
-                                (?P<part>part\d+)\.html
-    '''
+    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/library/view/[^/]+/(?P<course_id>[^/]+)/(?P<part>part\d+)\.html'
  
      _TESTS = [{
          'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/part00.html',
-        'md5': '5b0c4cc1b3c1ba15dda7344085aa5592',
+        'md5': 'dcc5a425e79f2564148652616af1f2a3',
          'info_dict': {
-            'id': '2842601850001',
+            'id': '0_qbqx90ic',
              'ext': 'mp4',
-            'title': 'Introduction',
+            'title': 'Introduction to Hadoop Fundamentals LiveLessons',
+            'timestamp': 1437758058,
+            'upload_date': '20150724',
+            'uploader_id': 'stork',
          },
-        'skip': 'Requires safaribooksonline account credentials',
-    }, {
-        'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
-        'only_matching': True,
      }, {
          # non-digits in course id
          'url': 'https://www.safaribooksonline.com/library/view/create-a-nodejs/100000006A0210/part00.html',
@@ -108,18 +96,55 @@ class SafariIE(SafariBaseIE):
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        course_id = mobj.group('course_id')
-        part = mobj.group('part')
+        video_id = '%s/%s' % (mobj.group('course_id'), mobj.group('part'))
+
+        webpage = self._download_webpage(url, video_id)
+        reference_id = self._search_regex(
+            r'data-reference-id=(["\'])(?P<id>.+?)\1',
+            webpage, 'kaltura reference id', group='id')
+        partner_id = self._search_regex(
+            r'data-partner-id=(["\'])(?P<id>.+?)\1',
+            webpage, 'kaltura widget id', group='id')
+        ui_id = self._search_regex(
+            r'data-ui-id=(["\'])(?P<id>.+?)\1',
+            webpage, 'kaltura uiconf id', group='id')
+
+        query = {
+            'wid': '_%s' % partner_id,
+            'uiconf_id': ui_id,
+            'flashvars[referenceId]': reference_id,
+        }
+
+        if self.LOGGED_IN:
+            kaltura_session = self._download_json(
+                '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
+                video_id, 'Downloading kaltura session JSON',
+                'Unable to download kaltura session JSON', fatal=False)
+            if kaltura_session:
+                session = kaltura_session.get('session')
+                if session:
+                    query['flashvars[ks]'] = session
  
-        webpage = self._download_webpage(
-            '%s/%s/chapter-content/%s.html' % (self._API_BASE, course_id, part),
-            part)
+        return self.url_result(update_url_query(
+            'https://cdnapisec.kaltura.com/html5/html5lib/v2.37.1/mwEmbedFrame.php', query),
+            'Kaltura')
  
-        bc_url = BrightcoveIE._extract_brightcove_url(webpage)
-        if not bc_url:
-            raise ExtractorError('Could not extract Brightcove URL from %s' % url, expected=True)
  
-        return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'Brightcove')
+class SafariApiIE(SafariBaseIE):
+    IE_NAME = 'safari:api'
+    _VALID_URL = r'https?://(?:www\.)?safaribooksonline\.com/api/v1/book/(?P<course_id>[^/]+)/chapter(?:-content)?/(?P<part>part\d+)\.html'
+
+    _TEST = {
+        'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        part = self._download_json(
+            url, '%s/%s' % (mobj.group('course_id'), mobj.group('part')),
+            'Downloading part JSON')
+        return self.url_result(part['web_url'], SafariIE.ie_key())
  
  
  class SafariCourseIE(SafariBaseIE):
@@ -145,7 +170,7 @@ class SafariCourseIE(SafariBaseIE):
          course_id = self._match_id(url)
  
          course_json = self._download_json(
-            '%s/%s/?override_format=%s' % (self._API_BASE, course_id, self._API_FORMAT),
+            '%s/book/%s/?override_format=%s' % (self._API_BASE, course_id, self._API_FORMAT),
              course_id, 'Downloading course JSON')
  
          if 'chapters' not in course_json:
@@ -153,7 +178,7 @@ class SafariCourseIE(SafariBaseIE):
                  'No chapters found for course %s' % course_id, expected=True)
  
          entries = [
-            self.url_result(chapter, 'Safari')
+            self.url_result(chapter, SafariApiIE.ie_key())
              for chapter in course_json['chapters']]
  
          course_title = course_json['title']
diff --git a/youtube_dl/extractor/sandia.py b/youtube_dl/extractor/sandia.py

index 9c88167f002fd664df0e6cdcf7cd1eb76b10a7d5..759898a492f43c67179409c563be42e864deae5f 100644 (file)
--- a/youtube_dl/extractor/sandia.py
+++ b/youtube_dl/extractor/sandia.py
@@ -6,14 +6,12 @@ import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      int_or_none,
      js_to_json,
      mimetype2ext,
+    sanitized_Request,
      unified_strdate,
  )
  
@@ -37,7 +35,7 @@ class SandiaIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
          webpage = self._download_webpage(req, video_id)
  
diff --git a/youtube_dl/extractor/sbs.py b/youtube_dl/extractor/sbs.py

index d6ee2d9e2245475d236c12fb6967af68558d8598..96472fbc44e9a78654ae7c136e9f7e4a31751a13 100644 (file)
--- a/youtube_dl/extractor/sbs.py
+++ b/youtube_dl/extractor/sbs.py
@@ -2,6 +2,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import (
+    smuggle_url,
+    ExtractorError,
+)
  
  
  class SBSIE(InfoExtractor):
@@ -20,6 +24,9 @@ class SBSIE(InfoExtractor):
              'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
              'thumbnail': 're:http://.*\.jpg',
              'duration': 308,
+            'timestamp': 1408613220,
+            'upload_date': '20140821',
+            'uploader': 'SBSC',
          },
      }, {
          'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed',
@@ -31,21 +38,29 @@ class SBSIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        player_params = self._download_json(
+            'http://www.sbs.com.au/api/video_pdkvars/id/%s?form=json' % video_id, video_id)
  
-        webpage = self._download_webpage(
-            'http://www.sbs.com.au/ondemand/video/single/%s?context=web' % video_id, video_id)
-
-        player_params = self._parse_json(
-            self._search_regex(
-                r'(?s)var\s+playerParams\s*=\s*({.+?});', webpage, 'playerParams'),
-            video_id)
+        error = player_params.get('error')
+        if error:
+            error_message = 'Sorry, The video you are looking for does not exist.'
+            video_data = error.get('results') or {}
+            error_code = error.get('errorCode')
+            if error_code == 'ComingSoon':
+                error_message = '%s is not yet available.' % video_data.get('title', '')
+            elif error_code in ('Forbidden', 'intranetAccessOnly'):
+                error_message = 'Sorry, This video cannot be accessed via this website'
+            elif error_code == 'Expired':
+                error_message = 'Sorry, %s is no longer available.' % video_data.get('title', '')
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
  
          urls = player_params['releaseUrls']
-        theplatform_url = (urls.get('progressive') or urls.get('standard') or
-                           urls.get('html') or player_params['relatedItemsURL'])
+        theplatform_url = (urls.get('progressive') or urls.get('html') or
+                           urls.get('standard') or player_params['relatedItemsURL'])
  
          return {
              '_type': 'url_transparent',
+            'ie_key': 'ThePlatform',
              'id': video_id,
-            'url': theplatform_url,
+            'url': smuggle_url(self._proto_relative_url(theplatform_url), {'force_smil_url': True}),
          }
diff --git a/youtube_dl/extractor/screencast.py b/youtube_dl/extractor/screencast.py

index dfd897ba3a3f0a7297164fb315e4543bb597d678..3566317008712d8e378eec36c437cbc39fdc1cdc 100644 (file)
--- a/youtube_dl/extractor/screencast.py
+++ b/youtube_dl/extractor/screencast.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class ScreencastIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
      _TESTS = [{
          'url': 'http://www.screencast.com/t/3ZEjQXlT',
          'md5': '917df1c13798a3e96211dd1561fded83',
@@ -53,8 +53,10 @@ class ScreencastIE(InfoExtractor):
              'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
              'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
          }
-    },
-    ]
+    }, {
+        'url': 'http://screencast.com/t/aAB3iowa',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -94,8 +96,9 @@ class ScreencastIE(InfoExtractor):
          title = self._og_search_title(webpage, default=None)
          if title is None:
              title = self._html_search_regex(
-                [r'<b>Title:</b> ([^<]*)</div>',
-                 r'class="tabSeperator">></span><span class="tabText">(.*?)<'],
+                [r'<b>Title:</b> ([^<]+)</div>',
+                 r'class="tabSeperator">></span><span class="tabText">(.+?)<',
+                 r'<title>([^<]+)</title>'],
                  webpage, 'title')
          thumbnail = self._og_search_thumbnail(webpage)
          description = self._og_search_description(webpage, default=None)
diff --git a/youtube_dl/extractor/screencastomatic.py b/youtube_dl/extractor/screencastomatic.py

index 05337421ca4210af5a9a797f22c112bb663a0960..7a88a42cd84dbfd9f343567dffb5f462c10329b7 100644 (file)
--- a/youtube_dl/extractor/screencastomatic.py
+++ b/youtube_dl/extractor/screencastomatic.py
@@ -1,15 +1,11 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..compat import compat_urlparse
-from ..utils import (
-    ExtractorError,
-    js_to_json,
-)
+from .jwplatform import JWPlatformBaseIE
+from ..utils import js_to_json
  
  
-class ScreencastOMaticIE(InfoExtractor):
+class ScreencastOMaticIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
      _TEST = {
          'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
@@ -20,6 +16,7 @@ class ScreencastOMaticIE(InfoExtractor):
              'title': 'Welcome to 3-4 Philosophy @ DECV!',
              'thumbnail': 're:^https?://.*\.jpg$',
              'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
+            'duration': 369.163,
          }
      }
  
@@ -27,23 +24,14 @@ class ScreencastOMaticIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        setup_js = self._search_regex(
-            r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);",
-            webpage, 'setup code')
-        data = self._parse_json(setup_js, video_id, transform_source=js_to_json)
-        try:
-            video_data = next(
-                m for m in data['modes'] if m.get('type') == 'html5')
-        except StopIteration:
-            raise ExtractorError('Could not find any video entries!')
-        video_url = compat_urlparse.urljoin(url, video_data['config']['file'])
-        thumbnail = data.get('image')
+        jwplayer_data = self._parse_json(
+            self._search_regex(
+                r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'),
+            video_id, transform_source=js_to_json)
  
-        return {
-            'id': video_id,
+        info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
+        info_dict.update({
              'title': self._og_search_title(webpage),
              'description': self._og_search_description(webpage),
-            'url': video_url,
-            'ext': 'mp4',
-            'thumbnail': thumbnail,
-        }
+        })
+        return info_dict
diff --git a/youtube_dl/extractor/screenjunkies.py b/youtube_dl/extractor/screenjunkies.py

new file mode 100644 (file)

index 0000000..dd0a6ba
--- /dev/null
+++ b/youtube_dl/extractor/screenjunkies.py
@@ -0,0 +1,138 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+)
+
+
+class ScreenJunkiesIE(InfoExtractor):
+    _VALID_URL = r'https?://www.screenjunkies.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
+    _TESTS = [{
+        'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
+        'md5': '5c2b686bec3d43de42bde9ec047536b0',
+        'info_dict': {
+            'id': '2841915',
+            'display_id': 'best-quentin-tarantino-movie',
+            'ext': 'mp4',
+            'title': 'Best Quentin Tarantino Movie',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 3671,
+            'age_limit': 13,
+            'tags': list,
+        },
+    }, {
+        'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
+        'info_dict': {
+            'id': '2348808',
+            'display_id': 'honest-trailers-the-dark-knight',
+            'ext': 'mp4',
+            'title': "Honest Trailers: 'The Dark Knight'",
+            'thumbnail': 're:^https?://.*\.jpg',
+            'age_limit': 10,
+            'tags': list,
+        },
+    }, {
+        # requires subscription but worked around
+        'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
+        'info_dict': {
+            'id': '3003285',
+            'display_id': 'knocking-dead-ep-1-the-show-so-far',
+            'ext': 'mp4',
+            'title': 'Knocking Dead Ep 1: State of The Dead Recap',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 3307,
+            'age_limit': 13,
+            'tags': list,
+        },
+    }]
+
+    _DEFAULT_BITRATES = (48, 150, 496, 864, 2240)
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        if not video_id:
+            webpage = self._download_webpage(url, display_id)
+            video_id = self._search_regex(
+                (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
+                webpage, 'video id')
+
+        webpage = self._download_webpage(
+            'http://www.screenjunkies.com/embed/%s' % video_id,
+            display_id, 'Downloading video embed page')
+        embed_vars = self._parse_json(
+            self._search_regex(
+                r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
+            display_id)
+
+        title = embed_vars['contentName']
+
+        formats = []
+        bitrates = []
+        for f in embed_vars.get('media', []):
+            if not f.get('uri') or f.get('mediaPurpose') != 'play':
+                continue
+            bitrate = int_or_none(f.get('bitRate'))
+            if bitrate:
+                bitrates.append(bitrate)
+            formats.append({
+                'url': f['uri'],
+                'format_id': 'http-%d' % bitrate if bitrate else 'http',
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'tbr': bitrate,
+                'format': 'mp4',
+            })
+
+        if not bitrates:
+            # When subscriptionLevel > 0, i.e. plus subscription is required
+            # media list will be empty. However, hds and hls uris are still
+            # available. We can grab them assuming bitrates to be default.
+            bitrates = self._DEFAULT_BITRATES
+
+        auth_token = embed_vars.get('AuthToken')
+
+        def construct_manifest_url(base_url, ext):
+            pieces = [base_url]
+            pieces.extend([compat_str(b) for b in bitrates])
+            pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
+            return ','.join(pieces)
+
+        if bitrates and auth_token:
+            hds_url = embed_vars.get('hdsUri')
+            if hds_url:
+                f4m_formats = self._extract_f4m_formats(
+                    construct_manifest_url(hds_url, 'f4m'),
+                    display_id, f4m_id='hds', fatal=False)
+                if len(f4m_formats) == len(bitrates):
+                    for f, bitrate in zip(f4m_formats, bitrates):
+                        if not f.get('tbr'):
+                            f['format_id'] = 'hds-%d' % bitrate
+                            f['tbr'] = bitrate
+                # TODO: fix f4m downloader to handle manifests without bitrates if possible
+                # formats.extend(f4m_formats)
+
+            hls_url = embed_vars.get('hlsUri')
+            if hls_url:
+                formats.extend(self._extract_m3u8_formats(
+                    construct_manifest_url(hls_url, 'm3u8'),
+                    display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'thumbnail': embed_vars.get('thumbUri'),
+            'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
+            'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
+            'tags': embed_vars.get('tags', '').split(','),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/screenwavemedia.py b/youtube_dl/extractor/screenwavemedia.py

index d1ab66b3216d5153a5480769fb0723919f3fdb37..44b0bbee68953a199c67e420fe1928048be5f2cf 100644 (file)
--- a/youtube_dl/extractor/screenwavemedia.py
+++ b/youtube_dl/extractor/screenwavemedia.py
@@ -7,12 +7,13 @@ from .common import InfoExtractor
  from ..utils import (
      int_or_none,
      unified_strdate,
+    js_to_json,
  )
  
  
  class ScreenwaveMediaIE(InfoExtractor):
-    _VALID_URL = r'http://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?[^"]*\bid=(?P<id>.+)'
-
+    _VALID_URL = r'https?://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=(?P<id>[A-Za-z0-9-]+)'
+    EMBED_PATTERN = r'src=(["\'])(?P<url>(?:https?:)?//player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=.+?)\1'
      _TESTS = [{
          'url': 'http://player.screenwavemedia.com/play/play.php?playerdiv=videoarea&companiondiv=squareAd&id=Cinemassacre-19911',
          'only_matching': True,
@@ -22,60 +23,74 @@ class ScreenwaveMediaIE(InfoExtractor):
          video_id = self._match_id(url)
  
          playerdata = self._download_webpage(
-            'http://player.screenwavemedia.com/play/player.php?id=%s' % video_id,
+            'http://player.screenwavemedia.com/player.php?id=%s' % video_id,
              video_id, 'Downloading player webpage')
  
          vidtitle = self._search_regex(
              r'\'vidtitle\'\s*:\s*"([^"]+)"', playerdata, 'vidtitle').replace('\\/', '/')
-        vidurl = self._search_regex(
-            r'\'vidurl\'\s*:\s*"([^"]+)"', playerdata, 'vidurl').replace('\\/', '/')
-
-        videolist_url = None
-
-        mobj = re.search(r"'videoserver'\s*:\s*'(?P<videoserver>[^']+)'", playerdata)
-        if mobj:
-            videoserver = mobj.group('videoserver')
-            mobj = re.search(r'\'vidid\'\s*:\s*"(?P<vidid>[^\']+)"', playerdata)
-            vidid = mobj.group('vidid') if mobj else video_id
-            videolist_url = 'http://%s/vod/smil:%s.smil/jwplayer.smil' % (videoserver, vidid)
-        else:
-            mobj = re.search(r"file\s*:\s*'(?P<smil>http.+?/jwplayer\.smil)'", playerdata)
-            if mobj:
-                videolist_url = mobj.group('smil')
-
-        if videolist_url:
-            videolist = self._download_xml(videolist_url, video_id, 'Downloading videolist XML')
-            formats = []
-            baseurl = vidurl[:vidurl.rfind('/') + 1]
-            for video in videolist.findall('.//video'):
-                src = video.get('src')
-                if not src:
+
+        playerconfig = self._download_webpage(
+            'http://player.screenwavemedia.com/player.js',
+            video_id, 'Downloading playerconfig webpage')
+
+        videoserver = self._search_regex(r'SWMServer\s*=\s*"([\d\.]+)"', playerdata, 'videoserver')
+
+        sources = self._parse_json(
+            js_to_json(
+                re.sub(
+                    r'(?s)/\*.*?\*/', '',
+                    self._search_regex(
+                        r'sources\s*:\s*(\[[^\]]+?\])', playerconfig,
+                        'sources',
+                    ).replace(
+                        "' + thisObj.options.videoserver + '",
+                        videoserver
+                    ).replace(
+                        "' + playerVidId + '",
+                        video_id
+                    )
+                )
+            ),
+            video_id, fatal=False
+        )
+
+        # Fallback to hardcoded sources if JS changes again
+        if not sources:
+            self.report_warning('Falling back to a hardcoded list of streams')
+            sources = [{
+                'file': 'http://%s/vod/%s_%s.mp4' % (videoserver, video_id, format_id),
+                'type': 'mp4',
+                'label': format_label,
+            } for format_id, format_label in (
+                ('low', '144p Low'), ('med', '160p Med'), ('high', '360p High'), ('hd1', '720p HD1'))]
+            sources.append({
+                'file': 'http://%s/vod/smil:%s.smil/playlist.m3u8' % (videoserver, video_id),
+                'type': 'hls',
+            })
+
+        formats = []
+        for source in sources:
+            file_ = source.get('file')
+            if not file_:
+                continue
+            if source.get('type') == 'hls':
+                formats.extend(self._extract_m3u8_formats(file_, video_id, ext='mp4'))
+            else:
+                format_id = self._search_regex(
+                    r'_(.+?)\.[^.]+$', file_, 'format id', default=None)
+                if not self._is_valid_url(file_, video_id, format_id or 'video'):
                      continue
-                file_ = src.partition(':')[-1]
-                width = int_or_none(video.get('width'))
-                height = int_or_none(video.get('height'))
-                bitrate = int_or_none(video.get('system-bitrate'), scale=1000)
-                format = {
-                    'url': baseurl + file_,
-                    'format_id': src.rpartition('.')[0].rpartition('_')[-1],
-                }
-                if width or height:
-                    format.update({
-                        'tbr': bitrate,
-                        'width': width,
-                        'height': height,
-                    })
-                else:
-                    format.update({
-                        'abr': bitrate,
-                        'vcodec': 'none',
-                    })
-                formats.append(format)
-        else:
-            formats = [{
-                'url': vidurl,
-            }]
-        self._sort_formats(formats)
+                format_label = source.get('label')
+                height = int_or_none(self._search_regex(
+                    r'^(\d+)[pP]', format_label, 'height', default=None))
+                formats.append({
+                    'url': file_,
+                    'format_id': format_id,
+                    'format': format_label,
+                    'ext': source.get('type'),
+                    'height': height,
+                })
+        self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
  
          return {
              'id': video_id,
@@ -94,7 +109,11 @@ class TeamFourIE(InfoExtractor):
              'upload_date': '20130401',
              'description': 'Check out this and more on our website: http://teamfourstar.com\nTFS Store: http://sharkrobot.com/team-four-star\nFollow on Twitter: http://twitter.com/teamfourstar\nLike on FB: http://facebook.com/teamfourstar',
              'title': 'A Moment With TFS Episode 4',
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/senateisvp.py b/youtube_dl/extractor/senateisvp.py

index 9c53704ea383b1af34e8f8157e327b71c2c3865a..c5f474dd1d8a5040a5368de7f2aa050658f7a984 100644 (file)
--- a/youtube_dl/extractor/senateisvp.py
+++ b/youtube_dl/extractor/senateisvp.py
@@ -15,55 +15,63 @@ from ..compat import (
  
  class SenateISVPIE(InfoExtractor):
      _COMM_MAP = [
-        ["ag", "76440", "http://ag-f.akamaihd.net"],
-        ["aging", "76442", "http://aging-f.akamaihd.net"],
-        ["approps", "76441", "http://approps-f.akamaihd.net"],
-        ["armed", "76445", "http://armed-f.akamaihd.net"],
-        ["banking", "76446", "http://banking-f.akamaihd.net"],
-        ["budget", "76447", "http://budget-f.akamaihd.net"],
-        ["cecc", "76486", "http://srs-f.akamaihd.net"],
-        ["commerce", "80177", "http://commerce1-f.akamaihd.net"],
-        ["csce", "75229", "http://srs-f.akamaihd.net"],
-        ["dpc", "76590", "http://dpc-f.akamaihd.net"],
-        ["energy", "76448", "http://energy-f.akamaihd.net"],
-        ["epw", "76478", "http://epw-f.akamaihd.net"],
-        ["ethics", "76449", "http://ethics-f.akamaihd.net"],
-        ["finance", "76450", "http://finance-f.akamaihd.net"],
-        ["foreign", "76451", "http://foreign-f.akamaihd.net"],
-        ["govtaff", "76453", "http://govtaff-f.akamaihd.net"],
-        ["help", "76452", "http://help-f.akamaihd.net"],
-        ["indian", "76455", "http://indian-f.akamaihd.net"],
-        ["intel", "76456", "http://intel-f.akamaihd.net"],
-        ["intlnarc", "76457", "http://intlnarc-f.akamaihd.net"],
-        ["jccic", "85180", "http://jccic-f.akamaihd.net"],
-        ["jec", "76458", "http://jec-f.akamaihd.net"],
-        ["judiciary", "76459", "http://judiciary-f.akamaihd.net"],
-        ["rpc", "76591", "http://rpc-f.akamaihd.net"],
-        ["rules", "76460", "http://rules-f.akamaihd.net"],
-        ["saa", "76489", "http://srs-f.akamaihd.net"],
-        ["smbiz", "76461", "http://smbiz-f.akamaihd.net"],
-        ["srs", "75229", "http://srs-f.akamaihd.net"],
-        ["uscc", "76487", "http://srs-f.akamaihd.net"],
-        ["vetaff", "76462", "http://vetaff-f.akamaihd.net"],
-        ["arch", "", "http://ussenate-f.akamaihd.net/"]
+        ['ag', '76440', 'http://ag-f.akamaihd.net'],
+        ['aging', '76442', 'http://aging-f.akamaihd.net'],
+        ['approps', '76441', 'http://approps-f.akamaihd.net'],
+        ['armed', '76445', 'http://armed-f.akamaihd.net'],
+        ['banking', '76446', 'http://banking-f.akamaihd.net'],
+        ['budget', '76447', 'http://budget-f.akamaihd.net'],
+        ['cecc', '76486', 'http://srs-f.akamaihd.net'],
+        ['commerce', '80177', 'http://commerce1-f.akamaihd.net'],
+        ['csce', '75229', 'http://srs-f.akamaihd.net'],
+        ['dpc', '76590', 'http://dpc-f.akamaihd.net'],
+        ['energy', '76448', 'http://energy-f.akamaihd.net'],
+        ['epw', '76478', 'http://epw-f.akamaihd.net'],
+        ['ethics', '76449', 'http://ethics-f.akamaihd.net'],
+        ['finance', '76450', 'http://finance-f.akamaihd.net'],
+        ['foreign', '76451', 'http://foreign-f.akamaihd.net'],
+        ['govtaff', '76453', 'http://govtaff-f.akamaihd.net'],
+        ['help', '76452', 'http://help-f.akamaihd.net'],
+        ['indian', '76455', 'http://indian-f.akamaihd.net'],
+        ['intel', '76456', 'http://intel-f.akamaihd.net'],
+        ['intlnarc', '76457', 'http://intlnarc-f.akamaihd.net'],
+        ['jccic', '85180', 'http://jccic-f.akamaihd.net'],
+        ['jec', '76458', 'http://jec-f.akamaihd.net'],
+        ['judiciary', '76459', 'http://judiciary-f.akamaihd.net'],
+        ['rpc', '76591', 'http://rpc-f.akamaihd.net'],
+        ['rules', '76460', 'http://rules-f.akamaihd.net'],
+        ['saa', '76489', 'http://srs-f.akamaihd.net'],
+        ['smbiz', '76461', 'http://smbiz-f.akamaihd.net'],
+        ['srs', '75229', 'http://srs-f.akamaihd.net'],
+        ['uscc', '76487', 'http://srs-f.akamaihd.net'],
+        ['vetaff', '76462', 'http://vetaff-f.akamaihd.net'],
+        ['arch', '', 'http://ussenate-f.akamaihd.net/']
      ]
      _IE_NAME = 'senate.gov'
-    _VALID_URL = r'http://www\.senate\.gov/isvp/?\?(?P<qs>.+)'
+    _VALID_URL = r'https?://www\.senate\.gov/isvp/?\?(?P<qs>.+)'
      _TESTS = [{
          'url': 'http://www.senate.gov/isvp/?comm=judiciary&type=live&stt=&filename=judiciary031715&auto_play=false&wmode=transparent&poster=http%3A%2F%2Fwww.judiciary.senate.gov%2Fthemes%2Fjudiciary%2Fimages%2Fvideo-poster-flash-fit.png',
          'info_dict': {
              'id': 'judiciary031715',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Integrated Senate Video Player',
              'thumbnail': 're:^https?://.*\.(?:jpg|png)$',
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
          'info_dict': {
              'id': 'commerce011514',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Integrated Senate Video Player'
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
          # checksum differs each time
@@ -121,9 +129,9 @@ class SenateISVPIE(InfoExtractor):
                  'url': compat_urlparse.urljoin(domain, filename) + '?v=3.1.0&fp=&r=&g=',
              }]
          else:
-            hdcore_sign = '?hdcore=3.1.0'
+            hdcore_sign = 'hdcore=3.1.0'
              url_params = (domain, video_id, stream_num)
-            f4m_url = '%s/z/%s_1@%s/manifest.f4m' % url_params + hdcore_sign
+            f4m_url = '%s/z/%s_1@%s/manifest.f4m?' % url_params + hdcore_sign
              m3u8_url = '%s/i/%s_1@%s/master.m3u8' % url_params
              for entry in self._extract_f4m_formats(f4m_url, video_id, f4m_id='f4m'):
                  # URLs without the extra param induce an 404 error
diff --git a/youtube_dl/extractor/sexu.py b/youtube_dl/extractor/sexu.py

index 6365a8779d74e2ac9d82ce83c32c404d51e64b2e..a99b2a8e7be1bc9de8a01d6ae2de6fb36055703c 100644 (file)
--- a/youtube_dl/extractor/sexu.py
+++ b/youtube_dl/extractor/sexu.py
@@ -1,7 +1,5 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  
  
@@ -14,7 +12,7 @@ class SexuIE(InfoExtractor):
              'id': '961791',
              'ext': 'mp4',
              'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b',
-            'description': 'md5:c5ed8625eb386855d5a7967bd7b77a54',
+            'description': 'md5:2b75327061310a3afb3fbd7d09e2e403',
              'categories': list,  # NSFW
              'thumbnail': 're:https?://.*\.jpg$',
              'age_limit': 18,
@@ -25,13 +23,18 @@ class SexuIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        quality_arr = self._search_regex(
-            r'sources:\s*\[([^\]]+)\]', webpage, 'forrmat string')
+        jwvideo = self._parse_json(
+            self._search_regex(r'\.setup\(\s*({.+?})\s*\);', webpage, 'jwvideo'),
+            video_id)
+
+        sources = jwvideo['sources']
+
          formats = [{
-            'url': fmt[0].replace('\\', ''),
-            'format_id': fmt[1],
-            'height': int(fmt[1][:3]),
-        } for fmt in re.findall(r'"file":"([^"]+)","label":"([^"]+)"', quality_arr)]
+            'url': source['file'].replace('\\', ''),
+            'format_id': source.get('label'),
+            'height': self._search_regex(
+                r'^(\d+)[pP]', source.get('label', ''), 'height', default=None),
+        } for source in sources if source.get('file')]
          self._sort_formats(formats)
  
          title = self._html_search_regex(
@@ -40,9 +43,7 @@ class SexuIE(InfoExtractor):
          description = self._html_search_meta(
              'description', webpage, 'description')
  
-        thumbnail = self._html_search_regex(
-            r'image:\s*"([^"]+)"',
-            webpage, 'thumbnail', fatal=False)
+        thumbnail = jwvideo.get('image')
  
          categories_str = self._html_search_meta(
              'keywords', webpage, 'categories')
diff --git a/youtube_dl/extractor/sexykarma.py b/youtube_dl/extractor/sexykarma.py

index 6446d26dc416703da688386a578f904d24b102a4..e33483674439fb2fcfcdce60a9de6a6ca328dfdd 100644 (file)
--- a/youtube_dl/extractor/sexykarma.py
+++ b/youtube_dl/extractor/sexykarma.py
@@ -29,6 +29,7 @@ class SexyKarmaIE(InfoExtractor):
              'view_count': int,
              'comment_count': int,
              'categories': list,
+            'age_limit': 18,
          }
      }, {
          'url': 'http://www.sexykarma.com/gonewild/video/pot-pixie-tribute-8Id6EZPbuHf.html',
diff --git a/youtube_dl/extractor/shahid.py b/youtube_dl/extractor/shahid.py

new file mode 100644 (file)

index 0000000..d95ea06
--- /dev/null
+++ b/youtube_dl/extractor/shahid.py
@@ -0,0 +1,111 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlencode
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
+)
+
+
+class ShahidIE(InfoExtractor):
+    _VALID_URL = r'https?://shahid\.mbc\.net/ar/episode/(?P<id>\d+)/?'
+    _TESTS = [{
+        'url': 'https://shahid.mbc.net/ar/episode/90574/%D8%A7%D9%84%D9%85%D9%84%D9%83-%D8%B9%D8%A8%D8%AF%D8%A7%D9%84%D9%84%D9%87-%D8%A7%D9%84%D8%A5%D9%86%D8%B3%D8%A7%D9%86-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-3.html',
+        'info_dict': {
+            'id': '90574',
+            'ext': 'mp4',
+            'title': 'الملك عبدالله الإنسان الموسم 1 كليب 3',
+            'description': 'الفيلم الوثائقي - الملك عبد الله الإنسان',
+            'duration': 2972,
+            'timestamp': 1422057420,
+            'upload_date': '20150123',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        # shahid plus subscriber only
+        'url': 'https://shahid.mbc.net/ar/episode/90511/%D9%85%D8%B1%D8%A7%D9%8A%D8%A7-2011-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1.html',
+        'only_matching': True
+    }]
+
+    def _handle_error(self, response):
+        if not isinstance(response, dict):
+            return
+        error = response.get('error')
+        if error:
+            raise ExtractorError(
+                '%s returned error: %s' % (self.IE_NAME, '\n'.join(error.values())),
+                expected=True)
+
+    def _download_json(self, url, video_id, note='Downloading JSON metadata'):
+        response = super(ShahidIE, self)._download_json(url, video_id, note)['data']
+        self._handle_error(response)
+        return response
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        api_vars = {
+            'id': video_id,
+            'type': 'player',
+            'url': 'http://api.shahid.net/api/v1_1',
+            'playerType': 'episode',
+        }
+
+        flashvars = self._search_regex(
+            r'var\s+flashvars\s*=\s*({[^}]+})', webpage, 'flashvars', default=None)
+        if flashvars:
+            for key in api_vars.keys():
+                value = self._search_regex(
+                    r'\b%s\s*:\s*(?P<q>["\'])(?P<value>.+?)(?P=q)' % key,
+                    flashvars, 'type', default=None, group='value')
+                if value:
+                    api_vars[key] = value
+
+        player = self._download_json(
+            'https://shahid.mbc.net/arContent/getPlayerContent-param-.id-%s.type-%s.html'
+            % (video_id, api_vars['type']), video_id, 'Downloading player JSON')
+
+        if player.get('drm'):
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
+        formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
+        self._sort_formats(formats)
+
+        video = self._download_json(
+            '%s/%s/%s?%s' % (
+                api_vars['url'], api_vars['playerType'], api_vars['id'],
+                compat_urllib_parse_urlencode({
+                    'apiKey': 'sh@hid0nlin3',
+                    'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
+                })),
+            video_id, 'Downloading video JSON')
+
+        video = video[api_vars['playerType']]
+
+        title = video['title']
+        description = video.get('description')
+        thumbnail = video.get('thumbnailUrl')
+        duration = int_or_none(video.get('duration'))
+        timestamp = parse_iso8601(video.get('referenceDate'))
+        categories = [
+            category['name']
+            for category in video.get('genres', []) if 'name' in category]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'timestamp': timestamp,
+            'categories': categories,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/shared.py b/youtube_dl/extractor/shared.py

index a07677686a4ecc2923b310c3aeeeaab610bb0868..e7e5f653eb2117936f568195e508f9e7778b1085 100644 (file)
--- a/youtube_dl/extractor/shared.py
+++ b/youtube_dl/extractor/shared.py
@@ -3,28 +3,37 @@ from __future__ import unicode_literals
  import base64
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class SharedIE(InfoExtractor):
-    _VALID_URL = r'http://shared\.sx/(?P<id>[\da-z]{10})'
+    IE_DESC = 'shared.sx and vivo.sx'
+    _VALID_URL = r'https?://(?:shared|vivo)\.sx/(?P<id>[\da-z]{10})'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://shared.sx/0060718775',
          'md5': '106fefed92a8a2adb8c98e6a0652f49b',
          'info_dict': {
              'id': '0060718775',
              'ext': 'mp4',
              'title': 'Bmp4',
+            'filesize': 1720110,
+        },
+    }, {
+        'url': 'http://vivo.sx/d7ddda0e78',
+        'md5': '15b3af41be0b4fe01f4df075c2678b2c',
+        'info_dict': {
+            'id': 'd7ddda0e78',
+            'ext': 'mp4',
+            'title': 'Chicken',
+            'filesize': 528031,
          },
-    }
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -35,8 +44,8 @@ class SharedIE(InfoExtractor):
                  'Video %s does not exist' % video_id, expected=True)
  
          download_form = self._hidden_inputs(webpage)
-        request = compat_urllib_request.Request(
-            url, compat_urllib_parse.urlencode(download_form))
+        request = sanitized_Request(
+            url, urlencode_postdata(download_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
          video_page = self._download_webpage(
diff --git a/youtube_dl/extractor/sharesix.py b/youtube_dl/extractor/sharesix.py

index ac3e3adf22ad194a8af3e833ae4d8acf7484e8b4..9cce5ceb43b71877202a77067458a39e5a810432 100644 (file)
--- a/youtube_dl/extractor/sharesix.py
+++ b/youtube_dl/extractor/sharesix.py
@@ -4,12 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      parse_duration,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -49,8 +47,8 @@ class ShareSixIE(InfoExtractor):
          fields = {
              'method_free': 'Free'
          }
-        post = compat_urllib_parse.urlencode(fields)
-        req = compat_urllib_request.Request(url, post)
+        post = urlencode_postdata(fields)
+        req = sanitized_Request(url, post)
          req.add_header('Content-type', 'application/x-www-form-urlencoded')
  
          webpage = self._download_webpage(req, video_id,
diff --git a/youtube_dl/extractor/sina.py b/youtube_dl/extractor/sina.py

index 0891a441f85f42b75d91f1d267fabdd1b5e952ce..d03f1b1d4308d047e5b690a682587ac5655ce338 100644 (file)
--- a/youtube_dl/extractor/sina.py
+++ b/youtube_dl/extractor/sina.py
@@ -4,10 +4,8 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse,
-)
+from ..compat import compat_urllib_parse_urlencode
+from ..utils import sanitized_Request
  
  
  class SinaIE(InfoExtractor):
@@ -41,7 +39,7 @@ class SinaIE(InfoExtractor):
      ]
  
      def _extract_video(self, video_id):
-        data = compat_urllib_parse.urlencode({'vid': video_id})
+        data = compat_urllib_parse_urlencode({'vid': video_id})
          url_doc = self._download_xml('http://v.iask.com/v_play.php?%s' % data,
                                       video_id, 'Downloading video url')
          image_page = self._download_webpage(
@@ -61,7 +59,7 @@ class SinaIE(InfoExtractor):
          if mobj.group('token') is not None:
              # The video id is in the redirected url
              self.to_screen('Getting video id')
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.get_method = lambda: 'HEAD'
              (_, urlh) = self._download_webpage_handle(request, 'NA', False)
              return self._real_extract(urlh.geturl())
diff --git a/youtube_dl/extractor/skynewsarabia.py b/youtube_dl/extractor/skynewsarabia.py

new file mode 100644 (file)

index 0000000..05e1b02
--- /dev/null
+++ b/youtube_dl/extractor/skynewsarabia.py
@@ -0,0 +1,117 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    parse_iso8601,
+    parse_duration,
+)
+
+
+class SkyNewsArabiaBaseIE(InfoExtractor):
+    _IMAGE_BASE_URL = 'http://www.skynewsarabia.com/web/images'
+
+    def _call_api(self, path, value):
+        return self._download_json('http://api.skynewsarabia.com/web/rest/v2/%s/%s.json' % (path, value), value)
+
+    def _get_limelight_media_id(self, url):
+        return self._search_regex(r'/media/[^/]+/([a-z0-9]{32})', url, 'limelight media id')
+
+    def _get_image_url(self, image_path_template, width='1600', height='1200'):
+        return self._IMAGE_BASE_URL + image_path_template.format(width=width, height=height)
+
+    def _extract_video_info(self, video_data):
+        video_id = compat_str(video_data['id'])
+        topic = video_data.get('topicTitle')
+        return {
+            '_type': 'url_transparent',
+            'url': 'limelight:media:%s' % self._get_limelight_media_id(video_data['videoUrl'][0]['url']),
+            'id': video_id,
+            'title': video_data['headline'],
+            'description': video_data.get('summary'),
+            'thumbnail': self._get_image_url(video_data['mediaAsset']['imageUrl']),
+            'timestamp': parse_iso8601(video_data.get('date')),
+            'duration': parse_duration(video_data.get('runTime')),
+            'tags': video_data.get('tags', []),
+            'categories': [topic] if topic else [],
+            'webpage_url': 'http://www.skynewsarabia.com/web/video/%s' % video_id,
+            'ie_key': 'LimelightMedia',
+        }
+
+
+class SkyNewsArabiaIE(SkyNewsArabiaBaseIE):
+    IE_NAME = 'skynewsarabia:video'
+    _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.skynewsarabia.com/web/video/794902/%D9%86%D8%B5%D9%81-%D9%85%D9%84%D9%8A%D9%88%D9%86-%D9%85%D8%B5%D8%A8%D8%A7%D8%AD-%D8%B4%D8%AC%D8%B1%D8%A9-%D9%83%D8%B1%D9%8A%D8%B3%D9%85%D8%A7%D8%B3',
+        'info_dict': {
+            'id': '794902',
+            'ext': 'flv',
+            'title': 'نصف مليون مصباح على شجرة كريسماس',
+            'description': 'md5:22f1b27f0850eeb10c7e59b1f16eb7c6',
+            'upload_date': '20151128',
+            'timestamp': 1448697198,
+            'duration': 2119,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._call_api('video', video_id)
+        return self._extract_video_info(video_data)
+
+
+class SkyNewsArabiaArticleIE(SkyNewsArabiaBaseIE):
+    IE_NAME = 'skynewsarabia:video'
+    _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/article/(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://www.skynewsarabia.com/web/article/794549/%D8%A7%D9%94%D8%AD%D8%AF%D8%A7%D8%AB-%D8%A7%D9%84%D8%B4%D8%B1%D9%82-%D8%A7%D9%84%D8%A7%D9%94%D9%88%D8%B3%D8%B7-%D8%AE%D8%B1%D9%8A%D8%B7%D8%A9-%D8%A7%D9%84%D8%A7%D9%94%D9%84%D8%B9%D8%A7%D8%A8-%D8%A7%D9%84%D8%B0%D9%83%D9%8A%D8%A9',
+        'info_dict': {
+            'id': '794549',
+            'ext': 'flv',
+            'title': 'بالفيديو.. ألعاب ذكية تحاكي واقع المنطقة',
+            'description': 'md5:0c373d29919a851e080ee4edd0c5d97f',
+            'upload_date': '20151126',
+            'timestamp': 1448559336,
+            'duration': 281.6,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.skynewsarabia.com/web/article/794844/%D8%A7%D8%B3%D8%AA%D9%87%D8%AF%D8%A7%D9%81-%D9%82%D9%88%D8%A7%D8%B1%D8%A8-%D8%A7%D9%94%D8%B3%D9%84%D8%AD%D8%A9-%D9%84%D9%85%D9%8A%D9%84%D9%8A%D8%B4%D9%8A%D8%A7%D8%AA-%D8%A7%D9%84%D8%AD%D9%88%D8%AB%D9%8A-%D9%88%D8%B5%D8%A7%D9%84%D8%AD',
+        'info_dict': {
+            'id': '794844',
+            'title': 'إحباط تهريب أسلحة لميليشيات الحوثي وصالح بجنوب اليمن',
+            'description': 'md5:5c927b8b2e805796e7f693538d96fc7e',
+        },
+        'playlist_mincount': 2,
+    }]
+
+    def _real_extract(self, url):
+        article_id = self._match_id(url)
+        article_data = self._call_api('article', article_id)
+        media_asset = article_data['mediaAsset']
+        if media_asset['type'] == 'VIDEO':
+            topic = article_data.get('topicTitle')
+            return {
+                '_type': 'url_transparent',
+                'url': 'limelight:media:%s' % self._get_limelight_media_id(media_asset['videoUrl'][0]['url']),
+                'id': article_id,
+                'title': article_data['headline'],
+                'description': article_data.get('summary'),
+                'thumbnail': self._get_image_url(media_asset['imageUrl']),
+                'timestamp': parse_iso8601(article_data.get('date')),
+                'tags': article_data.get('tags', []),
+                'categories': [topic] if topic else [],
+                'webpage_url': url,
+                'ie_key': 'LimelightMedia',
+            }
+        entries = [self._extract_video_info(item) for item in article_data.get('inlineItems', []) if item['type'] == 'VIDEO']
+        return self.playlist_result(entries, article_id, article_data['headline'], article_data.get('summary'))
diff --git a/youtube_dl/extractor/slutload.py b/youtube_dl/extractor/slutload.py

index 3df71304dafc9c9e353923f6769c88e1fcf8c5ff..7efb29f653b76b25c26d91aac16c6985255ee1d0 100644 (file)
--- a/youtube_dl/extractor/slutload.py
+++ b/youtube_dl/extractor/slutload.py
@@ -13,8 +13,8 @@ class SlutloadIE(InfoExtractor):
          'info_dict': {
              'id': 'TD73btpBqSxc',
              'ext': 'mp4',
-            "title": "virginie baisee en cam",
-            "age_limit": 18,
+            'title': 'virginie baisee en cam',
+            'age_limit': 18,
              'thumbnail': 're:https?://.*?\.jpg'
          }
      }
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index 93a7cfe15cc764bc61b912dd2e3283d950790565..5c3fd0fece8dc8b32a3d05bea8ed4dedf430f1c5 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -7,14 +7,12 @@ import hashlib
  import uuid
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-)
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
      unified_strdate,
+    urlencode_postdata,
  )
  
  
@@ -172,12 +170,12 @@ class SmotriIE(InfoExtractor):
              'getvideoinfo': '1',
          }
  
-        video_password = self._downloader.params.get('videopassword', None)
+        video_password = self._downloader.params.get('videopassword')
          if video_password:
              video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
  
-        request = compat_urllib_request.Request(
-            'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form))
+        request = sanitized_Request(
+            'http://smotri.com/video/view/url/bot/', urlencode_postdata(video_form))
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
  
          video = self._download_json(request, video_id, 'Downloading video JSON')
@@ -330,10 +328,7 @@ class SmotriBroadcastIE(InfoExtractor):
  
              (username, password) = self._get_login_info()
              if username is None:
-                raise ExtractorError(
-                    'Erotic broadcasts allowed only for registered users, '
-                    'use --username and --password options to provide account credentials.',
-                    expected=True)
+                self.raise_login_required('Erotic broadcasts allowed only for registered users')
  
              login_form = {
                  'login-hint53': '1',
@@ -342,8 +337,8 @@ class SmotriBroadcastIE(InfoExtractor):
                  'password': password,
              }
  
-            request = compat_urllib_request.Request(
-                broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form))
+            request = sanitized_Request(
+                broadcast_url + '/?no_redirect=1', urlencode_postdata(login_form))
              request.add_header('Content-Type', 'application/x-www-form-urlencoded')
              broadcast_page = self._download_webpage(
                  request, broadcast_id, 'Logging in and confirming age')
@@ -361,7 +356,7 @@ class SmotriBroadcastIE(InfoExtractor):
  
          url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
  
-        broadcast_password = self._downloader.params.get('videopassword', None)
+        broadcast_password = self._downloader.params.get('videopassword')
          if broadcast_password:
              url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
  
diff --git a/youtube_dl/extractor/snotr.py b/youtube_dl/extractor/snotr.py

index da3b05a8dc8ca89345e755225ed7885fa580c973..0d1ab07f86ac4088b4fd1e56e9d1dfaa52514ddd 100644 (file)
--- a/youtube_dl/extractor/snotr.py
+++ b/youtube_dl/extractor/snotr.py
@@ -43,7 +43,7 @@ class SnotrIE(InfoExtractor):
          title = self._og_search_title(webpage)
  
          description = self._og_search_description(webpage)
-        video_url = "http://cdn.videos.snotr.com/%s.flv" % video_id
+        video_url = 'http://cdn.videos.snotr.com/%s.flv' % video_id
  
          view_count = str_to_int(self._html_search_regex(
              r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',
diff --git a/youtube_dl/extractor/sohu.py b/youtube_dl/extractor/sohu.py

index ba2d5e19bc0d1de322b4b12ed5b8c0dc31157f7f..49e5d09ae450d11bb567a2fe95ecba55998c8b42 100644 (file)
--- a/youtube_dl/extractor/sohu.py
+++ b/youtube_dl/extractor/sohu.py
@@ -6,11 +6,11 @@ import re
  from .common import InfoExtractor
  from ..compat import (
      compat_str,
-    compat_urllib_request,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
  )
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -96,7 +96,7 @@ class SohuIE(InfoExtractor):
              else:
                  base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
  
-            req = compat_urllib_request.Request(base_data_url + vid_id)
+            req = sanitized_Request(base_data_url + vid_id)
  
              cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
              if cn_verification_proxy:
@@ -158,6 +158,7 @@ class SohuIE(InfoExtractor):
                          'file': clips_url[i],
                          'new': su[i],
                          'prod': 'flash',
+                        'rb': 1,
                      }
  
                      if cdnId is not None:
@@ -169,7 +170,7 @@ class SohuIE(InfoExtractor):
                      if retries > 0:
                          download_note += ' (retry #%d)' % retries
                      part_info = self._parse_json(self._download_webpage(
-                        'http://%s/?%s' % (allot, compat_urllib_parse.urlencode(params)),
+                        'http://%s/?%s' % (allot, compat_urllib_parse_urlencode(params)),
                          video_id, download_note), video_id)
  
                      video_url = part_info['url']
diff --git a/youtube_dl/extractor/soompi.py b/youtube_dl/extractor/soompi.py

deleted file mode 100644 (file)

index 5da66ca..0000000
--- a/youtube_dl/extractor/soompi.py
+++ /dev/null
@@ -1,146 +0,0 @@
-# encoding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .crunchyroll import CrunchyrollIE
-
-from .common import InfoExtractor
-from ..compat import compat_HTTPError
-from ..utils import (
-    ExtractorError,
-    int_or_none,
-    remove_start,
-    xpath_text,
-)
-
-
-class SoompiBaseIE(InfoExtractor):
-    def _get_episodes(self, webpage, episode_filter=None):
-        episodes = self._parse_json(
-            self._search_regex(
-                r'VIDEOS\s*=\s*(\[.+?\]);', webpage, 'episodes JSON'),
-            None)
-        return list(filter(episode_filter, episodes))
-
-
-class SoompiIE(SoompiBaseIE, CrunchyrollIE):
-    IE_NAME = 'soompi'
-    _VALID_URL = r'https?://tv\.soompi\.com/(?:en/)?watch/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'http://tv.soompi.com/en/watch/29235',
-        'info_dict': {
-            'id': '29235',
-            'ext': 'mp4',
-            'title': 'Episode 1096',
-            'description': '2015-05-20'
-        },
-        'params': {
-            'skip_download': True,
-        },
-    }]
-
-    def _get_episode(self, webpage, video_id):
-        return self._get_episodes(webpage, lambda x: x['id'] == video_id)[0]
-
-    def _get_subtitles(self, config, video_id):
-        sub_langs = {}
-        for subtitle in config.findall('./{default}preload/subtitles/subtitle'):
-            sub_langs[subtitle.attrib['id']] = subtitle.attrib['title']
-
-        subtitles = {}
-        for s in config.findall('./{default}preload/subtitle'):
-            lang_code = sub_langs.get(s.attrib['id'])
-            if not lang_code:
-                continue
-            sub_id = s.get('id')
-            data = xpath_text(s, './data', 'data')
-            iv = xpath_text(s, './iv', 'iv')
-            if not id or not iv or not data:
-                continue
-            subtitle = self._decrypt_subtitles(data, iv, sub_id).decode('utf-8')
-            subtitles[lang_code] = self._extract_subtitles(subtitle)
-        return subtitles
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        try:
-            webpage = self._download_webpage(
-                url, video_id, 'Downloading episode page')
-        except ExtractorError as ee:
-            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
-                webpage = ee.cause.read()
-                block_message = self._html_search_regex(
-                    r'(?s)<div class="block-message">(.+?)</div>', webpage,
-                    'block message', default=None)
-                if block_message:
-                    raise ExtractorError(block_message, expected=True)
-            raise
-
-        formats = []
-        config = None
-        for format_id in re.findall(r'\?quality=([0-9a-zA-Z]+)', webpage):
-            config = self._download_xml(
-                'http://tv.soompi.com/en/show/_/%s-config.xml?mode=hls&quality=%s' % (video_id, format_id),
-                video_id, 'Downloading %s XML' % format_id)
-            m3u8_url = xpath_text(
-                config, './{default}preload/stream_info/file',
-                '%s m3u8 URL' % format_id)
-            if not m3u8_url:
-                continue
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', m3u8_id=format_id))
-        self._sort_formats(formats)
-
-        episode = self._get_episode(webpage, video_id)
-
-        title = episode['name']
-        description = episode.get('description')
-        duration = int_or_none(episode.get('duration'))
-
-        thumbnails = [{
-            'id': thumbnail_id,
-            'url': thumbnail_url,
-        } for thumbnail_id, thumbnail_url in episode.get('img_url', {}).items()]
-
-        subtitles = self.extract_subtitles(config, video_id)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnails': thumbnails,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles
-        }
-
-
-class SoompiShowIE(SoompiBaseIE):
-    IE_NAME = 'soompi:show'
-    _VALID_URL = r'https?://tv\.soompi\.com/en/shows/(?P<id>[0-9a-zA-Z\-_]+)'
-    _TESTS = [{
-        'url': 'http://tv.soompi.com/en/shows/liar-game',
-        'info_dict': {
-            'id': 'liar-game',
-            'title': 'Liar Game',
-            'description': 'md5:52c02bce0c1a622a95823591d0589b66',
-        },
-        'playlist_count': 14,
-    }]
-
-    def _real_extract(self, url):
-        show_id = self._match_id(url)
-
-        webpage = self._download_webpage(
-            url, show_id, 'Downloading show page')
-
-        title = remove_start(self._og_search_title(webpage), 'SoompiTV | ')
-        description = self._og_search_description(webpage)
-
-        entries = [
-            self.url_result('http://tv.soompi.com/en/watch/%s' % episode['id'], 'Soompi')
-            for episode in self._get_episodes(webpage)]
-
-        return self.playlist_result(entries, show_id, title, description)
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 0a6c9fe727895b00c8830b2b3568d62f973541a7..194dabc71d84072fc64afd50baa3b80467c0808f 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -4,11 +4,14 @@ from __future__ import unicode_literals
  import re
  import itertools
  
-from .common import InfoExtractor
+from .common import (
+    InfoExtractor,
+    SearchInfoExtractor
+)
  from ..compat import (
      compat_str,
      compat_urlparse,
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
  )
  from ..utils import (
      ExtractorError,
@@ -29,7 +32,7 @@ class SoundcloudIE(InfoExtractor):
      _VALID_URL = r'''(?x)^(?:https?://)?
                      (?:(?:(?:www\.|m\.)?soundcloud\.com/
                              (?P<uploader>[\w\d-]+)/
-                            (?!sets/|(?:likes|tracks)/?(?:$|[?#]))
+                            (?!(?:tracks|sets(?:/[^/?#]+)?|reposts|likes|spotlight)/?(?:$|[?#]))
                              (?P<title>[\w\d-]+)/?
                              (?P<token>[^?]+?)?(?:[?].*)?$)
                         |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+)
@@ -113,7 +116,7 @@ class SoundcloudIE(InfoExtractor):
          },
      ]
  
-    _CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
+    _CLIENT_ID = '02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
      _IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
  
      def report_resolve(self, video_id):
@@ -218,7 +221,7 @@ class SoundcloudIE(InfoExtractor):
              full_title = track_id
              token = mobj.group('secret_token')
              if token:
-                info_json_url += "&secret_token=" + token
+                info_json_url += '&secret_token=' + token
          elif mobj.group('player'):
              query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
              real_url = query['url'][0]
@@ -293,60 +296,139 @@ class SoundcloudSetIE(SoundcloudIE):
  
  
  class SoundcloudUserIE(SoundcloudIE):
-    _VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<user>[^/]+)/?((?P<rsrc>tracks|likes)/?)?(\?.*)?$'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:(?:www|m)\.)?soundcloud\.com/
+                            (?P<user>[^/]+)
+                            (?:/
+                                (?P<rsrc>tracks|sets|reposts|likes|spotlight)
+                            )?
+                            /?(?:[?#].*)?$
+                    '''
      IE_NAME = 'soundcloud:user'
      _TESTS = [{
-        'url': 'https://soundcloud.com/the-concept-band',
+        'url': 'https://soundcloud.com/the-akashic-chronicler',
          'info_dict': {
-            'id': '9615865',
-            'title': 'The Royal Concept',
+            'id': '114582580',
+            'title': 'The Akashic Chronicler (All)',
          },
-        'playlist_mincount': 12
+        'playlist_mincount': 111,
      }, {
-        'url': 'https://soundcloud.com/the-concept-band/likes',
+        'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
          'info_dict': {
-            'id': '9615865',
-            'title': 'The Royal Concept',
+            'id': '114582580',
+            'title': 'The Akashic Chronicler (Tracks)',
          },
-        'playlist_mincount': 1,
+        'playlist_mincount': 50,
      }, {
-        'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
-        'only_matching': True,
+        'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
+        'info_dict': {
+            'id': '114582580',
+            'title': 'The Akashic Chronicler (Playlists)',
+        },
+        'playlist_mincount': 3,
+    }, {
+        'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
+        'info_dict': {
+            'id': '114582580',
+            'title': 'The Akashic Chronicler (Reposts)',
+        },
+        'playlist_mincount': 7,
+    }, {
+        'url': 'https://soundcloud.com/the-akashic-chronicler/likes',
+        'info_dict': {
+            'id': '114582580',
+            'title': 'The Akashic Chronicler (Likes)',
+        },
+        'playlist_mincount': 321,
+    }, {
+        'url': 'https://soundcloud.com/grynpyret/spotlight',
+        'info_dict': {
+            'id': '7098329',
+            'title': 'Grynpyret (Spotlight)',
+        },
+        'playlist_mincount': 1,
      }]
  
+    _API_BASE = 'https://api.soundcloud.com'
+    _API_V2_BASE = 'https://api-v2.soundcloud.com'
+
+    _BASE_URL_MAP = {
+        'all': '%s/profile/soundcloud:users:%%s' % _API_V2_BASE,
+        'tracks': '%s/users/%%s/tracks' % _API_BASE,
+        'sets': '%s/users/%%s/playlists' % _API_V2_BASE,
+        'reposts': '%s/profile/soundcloud:users:%%s/reposts' % _API_V2_BASE,
+        'likes': '%s/users/%%s/likes' % _API_V2_BASE,
+        'spotlight': '%s/users/%%s/spotlight' % _API_V2_BASE,
+    }
+
+    _TITLE_MAP = {
+        'all': 'All',
+        'tracks': 'Tracks',
+        'sets': 'Playlists',
+        'reposts': 'Reposts',
+        'likes': 'Likes',
+        'spotlight': 'Spotlight',
+    }
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          uploader = mobj.group('user')
-        resource = mobj.group('rsrc')
-        if resource is None:
-            resource = 'tracks'
-        elif resource == 'likes':
-            resource = 'favorites'
  
          url = 'http://soundcloud.com/%s/' % uploader
          resolv_url = self._resolv_url(url)
          user = self._download_json(
              resolv_url, uploader, 'Downloading user info')
-        base_url = 'http://api.soundcloud.com/users/%s/%s.json?' % (uploader, resource)
+
+        resource = mobj.group('rsrc') or 'all'
+        base_url = self._BASE_URL_MAP[resource] % user['id']
+
+        COMMON_QUERY = {
+            'limit': 50,
+            'client_id': self._CLIENT_ID,
+            'linked_partitioning': '1',
+        }
+
+        query = COMMON_QUERY.copy()
+        query['offset'] = 0
+
+        next_href = base_url + '?' + compat_urllib_parse_urlencode(query)
  
          entries = []
          for i in itertools.count():
-            data = compat_urllib_parse.urlencode({
-                'offset': i * 50,
-                'limit': 50,
-                'client_id': self._CLIENT_ID,
-            })
-            new_entries = self._download_json(
-                base_url + data, uploader, 'Downloading track page %s' % (i + 1))
-            if len(new_entries) == 0:
-                self.to_screen('%s: End page received' % uploader)
+            response = self._download_json(
+                next_href, uploader, 'Downloading track page %s' % (i + 1))
+
+            collection = response['collection']
+            if not collection:
+                break
+
+            def resolve_permalink_url(candidates):
+                for cand in candidates:
+                    if isinstance(cand, dict):
+                        permalink_url = cand.get('permalink_url')
+                        if permalink_url and permalink_url.startswith('http'):
+                            return permalink_url
+
+            for e in collection:
+                permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
+                if permalink_url:
+                    entries.append(self.url_result(permalink_url))
+
+            next_href = response.get('next_href')
+            if not next_href:
                  break
-            entries.extend(self.url_result(e['permalink_url'], 'Soundcloud') for e in new_entries)
+
+            parsed_next_href = compat_urlparse.urlparse(response['next_href'])
+            qs = compat_urlparse.parse_qs(parsed_next_href.query)
+            qs.update(COMMON_QUERY)
+            next_href = compat_urlparse.urlunparse(
+                parsed_next_href._replace(query=compat_urllib_parse_urlencode(qs, True)))
  
          return {
              '_type': 'playlist',
              'id': compat_str(user['id']),
-            'title': user['username'],
+            'title': '%s (%s)' % (user['username'], self._TITLE_MAP[resource]),
              'entries': entries,
          }
  
@@ -377,7 +459,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
          if token:
              data_dict['secret_token'] = token
  
-        data = compat_urllib_parse.urlencode(data_dict)
+        data = compat_urllib_parse_urlencode(data_dict)
          data = self._download_json(
              base_url + data, playlist_id, 'Downloading playlist')
  
@@ -390,3 +472,60 @@ class SoundcloudPlaylistIE(SoundcloudIE):
              'description': data.get('description'),
              'entries': entries,
          }
+
+
+class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
+    IE_NAME = 'soundcloud:search'
+    IE_DESC = 'Soundcloud search'
+    _MAX_RESULTS = float('inf')
+    _TESTS = [{
+        'url': 'scsearch15:post-avant jazzcore',
+        'info_dict': {
+            'title': 'post-avant jazzcore',
+        },
+        'playlist_count': 15,
+    }]
+
+    _SEARCH_KEY = 'scsearch'
+    _MAX_RESULTS_PER_PAGE = 200
+    _DEFAULT_RESULTS_PER_PAGE = 50
+    _API_V2_BASE = 'https://api-v2.soundcloud.com'
+
+    def _get_collection(self, endpoint, collection_id, **query):
+        limit = min(
+            query.get('limit', self._DEFAULT_RESULTS_PER_PAGE),
+            self._MAX_RESULTS_PER_PAGE)
+        query['limit'] = limit
+        query['client_id'] = self._CLIENT_ID
+        query['linked_partitioning'] = '1'
+        query['offset'] = 0
+        data = compat_urllib_parse_urlencode(query)
+        next_url = '{0}{1}?{2}'.format(self._API_V2_BASE, endpoint, data)
+
+        collected_results = 0
+
+        for i in itertools.count(1):
+            response = self._download_json(
+                next_url, collection_id, 'Downloading page {0}'.format(i),
+                'Unable to download API page')
+
+            collection = response.get('collection', [])
+            if not collection:
+                break
+
+            collection = list(filter(bool, collection))
+            collected_results += len(collection)
+
+            for item in collection:
+                yield self.url_result(item['uri'], SoundcloudIE.ie_key())
+
+            if not collection or collected_results >= limit:
+                break
+
+            next_url = response.get('next_href')
+            if not next_url:
+                break
+
+    def _get_n_results(self, query, n):
+        tracks = self._get_collection('/search/tracks', query, limit=n, q=query)
+        return self.playlist_result(tracks, playlist_title=query)
diff --git a/youtube_dl/extractor/southpark.py b/youtube_dl/extractor/southpark.py

index 7fb165a872766f4c1917d2929ec8a73b8f74434e..87b6504682a4feb50ce65c3eea43d07041aae31c 100644 (file)
--- a/youtube_dl/extractor/southpark.py
+++ b/youtube_dl/extractor/southpark.py
@@ -45,6 +45,14 @@ class SouthParkDeIE(SouthParkIE):
              'title': 'The Government Won\'t Respect My Privacy',
              'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
          },
+    }, {
+        # non-ASCII characters in initial URL
+        'url': 'http://www.southpark.de/alle-episoden/s18e09-hashtag-aufwärmen',
+        'playlist_count': 4,
+    }, {
+        # non-ASCII characters in redirect URL
+        'url': 'http://www.southpark.de/alle-episoden/s18e09',
+        'playlist_count': 4,
      }]
  
  
diff --git a/youtube_dl/extractor/space.py b/youtube_dl/extractor/space.py

deleted file mode 100644 (file)

index c2d0d36..0000000
--- a/youtube_dl/extractor/space.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from .brightcove import BrightcoveIE
-from ..utils import RegexNotFoundError, ExtractorError
-
-
-class SpaceIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
-    _TEST = {
-        'add_ie': ['Brightcove'],
-        'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
-        'info_dict': {
-            'id': '2780937028001',
-            'ext': 'mp4',
-            'title': 'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
-            'description': 'md5:db81cf7f3122f95ed234b631a6ea1e61',
-            'uploader': 'TechMedia Networks',
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        title = mobj.group('title')
-        webpage = self._download_webpage(url, title)
-        try:
-            # Some videos require the playerKey field, which isn't define in
-            # the BrightcoveExperience object
-            brightcove_url = self._og_search_video_url(webpage)
-        except RegexNotFoundError:
-            # Other videos works fine with the info from the object
-            brightcove_url = BrightcoveIE._extract_brightcove_url(webpage)
-        if brightcove_url is None:
-            raise ExtractorError(
-                'The webpage does not contain a video', expected=True)
-        return self.url_result(brightcove_url, BrightcoveIE.ie_key())
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index 7f060b15b69908a71ff45a44f13e92ab0089ff5b..50433d0f678f27c348031dbe0d6fcc3774d021b7 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  class SpankBangIE(InfoExtractor):
      _VALID_URL = r'https?://(?:(?:www|[a-z]{2})\.)?spankbang\.com/(?P<id>[\da-z]+)/video'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://spankbang.com/3vvn/video/fantasy+solo',
          'md5': '1cc433e1d6aa14bc376535b8679302f7',
          'info_dict': {
@@ -19,7 +19,11 @@ class SpankBangIE(InfoExtractor):
              'uploader': 'silly2587',
              'age_limit': 18,
          }
-    }
+    }, {
+        # 480p only
+        'url': 'http://spankbang.com/1vt0/video/solvane+gangbang',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -34,11 +38,12 @@ class SpankBangIE(InfoExtractor):
              'ext': 'mp4',
              'format_id': '%sp' % height,
              'height': int(height),
-        } for height in re.findall(r'<span[^>]+q_(\d+)p', webpage)]
+        } for height in re.findall(r'<(?:span|li|p)[^>]+[qb]_(\d+)p', webpage)]
+        self._check_formats(formats, video_id)
          self._sort_formats(formats)
  
          title = self._html_search_regex(
-            r'(?s)<h1>(.+?)</h1>', webpage, 'title')
+            r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
          description = self._search_regex(
              r'class="desc"[^>]*>([^<]+)',
              webpage, 'description', default=None)
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 5fa6faf18b738aa32e384972bf65ad56188ad9b4..692fd78e886c0a6a932adce4659f2564beeab7e6 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -6,9 +6,9 @@ from .common import InfoExtractor
  from ..compat import (
      compat_urllib_parse_unquote,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
  )
  from ..utils import (
+    sanitized_Request,
      str_to_int,
      unified_strdate,
  )
@@ -16,8 +16,9 @@ from ..aes import aes_decrypt_text
  
  
  class SpankwireIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
+    _TESTS = [{
+        # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
          'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
          'md5': '8bbfde12b101204b39e4b9fe7eb67095',
          'info_dict': {
@@ -30,14 +31,27 @@ class SpankwireIE(InfoExtractor):
              'upload_date': '20070507',
              'age_limit': 18,
          }
-    }
+    }, {
+        # download URL pattern: */mp4_<format_id>_<video_id>.mp4
+        'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
+        'md5': '09b3c20833308b736ae8902db2f8d7e6',
+        'info_dict': {
+            'id': '1921551',
+            'ext': 'mp4',
+            'title': 'Titcums Compiloation I',
+            'description': 'cum on tits',
+            'uploader': 'dannyh78999',
+            'uploader_id': '3056053',
+            'upload_date': '20150822',
+            'age_limit': 18,
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('videoid')
-        url = 'http://www.' + mobj.group('url')
+        video_id = mobj.group('id')
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request('http://www.' + mobj.group('url'))
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, video_id)
  
@@ -54,7 +68,7 @@ class SpankwireIE(InfoExtractor):
              r'by:\s*<a [^>]*>(.+?)</a>',
              webpage, 'uploader', fatal=False)
          uploader_id = self._html_search_regex(
-            r'by:\s*<a href="/Profile\.aspx\?.*?UserId=(\d+).*?"',
+            r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
              webpage, 'uploader id', fatal=False)
          upload_date = unified_strdate(self._html_search_regex(
              r'</a> on (.+?) at \d+:\d+',
@@ -67,9 +81,10 @@ class SpankwireIE(InfoExtractor):
              r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
              webpage, 'comment count', fatal=False))
  
-        video_urls = list(map(
-            compat_urllib_parse_unquote,
-            re.findall(r'playerData\.cdnPath[0-9]{3,}\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)))
+        videos = re.findall(
+            r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
+        heights = [int(video[0]) for video in videos]
+        video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
          if webpage.find('flashvars\.encrypted = "true"') != -1:
              password = self._search_regex(
                  r'flashvars\.video_title = "([^"]+)',
@@ -79,21 +94,22 @@ class SpankwireIE(InfoExtractor):
                  video_urls))
  
          formats = []
-        for video_url in video_urls:
+        for height, video_url in zip(heights, video_urls):
              path = compat_urllib_parse_urlparse(video_url).path
-            format = path.split('/')[4].split('_')[:2]
-            resolution, bitrate_str = format
-            format = "-".join(format)
-            height = int(resolution.rstrip('Pp'))
-            tbr = int(bitrate_str.rstrip('Kk'))
-            formats.append({
+            _, quality = path.split('/')[4].split('_')[:2]
+            f = {
                  'url': video_url,
-                'resolution': resolution,
-                'format': format,
-                'tbr': tbr,
                  'height': height,
-                'format_id': format,
-            })
+            }
+            tbr = self._search_regex(r'^(\d+)[Kk]$', quality, 'tbr', default=None)
+            if tbr:
+                f.update({
+                    'tbr': int(tbr),
+                    'format_id': '%dp' % height,
+                })
+            else:
+                f['format_id'] = quality
+            formats.append(f)
          self._sort_formats(formats)
  
          age_limit = self._rta_search(webpage)
diff --git a/youtube_dl/extractor/spiegel.py b/youtube_dl/extractor/spiegel.py

index 5bd3c00875234c5efcf68772178b672261cc2a9f..39a7aaf9d630203dc1796b3b5621aad3c433f575 100644 (file)
--- a/youtube_dl/extractor/spiegel.py
+++ b/youtube_dl/extractor/spiegel.py
@@ -58,7 +58,8 @@ class SpiegelIE(InfoExtractor):
          description = self._html_search_meta('description', webpage, 'description')
  
          base_url = self._search_regex(
-            r'var\s+server\s*=\s*"([^"]+)\"', webpage, 'server URL')
+            [r'server\s*:\s*(["\'])(?P<url>.+?)\1', r'var\s+server\s*=\s*"(?P<url>[^"]+)\"'],
+            webpage, 'server URL', group='url')
  
          xml_url = base_url + video_id + '.xml'
          idoc = self._download_xml(xml_url, video_id)
diff --git a/youtube_dl/extractor/spiegeltv.py b/youtube_dl/extractor/spiegeltv.py

index 27f4033c547a9700db6af520c4fe4a957e3755c0..034bd47ff617bdc96d572b7065b3af03c7117468 100644 (file)
--- a/youtube_dl/extractor/spiegeltv.py
+++ b/youtube_dl/extractor/spiegeltv.py
@@ -77,17 +77,21 @@ class SpiegeltvIE(InfoExtractor):
                      'rtmp_live': True,
                  })
              elif determine_ext(endpoint) == 'm3u8':
-                m3u8_formats = self._extract_m3u8_formats(
-                    endpoint.replace('[video]', play_path),
-                    video_id, 'm4v',
-                    preference=1,  # Prefer hls since it allows to workaround georestriction
-                    m3u8_id='hls', fatal=False)
-                if m3u8_formats is not False:
-                    formats.extend(m3u8_formats)
+                formats.append({
+                    'url': endpoint.replace('[video]', play_path),
+                    'ext': 'm4v',
+                    'format_id': 'hls',  # Prefer hls since it allows to workaround georestriction
+                    'protocol': 'm3u8',
+                    'preference': 1,
+                    'http_headers': {
+                        'Accept-Encoding': 'deflate',  # gzip causes trouble on the server side
+                    },
+                })
              else:
                  formats.append({
                      'url': endpoint,
                  })
+        self._check_formats(formats, video_id)
  
          thumbnails = []
          for image in media_json['images']:
diff --git a/youtube_dl/extractor/sport5.py b/youtube_dl/extractor/sport5.py

index dfe50ed4585b0fe876b8a300edd00a453ae4b690..7e67833062d0a21d2c663b1b5d24246d653f0116 100644 (file)
--- a/youtube_dl/extractor/sport5.py
+++ b/youtube_dl/extractor/sport5.py
@@ -8,7 +8,7 @@ from ..utils import ExtractorError
  
  
  class Sport5IE(InfoExtractor):
-    _VALID_URL = r'http://(?:www|vod)?\.sport5\.co\.il/.*\b(?:Vi|docID)=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|vod)?\.sport5\.co\.il/.*\b(?:Vi|docID)=(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://vod.sport5.co.il/?Vc=147&Vi=176331&Page=1',
diff --git a/youtube_dl/extractor/sportbox.py b/youtube_dl/extractor/sportbox.py

index 86d509ae5351a3cc15be66dda9f485d63ec166ba..e5c28ae890ee61536052a5716677d486d0a5b43e 100644 (file)
--- a/youtube_dl/extractor/sportbox.py
+++ b/youtube_dl/extractor/sportbox.py
@@ -6,6 +6,7 @@ import re
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
+    js_to_json,
      unified_strdate,
  )
  
@@ -94,18 +95,32 @@ class SportBoxEmbedIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        hls = self._search_regex(
-            r"sportboxPlayer\.jwplayer_common_params\.file\s*=\s*['\"]([^'\"]+)['\"]",
-            webpage, 'hls file')
+        formats = []
  
-        formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
+        def cleanup_js(code):
+            # desktop_advert_config contains complex Javascripts and we don't need it
+            return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
  
-        title = self._search_regex(
-            r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title')
+        jwplayer_data = self._parse_json(self._search_regex(
+            r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
+            transform_source=cleanup_js)
  
-        thumbnail = self._search_regex(
-            r'sportboxPlayer\.jwplayer_common_params\.image\s*=\s*"([^"]+)"',
-            webpage, 'thumbnail', default=None)
+        hls_url = jwplayer_data.get('hls_url')
+        if hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, video_id, ext='mp4', m3u8_id='hls'))
+
+        rtsp_url = jwplayer_data.get('rtsp_url')
+        if rtsp_url:
+            formats.append({
+                'url': rtsp_url,
+                'format_id': 'rtsp',
+            })
+
+        self._sort_formats(formats)
+
+        title = jwplayer_data['node_title']
+        thumbnail = jwplayer_data.get('image_url')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py

index 1a57aebf16c4439174844b3e21bb42bf4331d02a..a9927f6e29d1d52463cefc3503414305bec0e919 100644 (file)
--- a/youtube_dl/extractor/sportdeutschland.py
+++ b/youtube_dl/extractor/sportdeutschland.py
@@ -4,11 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -38,10 +36,12 @@ class SportDeutschlandIE(InfoExtractor):
              'upload_date': '20140825',
              'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
              'timestamp': 1408976060,
+            'duration': 2732,
              'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
              'thumbnail': 're:^https?://.*\.jpg$',
              'view_count': int,
              'categories': ['Li-Ning Badminton WM 2014'],
+
          }
      }]
  
@@ -50,20 +50,19 @@ class SportDeutschlandIE(InfoExtractor):
          video_id = mobj.group('id')
          sport_id = mobj.group('sport')
  
-        api_url = 'http://splink.tv/api/permalinks/%s/%s' % (
+        api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
              sport_id, video_id)
-        req = compat_urllib_request.Request(api_url, headers={
+        req = sanitized_Request(api_url, headers={
              'Accept': 'application/vnd.vidibus.v2.html+json',
              'Referer': url,
          })
          data = self._download_json(req, video_id)
  
-        categories = list(data.get('section', {}).get('tags', {}).values())
          asset = data['asset']
-        assets_info = self._download_json(asset['url'], video_id)
+        categories = [data['section']['title']]
  
          formats = []
-        smil_url = assets_info['video']
+        smil_url = asset['video']
          if '.smil' in smil_url:
              m3u8_url = smil_url.replace('.smil', '.m3u8')
              formats.extend(
@@ -71,10 +70,12 @@ class SportDeutschlandIE(InfoExtractor):
  
              smil_doc = self._download_xml(
                  smil_url, video_id, note='Downloading SMIL metadata')
-            base_url = smil_doc.find('./head/meta').attrib['base']
+            base_url_el = smil_doc.find('./head/meta')
+            if base_url_el:
+                base_url = base_url_el.attrib['base']
              formats.extend([{
                  'format_id': 'rmtp',
-                'url': base_url,
+                'url': base_url if base_url_el else n.attrib['src'],
                  'play_path': n.attrib['src'],
                  'ext': 'flv',
                  'preference': -100,
@@ -91,6 +92,7 @@ class SportDeutschlandIE(InfoExtractor):
              'title': asset['title'],
              'thumbnail': asset.get('image'),
              'description': asset.get('teaser'),
+            'duration': asset.get('duration'),
              'categories': categories,
              'view_count': asset.get('views'),
              'rtmp_live': asset.get('live'),
diff --git a/youtube_dl/extractor/srf.py b/youtube_dl/extractor/srf.py

deleted file mode 100644 (file)

index 77eec0b..0000000
--- a/youtube_dl/extractor/srf.py
+++ /dev/null
@@ -1,104 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-from .common import InfoExtractor
-from ..utils import (
-    determine_ext,
-    parse_iso8601,
-    xpath_text,
-)
-
-
-class SrfIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.srf\.ch/play(?:er)?/tv/[^/]+/video/(?P<display_id>[^?]+)\?id=|tp\.srgssr\.ch/p/flash\?urn=urn:srf:ais:video:)(?P<id>[0-9a-f\-]{36})'
-    _TESTS = [{
-        'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-        'md5': '4cd93523723beff51bb4bee974ee238d',
-        'info_dict': {
-            'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-            'display_id': 'snowden-beantragt-asyl-in-russland',
-            'ext': 'm4v',
-            'upload_date': '20130701',
-            'title': 'Snowden beantragt Asyl in Russland',
-            'timestamp': 1372713995,
-        }
-    }, {
-        # No Speichern (Save) button
-        'url': 'http://www.srf.ch/play/tv/top-gear/video/jaguar-xk120-shadow-und-tornado-dampflokomotive?id=677f5829-e473-4823-ac83-a1087fe97faa',
-        'md5': 'd97e236e80d1d24729e5d0953d276a4f',
-        'info_dict': {
-            'id': '677f5829-e473-4823-ac83-a1087fe97faa',
-            'display_id': 'jaguar-xk120-shadow-und-tornado-dampflokomotive',
-            'ext': 'flv',
-            'upload_date': '20130710',
-            'title': 'Jaguar XK120, Shadow und Tornado-Dampflokomotive',
-            'timestamp': 1373493600,
-        },
-    }, {
-        'url': 'http://www.srf.ch/player/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-        'only_matching': True,
-    }, {
-        'url': 'https://tp.srgssr.ch/p/flash?urn=urn:srf:ais:video:28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        display_id = re.match(self._VALID_URL, url).group('display_id') or video_id
-
-        video_data = self._download_xml(
-            'http://il.srgssr.ch/integrationlayer/1.0/ue/srf/video/play/%s.xml' % video_id,
-            display_id)
-
-        title = xpath_text(
-            video_data, './AssetMetadatas/AssetMetadata/title', fatal=True)
-        thumbnails = [{
-            'url': s.text
-        } for s in video_data.findall('.//ImageRepresentation/url')]
-        timestamp = parse_iso8601(xpath_text(video_data, './createdDate'))
-        # The <duration> field in XML is different from the exact duration, skipping
-
-        formats = []
-        for item in video_data.findall('./Playlists/Playlist') + video_data.findall('./Downloads/Download'):
-            for url_node in item.findall('url'):
-                quality = url_node.attrib['quality']
-                full_url = url_node.text
-                original_ext = determine_ext(full_url)
-                format_id = '%s-%s' % (quality, item.attrib['protocol'])
-                if original_ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        full_url + '?hdcore=3.4.0', display_id, f4m_id=format_id))
-                elif original_ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        full_url, display_id, 'mp4', m3u8_id=format_id))
-                else:
-                    formats.append({
-                        'url': full_url,
-                        'ext': original_ext,
-                        'format_id': format_id,
-                        'quality': 0 if 'HD' in quality else -1,
-                        'preference': 1,
-                    })
-
-        self._sort_formats(formats)
-
-        subtitles = {}
-        subtitles_data = video_data.find('Subtitles')
-        if subtitles_data is not None:
-            subtitles_list = [{
-                'url': sub.text,
-                'ext': determine_ext(sub.text),
-            } for sub in subtitles_data]
-            if subtitles_list:
-                subtitles['de'] = subtitles_list
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'formats': formats,
-            'title': title,
-            'thumbnails': thumbnails,
-            'timestamp': timestamp,
-            'subtitles': subtitles,
-        }
diff --git a/youtube_dl/extractor/srgssr.py b/youtube_dl/extractor/srgssr.py

new file mode 100644 (file)

index 0000000..246970c
--- /dev/null
+++ b/youtube_dl/extractor/srgssr.py
@@ -0,0 +1,155 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    parse_iso8601,
+    qualities,
+)
+
+
+class SRGSSRIE(InfoExtractor):
+    _VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)'
+
+    _ERRORS = {
+        'AGERATING12': 'To protect children under the age of 12, this video is only available between 8 p.m. and 6 a.m.',
+        'AGERATING18': 'To protect children under the age of 18, this video is only available between 11 p.m. and 5 a.m.',
+        # 'ENDDATE': 'For legal reasons, this video was only available for a specified period of time.',
+        'GEOBLOCK': 'For legal reasons, this video is only available in Switzerland.',
+        'LEGAL': 'The video cannot be transmitted for legal reasons.',
+        'STARTDATE': 'This video is not yet available. Please try again later.',
+    }
+
+    def get_media_data(self, bu, media_type, media_id):
+        media_data = self._download_json(
+            'http://il.srgssr.ch/integrationlayer/1.0/ue/%s/%s/play/%s.json' % (bu, media_type, media_id),
+            media_id)[media_type.capitalize()]
+
+        if media_data.get('block') and media_data['block'] in self._ERRORS:
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, self._ERRORS[media_data['block']]), expected=True)
+
+        return media_data
+
+    def _real_extract(self, url):
+        bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
+
+        if bu == 'rts':
+            return self.url_result('rts:%s' % media_id, 'RTS')
+
+        media_data = self.get_media_data(bu, media_type, media_id)
+
+        metadata = media_data['AssetMetadatas']['AssetMetadata'][0]
+        title = metadata['title']
+        description = metadata.get('description')
+        created_date = media_data.get('createdDate') or metadata.get('createdDate')
+        timestamp = parse_iso8601(created_date)
+
+        thumbnails = [{
+            'id': image.get('id'),
+            'url': image['url'],
+        } for image in media_data.get('Image', {}).get('ImageRepresentations', {}).get('ImageRepresentation', [])]
+
+        preference = qualities(['LQ', 'MQ', 'SD', 'HQ', 'HD'])
+        formats = []
+        for source in media_data.get('Playlists', {}).get('Playlist', []) + media_data.get('Downloads', {}).get('Download', []):
+            protocol = source.get('@protocol')
+            for asset in source['url']:
+                asset_url = asset['text']
+                quality = asset['@quality']
+                format_id = '%s-%s' % (protocol, quality)
+                if protocol == 'HTTP-HDS':
+                    formats.extend(self._extract_f4m_formats(
+                        asset_url + '?hdcore=3.4.0', media_id,
+                        f4m_id=format_id, fatal=False))
+                elif protocol == 'HTTP-HLS':
+                    formats.extend(self._extract_m3u8_formats(
+                        asset_url, media_id, 'mp4', 'm3u8_native',
+                        m3u8_id=format_id, fatal=False))
+                else:
+                    formats.append({
+                        'format_id': format_id,
+                        'url': asset_url,
+                        'preference': preference(quality),
+                        'ext': 'flv' if protocol == 'RTMP' else None,
+                    })
+        self._sort_formats(formats)
+
+        return {
+            'id': media_id,
+            'title': title,
+            'description': description,
+            'timestamp': timestamp,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
+
+
+class SRGSSRPlayIE(InfoExtractor):
+    IE_DESC = 'srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites'
+    _VALID_URL = r'https?://(?:(?:www|play)\.)?(?P<bu>srf|rts|rsi|rtr|swissinfo)\.ch/play/(?:tv|radio)/[^/]+/(?P<type>video|audio)/[^?]+\?id=(?P<id>[0-9a-f\-]{36}|\d+)'
+
+    _TESTS = [{
+        'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
+        'md5': '4cd93523723beff51bb4bee974ee238d',
+        'info_dict': {
+            'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
+            'ext': 'm4v',
+            'upload_date': '20130701',
+            'title': 'Snowden beantragt Asyl in Russland',
+            'timestamp': 1372713995,
+        }
+    }, {
+        # No Speichern (Save) button
+        'url': 'http://www.srf.ch/play/tv/top-gear/video/jaguar-xk120-shadow-und-tornado-dampflokomotive?id=677f5829-e473-4823-ac83-a1087fe97faa',
+        'md5': '0a274ce38fda48c53c01890651985bc6',
+        'info_dict': {
+            'id': '677f5829-e473-4823-ac83-a1087fe97faa',
+            'ext': 'flv',
+            'upload_date': '20130710',
+            'title': 'Jaguar XK120, Shadow und Tornado-Dampflokomotive',
+            'description': 'md5:88604432b60d5a38787f152dec89cd56',
+            'timestamp': 1373493600,
+        },
+    }, {
+        'url': 'http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc',
+        'info_dict': {
+            'id': '63cb0778-27f8-49af-9284-8c7a8c6d15fc',
+            'ext': 'mp3',
+            'upload_date': '20151013',
+            'title': 'Saira: Tujetsch - tuttina cuntinuar cun Sedrun Mustér Turissem',
+            'timestamp': 1444750398,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.rts.ch/play/tv/-/video/le-19h30?id=6348260',
+        'md5': '67a2a9ae4e8e62a68d0e9820cc9782df',
+        'info_dict': {
+            'id': '6348260',
+            'display_id': '6348260',
+            'ext': 'mp4',
+            'duration': 1796,
+            'title': 'Le 19h30',
+            'description': '',
+            'uploader': '19h30',
+            'upload_date': '20141201',
+            'timestamp': 1417458600,
+            'thumbnail': 're:^https?://.*\.image',
+            'view_count': int,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]
+
+    def _real_extract(self, url):
+        bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
+        # other info can be extracted from url + '&layout=json'
+        return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index 5d583c720bff22bee0b5af55699a378e3a5267ea..74d01183f5f396fb9499a8426775886faed5961d 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -1,17 +1,18 @@
  # encoding: utf-8
  from __future__ import unicode_literals
  
-import json
+from .ard import ARDMediathekIE
+from ..utils import (
+    ExtractorError,
+    get_element_by_attribute,
+)
  
-from .common import InfoExtractor
-from ..utils import js_to_json
  
-
-class SRMediathekIE(InfoExtractor):
+class SRMediathekIE(ARDMediathekIE):
      IE_DESC = 'Saarländischer Rundfunk'
      _VALID_URL = r'https?://sr-mediathek\.sr-online\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=28455',
          'info_dict': {
              'id': '28455',
@@ -20,24 +21,36 @@ class SRMediathekIE(InfoExtractor):
              'description': 'Ringen: KSV Köllerbach gegen Aachen-Walheim; Frauen-Fußball: 1. FC Saarbrücken gegen Sindelfingen; Motorsport: Rallye in Losheim; dazu: Interview mit Timo Bernhard; Turnen: TG Saar; Reitsport: Deutscher Voltigier-Pokal; Badminton: Interview mit Michael Fuchs ',
              'thumbnail': 're:^https?://.*\.jpg$',
          },
-    }
+        'skip': 'no longer available',
+    }, {
+        'url': 'http://sr-mediathek.sr-online.de/index.php?seite=7&id=37682',
+        'info_dict': {
+            'id': '37682',
+            'ext': 'mp4',
+            'title': 'Love, Cakes and Rock\'n\'Roll',
+            'description': 'md5:18bf9763631c7d326c22603681e1123d',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['Unable to download f4m manifest']
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        murls = json.loads(js_to_json(self._search_regex(
-            r'var mediaURLs\s*=\s*(.*?);\n', webpage, 'video URLs')))
-        formats = [{'url': murl} for murl in murls]
-        self._sort_formats(formats)
-
-        title = json.loads(js_to_json(self._search_regex(
-            r'var mediaTitles\s*=\s*(.*?);\n', webpage, 'title')))[0]
+        if '>Der gew&uuml;nschte Beitrag ist leider nicht mehr verf&uuml;gbar.<' in webpage:
+            raise ExtractorError('Video %s is no longer available' % video_id, expected=True)
  
-        return {
+        media_collection_url = self._search_regex(
+            r'data-mediacollection-ardplayer="([^"]+)"', webpage, 'media collection url')
+        info = self._extract_media_info(media_collection_url, webpage, video_id)
+        info.update({
              'id': video_id,
-            'title': title,
-            'formats': formats,
+            'title': get_element_by_attribute('class', 'ardplayer-title', webpage),
              'description': self._og_search_description(webpage),
              'thumbnail': self._og_search_thumbnail(webpage),
-        }
+        })
+        return info
diff --git a/youtube_dl/extractor/ssa.py b/youtube_dl/extractor/ssa.py

index 13101c7146244181f62a634b773da446e2f5e79a..54d1843f2200d0cef7fa2e7b192f673d316c5f18 100644 (file)
--- a/youtube_dl/extractor/ssa.py
+++ b/youtube_dl/extractor/ssa.py
@@ -8,7 +8,7 @@ from ..utils import (
  
  
  class SSAIE(InfoExtractor):
-    _VALID_URL = r'http://ssa\.nls\.uk/film/(?P<id>\d+)'
+    _VALID_URL = r'https?://ssa\.nls\.uk/film/(?P<id>\d+)'
      _TEST = {
          'url': 'http://ssa.nls.uk/film/3561',
          'info_dict': {
diff --git a/youtube_dl/extractor/steam.py b/youtube_dl/extractor/steam.py

index 183dcb03cccb61a2f843d5c1b511050fc4bce75d..1a831ef6da5f4076dbbab4c989562d7e183d1f43 100644 (file)
--- a/youtube_dl/extractor/steam.py
+++ b/youtube_dl/extractor/steam.py
@@ -22,23 +22,23 @@ class SteamIE(InfoExtractor):
      _VIDEO_PAGE_TEMPLATE = 'http://store.steampowered.com/video/%s/'
      _AGECHECK_TEMPLATE = 'http://store.steampowered.com/agecheck/video/%s/?snr=1_agecheck_agecheck__age-gate&ageDay=1&ageMonth=January&ageYear=1970'
      _TESTS = [{
-        "url": "http://store.steampowered.com/video/105600/",
-        "playlist": [
+        'url': 'http://store.steampowered.com/video/105600/',
+        'playlist': [
              {
-                "md5": "f870007cee7065d7c76b88f0a45ecc07",
-                "info_dict": {
+                'md5': 'f870007cee7065d7c76b88f0a45ecc07',
+                'info_dict': {
                      'id': '81300',
                      'ext': 'flv',
-                    "title": "Terraria 1.1 Trailer",
+                    'title': 'Terraria 1.1 Trailer',
                      'playlist_index': 1,
                  }
              },
              {
-                "md5": "61aaf31a5c5c3041afb58fb83cbb5751",
-                "info_dict": {
+                'md5': '61aaf31a5c5c3041afb58fb83cbb5751',
+                'info_dict': {
                      'id': '80859',
                      'ext': 'flv',
-                    "title": "Terraria Trailer",
+                    'title': 'Terraria Trailer',
                      'playlist_index': 2,
                  }
              }
diff --git a/youtube_dl/extractor/stitcher.py b/youtube_dl/extractor/stitcher.py

new file mode 100644 (file)

index 0000000..d5c852f
--- /dev/null
+++ b/youtube_dl/extractor/stitcher.py
@@ -0,0 +1,81 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    int_or_none,
+    js_to_json,
+    unescapeHTML,
+)
+
+
+class StitcherIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?stitcher\.com/podcast/(?:[^/]+/)+e/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)'
+    _TESTS = [{
+        'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
+        'md5': '391dd4e021e6edeb7b8e68fbf2e9e940',
+        'info_dict': {
+            'id': '40789481',
+            'ext': 'mp3',
+            'title': 'Machine Learning Mastery and Cancer Clusters',
+            'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3',
+            'duration': 1604,
+            'thumbnail': 're:^https?://.*\.jpg',
+        },
+    }, {
+        'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
+        'info_dict': {
+            'id': '40846275',
+            'display_id': 'the-rare-hourlong-comedy-plus',
+            'ext': 'mp3',
+            'title': "The CW's 'Crazy Ex-Girlfriend'",
+            'description': 'md5:04f1e2f98eb3f5cbb094cea0f9e19b17',
+            'duration': 2235,
+            'thumbnail': 're:^https?://.*\.jpg',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # escaped title
+        'url': 'http://www.stitcher.com/podcast/marketplace-on-stitcher/e/40910226?autoplay=true',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.stitcher.com/podcast/panoply/getting-in/e/episode-2a-how-many-extracurriculars-should-i-have-40876278?autoplay=true',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        audio_id = mobj.group('id')
+        display_id = mobj.group('display_id') or audio_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        episode = self._parse_json(
+            js_to_json(self._search_regex(
+                r'(?s)var\s+stitcher\s*=\s*({.+?});\n', webpage, 'episode config')),
+            display_id)['config']['episode']
+
+        title = unescapeHTML(episode['title'])
+        formats = [{
+            'url': episode[episode_key],
+            'ext': determine_ext(episode[episode_key]) or 'mp3',
+            'vcodec': 'none',
+        } for episode_key in ('episodeURL',) if episode.get(episode_key)]
+        description = self._search_regex(
+            r'Episode Info:\s*</span>([^<]+)<', webpage, 'description', fatal=False)
+        duration = int_or_none(episode.get('duration'))
+        thumbnail = episode.get('episodeImage')
+
+        return {
+            'id': audio_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/streamcloud.py b/youtube_dl/extractor/streamcloud.py

index d4e1340158da92de09776b39a4dfebe8e38aeddf..712359885fde90fa3032aeff1b2cb74afb761f35 100644 (file)
--- a/youtube_dl/extractor/streamcloud.py
+++ b/youtube_dl/extractor/streamcloud.py
@@ -4,9 +4,9 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..utils import (
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -37,13 +37,13 @@ class StreamcloudIE(InfoExtractor):
              (?:id="[^"]+"\s+)?
              value="([^"]*)"
              ''', orig_webpage)
-        post = compat_urllib_parse.urlencode(fields)
+        post = urlencode_postdata(fields)
  
          self._sleep(12, video_id)
          headers = {
              b'Content-Type': b'application/x-www-form-urlencoded',
          }
-        req = compat_urllib_request.Request(url, post, headers)
+        req = sanitized_Request(url, post, headers)
  
          webpage = self._download_webpage(
              req, video_id, note='Downloading video page ...')
diff --git a/youtube_dl/extractor/streamcz.py b/youtube_dl/extractor/streamcz.py

index e92b93285c92ad9d049f2092fba9b70884057e8f..d3d2b7eb7a6fa9db4008365e62e046b83490b064 100644 (file)
--- a/youtube_dl/extractor/streamcz.py
+++ b/youtube_dl/extractor/streamcz.py
@@ -5,11 +5,9 @@ import hashlib
  import time
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -54,7 +52,7 @@ class StreamCZIE(InfoExtractor):
          video_id = self._match_id(url)
          api_path = '/episode/%s' % video_id
  
-        req = compat_urllib_request.Request(self._API_URL + api_path)
+        req = sanitized_Request(self._API_URL + api_path)
          req.add_header('Api-Password', _get_api_key(api_path))
          data = self._download_json(req, video_id)
  
diff --git a/youtube_dl/extractor/streetvoice.py b/youtube_dl/extractor/streetvoice.py

index 6a57fa60a5a2ea877f65a1af045f14d05377c1a9..e529051d100b8024007229200648ea259b3d1677 100644 (file)
--- a/youtube_dl/extractor/streetvoice.py
+++ b/youtube_dl/extractor/streetvoice.py
@@ -14,7 +14,6 @@ class StreetVoiceIE(InfoExtractor):
          'info_dict': {
              'id': '94440',
              'ext': 'mp3',
-            'filesize': 4167053,
              'title': '輸',
              'description': 'Crispy脆樂團 - 輸',
              'thumbnail': 're:^https?://.*\.jpg$',
@@ -32,20 +31,19 @@ class StreetVoiceIE(InfoExtractor):
          song_id = self._match_id(url)
  
          song = self._download_json(
-            'http://streetvoice.com/music/api/song/%s' % song_id, song_id)
+            'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'')
  
          title = song['name']
-        author = song['musician']['name']
+        author = song['user']['nickname']
  
          return {
              'id': song_id,
              'url': song['file'],
-            'filesize': song.get('size'),
              'title': title,
              'description': '%s - %s' % (author, title),
              'thumbnail': self._proto_relative_url(song.get('image'), 'http:'),
              'duration': song.get('length'),
              'upload_date': unified_strdate(song.get('created_at')),
              'uploader': author,
-            'uploader_id': compat_str(song['musician']['id']),
+            'uploader_id': compat_str(song['user']['id']),
          }
diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py

index fc20f664b7f4e1e6267e5cbad7a191e723e204e3..2ab30e45ff7c65ab7dd1d6cff7a1952764799cc0 100644 (file)
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@@ -19,24 +19,37 @@ class SVTBaseIE(InfoExtractor):
          video_info = info['video']
          formats = []
          for vr in video_info['videoReferences']:
+            player_type = vr.get('playerType')
              vurl = vr['url']
              ext = determine_ext(vurl)
              if ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
                      vurl, video_id,
                      ext='mp4', entry_protocol='m3u8_native',
-                    m3u8_id=vr.get('playerType')))
+                    m3u8_id=player_type, fatal=False))
              elif ext == 'f4m':
                  formats.extend(self._extract_f4m_formats(
                      vurl + '?hdcore=3.3.0', video_id,
-                    f4m_id=vr.get('playerType')))
+                    f4m_id=player_type, fatal=False))
+            elif ext == 'mpd':
+                if player_type == 'dashhbbtv':
+                    formats.extend(self._extract_mpd_formats(
+                        vurl, video_id, mpd_id=player_type, fatal=False))
              else:
                  formats.append({
-                    'format_id': vr.get('playerType'),
+                    'format_id': player_type,
                      'url': vurl,
                  })
          self._sort_formats(formats)
  
+        subtitles = {}
+        subtitle_references = video_info.get('subtitleReferences')
+        if isinstance(subtitle_references, list):
+            for sr in subtitle_references:
+                subtitle_url = sr.get('url')
+                if subtitle_url:
+                    subtitles.setdefault('sv', []).append({'url': subtitle_url})
+
          duration = video_info.get('materialLength')
          age_limit = 18 if video_info.get('inappropriateForChildren') else 0
  
@@ -44,6 +57,7 @@ class SVTBaseIE(InfoExtractor):
              'id': video_id,
              'title': title,
              'formats': formats,
+            'subtitles': subtitles,
              'thumbnail': thumbnail,
              'duration': duration,
              'age_limit': age_limit,
@@ -83,30 +97,23 @@ class SVTIE(SVTBaseIE):
  class SVTPlayIE(SVTBaseIE):
      IE_DESC = 'SVT Play and Öppet arkiv'
      _VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'http://www.svtplay.se/video/2609989/sm-veckan/sm-veckan-rally-final-sasong-1-sm-veckan-rally-final',
-        'md5': 'ade3def0643fa1c40587a422f98edfd9',
-        'info_dict': {
-            'id': '2609989',
-            'ext': 'flv',
-            'title': 'SM veckan vinter, Örebro - Rally, final',
-            'duration': 4500,
-            'thumbnail': 're:^https?://.*[\.-]jpg$',
-            'age_limit': 0,
-        },
-    }, {
-        'url': 'http://www.oppetarkiv.se/video/1058509/rederiet-sasong-1-avsnitt-1-av-318',
-        'md5': 'c3101a17ce9634f4c1f9800f0746c187',
+    _TEST = {
+        'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
+        'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
          'info_dict': {
-            'id': '1058509',
-            'ext': 'flv',
-            'title': 'Farlig kryssning',
-            'duration': 2566,
+            'id': '5996901',
+            'ext': 'mp4',
+            'title': 'Flygplan till Haile Selassie',
+            'duration': 3527,
              'thumbnail': 're:^https?://.*[\.-]jpg$',
              'age_limit': 0,
+            'subtitles': {
+                'sv': [{
+                    'ext': 'wsrt',
+                }]
+            },
          },
-        'skip': 'Only works from Sweden',
-    }]
+    }
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/sztvhu.py b/youtube_dl/extractor/sztvhu.py

index aa5964acb6b3f40b0d663bd2169ac6aec0c210ae..f562aa6d386ee891f4ab3a724bef53e20a6cec92 100644 (file)
--- a/youtube_dl/extractor/sztvhu.py
+++ b/youtube_dl/extractor/sztvhu.py
@@ -5,7 +5,7 @@ from .common import InfoExtractor
  
  
  class SztvHuIE(InfoExtractor):
-    _VALID_URL = r'http://(?:(?:www\.)?sztv\.hu|www\.tvszombathely\.hu)/(?:[^/]+)/.+-(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?sztv\.hu|www\.tvszombathely\.hu)/(?:[^/]+)/.+-(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://sztv.hu/hirek/cserkeszek-nepszerusitettek-a-kornyezettudatos-eletmodot-a-savaria-teren-20130909',
          'md5': 'a6df607b11fb07d0e9f2ad94613375cb',
diff --git a/youtube_dl/extractor/tapely.py b/youtube_dl/extractor/tapely.py

index f1f43d0a7113cbf40e5dfd3ffb71af5e900fab78..ed560bd246f4e588b9d63be4dcc0f34388d46f89 100644 (file)
--- a/youtube_dl/extractor/tapely.py
+++ b/youtube_dl/extractor/tapely.py
@@ -4,19 +4,17 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      clean_html,
      ExtractorError,
      float_or_none,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
  class TapelyIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tape\.ly/(?P<id>[A-Za-z0-9\-_]+)(?:/(?P<songnr>\d+))?'
+    _VALID_URL = r'https?://(?:www\.)?(?:tape\.ly|tapely\.com)/(?P<id>[A-Za-z0-9\-_]+)(?:/(?P<songnr>\d+))?'
      _API_URL = 'http://tape.ly/showtape?id={0:}'
      _S3_SONG_URL = 'http://mytape.s3.amazonaws.com/{0:}'
      _SOUNDCLOUD_SONG_URL = 'http://api.soundcloud.com{0:}'
@@ -42,6 +40,10 @@ class TapelyIE(InfoExtractor):
                  'ext': 'm4a',
              },
          },
+        {
+            'url': 'https://tapely.com/my-grief-as-told-by-water',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
@@ -49,7 +51,7 @@ class TapelyIE(InfoExtractor):
          display_id = mobj.group('id')
  
          playlist_url = self._API_URL.format(display_id)
-        request = compat_urllib_request.Request(playlist_url)
+        request = sanitized_Request(playlist_url)
          request.add_header('X-Requested-With', 'XMLHttpRequest')
          request.add_header('Accept', 'application/json')
          request.add_header('Referer', url)
diff --git a/youtube_dl/extractor/tdslifeway.py b/youtube_dl/extractor/tdslifeway.py

new file mode 100644 (file)

index 0000000..4d1f5c8
--- /dev/null
+++ b/youtube_dl/extractor/tdslifeway.py
@@ -0,0 +1,33 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class TDSLifewayIE(InfoExtractor):
+    _VALID_URL = r'https?://tds\.lifeway\.com/v1/trainingdeliverysystem/courses/(?P<id>\d+)/index\.html'
+
+    _TEST = {
+        # From http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers
+        'url': 'http://tds.lifeway.com/v1/trainingdeliverysystem/courses/3453494717001/index.html?externalRegistration=AssetId%7C34F466F1-78F3-4619-B2AB-A8EFFA55E9E9%21InstanceId%7C0%21UserId%7Caaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa&grouping=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&activity_id=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&content_endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2Fcontent%2F&actor=%7B%22name%22%3A%5B%22Guest%20Guest%22%5D%2C%22account%22%3A%5B%7B%22accountServiceHomePage%22%3A%22http%3A%2F%2Fscorm.lifeway.com%2F%22%2C%22accountName%22%3A%22aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa%22%7D%5D%2C%22objectType%22%3A%22Agent%22%7D&content_token=462a50b2-b6f9-4970-99b1-930882c499fb&registration=93d6ec8e-7f7b-4ed3-bbc8-a857913c0b2a&externalConfiguration=access%7CFREE%21adLength%7C-1%21assignOrgId%7C4AE36F78-299A-425D-91EF-E14A899B725F%21assignOrgParentId%7C%21courseId%7C%21isAnonymous%7Cfalse%21previewAsset%7Cfalse%21previewLength%7C-1%21previewMode%7Cfalse%21royalty%7CFREE%21sessionId%7C671422F9-8E79-48D4-9C2C-4EE6111EA1CD%21trackId%7C&auth=Basic%20OjhmZjk5MDBmLTBlYTMtNDJhYS04YjFlLWE4MWQ3NGNkOGRjYw%3D%3D&endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2F',
+        'info_dict': {
+            'id': '3453494717001',
+            'ext': 'mp4',
+            'title': 'The Gospel by Numbers',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'upload_date': '20140410',
+            'description': 'Coming soon from T4G 2014!',
+            'uploader_id': '2034960640001',
+            'timestamp': 1397145591,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/2034960640001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        brightcove_id = self._match_id(url)
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/teachingchannel.py b/youtube_dl/extractor/teachingchannel.py

index 117afa9bf498eb063f325504c2301a3eb7ff0d56..e0477382ceabea0769bd0575ceb1f350ce8c0911 100644 (file)
--- a/youtube_dl/extractor/teachingchannel.py
+++ b/youtube_dl/extractor/teachingchannel.py
@@ -16,6 +16,7 @@ class TeachingChannelIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'A History of Teaming',
              'description': 'md5:2a9033db8da81f2edffa4c99888140b3',
+            'duration': 422.255,
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/teamcoco.py b/youtube_dl/extractor/teamcoco.py

index d1b7264b4ca4a0cb72e491da26d7f5bbc1cc66b7..b49ab5f5b98c2d6219d1d17a1c0aea02eb534f61 100644 (file)
--- a/youtube_dl/extractor/teamcoco.py
+++ b/youtube_dl/extractor/teamcoco.py
@@ -16,7 +16,7 @@ from ..compat import compat_ord
  
  
  class TeamcocoIE(InfoExtractor):
-    _VALID_URL = r'http://teamcoco\.com/video/(?P<video_id>[0-9]+)?/?(?P<display_id>.*)'
+    _VALID_URL = r'https?://teamcoco\.com/video/(?P<video_id>[0-9]+)?/?(?P<display_id>.*)'
      _TESTS = [
          {
              'url': 'http://teamcoco.com/video/80187/conan-becomes-a-mary-kay-beauty-consultant',
diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py

index a48d77c309dcd1f9984cd0a6c71b7af574ca5498..cf8851438bb74000abb2692c34607f3137505f1d 100644 (file)
--- a/youtube_dl/extractor/ted.py
+++ b/youtube_dl/extractor/ted.py
@@ -73,7 +73,7 @@ class TEDIE(InfoExtractor):
          'add_ie': ['Youtube'],
          'info_dict': {
              'id': '_ZG8HBuDjgc',
-            'ext': 'mp4',
+            'ext': 'webm',
              'title': 'Douglas Adams: Parrots the Universe and Everything',
              'description': 'md5:01ad1e199c49ac640cb1196c0e9016af',
              'uploader': 'University of California Television (UCTV)',
diff --git a/youtube_dl/extractor/tele13.py b/youtube_dl/extractor/tele13.py

new file mode 100644 (file)

index 0000000..a29a64b
--- /dev/null
+++ b/youtube_dl/extractor/tele13.py
@@ -0,0 +1,88 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .youtube import YoutubeIE
+from ..utils import (
+    js_to_json,
+    qualities,
+    determine_ext,
+)
+
+
+class Tele13IE(InfoExtractor):
+    _VALID_URL = r'^https?://(?:www\.)?t13\.cl/videos(?:/[^/]+)+/(?P<id>[\w-]+)'
+    _TESTS = [
+        {
+            'url': 'http://www.t13.cl/videos/actualidad/el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
+            'md5': '4cb1fa38adcad8fea88487a078831755',
+            'info_dict': {
+                'id': 'el-circulo-de-hierro-de-michelle-bachelet-en-su-regreso-a-la-moneda',
+                'ext': 'mp4',
+                'title': 'El círculo de hierro de Michelle Bachelet en su regreso a La Moneda',
+            },
+            'params': {
+                # HTTP Error 404: Not Found
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'http://www.t13.cl/videos/mundo/tendencias/video-captan-misteriosa-bola-fuego-cielos-bangkok',
+            'md5': '867adf6a3b3fef932c68a71d70b70946',
+            'info_dict': {
+                'id': 'rOoKv2OMpOw',
+                'ext': 'mp4',
+                'title': 'Shooting star seen on 7-Sep-2015',
+                'description': 'md5:7292ff2a34b2f673da77da222ae77e1e',
+                'uploader': 'Porjai Jaturongkhakun',
+                'upload_date': '20150906',
+                'uploader_id': 'UCnLY_3ezwNcDSC_Wc6suZxw',
+            },
+            'add_ie': ['Youtube'],
+        }
+    ]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        setup_js = self._search_regex(
+            r"(?s)jwplayer\('player-vivo'\).setup\((\{.*?\})\)",
+            webpage, 'setup code')
+        sources = self._parse_json(self._search_regex(
+            r'sources\s*:\s*(\[[^\]]+\])', setup_js, 'sources'),
+            display_id, js_to_json)
+
+        preference = qualities(['Móvil', 'SD', 'HD'])
+        formats = []
+        urls = []
+        for f in sources:
+            format_url = f['file']
+            if format_url and format_url not in urls:
+                ext = determine_ext(format_url)
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, display_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls', fatal=False))
+                elif YoutubeIE.suitable(format_url):
+                    return self.url_result(format_url, 'Youtube')
+                else:
+                    formats.append({
+                        'url': format_url,
+                        'format_id': f.get('label'),
+                        'preference': preference(f.get('label')),
+                        'ext': ext,
+                    })
+                urls.append(format_url)
+        self._sort_formats(formats)
+
+        return {
+            'id': display_id,
+            'title': self._search_regex(
+                r'title\s*:\s*"([^"]+)"', setup_js, 'title'),
+            'description': self._html_search_meta(
+                'description', webpage, 'description'),
+            'thumbnail': self._search_regex(
+                r'image\s*:\s*"([^"]+)"', setup_js, 'thumbnail', default=None),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/telebruxelles.py b/youtube_dl/extractor/telebruxelles.py

index a3d05f97d681b6cb4da6adf179a4f0a5744e5123..eefecc490c5d13476259497e79f7a3ebe68caee7 100644 (file)
--- a/youtube_dl/extractor/telebruxelles.py
+++ b/youtube_dl/extractor/telebruxelles.py
@@ -1,11 +1,13 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  
  
  class TeleBruxellesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?telebruxelles\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
      _TESTS = [{
          'url': 'http://www.telebruxelles.be/news/auditions-devant-parlement-francken-galant-tres-attendus/',
          'md5': '59439e568c9ee42fb77588b2096b214f',
@@ -39,18 +41,18 @@ class TeleBruxellesIE(InfoExtractor):
          webpage = self._download_webpage(url, display_id)
  
          article_id = self._html_search_regex(
-            r"<article id=\"post-(\d+)\"", webpage, 'article ID')
+            r"<article id=\"post-(\d+)\"", webpage, 'article ID', default=None)
          title = self._html_search_regex(
              r'<h1 class=\"entry-title\">(.*?)</h1>', webpage, 'title')
-        description = self._og_search_description(webpage)
+        description = self._og_search_description(webpage, default=None)
  
          rtmp_url = self._html_search_regex(
-            r"file: \"(rtmp://\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}/vod/mp4:\" \+ \"\w+\" \+ \".mp4)\"",
+            r'file\s*:\s*"(rtmp://[^/]+/vod/mp4:"\s*\+\s*"[^"]+"\s*\+\s*".mp4)"',
              webpage, 'RTMP url')
-        rtmp_url = rtmp_url.replace("\" + \"", "")
+        rtmp_url = re.sub(r'"\s*\+\s*"', '', rtmp_url)
  
          return {
-            'id': article_id,
+            'id': article_id or display_id,
              'display_id': display_id,
              'title': title,
              'description': description,
diff --git a/youtube_dl/extractor/telecinco.py b/youtube_dl/extractor/telecinco.py

index a0c744fd16b633b08e7bbb632b77cf40e8410710..4b4b740b44d325ffb8a8a5c6cba848b0c99ced13 100644 (file)
--- a/youtube_dl/extractor/telecinco.py
+++ b/youtube_dl/extractor/telecinco.py
@@ -1,26 +1,95 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .mitele import MiTeleIE
+import json
  
+from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
+from ..utils import (
+    get_element_by_attribute,
+    parse_duration,
+    strip_jsonp,
+)
  
-class TelecincoIE(MiTeleIE):
-    IE_NAME = 'telecinco.es'
-    _VALID_URL = r'https?://www\.telecinco\.es/[^/]+/[^/]+/(?:[^/]+/)?(?P<id>.*?)\.html'
+
+class TelecincoIE(InfoExtractor):
+    IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
+    _VALID_URL = r'https?://www\.(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
  
      _TESTS = [{
          'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html',
+        'md5': '5cbef3ad5ef17bf0d21570332d140729',
          'info_dict': {
              'id': 'MDSVID20141015_0058',
              'ext': 'mp4',
              'title': 'Con Martín Berasategui, hacer un bacalao al ...',
              'duration': 662,
          },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+    }, {
+        'url': 'http://www.cuatro.com/deportes/futbol/barcelona/Leo_Messi-Champions-Roma_2_2052780128.html',
+        'md5': '0a5b9f3cc8b074f50a0578f823a12694',
+        'info_dict': {
+            'id': 'MDSVID20150916_0128',
+            'ext': 'mp4',
+            'title': '¿Quién es este ex futbolista con el que hablan ...',
+            'duration': 79,
+        },
+    }, {
+        'url': 'http://www.mediaset.es/12meses/campanas/doylacara/conlatratanohaytrato/Ayudame-dar-cara-trata-trato_2_1986630220.html',
+        'md5': 'ad1bfaaba922dd4a295724b05b68f86a',
+        'info_dict': {
+            'id': 'MDSVID20150513_0220',
+            'ext': 'mp4',
+            'title': '#DOYLACARA. Con la trata no hay trato',
+            'duration': 50,
          },
      }, {
          'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
          'only_matching': True,
+    }, {
+        'url': 'http://www.telecinco.es/espanasinirmaslejos/Espana-gran-destino-turistico_2_1240605043.html',
+        'only_matching': True,
      }]
+
+    def _real_extract(self, url):
+        episode = self._match_id(url)
+        webpage = self._download_webpage(url, episode)
+        embed_data_json = self._search_regex(
+            r'(?s)MSV\.embedData\[.*?\]\s*=\s*({.*?});', webpage, 'embed data',
+        ).replace('\'', '"')
+        embed_data = json.loads(embed_data_json)
+
+        domain = embed_data['mediaUrl']
+        if not domain.startswith('http'):
+            # only happens in telecinco.es videos
+            domain = 'http://' + domain
+        info_url = compat_urlparse.urljoin(
+            domain,
+            compat_urllib_parse_unquote(embed_data['flashvars']['host'])
+        )
+        info_el = self._download_xml(info_url, episode).find('./video/info')
+
+        video_link = info_el.find('videoUrl/link').text
+        token_query = compat_urllib_parse_urlencode({'id': video_link})
+        token_info = self._download_json(
+            embed_data['flashvars']['ov_tk'] + '?' + token_query,
+            episode,
+            transform_source=strip_jsonp
+        )
+        formats = self._extract_m3u8_formats(
+            token_info['tokenizedUrl'], episode, ext='mp4', entry_protocol='m3u8_native')
+        self._sort_formats(formats)
+
+        return {
+            'id': embed_data['videoId'],
+            'display_id': episode,
+            'title': info_el.find('title').text,
+            'formats': formats,
+            'description': get_element_by_attribute('class', 'text', webpage),
+            'thumbnail': info_el.find('thumb').text,
+            'duration': parse_duration(info_el.find('duration').text),
+        }
diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py

new file mode 100644 (file)

index 0000000..6f8333c
--- /dev/null
+++ b/youtube_dl/extractor/telegraaf.py
@@ -0,0 +1,35 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import remove_end
+
+
+class TelegraafIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?telegraaf\.nl/tv/(?:[^/]+/)+(?P<id>\d+)/[^/]+\.html'
+    _TEST = {
+        'url': 'http://www.telegraaf.nl/tv/nieuws/binnenland/24353229/__Tikibad_ontruimd_wegens_brand__.html',
+        'md5': '83245a9779bcc4a24454bfd53c65b6dc',
+        'info_dict': {
+            'id': '24353229',
+            'ext': 'mp4',
+            'title': 'Tikibad ontruimd wegens brand',
+            'description': 'md5:05ca046ff47b931f9b04855015e163a4',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 33,
+        },
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        playlist_url = self._search_regex(
+            r"iframe\.loadPlayer\('([^']+)'", webpage, 'player')
+
+        entries = self._extract_xspf_playlist(playlist_url, playlist_id)
+        title = remove_end(self._og_search_title(webpage), ' - VIDEO')
+        description = self._og_search_description(webpage)
+
+        return self.playlist_result(entries, playlist_id, title, description)
diff --git a/youtube_dl/extractor/tenplay.py b/youtube_dl/extractor/tenplay.py

deleted file mode 100644 (file)

index f669414..0000000
--- a/youtube_dl/extractor/tenplay.py
+++ /dev/null
@@ -1,90 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    float_or_none,
-)
-
-
-class TenPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ten(play)?\.com\.au/.+'
-    _TEST = {
-        'url': 'http://tenplay.com.au/ten-insider/extra/season-2013/tenplay-tv-your-way',
-        'info_dict': {
-            'id': '2695695426001',
-            'ext': 'flv',
-            'title': 'TENplay: TV your way',
-            'description': 'Welcome to a new TV experience. Enjoy a taste of the TENplay benefits.',
-            'timestamp': 1380150606.889,
-            'upload_date': '20130925',
-            'uploader': 'TENplay',
-        },
-        'params': {
-            'skip_download': True,  # Requires rtmpdump
-        }
-    }
-
-    _video_fields = [
-        "id", "name", "shortDescription", "longDescription", "creationDate",
-        "publishedDate", "lastModifiedDate", "customFields", "videoStillURL",
-        "thumbnailURL", "referenceId", "length", "playsTotal",
-        "playsTrailingWeek", "renditions", "captioning", "startDate", "endDate"]
-
-    def _real_extract(self, url):
-        webpage = self._download_webpage(url, url)
-        video_id = self._html_search_regex(
-            r'videoID: "(\d+?)"', webpage, 'video_id')
-        api_token = self._html_search_regex(
-            r'apiToken: "([a-zA-Z0-9-_\.]+?)"', webpage, 'api_token')
-        title = self._html_search_regex(
-            r'<meta property="og:title" content="\s*(.*?)\s*"\s*/?\s*>',
-            webpage, 'title')
-
-        json = self._download_json('https://api.brightcove.com/services/library?command=find_video_by_id&video_id=%s&token=%s&video_fields=%s' % (video_id, api_token, ','.join(self._video_fields)), title)
-
-        formats = []
-        for rendition in json['renditions']:
-            url = rendition['remoteUrl'] or rendition['url']
-            protocol = 'rtmp' if url.startswith('rtmp') else 'http'
-            ext = 'flv' if protocol == 'rtmp' else rendition['videoContainer'].lower()
-
-            if protocol == 'rtmp':
-                url = url.replace('&mp4:', '')
-
-                tbr = int_or_none(rendition.get('encodingRate'), 1000)
-
-            formats.append({
-                'format_id': '_'.join(
-                    ['rtmp', rendition['videoContainer'].lower(),
-                     rendition['videoCodec'].lower(), '%sk' % tbr]),
-                'width': int_or_none(rendition['frameWidth']),
-                'height': int_or_none(rendition['frameHeight']),
-                'tbr': tbr,
-                'filesize': int_or_none(rendition['size']),
-                'protocol': protocol,
-                'ext': ext,
-                'vcodec': rendition['videoCodec'].lower(),
-                'container': rendition['videoContainer'].lower(),
-                'url': url,
-            })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': json['referenceId'],
-            'title': json['name'],
-            'description': json['shortDescription'] or json['longDescription'],
-            'formats': formats,
-            'thumbnails': [{
-                'url': json['videoStillURL']
-            }, {
-                'url': json['thumbnailURL']
-            }],
-            'thumbnail': json['videoStillURL'],
-            'duration': float_or_none(json.get('length'), 1000),
-            'timestamp': float_or_none(json.get('creationDate'), 1000),
-            'uploader': json.get('customFields', {}).get('production_company_distributor') or 'TENplay',
-            'view_count': int_or_none(json.get('playsTotal')),
-        }
diff --git a/youtube_dl/extractor/testtube.py b/youtube_dl/extractor/testtube.py

deleted file mode 100644 (file)

index 26655d6..0000000
--- a/youtube_dl/extractor/testtube.py
+++ /dev/null
@@ -1,90 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    qualities,
-)
-
-
-class TestTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://testtube\.com/[^/?#]+/(?P<id>[^/?#]+)'
-    _TESTS = [{
-        'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
-        'info_dict': {
-            'id': '60163',
-            'display_id': '5-weird-ways-plants-can-eat-animals',
-            'duration': 275,
-            'ext': 'webm',
-            'title': '5 Weird Ways Plants Can Eat Animals',
-            'description': 'Why have some plants evolved to eat meat?',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'uploader': 'DNews',
-            'uploader_id': 'dnews',
-        },
-    }, {
-        'url': 'https://testtube.com/iflscience/insane-jet-ski-flipping',
-        'info_dict': {
-            'id': 'fAGfJ4YjVus',
-            'ext': 'mp4',
-            'title': 'Flipping Jet-Ski Skills | Outrageous Acts of Science',
-            'uploader': 'Science Channel',
-            'uploader_id': 'ScienceChannel',
-            'upload_date': '20150203',
-            'description': 'md5:e61374030015bae1d2e22f096d4769d6',
-        }
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
-
-        youtube_url = self._html_search_regex(
-            r'<iframe[^>]+src="((?:https?:)?//www.youtube.com/embed/[^"]+)"',
-            webpage, 'youtube iframe', default=None)
-        if youtube_url:
-            return self.url_result(youtube_url, 'Youtube', video_id=display_id)
-
-        video_id = self._search_regex(
-            r"player\.loadRevision3Item\('video_id',\s*([0-9]+)\);",
-            webpage, 'video ID')
-
-        all_info = self._download_json(
-            'https://testtube.com/api/getPlaylist.json?api_key=ba9c741bce1b9d8e3defcc22193f3651b8867e62&codecs=h264,vp8,theora&video_id=%s' % video_id,
-            video_id)
-        info = all_info['items'][0]
-
-        formats = []
-        for vcodec, fdatas in info['media'].items():
-            for name, fdata in fdatas.items():
-                formats.append({
-                    'format_id': '%s-%s' % (vcodec, name),
-                    'url': fdata['url'],
-                    'vcodec': vcodec,
-                    'tbr': fdata.get('bitrate'),
-                })
-        self._sort_formats(formats)
-
-        duration = int_or_none(info.get('duration'))
-        images = info.get('images')
-        thumbnails = None
-        preference = qualities(['mini', 'small', 'medium', 'large'])
-        if images:
-            thumbnails = [{
-                'id': thumbnail_id,
-                'url': img_url,
-                'preference': preference(thumbnail_id)
-            } for thumbnail_id, img_url in images.items()]
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': info['title'],
-            'description': info.get('summary'),
-            'thumbnails': thumbnails,
-            'uploader': info.get('show', {}).get('name'),
-            'uploader_id': info.get('show', {}).get('slug'),
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/testurl.py b/youtube_dl/extractor/testurl.py

index c7d559315be7d2ceed094fab691e4141de5534e7..46918adb05fc77d45b480b138575afff9d69a086 100644 (file)
--- a/youtube_dl/extractor/testurl.py
+++ b/youtube_dl/extractor/testurl.py
@@ -7,7 +7,7 @@ from ..utils import ExtractorError
  
  
  class TestURLIE(InfoExtractor):
-    """ Allows adressing of the test cases as test:yout.*be_1 """
+    """ Allows addressing of the test cases as test:yout.*be_1 """
  
      IE_DESC = False  # Do not list
      _VALID_URL = r'test(?:url)?:(?P<id>(?P<extractor>.+?)(?:_(?P<num>[0-9]+))?)$'
diff --git a/youtube_dl/extractor/tf1.py b/youtube_dl/extractor/tf1.py

index 3a68eaa80ea6867e6806a4f242a8afc910b8ba06..3f54b2744cb16cd6385e5cb06919cbaf9628167a 100644 (file)
--- a/youtube_dl/extractor/tf1.py
+++ b/youtube_dl/extractor/tf1.py
@@ -6,7 +6,7 @@ from .common import InfoExtractor
  
  class TF1IE(InfoExtractor):
      """TF1 uses the wat.tv player."""
-    _VALID_URL = r'http://(?:(?:videos|www|lci)\.tf1|www\.tfou)\.fr/.*?-(?P<id>\d+)(?:-\d+)?\.html'
+    _VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|www\.tfou)\.fr/(?:[^/]+/)*(?P<id>.+?)\.html'
      _TESTS = [{
          'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
          'info_dict': {
@@ -22,7 +22,7 @@ class TF1IE(InfoExtractor):
      }, {
          'url': 'http://www.tfou.fr/chuggington/videos/le-grand-mysterioso-chuggington-7085291-739.html',
          'info_dict': {
-            'id': '12043945',
+            'id': 'le-grand-mysterioso-chuggington-7085291-739',
              'ext': 'mp4',
              'title': 'Le grand Mystérioso - Chuggington',
              'description': 'Le grand Mystérioso - Emery rêve qu\'un article lui soit consacré dans le journal.',
@@ -32,22 +32,22 @@ class TF1IE(InfoExtractor):
              # Sometimes wat serves the whole file with the --test option
              'skip_download': True,
          },
+        'skip': 'HTTP Error 410: Gone',
      }, {
          'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
          'only_matching': True,
      }, {
          'url': 'http://lci.tf1.fr/sept-a-huit/videos/sept-a-huit-du-24-mai-2015-8611550.html',
          'only_matching': True,
+    }, {
+        'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
-        embed_url = self._html_search_regex(
-            r'["\'](https?://www.wat.tv/embedframe/.*?)["\']', webpage, 'embed url')
-        embed_page = self._download_webpage(embed_url, video_id,
-                                            'Downloading embed player page')
-        wat_id = self._search_regex(r'UVID=(.*?)&', embed_page, 'wat id')
-        wat_info = self._download_json(
-            'http://www.wat.tv/interface/contentv3/%s' % wat_id, video_id)
-        return self.url_result(wat_info['media']['url'], 'Wat')
+        wat_id = self._html_search_regex(
+            r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})(?:#.*?)?\1',
+            webpage, 'wat id', group='id')
+        return self.url_result('wat:%s' % wat_id, 'Wat')
diff --git a/youtube_dl/extractor/theintercept.py b/youtube_dl/extractor/theintercept.py

new file mode 100644 (file)

index 0000000..8cb3c36
--- /dev/null
+++ b/youtube_dl/extractor/theintercept.py
@@ -0,0 +1,49 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    parse_iso8601,
+    int_or_none,
+    ExtractorError,
+)
+
+
+class TheInterceptIE(InfoExtractor):
+    _VALID_URL = r'https://theintercept.com/fieldofvision/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'https://theintercept.com/fieldofvision/thisisacoup-episode-four-surrender-or-die/',
+        'md5': '145f28b41d44aab2f87c0a4ac8ec95bd',
+        'info_dict': {
+            'id': '46214',
+            'ext': 'mp4',
+            'title': '#ThisIsACoup – Episode Four: Surrender or Die',
+            'description': 'md5:74dd27f0e2fbd50817829f97eaa33140',
+            'timestamp': 1450429239,
+            'upload_date': '20151218',
+            'comment_count': int,
+        }
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        json_data = self._parse_json(self._search_regex(
+            r'initialStoreTree\s*=\s*(?P<json_data>{.+})', webpage,
+            'initialStoreTree'), display_id)
+
+        for post in json_data['resources']['posts'].values():
+            if post['slug'] == display_id:
+                return {
+                    '_type': 'url_transparent',
+                    'url': 'jwplatform:%s' % post['fov_videoid'],
+                    'id': compat_str(post['ID']),
+                    'display_id': display_id,
+                    'title': post['title'],
+                    'description': post.get('excerpt'),
+                    'timestamp': parse_iso8601(post.get('date')),
+                    'comment_count': int_or_none(post.get('comments_number')),
+                }
+        raise ExtractorError('Unable to find the current post')
diff --git a/youtube_dl/extractor/theonion.py b/youtube_dl/extractor/theonion.py

deleted file mode 100644 (file)

index 10239c9..0000000
--- a/youtube_dl/extractor/theonion.py
+++ /dev/null
@@ -1,63 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class TheOnionIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<id>[0-9]+)/?'
-    _TEST = {
-        'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/',
-        'md5': '19eaa9a39cf9b9804d982e654dc791ee',
-        'info_dict': {
-            'id': '2133',
-            'ext': 'mp4',
-            'title': 'Man Wearing M&M Jacket Apparently Made In God\'s Image',
-            'description': 'md5:cc12448686b5600baae9261d3e180910',
-            'thumbnail': 're:^https?://.*\.jpg\?\d+$',
-        }
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        video_id = self._search_regex(
-            r'"videoId":\s(\d+),', webpage, 'video ID')
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-
-        sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage)
-        formats = []
-        for src, type_ in sources:
-            if type_ == 'video/mp4':
-                formats.append({
-                    'format_id': 'mp4_sd',
-                    'preference': 1,
-                    'url': src,
-                })
-            elif type_ == 'video/webm':
-                formats.append({
-                    'format_id': 'webm_sd',
-                    'preference': 0,
-                    'url': src,
-                })
-            elif type_ == 'application/x-mpegURL':
-                formats.extend(
-                    self._extract_m3u8_formats(src, display_id, preference=-1))
-            else:
-                self.report_warning(
-                    'Encountered unexpected format: %s' % type_)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': thumbnail,
-            'description': description,
-        }
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

index 83d833e30dbeb60caa43aa272bfd4d35f4507a53..8272dd96936f501069edc0eb85d468c07ab5681b 100644 (file)
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@@ -1,32 +1,88 @@
+# -*- coding: utf-8 -*-
  from __future__ import unicode_literals
  
  import re
-import json
  import time
  import hmac
  import binascii
  import hashlib
  
  
-from .common import InfoExtractor
+from .once import OnceIE
  from ..compat import (
-    compat_str,
+    compat_parse_qs,
+    compat_urllib_parse_urlparse,
  )
  from ..utils import (
-    determine_ext,
      ExtractorError,
-    xpath_with_ns,
-    unsmuggle_url,
+    float_or_none,
      int_or_none,
+    sanitized_Request,
+    unsmuggle_url,
+    xpath_with_ns,
+    mimetype2ext,
+    find_xpath_attr,
  )
  
-_x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language'})
+default_ns = 'http://www.w3.org/2005/SMIL21/Language'
+_x = lambda p: xpath_with_ns(p, {'smil': default_ns})
+
+
+class ThePlatformBaseIE(OnceIE):
+    def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL data'):
+        meta = self._download_xml(smil_url, video_id, note=note, query={'format': 'SMIL'})
+        error_element = find_xpath_attr(meta, _x('.//smil:ref'), 'src')
+        if error_element is not None and error_element.attrib['src'].startswith(
+                'http://link.theplatform.com/s/errorFiles/Unavailable.'):
+            raise ExtractorError(error_element.attrib['abstract'], expected=True)
+
+        smil_formats = self._parse_smil_formats(
+            meta, smil_url, video_id, namespace=default_ns,
+            # the parameters are from syfy.com, other sites may use others,
+            # they also work for nbc.com
+            f4m_params={'g': 'UXWGVKRWHFSP', 'hdcore': '3.0.3'},
+            transform_rtmp_url=lambda streamer, src: (streamer, 'mp4:' + src))
+
+        formats = []
+        for _format in smil_formats:
+            if OnceIE.suitable(_format['url']):
+                formats.extend(self._extract_once_formats(_format['url']))
+            else:
+                formats.append(_format)
+
+        subtitles = self._parse_smil_subtitles(meta, default_ns)
  
+        return formats, subtitles
  
-class ThePlatformIE(InfoExtractor):
+    def get_metadata(self, path, video_id):
+        info_url = 'http://link.theplatform.com/s/%s?format=preview' % path
+        info = self._download_json(info_url, video_id)
+
+        subtitles = {}
+        captions = info.get('captions')
+        if isinstance(captions, list):
+            for caption in captions:
+                lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
+                subtitles[lang] = [{
+                    'ext': mimetype2ext(mime),
+                    'url': src,
+                }]
+
+        return {
+            'title': info['title'],
+            'subtitles': subtitles,
+            'description': info['description'],
+            'thumbnail': info['defaultThumbnailUrl'],
+            'duration': int_or_none(info.get('duration'), 1000),
+            'timestamp': int_or_none(info.get('pubDate'), 1000) or None,
+            'uploader': info.get('billingCode'),
+        }
+
+
+class ThePlatformIE(ThePlatformBaseIE):
      _VALID_URL = r'''(?x)
          (?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
-           (?:(?P<media>(?:[^/]+/)+select/media/)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
+           (?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
           |theplatform:)(?P<id>[^/\?&]+)'''
  
      _TESTS = [{
@@ -38,6 +94,9 @@ class ThePlatformIE(InfoExtractor):
              'title': 'Blackberry\'s big, bold Z30',
              'description': 'The Z30 is Blackberry\'s biggest, baddest mobile messaging device yet.',
              'duration': 247,
+            'timestamp': 1383239700,
+            'upload_date': '20131031',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
              # rtmp download
@@ -51,6 +110,9 @@ class ThePlatformIE(InfoExtractor):
              'ext': 'flv',
              'description': 'md5:ac330c9258c04f9d7512cf26b9595409',
              'title': 'Tesla Model S: A second step towards a cleaner motoring future',
+            'timestamp': 1426176191,
+            'upload_date': '20150312',
+            'uploader': 'CBSI-NEW',
          },
          'params': {
              # rtmp download
@@ -63,10 +125,30 @@ class ThePlatformIE(InfoExtractor):
              'ext': 'mp4',
              'description': 'md5:644ad9188d655b742f942bf2e06b002d',
              'title': 'HIGHLIGHTS: USA bag first ever series Cup win',
+            'uploader': 'EGSM',
          }
      }, {
          'url': 'http://player.theplatform.com/p/NnzsPC/widget/select/media/4Y0TlYUr_ZT7',
          'only_matching': True,
+    }, {
+        'url': 'http://player.theplatform.com/p/2E2eJC/nbcNewsOffsite?guid=tdy_or_siri_150701',
+        'md5': 'fb96bb3d85118930a5b055783a3bd992',
+        'info_dict': {
+            'id': 'tdy_or_siri_150701',
+            'ext': 'mp4',
+            'title': 'iPhone Siri’s sassy response to a math question has people talking',
+            'description': 'md5:a565d1deadd5086f3331d57298ec6333',
+            'duration': 83.0,
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1435752600,
+            'upload_date': '20150701',
+            'uploader': 'NBCU-NEWS',
+        },
+    }, {
+        # From http://www.nbc.com/the-blacklist/video/sir-crispin-crandall/2928790?onid=137781#vc137781=1
+        # geo-restricted (US), HLS encrypted with AES-128
+        'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -80,7 +162,7 @@ class ThePlatformIE(InfoExtractor):
          def hex_to_str(hex):
              return binascii.a2b_hex(hex)
  
-        relative_path = url.split('http://link.theplatform.com/s/')[1].split('?')[0]
+        relative_path = re.match(r'https?://link.theplatform.com/s/([^?]+)', url).group(1)
          clear_text = hex_to_str(flags + expiration_date + str_to_hex(relative_path))
          checksum = hmac.new(sig_key.encode('ascii'), clear_text, hashlib.sha1).hexdigest()
          sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
@@ -96,115 +178,147 @@ class ThePlatformIE(InfoExtractor):
          if not provider_id:
              provider_id = 'dJ5BDC'
  
-        path = provider_id
+        path = provider_id + '/'
          if mobj.group('media'):
-            path += '/media'
-        path += '/' + video_id
+            path += mobj.group('media')
+        path += video_id
+
+        qs_dict = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
+        if 'guid' in qs_dict:
+            webpage = self._download_webpage(url, video_id)
+            scripts = re.findall(r'<script[^>]+src="([^"]+)"', webpage)
+            feed_id = None
+            # feed id usually locates in the last script.
+            # Seems there's no pattern for the interested script filename, so
+            # I try one by one
+            for script in reversed(scripts):
+                feed_script = self._download_webpage(
+                    self._proto_relative_url(script, 'http:'),
+                    video_id, 'Downloading feed script')
+                feed_id = self._search_regex(
+                    r'defaultFeedId\s*:\s*"([^"]+)"', feed_script,
+                    'default feed id', default=None)
+                if feed_id is not None:
+                    break
+            if feed_id is None:
+                raise ExtractorError('Unable to find feed id')
+            return self.url_result('http://feed.theplatform.com/f/%s/%s?byGuid=%s' % (
+                provider_id, feed_id, qs_dict['guid'][0]))
  
          if smuggled_data.get('force_smil_url', False):
              smil_url = url
+        # Explicitly specified SMIL (see https://github.com/rg3/youtube-dl/issues/7385)
+        elif '/guid/' in url:
+            headers = {}
+            source_url = smuggled_data.get('source_url')
+            if source_url:
+                headers['Referer'] = source_url
+            request = sanitized_Request(url, headers=headers)
+            webpage = self._download_webpage(request, video_id)
+            smil_url = self._search_regex(
+                r'<link[^>]+href=(["\'])(?P<url>.+?)\1[^>]+type=["\']application/smil\+xml',
+                webpage, 'smil url', group='url')
+            path = self._search_regex(
+                r'link\.theplatform\.com/s/((?:[^/?#&]+/)+[^/?#&]+)', smil_url, 'path')
+            smil_url += '?' if '?' not in smil_url else '&' + 'formats=m3u,mpeg4'
          elif mobj.group('config'):
              config_url = url + '&form=json'
              config_url = config_url.replace('swf/', 'config/')
              config_url = config_url.replace('onsite/', 'onsite/config/')
              config = self._download_json(config_url, video_id, 'Downloading config')
-            smil_url = config['releaseUrl'] + '&format=SMIL&formats=MPEG4&manifest=f4m'
+            if 'releaseUrl' in config:
+                release_url = config['releaseUrl']
+            else:
+                release_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
+            smil_url = release_url + '&formats=MPEG4&manifest=f4m'
          else:
-            smil_url = 'http://link.theplatform.com/s/%s/meta.smil?format=smil&mbr=true' % path
+            smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
  
          sig = smuggled_data.get('sig')
          if sig:
              smil_url = self._sign_url(smil_url, sig['key'], sig['secret'])
  
-        meta = self._download_xml(smil_url, video_id)
-        try:
-            error_msg = next(
-                n.attrib['abstract']
-                for n in meta.findall(_x('.//smil:ref'))
-                if n.attrib.get('title') == 'Geographic Restriction' or n.attrib.get('title') == 'Expired')
-        except StopIteration:
-            pass
-        else:
-            raise ExtractorError(error_msg, expected=True)
+        formats, subtitles = self._extract_theplatform_smil(smil_url, video_id)
+        self._sort_formats(formats)
  
-        info_url = 'http://link.theplatform.com/s/%s?format=preview' % path
-        info_json = self._download_webpage(info_url, video_id)
-        info = json.loads(info_json)
+        ret = self.get_metadata(path, video_id)
+        combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles)
+        ret.update({
+            'id': video_id,
+            'formats': formats,
+            'subtitles': combined_subtitles,
+        })
+
+        return ret
+
+
+class ThePlatformFeedIE(ThePlatformBaseIE):
+    _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&byGuid=%s'
+    _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*byGuid=(?P<id>[a-zA-Z0-9_]+)'
+    _TEST = {
+        # From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207
+        'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207',
+        'md5': '6e32495b5073ab414471b615c5ded394',
+        'info_dict': {
+            'id': 'n_hardball_5biden_140207',
+            'ext': 'mp4',
+            'title': 'The Biden factor: will Joe run in 2016?',
+            'description': 'Could Vice President Joe Biden be preparing a 2016 campaign? Mark Halperin and Sam Stein weigh in.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'upload_date': '20140208',
+            'timestamp': 1391824260,
+            'duration': 467.0,
+            'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
+            'uploader': 'NBCU-NEWS',
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+
+        video_id = mobj.group('id')
+        provider_id = mobj.group('provider_id')
+        feed_id = mobj.group('feed_id')
+
+        real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, video_id)
+        feed = self._download_json(real_url, video_id)
+        entry = feed['entries'][0]
  
+        formats = []
          subtitles = {}
-        captions = info.get('captions')
-        if isinstance(captions, list):
-            for caption in captions:
-                lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
-                subtitles[lang] = [{
-                    'ext': 'srt' if mime == 'text/srt' else 'ttml',
-                    'url': src,
-                }]
+        first_video_id = None
+        duration = None
+        for item in entry['media$content']:
+            smil_url = item['plfile$url'] + '&mbr=true'
+            cur_video_id = ThePlatformIE._match_id(smil_url)
+            if first_video_id is None:
+                first_video_id = cur_video_id
+                duration = float_or_none(item.get('plfile$duration'))
+            cur_formats, cur_subtitles = self._extract_theplatform_smil(smil_url, video_id, 'Downloading SMIL data for %s' % cur_video_id)
+            formats.extend(cur_formats)
+            subtitles = self._merge_subtitles(subtitles, cur_subtitles)
  
-        head = meta.find(_x('smil:head'))
-        body = meta.find(_x('smil:body'))
+        self._sort_formats(formats)
  
-        f4m_node = body.find(_x('smil:seq//smil:video'))
-        if f4m_node is None:
-            f4m_node = body.find(_x('smil:seq/smil:video'))
-        if f4m_node is not None and '.f4m' in f4m_node.attrib['src']:
-            f4m_url = f4m_node.attrib['src']
-            if 'manifest.f4m?' not in f4m_url:
-                f4m_url += '?'
-            # the parameters are from syfy.com, other sites may use others,
-            # they also work for nbc.com
-            f4m_url += '&g=UXWGVKRWHFSP&hdcore=3.0.3'
-            formats = self._extract_f4m_formats(f4m_url, video_id)
-        else:
-            formats = []
-            switch = body.find(_x('smil:switch'))
-            if switch is None:
-                switch = body.find(_x('smil:par//smil:switch'))
-            if switch is None:
-                switch = body.find(_x('smil:par/smil:switch'))
-            if switch is None:
-                switch = body.find(_x('smil:par'))
-            if switch is not None:
-                base_url = head.find(_x('smil:meta')).attrib['base']
-                for f in switch.findall(_x('smil:video')):
-                    attr = f.attrib
-                    width = int_or_none(attr.get('width'))
-                    height = int_or_none(attr.get('height'))
-                    vbr = int_or_none(attr.get('system-bitrate'), 1000)
-                    format_id = '%dx%d_%dk' % (width, height, vbr)
-                    formats.append({
-                        'format_id': format_id,
-                        'url': base_url,
-                        'play_path': 'mp4:' + attr['src'],
-                        'ext': 'flv',
-                        'width': width,
-                        'height': height,
-                        'vbr': vbr,
-                    })
-            else:
-                switch = body.find(_x('smil:seq//smil:switch'))
-                if switch is None:
-                    switch = body.find(_x('smil:seq/smil:switch'))
-                for f in switch.findall(_x('smil:video')):
-                    attr = f.attrib
-                    vbr = int_or_none(attr.get('system-bitrate'), 1000)
-                    ext = determine_ext(attr['src'])
-                    if ext == 'once':
-                        ext = 'mp4'
-                    formats.append({
-                        'format_id': compat_str(vbr),
-                        'url': attr['src'],
-                        'vbr': vbr,
-                        'ext': ext,
-                    })
-            self._sort_formats(formats)
+        thumbnails = [{
+            'url': thumbnail['plfile$url'],
+            'width': int_or_none(thumbnail.get('plfile$width')),
+            'height': int_or_none(thumbnail.get('plfile$height')),
+        } for thumbnail in entry.get('media$thumbnails', [])]
  
-        return {
+        timestamp = int_or_none(entry.get('media$availableDate'), scale=1000)
+        categories = [item['media$name'] for item in entry.get('media$categories', [])]
+
+        ret = self.get_metadata('%s/%s' % (provider_id, first_video_id), video_id)
+        subtitles = self._merge_subtitles(subtitles, ret['subtitles'])
+        ret.update({
              'id': video_id,
-            'title': info['title'],
-            'subtitles': subtitles,
              'formats': formats,
-            'description': info['description'],
-            'thumbnail': info['defaultThumbnailUrl'],
-            'duration': int_or_none(info.get('duration'), 1000),
-        }
+            'subtitles': subtitles,
+            'thumbnails': thumbnails,
+            'duration': duration,
+            'timestamp': timestamp,
+            'categories': categories,
+        })
+
+        return ret
diff --git a/youtube_dl/extractor/thescene.py b/youtube_dl/extractor/thescene.py

new file mode 100644 (file)

index 0000000..3e4e140
--- /dev/null
+++ b/youtube_dl/extractor/thescene.py
@@ -0,0 +1,52 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+from ..compat import compat_urlparse
+from ..utils import qualities
+
+
+class TheSceneIE(InfoExtractor):
+    _VALID_URL = r'https://thescene\.com/watch/[^/]+/(?P<id>[^/#?]+)'
+
+    _TEST = {
+        'url': 'https://thescene.com/watch/vogue/narciso-rodriguez-spring-2013-ready-to-wear',
+        'info_dict': {
+            'id': '520e8faac2b4c00e3c6e5f43',
+            'ext': 'mp4',
+            'title': 'Narciso Rodriguez: Spring 2013 Ready-to-Wear',
+            'display_id': 'narciso-rodriguez-spring-2013-ready-to-wear',
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        player_url = compat_urlparse.urljoin(
+            url,
+            self._html_search_regex(
+                r'id=\'js-player-script\'[^>]+src=\'(.+?)\'', webpage, 'player url'))
+
+        player = self._download_webpage(player_url, display_id)
+        info = self._parse_json(
+            self._search_regex(
+                r'(?m)var\s+video\s+=\s+({.+?});$', player, 'info json'),
+            display_id)
+
+        qualities_order = qualities(('low', 'high'))
+        formats = [{
+            'format_id': '{0}-{1}'.format(f['type'].split('/')[0], f['quality']),
+            'url': f['src'],
+            'quality': qualities_order(f['quality']),
+        } for f in info['sources'][0]]
+        self._sort_formats(formats)
+
+        return {
+            'id': info['id'],
+            'display_id': display_id,
+            'title': info['title'],
+            'formats': formats,
+            'thumbnail': info.get('poster_frame'),
+        }
diff --git a/youtube_dl/extractor/thesixtyone.py b/youtube_dl/extractor/thesixtyone.py

index 5d09eb9a8b28cdd2f8bea0743d947f98415564a2..d8b1fd2813eadc3d17a17a6d46766b3c9c4ea37a 100644 (file)
--- a/youtube_dl/extractor/thesixtyone.py
+++ b/youtube_dl/extractor/thesixtyone.py
@@ -48,22 +48,22 @@ class TheSixtyOneIE(InfoExtractor):
      ]
  
      _DECODE_MAP = {
-        "x": "a",
-        "m": "b",
-        "w": "c",
-        "q": "d",
-        "n": "e",
-        "p": "f",
-        "a": "0",
-        "h": "1",
-        "e": "2",
-        "u": "3",
-        "s": "4",
-        "i": "5",
-        "o": "6",
-        "y": "7",
-        "r": "8",
-        "c": "9"
+        'x': 'a',
+        'm': 'b',
+        'w': 'c',
+        'q': 'd',
+        'n': 'e',
+        'p': 'f',
+        'a': '0',
+        'h': '1',
+        'e': '2',
+        'u': '3',
+        's': '4',
+        'i': '5',
+        'o': '6',
+        'y': '7',
+        'r': '8',
+        'c': '9'
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/thestar.py b/youtube_dl/extractor/thestar.py

new file mode 100644 (file)

index 0000000..ba1380a
--- /dev/null
+++ b/youtube_dl/extractor/thestar.py
@@ -0,0 +1,35 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .brightcove import BrightcoveLegacyIE
+from ..compat import compat_parse_qs
+
+
+class TheStarIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?thestar\.com/(?:[^/]+/)*(?P<id>.+)\.html'
+    _TEST = {
+        'url': 'http://www.thestar.com/life/2016/02/01/mankind-why-this-woman-started-a-men-s-skincare-line.html',
+        'md5': '2c62dd4db2027e35579fefb97a8b6554',
+        'info_dict': {
+            'id': '4732393888001',
+            'ext': 'mp4',
+            'title': 'Mankind: Why this woman started a men\'s skin care line',
+            'description': 'Robert Cribb talks to Young Lee, the founder of Uncle Peter\'s MAN.',
+            'uploader_id': '794267642001',
+            'timestamp': 1454353482,
+            'upload_date': '20160201',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/794267642001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
+        brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/thvideo.py b/youtube_dl/extractor/thvideo.py

index 496f15d80b478f94bc2aac86c3d20417e2b09925..406f4a826623c0335e973a0f6dd79744fb32e982 100644 (file)
--- a/youtube_dl/extractor/thvideo.py
+++ b/youtube_dl/extractor/thvideo.py
@@ -10,7 +10,7 @@ from ..utils import (
  
  
  class THVideoIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?thvideo\.tv/(?:v/th|mobile\.php\?cid=)(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?thvideo\.tv/(?:v/th|mobile\.php\?cid=)(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://thvideo.tv/v/th1987/',
          'md5': 'fa107b1f73817e325e9433505a70db50',
diff --git a/youtube_dl/extractor/tinypic.py b/youtube_dl/extractor/tinypic.py

index e036b8cdf1e6ca6ad4277a4c3d22e79361322703..c43cace24d5bfd107328944d0bd290594ec06b3f 100644 (file)
--- a/youtube_dl/extractor/tinypic.py
+++ b/youtube_dl/extractor/tinypic.py
@@ -9,7 +9,7 @@ from ..utils import ExtractorError
  class TinyPicIE(InfoExtractor):
      IE_NAME = 'tinypic'
      IE_DESC = 'tinypic.com videos'
-    _VALID_URL = r'http://(?:.+?\.)?tinypic\.com/player\.php\?v=(?P<id>[^&]+)&s=\d+'
+    _VALID_URL = r'https?://(?:.+?\.)?tinypic\.com/player\.php\?v=(?P<id>[^&]+)&s=\d+'
  
      _TESTS = [
          {
diff --git a/youtube_dl/extractor/tlc.py b/youtube_dl/extractor/tlc.py

index 13263614cc06b099d929ee71564899ac3620f76a..abad3ff64b5e519414615d3dd3cf8da345e9a2f3 100644 (file)
--- a/youtube_dl/extractor/tlc.py
+++ b/youtube_dl/extractor/tlc.py
@@ -3,36 +3,13 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from .brightcove import BrightcoveIE
-from .discovery import DiscoveryIE
-from ..compat import compat_urlparse
-
-
-class TlcIE(DiscoveryIE):
-    IE_NAME = 'tlc.com'
-    _VALID_URL = r'http://www\.tlc\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
-
-    # DiscoveryIE has _TESTS
-    _TESTS = [{
-        'url': 'http://www.tlc.com/tv-shows/cake-boss/videos/too-big-to-fly.htm',
-        'info_dict': {
-            'id': '104493',
-            'ext': 'mp4',
-            'title': 'Too Big to Fly',
-            'description': 'Buddy has taken on a high flying task.',
-            'duration': 119,
-            'timestamp': 1393365060,
-            'upload_date': '20140225',
-        },
-        'params': {
-            'skip_download': True,  # requires ffmpef
-        },
-    }]
+from .brightcove import BrightcoveLegacyIE
+from ..compat import compat_parse_qs
  
  
  class TlcDeIE(InfoExtractor):
      IE_NAME = 'tlc.de'
-    _VALID_URL = r'http://www\.tlc\.de/sendungen/[^/]+/videos/(?P<title>[^/?]+)'
+    _VALID_URL = r'https?://www\.tlc\.de/(?:[^/]+/)*videos/(?P<title>[^/?#]+)?(?:.*#(?P<id>\d+))?'
  
      _TEST = {
          'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
@@ -40,32 +17,23 @@ class TlcDeIE(InfoExtractor):
              'id': '3235167922001',
              'ext': 'mp4',
              'title': 'Breaking Amish: Die Welt da draußen',
-            'uploader': 'Discovery Networks - Germany',
              'description': (
                  'Vier Amische und eine Mennonitin wagen in New York'
                  '  den Sprung in ein komplett anderes Leben. Begleitet sie auf'
                  ' ihrem spannenden Weg.'),
+            'timestamp': 1396598084,
+            'upload_date': '20140404',
+            'uploader_id': '1659832546',
          },
      }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        title = mobj.group('title')
-        webpage = self._download_webpage(url, title)
-        iframe_url = self._search_regex(
-            '<iframe src="(http://www\.tlc\.de/wp-content/.+?)"', webpage,
-            'iframe url')
-        # Otherwise we don't get the correct 'BrightcoveExperience' element,
-        # example: http://www.tlc.de/sendungen/cake-boss/videos/cake-boss-cannoli-drama/
-        iframe_url = iframe_url.replace('.htm?', '.php?')
-        url_fragment = compat_urlparse.urlparse(url).fragment
-        if url_fragment:
-            # Since the fragment is not send to the server, we always get the same iframe
-            iframe_url = re.sub(r'playlist=(\d+)', 'playlist=%s' % url_fragment, iframe_url)
-        iframe = self._download_webpage(iframe_url, title)
-
-        return {
-            '_type': 'url',
-            'url': BrightcoveIE._extract_brightcove_url(iframe),
-            'ie': BrightcoveIE.ie_key(),
-        }
+        brightcove_id = mobj.group('id')
+        if not brightcove_id:
+            title = mobj.group('title')
+            webpage = self._download_webpage(url, title)
+            brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
+            brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/tnaflix.py b/youtube_dl/extractor/tnaflix.py

index 49516abca690721a83dee5044bb2cdd6540d4a07..78174178e6ef69362462f96f997b7a37a640a275 100644 (file)
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -71,12 +71,16 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') if 'display_id' in mobj.groupdict() else video_id
  
          webpage = self._download_webpage(url, display_id)
  
          cfg_url = self._proto_relative_url(self._html_search_regex(
-            self._CONFIG_REGEX, webpage, 'flashvars.config'), 'http:')
+            self._CONFIG_REGEX, webpage, 'flashvars.config', default=None), 'http:')
+
+        if not cfg_url:
+            inputs = self._hidden_inputs(webpage)
+            cfg_url = 'https://cdn-fck.tnaflix.com/tnaflix/%s.fid?key=%s' % (inputs['vkey'], inputs['nkey'])
  
          cfg_xml = self._download_xml(
              cfg_url, display_id, 'Downloading metadata',
@@ -117,7 +121,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          title = self._html_search_regex(
              self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
  
-        age_limit = self._rta_search(webpage)
+        age_limit = self._rta_search(webpage) or 18
  
          duration = parse_duration(self._html_search_meta(
              'duration', webpage, 'duration', default=None))
@@ -132,7 +136,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          average_rating = float_or_none(extract_field(self._AVERAGE_RATING_REGEX, 'average rating'))
  
          categories_str = extract_field(self._CATEGORIES_REGEX, 'categories')
-        categories = categories_str.split(', ') if categories_str is not None else []
+        categories = [c.strip() for c in categories_str.split(',')] if categories_str is not None else []
  
          return {
              'id': video_id,
@@ -152,17 +156,48 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
          }
  
  
+class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
+    _VALID_URL = r'https?://player\.(?:tna|emp)flix\.com/video/(?P<id>\d+)'
+
+    _TITLE_REGEX = r'<title>([^<]+)</title>'
+
+    _TESTS = [{
+        'url': 'https://player.tnaflix.com/video/6538',
+        'info_dict': {
+            'id': '6538',
+            'display_id': '6538',
+            'ext': 'mp4',
+            'title': 'Educational xxx video',
+            'thumbnail': 're:https?://.*\.jpg$',
+            'age_limit': 18,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://player.empflix.com/video/33051',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [url for _, url in re.findall(
+            r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.(?:tna|emp)flix\.com/video/\d+)\1',
+            webpage)]
+
+
  class TNAFlixIE(TNAFlixNetworkBaseIE):
      _VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
  
      _TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
-    _DESCRIPTION_REGEX = r'<h3 itemprop="description">([^<]+)</h3>'
-    _UPLOADER_REGEX = r'(?s)<span[^>]+class="infoTitle"[^>]*>Uploaded By:</span>(.+?)<div'
+    _DESCRIPTION_REGEX = r'<meta[^>]+name="description"[^>]+content="([^"]+)"'
+    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h1>(.+?)</h1>'
+    _CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'
  
      _TESTS = [{
          # anonymous uploader, no categories
          'url': 'http://www.tnaflix.com/porn-stars/Carmella-Decesare-striptease/video553878',
-        'md5': 'ecf3498417d09216374fc5907f9c6ec0',
+        'md5': '7e569419fe6d69543d01e6be22f5f7c4',
          'info_dict': {
              'id': '553878',
              'display_id': 'Carmella-Decesare-striptease',
@@ -171,17 +206,16 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'thumbnail': 're:https?://.*\.jpg$',
              'duration': 91,
              'age_limit': 18,
-            'uploader': 'Anonymous',
-            'categories': [],
+            'categories': ['Porn Stars'],
          }
      }, {
          # non-anonymous uploader, categories
          'url': 'https://www.tnaflix.com/teen-porn/Educational-xxx-video/video6538',
-        'md5': '0f5d4d490dbfd117b8607054248a07c0',
+        'md5': 'fcba2636572895aba116171a899a5658',
          'info_dict': {
              'id': '6538',
              'display_id': 'Educational-xxx-video',
-            'ext': 'mp4',
+            'ext': 'flv',
              'title': 'Educational xxx video',
              'description': 'md5:b4fab8f88a8621c8fabd361a173fe5b8',
              'thumbnail': 're:https?://.*\.jpg$',
diff --git a/youtube_dl/extractor/toggle.py b/youtube_dl/extractor/toggle.py

new file mode 100644 (file)

index 0000000..c54b876
--- /dev/null
+++ b/youtube_dl/extractor/toggle.py
@@ -0,0 +1,192 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    ExtractorError,
+    float_or_none,
+    int_or_none,
+    parse_iso8601,
+    sanitized_Request,
+)
+
+
+class ToggleIE(InfoExtractor):
+    IE_NAME = 'toggle'
+    _VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:series|clips|movies)/(?:[^/]+/)+(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
+        'info_dict': {
+            'id': '343115',
+            'ext': 'mp4',
+            'title': 'Lion Moms Premiere',
+            'description': 'md5:aea1149404bff4d7f7b6da11fafd8e6b',
+            'upload_date': '20150910',
+            'timestamp': 1441858274,
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        }
+    }, {
+        'note': 'DRM-protected video',
+        'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
+        'info_dict': {
+            'id': '341413',
+            'ext': 'wvm',
+            'title': 'Dug\'s Special Mission',
+            'description': 'md5:e86c6f4458214905c1772398fabc93e0',
+            'upload_date': '20150827',
+            'timestamp': 1440644006,
+        },
+        'params': {
+            'skip_download': 'DRM-protected wvm download',
+        }
+    }, {
+        # this also tests correct video id extraction
+        'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
+        'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
+        'info_dict': {
+            'id': '332861',
+            'ext': 'mp4',
+            'title': '28th SEA Games (5 Show) -  Episode  11',
+            'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa',
+            'upload_date': '20150605',
+            'timestamp': 1433480166,
+        },
+        'params': {
+            'skip_download': 'DRM-protected wvm download',
+        },
+        'skip': 'm3u8 links are geo-restricted'
+    }, {
+        'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
+        'only_matching': True,
+    }]
+
+    _FORMAT_PREFERENCES = {
+        'wvm-STBMain': -10,
+        'wvm-iPadMain': -20,
+        'wvm-iPhoneMain': -30,
+        'wvm-Android': -40,
+    }
+    _API_USER = 'tvpapi_147'
+    _API_PASS = '11111'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            url, video_id, note='Downloading video page')
+
+        api_user = self._search_regex(
+            r'apiUser\s*:\s*(["\'])(?P<user>.+?)\1', webpage, 'apiUser',
+            default=self._API_USER, group='user')
+        api_pass = self._search_regex(
+            r'apiPass\s*:\s*(["\'])(?P<pass>.+?)\1', webpage, 'apiPass',
+            default=self._API_PASS, group='pass')
+
+        params = {
+            'initObj': {
+                'Locale': {
+                    'LocaleLanguage': '',
+                    'LocaleCountry': '',
+                    'LocaleDevice': '',
+                    'LocaleUserState': 0
+                },
+                'Platform': 0,
+                'SiteGuid': 0,
+                'DomainID': '0',
+                'UDID': '',
+                'ApiUser': api_user,
+                'ApiPass': api_pass
+            },
+            'MediaID': video_id,
+            'mediaType': 0,
+        }
+
+        req = sanitized_Request(
+            'http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo',
+            json.dumps(params).encode('utf-8'))
+        info = self._download_json(req, video_id, 'Downloading video info json')
+
+        title = info['MediaName']
+
+        formats = []
+        for video_file in info.get('Files', []):
+            video_url, vid_format = video_file.get('URL'), video_file.get('Format')
+            if not video_url or not vid_format:
+                continue
+            ext = determine_ext(video_url)
+            vid_format = vid_format.replace(' ', '')
+            # if geo-restricted, m3u8 is inaccessible, but mp4 is okay
+            if ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, ext='mp4', m3u8_id=vid_format,
+                    note='Downloading %s m3u8 information' % vid_format,
+                    errnote='Failed to download %s m3u8 information' % vid_format,
+                    fatal=False))
+            elif ext in ('mp4', 'wvm'):
+                # wvm are drm-protected files
+                formats.append({
+                    'ext': ext,
+                    'url': video_url,
+                    'format_id': vid_format,
+                    'preference': self._FORMAT_PREFERENCES.get(ext + '-' + vid_format) or -1,
+                    'format_note': 'DRM-protected video' if ext == 'wvm' else None
+                })
+        if not formats:
+            # Most likely because geo-blocked
+            raise ExtractorError('No downloadable videos found', expected=True)
+        self._sort_formats(formats)
+
+        duration = int_or_none(info.get('Duration'))
+        description = info.get('Description')
+        created_at = parse_iso8601(info.get('CreationDate') or None)
+
+        average_rating = float_or_none(info.get('Rating'))
+        view_count = int_or_none(info.get('ViewCounter') or info.get('view_counter'))
+        like_count = int_or_none(info.get('LikeCounter') or info.get('like_counter'))
+
+        thumbnails = []
+        for picture in info.get('Pictures', []):
+            if not isinstance(picture, dict):
+                continue
+            pic_url = picture.get('URL')
+            if not pic_url:
+                continue
+            thumbnail = {
+                'url': pic_url,
+            }
+            pic_size = picture.get('PicSize', '')
+            m = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', pic_size)
+            if m:
+                thumbnail.update({
+                    'width': int(m.group('width')),
+                    'height': int(m.group('height')),
+                })
+            thumbnails.append(thumbnail)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'timestamp': created_at,
+            'average_rating': average_rating,
+            'view_count': view_count,
+            'like_count': like_count,
+            'thumbnails': thumbnails,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/toypics.py b/youtube_dl/extractor/toypics.py

index 2756f56d3a94ae8f2bed64aa39acf4d45616366b..2579ba8c67498c91aa117c6853b83f391ccb3ba6 100644 (file)
--- a/youtube_dl/extractor/toypics.py
+++ b/youtube_dl/extractor/toypics.py
@@ -41,7 +41,7 @@ class ToypicsIE(InfoExtractor):
  
  class ToypicsUserIE(InfoExtractor):
      IE_DESC = 'Toypics user profile'
-    _VALID_URL = r'http://videos\.toypics\.net/(?P<username>[^/?]+)(?:$|[?#])'
+    _VALID_URL = r'https?://videos\.toypics\.net/(?P<username>[^/?]+)(?:$|[?#])'
      _TEST = {
          'url': 'http://videos.toypics.net/Mikey',
          'info_dict': {
diff --git a/youtube_dl/extractor/traileraddict.py b/youtube_dl/extractor/traileraddict.py

index 1c53a3fd09459f31fdf188dc852141ef000af4a6..747370d12d7fc8fd1c66b1ac101db0ba01c963e5 100644 (file)
--- a/youtube_dl/extractor/traileraddict.py
+++ b/youtube_dl/extractor/traileraddict.py
@@ -7,7 +7,7 @@ from .common import InfoExtractor
  
  class TrailerAddictIE(InfoExtractor):
      _WORKING = False
-    _VALID_URL = r'(?:http://)?(?:www\.)?traileraddict\.com/(?:trailer|clip)/(?P<movie>.+?)/(?P<trailer_name>.+)'
+    _VALID_URL = r'(?:https?://)?(?:www\.)?traileraddict\.com/(?:trailer|clip)/(?P<movie>.+?)/(?P<trailer_name>.+)'
      _TEST = {
          'url': 'http://www.traileraddict.com/trailer/prince-avalanche/trailer',
          'md5': '41365557f3c8c397d091da510e73ceb4',
@@ -38,12 +38,12 @@ class TrailerAddictIE(InfoExtractor):
  
          # Presence of (no)watchplus function indicates HD quality is available
          if re.search(r'function (no)?watchplus()', webpage):
-            fvar = "fvarhd"
+            fvar = 'fvarhd'
          else:
-            fvar = "fvar"
+            fvar = 'fvar'
  
-        info_url = "http://www.traileraddict.com/%s.php?tid=%s" % (fvar, str(video_id))
-        info_webpage = self._download_webpage(info_url, video_id, "Downloading the info webpage")
+        info_url = 'http://www.traileraddict.com/%s.php?tid=%s' % (fvar, str(video_id))
+        info_webpage = self._download_webpage(info_url, video_id, 'Downloading the info webpage')
  
          final_url = self._search_regex(r'&fileurl=(.+)',
                                         info_webpage, 'Download url').replace('%3F', '?')
diff --git a/youtube_dl/extractor/trilulilu.py b/youtube_dl/extractor/trilulilu.py

index 185accc4b6b6ebaad3f1aa9379fe8f7c8f6d33ea..a800449e9448d70a2900578d0afd9933d265730e 100644 (file)
--- a/youtube_dl/extractor/trilulilu.py
+++ b/youtube_dl/extractor/trilulilu.py
@@ -1,80 +1,103 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    parse_iso8601,
+)
  
  
  class TriluliluIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?trilulilu\.ro/(?:video-[^/]+/)?(?P<id>[^/#\?]+)'
-    _TEST = {
-        'url': 'http://www.trilulilu.ro/video-animatie/big-buck-bunny-1',
-        'md5': 'c1450a00da251e2769b74b9005601cac',
+    _VALID_URL = r'https?://(?:(?:www|m)\.)?trilulilu\.ro/(?:[^/]+/)?(?P<id>[^/#\?]+)'
+    _TESTS = [{
+        'url': 'http://www.trilulilu.ro/big-buck-bunny-1',
+        'md5': '68da087b676a6196a413549212f60cc6',
          'info_dict': {
              'id': 'ae2899e124140b',
              'ext': 'mp4',
              'title': 'Big Buck Bunny',
              'description': ':) pentru copilul din noi',
+            'uploader_id': 'chipy',
+            'upload_date': '20120304',
+            'timestamp': 1330830647,
+            'uploader': 'chipy',
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
          },
-    }
+    }, {
+        'url': 'http://www.trilulilu.ro/adena-ft-morreti-inocenta',
+        'md5': '929dfb8729dc71750463af88bbbbf4a4',
+        'info_dict': {
+            'id': 'f299710e3c91c5',
+            'ext': 'mp4',
+            'title': 'Adena ft. Morreti - Inocenta',
+            'description': 'pop music',
+            'uploader_id': 'VEVOmixt',
+            'upload_date': '20151204',
+            'uploader': 'VEVOmixt',
+            'timestamp': 1449187937,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
+        media_info = self._download_json('http://m.trilulilu.ro/%s?format=json' % display_id, display_id)
  
-        if re.search(r'Fişierul nu este disponibil pentru vizionare în ţara dumneavoastră', webpage):
-            raise ExtractorError(
-                'This video is not available in your country.', expected=True)
-        elif re.search('Fişierul poate fi accesat doar de către prietenii lui', webpage):
+        age_limit = 0
+        errors = media_info.get('errors', {})
+        if errors.get('friends'):
              raise ExtractorError('This video is private.', expected=True)
+        elif errors.get('geoblock'):
+            raise ExtractorError('This video is not available in your country.', expected=True)
+        elif errors.get('xxx_unlogged'):
+            age_limit = 18
  
-        flashvars_str = self._search_regex(
-            r'block_flash_vars\s*=\s*(\{[^\}]+\})', webpage, 'flashvars', fatal=False, default=None)
+        media_class = media_info.get('class')
+        if media_class not in ('video', 'audio'):
+            raise ExtractorError('not a video or an audio')
  
-        if flashvars_str:
-            flashvars = self._parse_json(flashvars_str, display_id)
-        else:
-            raise ExtractorError(
-                'This page does not contain videos', expected=True)
+        user = media_info.get('user', {})
  
-        if flashvars['isMP3'] == 'true':
-            raise ExtractorError(
-                'Audio downloads are currently not supported', expected=True)
+        thumbnail = media_info.get('cover_url')
+        if thumbnail:
+            thumbnail.format(width='1600', height='1200')
  
-        video_id = flashvars['hash']
-        title = self._og_search_title(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        description = self._og_search_description(webpage, default=None)
-
-        format_url = ('http://fs%(server)s.trilulilu.ro/%(hash)s/'
-                      'video-formats2' % flashvars)
-        format_doc = self._download_xml(
-            format_url, video_id,
-            note='Downloading formats',
-            errnote='Error while downloading formats')
-
-        video_url_template = (
-            'http://fs%(server)s.trilulilu.ro/stream.php?type=video'
-            '&source=site&hash=%(hash)s&username=%(userid)s&'
-            'key=ministhebest&format=%%s&sig=&exp=' %
-            flashvars)
-        formats = [
-            {
-                'format_id': fnode.text.partition('-')[2],
-                'url': video_url_template % fnode.text,
-                'ext': fnode.text.partition('-')[0]
-            }
-
-            for fnode in format_doc.findall('./formats/format')
-        ]
+        # TODO: get correct ext for audio files
+        stream_type = media_info.get('stream_type')
+        formats = [{
+            'url': media_info['href'],
+            'ext': stream_type,
+        }]
+        if media_info.get('is_hd'):
+            formats.append({
+                'format_id': 'hd',
+                'url': media_info['hrefhd'],
+                'ext': stream_type,
+            })
+        if media_class == 'audio':
+            formats[0]['vcodec'] = 'none'
+        else:
+            formats[0]['format_id'] = 'sd'
  
          return {
-            'id': video_id,
+            'id': media_info['identifier'].split('|')[1],
              'display_id': display_id,
              'formats': formats,
-            'title': title,
-            'description': description,
+            'title': media_info['title'],
+            'description': media_info.get('description'),
              'thumbnail': thumbnail,
+            'uploader_id': user.get('username'),
+            'uploader': user.get('fullname'),
+            'timestamp': parse_iso8601(media_info.get('published'), ' '),
+            'duration': int_or_none(media_info.get('duration')),
+            'view_count': int_or_none(media_info.get('count_views')),
+            'like_count': int_or_none(media_info.get('count_likes')),
+            'comment_count': int_or_none(media_info.get('count_comments')),
+            'age_limit': age_limit,
          }
diff --git a/youtube_dl/extractor/trollvids.py b/youtube_dl/extractor/trollvids.py

new file mode 100644 (file)

index 0000000..6577056
--- /dev/null
+++ b/youtube_dl/extractor/trollvids.py
@@ -0,0 +1,36 @@
+# encoding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .nuevo import NuevoBaseIE
+
+
+class TrollvidsIE(NuevoBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?trollvids\.com/video/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+    IE_NAME = 'trollvids'
+    _TEST = {
+        'url': 'http://trollvids.com/video/2349002/%E3%80%90MMD-R-18%E3%80%91%E3%82%AC%E3%83%BC%E3%83%AB%E3%83%95%E3%83%AC%E3%83%B3%E3%83%89-carrymeoff',
+        'md5': '1d53866b2c514b23ed69e4352fdc9839',
+        'info_dict': {
+            'id': '2349002',
+            'ext': 'mp4',
+            'title': '【MMD R-18】ガールフレンド carry_me_off',
+            'age_limit': 18,
+            'duration': 216.78,
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        info = self._extract_nuevo(
+            'http://trollvids.com/nuevo/player/config.php?v=%s' % video_id,
+            video_id)
+        info.update({
+            'display_id': display_id,
+            'age_limit': 18
+        })
+        return info
diff --git a/youtube_dl/extractor/trutube.py b/youtube_dl/extractor/trutube.py

index e7b79243a8fb9f091087f5c452c8192c49c81af2..d55e0c563998bb1f4c38b973519ec420b6b41a8c 100644 (file)
--- a/youtube_dl/extractor/trutube.py
+++ b/youtube_dl/extractor/trutube.py
@@ -1,11 +1,10 @@
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
-from ..utils import xpath_text
+from .nuevo import NuevoBaseIE
  
  
-class TruTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>[0-9]+)'
+class TruTubeIE(NuevoBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?trutube\.tv/(?:video/|nuevo/player/embed\.php\?v=)(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://trutube.tv/video/14880/Ramses-II-Proven-To-Be-A-Red-Headed-Caucasoid-',
          'md5': 'c5b6e301b0a2040b074746cbeaa26ca1',
@@ -22,19 +21,6 @@ class TruTubeIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-
-        config = self._download_xml(
+        return self._extract_nuevo(
              'https://trutube.tv/nuevo/player/config.php?v=%s' % video_id,
-            video_id, transform_source=lambda s: s.strip())
-
-        # filehd is always 404
-        video_url = xpath_text(config, './file', 'video URL', fatal=True)
-        title = xpath_text(config, './title', 'title').strip()
-        thumbnail = xpath_text(config, './image', ' thumbnail')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': title,
-            'thumbnail': thumbnail,
-        }
+            video_id)
diff --git a/youtube_dl/extractor/tube8.py b/youtube_dl/extractor/tube8.py

index c9cb69333f7da0a9f4fe009e79b06433bca83726..1d9271d1e70c3f2b5ad30a6c69f6602bd58ac89a 100644 (file)
--- a/youtube_dl/extractor/tube8.py
+++ b/youtube_dl/extractor/tube8.py
@@ -1,15 +1,12 @@
  from __future__ import unicode_literals
  
-import json
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-    compat_urllib_request,
-)
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
+    sanitized_Request,
      str_to_int,
  )
  from ..aes import aes_decrypt_text
@@ -17,43 +14,55 @@ from ..aes import aes_decrypt_text
  
  class Tube8IE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/(?P<id>\d+)'
-    _TESTS = [
-        {
-            'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
-            'md5': '44bf12b98313827dd52d35b8706a4ea0',
-            'info_dict': {
-                'id': '229795',
-                'display_id': 'kasia-music-video',
-                'ext': 'mp4',
-                'description': 'hot teen Kasia grinding',
-                'uploader': 'unknown',
-                'title': 'Kasia music video',
-                'age_limit': 18,
-            }
-        },
-        {
-            'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
-            'only_matching': True,
-        },
-    ]
+    _TESTS = [{
+        'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
+        'md5': '65e20c48e6abff62ed0c3965fff13a39',
+        'info_dict': {
+            'id': '229795',
+            'display_id': 'kasia-music-video',
+            'ext': 'mp4',
+            'description': 'hot teen Kasia grinding',
+            'uploader': 'unknown',
+            'title': 'Kasia music video',
+            'age_limit': 18,
+            'duration': 230,
+        }
+    }, {
+        'url': 'http://www.tube8.com/shemale/teen/blonde-cd-gets-kidnapped-by-two-blacks-and-punished-for-being-a-slutty-girl/19569151/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          display_id = mobj.group('display_id')
  
-        req = compat_urllib_request.Request(url)
+        req = sanitized_Request(url)
          req.add_header('Cookie', 'age_verified=1')
          webpage = self._download_webpage(req, display_id)
  
-        flashvars = json.loads(self._html_search_regex(
-            r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'))
+        flashvars = self._parse_json(
+            self._search_regex(
+                r'flashvars\s*=\s*({.+?});\r?\n', webpage, 'flashvars'),
+            video_id)
  
-        video_url = flashvars['video_url']
-        if flashvars.get('encrypted') is True:
-            video_url = aes_decrypt_text(video_url, flashvars['video_title'], 32).decode('utf-8')
-        path = compat_urllib_parse_urlparse(video_url).path
-        format_id = '-'.join(path.split('/')[4].split('_')[:2])
+        formats = []
+        for key, video_url in flashvars.items():
+            if not isinstance(video_url, compat_str) or not video_url.startswith('http'):
+                continue
+            height = self._search_regex(
+                r'quality_(\d+)[pP]', key, 'height', default=None)
+            if not height:
+                continue
+            if flashvars.get('encrypted') is True:
+                video_url = aes_decrypt_text(
+                    video_url, flashvars['video_title'], 32).decode('utf-8')
+            formats.append({
+                'url': video_url,
+                'format_id': '%sp' % height,
+                'height': int(height),
+            })
+        self._sort_formats(formats)
  
          thumbnail = flashvars.get('image_url')
  
@@ -64,32 +73,31 @@ class Tube8IE(InfoExtractor):
          uploader = self._html_search_regex(
              r'<span class="username">\s*(.+?)\s*<',
              webpage, 'uploader', fatal=False)
+        duration = int_or_none(flashvars.get('video_duration'))
  
-        like_count = int_or_none(self._html_search_regex(
+        like_count = int_or_none(self._search_regex(
              r'rupVar\s*=\s*"(\d+)"', webpage, 'like count', fatal=False))
-        dislike_count = int_or_none(self._html_search_regex(
+        dislike_count = int_or_none(self._search_regex(
              r'rdownVar\s*=\s*"(\d+)"', webpage, 'dislike count', fatal=False))
-        view_count = self._html_search_regex(
-            r'<strong>Views: </strong>([\d,\.]+)\s*</li>', webpage, 'view count', fatal=False)
-        if view_count:
-            view_count = str_to_int(view_count)
-        comment_count = self._html_search_regex(
-            r'<span id="allCommentsCount">(\d+)</span>', webpage, 'comment count', fatal=False)
-        if comment_count:
-            comment_count = str_to_int(comment_count)
+        view_count = str_to_int(self._search_regex(
+            r'<strong>Views: </strong>([\d,\.]+)\s*</li>',
+            webpage, 'view count', fatal=False))
+        comment_count = str_to_int(self._search_regex(
+            r'<span id="allCommentsCount">(\d+)</span>',
+            webpage, 'comment count', fatal=False))
  
          return {
              'id': video_id,
              'display_id': display_id,
-            'url': video_url,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'uploader': uploader,
-            'format_id': format_id,
+            'duration': duration,
              'view_count': view_count,
              'like_count': like_count,
              'dislike_count': dislike_count,
              'comment_count': comment_count,
              'age_limit': 18,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/tubitv.py b/youtube_dl/extractor/tubitv.py

index 2c4b21807ce1dc276e8625463ac38fb63c8ff211..c6572defbcf7a732d58d5fa3003993d063a8be3e 100644 (file)
--- a/youtube_dl/extractor/tubitv.py
+++ b/youtube_dl/extractor/tubitv.py
@@ -1,33 +1,32 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import codecs
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request
-)
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
+    parse_iso8601,
  )
  
  
  class TubiTvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tubitv\.com/video\?id=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
      _LOGIN_URL = 'http://tubitv.com/login'
      _NETRC_MACHINE = 'tubitv'
      _TEST = {
-        'url': 'http://tubitv.com/video?id=54411&title=The_Kitchen_Musical_-_EP01',
+        'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
          'info_dict': {
-            'id': '54411',
+            'id': '283829',
              'ext': 'mp4',
-            'title': 'The Kitchen Musical - EP01',
-            'thumbnail': 're:^https?://.*\.png$',
-            'description': 'md5:37532716166069b353e8866e71fefae7',
-            'duration': 2407,
+            'title': 'The Comedian at The Friday',
+            'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
+            'uploader': 'Indie Rights Films',
+            'upload_date': '20160111',
+            'timestamp': 1452555979,
          },
          'params': {
              'skip_download': 'HLS download',
@@ -43,8 +42,8 @@ class TubiTvIE(InfoExtractor):
              'username': username,
              'password': password,
          }
-        payload = compat_urllib_parse.urlencode(form_data).encode('utf-8')
-        request = compat_urllib_request.Request(self._LOGIN_URL, payload)
+        payload = urlencode_postdata(form_data)
+        request = sanitized_Request(self._LOGIN_URL, payload)
          request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          login_page = self._download_webpage(
              request, None, False, 'Wrong login info')
@@ -57,28 +56,31 @@ class TubiTvIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        video_data = self._download_json(
+            'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
+        title = video_data['n']
  
-        webpage = self._download_webpage(url, video_id)
-        if re.search(r"<(?:DIV|div) class='login-required-screen'>", webpage):
-            raise ExtractorError(
-                'This video requires login, use --username and --password '
-                'options to provide account credentials.', expected=True)
-
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        duration = int_or_none(self._html_search_meta(
-            'video:duration', webpage, 'duration'))
+        formats = self._extract_m3u8_formats(
+            video_data['mh'], video_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
  
-        apu = self._search_regex(r"apu='([^']+)'", webpage, 'apu')
-        m3u8_url = codecs.decode(apu, 'rot_13')[::-1]
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        subtitles = {}
+        for sub in video_data.get('sb', []):
+            sub_url = sub.get('u')
+            if not sub_url:
+                continue
+            subtitles.setdefault(sub.get('l', 'en'), []).append({
+                'url': sub_url,
+            })
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
-            'thumbnail': thumbnail,
-            'description': description,
-            'duration': duration,
+            'subtitles': subtitles,
+            'thumbnail': video_data.get('ph'),
+            'description': video_data.get('d'),
+            'duration': int_or_none(video_data.get('s')),
+            'timestamp': parse_iso8601(video_data.get('u')),
+            'uploader': video_data.get('on'),
          }
diff --git a/youtube_dl/extractor/tudou.py b/youtube_dl/extractor/tudou.py

index c89de5ba4a46bb261987d8dbee5f55b3d05492da..bb8b8e23424e7943f2133028aca187d4fcffeab9 100644 (file)
--- a/youtube_dl/extractor/tudou.py
+++ b/youtube_dl/extractor/tudou.py
@@ -2,14 +2,20 @@
  
  from __future__ import unicode_literals
  
-import re
-import json
-
  from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    InAdvancePagedList,
+    float_or_none,
+    unescapeHTML,
+)
  
  
  class TudouIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?tudou\.com/(?:listplay|programs(?:/view)?|albumplay)/.*?/(?P<id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])'
+    IE_NAME = 'tudou'
+    _VALID_URL = r'https?://(?:www\.)?tudou\.com/(?:(?:programs|wlplay)/view|(?:listplay|albumplay)/[\w-]{11})/(?P<id>[\w-]{11})'
      _TESTS = [{
          'url': 'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
          'md5': '140a49ed444bd22f93330985d8475fcb',
@@ -18,6 +24,11 @@ class TudouIE(InfoExtractor):
              'ext': 'f4v',
              'title': '卡马乔国足开大脚长传冲吊集锦',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1372113489000,
+            'description': '卡马乔卡家军，开大脚先进战术不完全集锦！',
+            'duration': 289.04,
+            'view_count': int,
+            'filesize': int,
          }
      }, {
          'url': 'http://www.tudou.com/programs/view/ajX3gyhL0pc/',
@@ -26,62 +37,145 @@ class TudouIE(InfoExtractor):
              'ext': 'f4v',
              'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012',
              'thumbnail': 're:^https?://.*\.jpg$',
+            'timestamp': 1349207518000,
+            'description': 'md5:294612423894260f2dcd5c6c04fe248b',
+            'duration': 5478.33,
+            'view_count': int,
+            'filesize': int,
          }
      }]
  
-    def _url_for_id(self, id, quality=None):
-        info_url = "http://v2.tudou.com/f?id=" + str(id)
+    _PLAYER_URL = 'http://js.tudouui.com/bin/lingtong/PortalPlayer_177.swf'
+
+    # Translated from tudou/tools/TVCHelper.as in PortalPlayer_193.swf
+    # 0001, 0002 and 4001 are not included as they indicate temporary issues
+    TVC_ERRORS = {
+        '0003': 'The video is deleted or does not exist',
+        '1001': 'This video is unavailable due to licensing issues',
+        '1002': 'This video is unavailable as it\'s under review',
+        '1003': 'This video is unavailable as it\'s under review',
+        '3001': 'Password required',
+        '5001': 'This video is available in Mainland China only due to licensing issues',
+        '7001': 'This video is unavailable',
+        '8001': 'This video is unavailable due to licensing issues',
+    }
+
+    def _url_for_id(self, video_id, quality=None):
+        info_url = 'http://v2.tudou.com/f?id=' + compat_str(video_id)
          if quality:
              info_url += '&hd' + quality
-        webpage = self._download_webpage(info_url, id, "Opening the info webpage")
-        final_url = self._html_search_regex('>(.+?)</f>', webpage, 'video url')
+        xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page')
+        error = xml_data.attrib.get('error')
+        if error is not None:
+            raise ExtractorError('Tudou said: %s' % error, expected=True)
+        final_url = xml_data.text
          return final_url
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        m = re.search(r'vcode:\s*[\'"](.+?)[\'"]', webpage)
-        if m and m.group(1):
-            return {
-                '_type': 'url',
-                'url': 'youku:' + m.group(1),
-                'ie_key': 'Youku'
-            }
-
-        title = self._search_regex(
-            r",kw:\s*['\"](.+?)[\"']", webpage, 'title')
-        thumbnail_url = self._search_regex(
-            r",pic:\s*[\"'](.+?)[\"']", webpage, 'thumbnail URL', fatal=False)
-
-        segs_json = self._search_regex(r'segs: \'(.*)\'', webpage, 'segments')
-        segments = json.loads(segs_json)
+        item_data = self._download_json(
+            'http://www.tudou.com/tvp/getItemInfo.action?ic=%s' % video_id, video_id)
+
+        youku_vcode = item_data.get('vcode')
+        if youku_vcode:
+            return self.url_result('youku:' + youku_vcode, ie='Youku')
+
+        if not item_data.get('itemSegs'):
+            tvc_code = item_data.get('tvcCode')
+            if tvc_code:
+                err_msg = self.TVC_ERRORS.get(tvc_code)
+                if err_msg:
+                    raise ExtractorError('Tudou said: %s' % err_msg, expected=True)
+                raise ExtractorError('Unexpected error %s returned from Tudou' % tvc_code)
+            raise ExtractorError('Unxpected error returned from Tudou')
+
+        title = unescapeHTML(item_data['kw'])
+        description = item_data.get('desc')
+        thumbnail_url = item_data.get('pic')
+        view_count = int_or_none(item_data.get('playTimes'))
+        timestamp = int_or_none(item_data.get('pt'))
+
+        segments = self._parse_json(item_data['itemSegs'], video_id)
          # It looks like the keys are the arguments that have to be passed as
          # the hd field in the request url, we pick the higher
          # Also, filter non-number qualities (see issue #3643).
          quality = sorted(filter(lambda k: k.isdigit(), segments.keys()),
                           key=lambda k: int(k))[-1]
          parts = segments[quality]
-        result = []
          len_parts = len(parts)
          if len_parts > 1:
              self.to_screen('%s: found %s parts' % (video_id, len_parts))
-        for part in parts:
+
+        def part_func(partnum):
+            part = parts[partnum]
              part_id = part['k']
              final_url = self._url_for_id(part_id, quality)
              ext = (final_url.split('?')[0]).split('.')[-1]
-            part_info = {
+            return [{
                  'id': '%s' % part_id,
                  'url': final_url,
                  'ext': ext,
                  'title': title,
                  'thumbnail': thumbnail_url,
-            }
-            result.append(part_info)
+                'description': description,
+                'view_count': view_count,
+                'timestamp': timestamp,
+                'duration': float_or_none(part.get('seconds'), 1000),
+                'filesize': int_or_none(part.get('size')),
+                'http_headers': {
+                    'Referer': self._PLAYER_URL,
+                },
+            }]
+
+        entries = InAdvancePagedList(part_func, len_parts, 1)
  
          return {
              '_type': 'multi_video',
-            'entries': result,
+            'entries': entries,
              'id': video_id,
              'title': title,
          }
+
+
+class TudouPlaylistIE(InfoExtractor):
+    IE_NAME = 'tudou:playlist'
+    _VALID_URL = r'https?://(?:www\.)?tudou\.com/listplay/(?P<id>[\w-]{11})\.html'
+    _TESTS = [{
+        'url': 'http://www.tudou.com/listplay/zzdE77v6Mmo.html',
+        'info_dict': {
+            'id': 'zzdE77v6Mmo',
+        },
+        'playlist_mincount': 209,
+    }]
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        playlist_data = self._download_json(
+            'http://www.tudou.com/tvp/plist.action?lcode=%s' % playlist_id, playlist_id)
+        entries = [self.url_result(
+            'http://www.tudou.com/programs/view/%s' % item['icode'],
+            'Tudou', item['icode'],
+            item['kw']) for item in playlist_data['items']]
+        return self.playlist_result(entries, playlist_id)
+
+
+class TudouAlbumIE(InfoExtractor):
+    IE_NAME = 'tudou:album'
+    _VALID_URL = r'https?://(?:www\.)?tudou\.com/album(?:cover|play)/(?P<id>[\w-]{11})'
+    _TESTS = [{
+        'url': 'http://www.tudou.com/albumplay/v5qckFJvNJg.html',
+        'info_dict': {
+            'id': 'v5qckFJvNJg',
+        },
+        'playlist_mincount': 45,
+    }]
+
+    def _real_extract(self, url):
+        album_id = self._match_id(url)
+        album_data = self._download_json(
+            'http://www.tudou.com/tvp/alist.action?acode=%s' % album_id, album_id)
+        entries = [self.url_result(
+            'http://www.tudou.com/programs/view/%s' % item['icode'],
+            'Tudou', item['icode'],
+            item['kw']) for item in album_data['items']]
+        return self.playlist_result(entries, album_id)
diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py

index 3d3b635e4cb362515b365ccd8f9321e5124aadeb..4d8b57111897f3c936e11f55fcea60d6a6bd30d6 100644 (file)
--- a/youtube_dl/extractor/tumblr.py
+++ b/youtube_dl/extractor/tumblr.py
@@ -4,10 +4,11 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from ..utils import int_or_none
  
  
  class TumblrIE(InfoExtractor):
-    _VALID_URL = r'http://(?P<blog_name>.*?)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])'
+    _VALID_URL = r'https?://(?P<blog_name>[^/?#&]+)\.tumblr\.com/(?:post|video)/(?P<id>[0-9]+)(?:$|[/?#])'
      _TESTS = [{
          'url': 'http://tatianamaslanydaily.tumblr.com/post/54196191430/orphan-black-dvd-extra-behind-the-scenes',
          'md5': '479bb068e5b16462f5176a6828829767',
@@ -28,6 +29,19 @@ class TumblrIE(InfoExtractor):
              'description': 'md5:dba62ac8639482759c8eb10ce474586a',
              'thumbnail': 're:http://.*\.jpg',
          }
+    }, {
+        'url': 'http://hdvideotest.tumblr.com/post/130323439814/test-description-for-my-hd-video',
+        'md5': '7ae503065ad150122dc3089f8cf1546c',
+        'info_dict': {
+            'id': '130323439814',
+            'ext': 'mp4',
+            'title': 'HD Video Testing \u2014 Test description for my HD video',
+            'description': 'md5:97cc3ab5fcd27ee4af6356701541319c',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+        'params': {
+            'format': 'hd',
+        },
      }, {
          'url': 'http://naked-yogi.tumblr.com/post/118312946248/naked-smoking-stretching',
          'md5': 'de07e5211d60d4f3a2c3df757ea9f6ab',
@@ -37,6 +51,9 @@ class TumblrIE(InfoExtractor):
              'title': 'naked smoking & stretching',
              'upload_date': '20150506',
              'timestamp': 1430931613,
+            'age_limit': 18,
+            'uploader_id': '1638622',
+            'uploader': 'naked-yogi',
          },
          'add_ie': ['Vidme'],
      }, {
@@ -50,6 +67,34 @@ class TumblrIE(InfoExtractor):
              'uploader_id': 'user32021558',
          },
          'add_ie': ['Vimeo'],
+    }, {
+        'url': 'http://sutiblr.tumblr.com/post/139638707273',
+        'md5': '2dd184b3669e049ba40563a7d423f95c',
+        'info_dict': {
+            'id': 'ir7qBEIKqvq',
+            'ext': 'mp4',
+            'title': 'Vine by sutiblr',
+            'alt_title': 'Vine by sutiblr',
+            'uploader': 'sutiblr',
+            'uploader_id': '1198993975374495744',
+            'upload_date': '20160220',
+            'like_count': int,
+            'comment_count': int,
+            'repost_count': int,
+        },
+        'add_ie': ['Vine'],
+    }, {
+        'url': 'http://vitasidorkina.tumblr.com/post/134652425014/joskriver-victoriassecret-invisibility-or',
+        'md5': '01c12ceb82cbf6b2fe0703aa56b3ad72',
+        'info_dict': {
+            'id': '-7LnUPGlSo',
+            'ext': 'mp4',
+            'title': 'Video by victoriassecret',
+            'description': 'Invisibility or flight…which superpower would YOU choose? #VSFashionShow #ThisOrThat',
+            'uploader_id': 'victoriassecret',
+            'thumbnail': 're:^https?://.*\.jpg'
+        },
+        'add_ie': ['Instagram'],
      }]
  
      def _real_extract(self, url):
@@ -66,10 +111,38 @@ class TumblrIE(InfoExtractor):
          if iframe_url is None:
              return self.url_result(urlh.geturl(), 'Generic')
  
-        iframe = self._download_webpage(iframe_url, video_id,
-                                        'Downloading iframe page')
-        video_url = self._search_regex(r'<source src="([^"]+)"',
-                                       iframe, 'video url')
+        iframe = self._download_webpage(iframe_url, video_id, 'Downloading iframe page')
+
+        duration = None
+        sources = []
+
+        sd_url = self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?)\1', iframe,
+            'sd video url', default=None, group='url')
+        if sd_url:
+            sources.append((sd_url, 'sd'))
+
+        options = self._parse_json(
+            self._search_regex(
+                r'data-crt-options=(["\'])(?P<options>.+?)\1', iframe,
+                'hd video url', default='', group='options'),
+            video_id, fatal=False)
+        if options:
+            duration = int_or_none(options.get('duration'))
+            hd_url = options.get('hdUrl')
+            if hd_url:
+                sources.append((hd_url, 'hd'))
+
+        formats = [{
+            'url': video_url,
+            'ext': 'mp4',
+            'format_id': format_id,
+            'height': int_or_none(self._search_regex(
+                r'/(\d{3,4})$', video_url, 'height', default=None)),
+            'quality': quality,
+        } for quality, (video_url, format_id) in enumerate(sources)]
+
+        self._sort_formats(formats)
  
          # The only place where you can get a title, it's not complete,
          # but searching in other places doesn't work for all videos
@@ -79,9 +152,9 @@ class TumblrIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'url': video_url,
-            'ext': 'mp4',
              'title': video_title,
              'description': self._og_search_description(webpage, default=None),
              'thumbnail': self._og_search_thumbnail(webpage, default=None),
+            'duration': duration,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/tunein.py b/youtube_dl/extractor/tunein.py

index b6b1f2568f23a6ea9fe8e12c86deb6b30d44a809..ae4cfaec29b493c3b8b8e11705629901a07a2bf2 100644 (file)
--- a/youtube_dl/extractor/tunein.py
+++ b/youtube_dl/extractor/tunein.py
@@ -1,77 +1,35 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
  import re
  
  from .common import InfoExtractor
  from ..utils import ExtractorError
+from ..compat import compat_urlparse
  
  
-class TuneInIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?
-    (?:
-        tunein\.com/
-        (?:
-            radio/.*?-s|
-            station/.*?StationId\=
-        )(?P<id>[0-9]+)
-        |tun\.in/(?P<redirect_id>[A-Za-z0-9]+)
-    )
-    '''
-    _API_URL_TEMPLATE = 'http://tunein.com/tuner/tune/?stationId={0:}&tuneType=Station'
-
-    _INFO_DICT = {
-        'id': '34682',
-        'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
-        'ext': 'aac',
-        'thumbnail': 're:^https?://.*\.png$',
-        'location': 'Tacoma, WA',
-    }
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
-            'info_dict': _INFO_DICT,
-            'params': {
-                'skip_download': True,  # live stream
-            },
-        },
-        {  # test redirection
-            'url': 'http://tun.in/ser7s',
-            'info_dict': _INFO_DICT,
-            'params': {
-                'skip_download': True,  # live stream
-            },
-        },
-    ]
+class TuneInBaseIE(InfoExtractor):
+    _API_BASE_URL = 'http://tunein.com/tuner/tune/'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        redirect_id = mobj.group('redirect_id')
-        if redirect_id:
-            # The server doesn't support HEAD requests
-            urlh = self._request_webpage(
-                url, redirect_id, note='Downloading redirect page')
-            url = urlh.geturl()
-            self.to_screen('Following redirect: %s' % url)
-            mobj = re.match(self._VALID_URL, url)
-        station_id = mobj.group('id')
-
-        station_info = self._download_json(
-            self._API_URL_TEMPLATE.format(station_id),
-            station_id, note='Downloading station JSON')
-
-        title = station_info['Title']
-        thumbnail = station_info.get('Logo')
-        location = station_info.get('Location')
-        streams_url = station_info.get('StreamUrl')
+        content_id = self._match_id(url)
+
+        content_info = self._download_json(
+            self._API_BASE_URL + self._API_URL_QUERY % content_id,
+            content_id, note='Downloading JSON metadata')
+
+        title = content_info['Title']
+        thumbnail = content_info.get('Logo')
+        location = content_info.get('Location')
+        streams_url = content_info.get('StreamUrl')
          if not streams_url:
-            raise ExtractorError('No downloadable streams found',
-                                 expected=True)
-        stream_data = self._download_webpage(
-            streams_url, station_id, note='Downloading stream data')
-        streams = json.loads(self._search_regex(
-            r'\((.*)\);', stream_data, 'stream info'))['Streams']
+            raise ExtractorError('No downloadable streams found', expected=True)
+        if not streams_url.startswith('http://'):
+            streams_url = compat_urlparse.urljoin(url, streams_url)
+
+        streams = self._download_json(
+            streams_url, content_id, note='Downloading stream data',
+            transform_source=lambda s: re.sub(r'^\s*\((.*)\);\s*$', r'\1', s))['Streams']
  
          is_live = None
          formats = []
@@ -97,10 +55,122 @@ class TuneInIE(InfoExtractor):
          self._sort_formats(formats)
  
          return {
-            'id': station_id,
+            'id': content_id,
              'title': title,
              'formats': formats,
              'thumbnail': thumbnail,
              'location': location,
              'is_live': is_live,
          }
+
+
+class TuneInClipIE(TuneInBaseIE):
+    IE_NAME = 'tunein:clip'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/station/.*?audioClipId\=(?P<id>\d+)'
+    _API_URL_QUERY = '?tuneType=AudioClip&audioclipId=%s'
+
+    _TESTS = [
+        {
+            'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816',
+            'md5': '99f00d772db70efc804385c6b47f4e77',
+            'info_dict': {
+                'id': '816',
+                'title': '32m',
+                'ext': 'mp3',
+            },
+        },
+    ]
+
+
+class TuneInStationIE(TuneInBaseIE):
+    IE_NAME = 'tunein:station'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId\=)(?P<id>\d+)'
+    _API_URL_QUERY = '?tuneType=Station&stationId=%s'
+
+    @classmethod
+    def suitable(cls, url):
+        return False if TuneInClipIE.suitable(url) else super(TuneInStationIE, cls).suitable(url)
+
+    _TESTS = [
+        {
+            'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
+            'info_dict': {
+                'id': '34682',
+                'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
+                'ext': 'mp3',
+                'location': 'Tacoma, WA',
+            },
+            'params': {
+                'skip_download': True,  # live stream
+            },
+        },
+    ]
+
+
+class TuneInProgramIE(TuneInBaseIE):
+    IE_NAME = 'tunein:program'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId\=)(?P<id>\d+)'
+    _API_URL_QUERY = '?tuneType=Program&programId=%s'
+
+    _TESTS = [
+        {
+            'url': 'http://tunein.com/radio/Jazz-24-p2506/',
+            'info_dict': {
+                'id': '2506',
+                'title': 'Jazz 24 on 91.3 WUKY-HD3',
+                'ext': 'mp3',
+                'location': 'Lexington, KY',
+            },
+            'params': {
+                'skip_download': True,  # live stream
+            },
+        },
+    ]
+
+
+class TuneInTopicIE(TuneInBaseIE):
+    IE_NAME = 'tunein:topic'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/topic/.*?TopicId\=(?P<id>\d+)'
+    _API_URL_QUERY = '?tuneType=Topic&topicId=%s'
+
+    _TESTS = [
+        {
+            'url': 'http://tunein.com/topic/?TopicId=101830576',
+            'md5': 'c31a39e6f988d188252eae7af0ef09c9',
+            'info_dict': {
+                'id': '101830576',
+                'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)',
+                'ext': 'mp3',
+                'location': 'Belgium',
+            },
+        },
+    ]
+
+
+class TuneInShortenerIE(InfoExtractor):
+    IE_NAME = 'tunein:shortener'
+    IE_DESC = False  # Do not list
+    _VALID_URL = r'https?://tun\.in/(?P<id>[A-Za-z0-9]+)'
+
+    _TEST = {
+        # test redirection
+        'url': 'http://tun.in/ser7s',
+        'info_dict': {
+            'id': '34682',
+            'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
+            'ext': 'mp3',
+            'location': 'Tacoma, WA',
+        },
+        'params': {
+            'skip_download': True,  # live stream
+        },
+    }
+
+    def _real_extract(self, url):
+        redirect_id = self._match_id(url)
+        # The server doesn't support HEAD requests
+        urlh = self._request_webpage(
+            url, redirect_id, note='Downloading redirect page')
+        url = urlh.geturl()
+        self.to_screen('Following redirect: %s' % url)
+        return self.url_result(url)
diff --git a/youtube_dl/extractor/tutv.py b/youtube_dl/extractor/tutv.py

index fad720b681e125ac495b26ba3870dbd65340e3ce..822372ea14e52c08d0e95f81a4618c122b2232d8 100644 (file)
--- a/youtube_dl/extractor/tutv.py
+++ b/youtube_dl/extractor/tutv.py
@@ -10,10 +10,10 @@ class TutvIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tu\.tv/videos/(?P<id>[^/?]+)'
      _TEST = {
          'url': 'http://tu.tv/videos/robots-futbolistas',
-        'md5': '627c7c124ac2a9b5ab6addb94e0e65f7',
+        'md5': '0cd9e28ad270488911b0d2a72323395d',
          'info_dict': {
              'id': '2973058',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Robots futbolistas',
          },
      }
diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py

index fa338b936de7d3fef15cf24bccc05255bc928ee6..86bb7915db170ecf4c75fdc2b960160a65daa1c0 100644 (file)
--- a/youtube_dl/extractor/tv2.py
+++ b/youtube_dl/extractor/tv2.py
@@ -14,21 +14,24 @@ from ..utils import (
  
  
  class TV2IE(InfoExtractor):
-    _VALID_URL = 'http://(?:www\.)?tv2\.no/v/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tv2\.no/v/(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.tv2.no/v/916509/',
-        'md5': '9cb9e3410b18b515d71892f27856e9b1',
          'info_dict': {
              'id': '916509',
-            'ext': 'flv',
-            'title': 'Se Gryttens hyllest av Steven Gerrard',
+            'ext': 'mp4',
+            'title': 'Se Frode Gryttens hyllest av Steven Gerrard',
              'description': 'TV 2 Sportens huspoet tar avskjed med Liverpools kaptein Steven Gerrard.',
              'timestamp': 1431715610,
              'upload_date': '20150515',
              'duration': 156.967,
              'view_count': int,
              'categories': list,
-        }
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }
  
      def _real_extract(self, url):
@@ -97,7 +100,7 @@ class TV2IE(InfoExtractor):
  
  
  class TV2ArticleIE(InfoExtractor):
-    _VALID_URL = 'http://(?:www\.)?tv2\.no/(?:a|\d{4}/\d{2}/\d{2}(/[^/]+)+)/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tv2\.no/(?:a|\d{4}/\d{2}/\d{2}(/[^/]+)+)/(?P<id>\d+)'
      _TESTS = [{
          'url': 'http://www.tv2.no/2015/05/16/nyheter/alesund/krim/pingvin/6930542',
          'info_dict': {
diff --git a/youtube_dl/extractor/tv3.py b/youtube_dl/extractor/tv3.py

new file mode 100644 (file)

index 0000000..3867ec9
--- /dev/null
+++ b/youtube_dl/extractor/tv3.py
@@ -0,0 +1,34 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class TV3IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tv3\.co\.nz/(?P<id>[^/]+)/tabid/\d+/articleID/\d+/MCat/\d+/Default\.aspx'
+    _TEST = {
+        'url': 'http://www.tv3.co.nz/MOTORSPORT-SRS-SsangYong-Hampton-Downs-Round-3/tabid/3692/articleID/121615/MCat/2915/Default.aspx',
+        'info_dict': {
+            'id': '4659127992001',
+            'ext': 'mp4',
+            'title': 'CRC Motorsport: SRS SsangYong Hampton Downs Round 3 - S2015 Ep3',
+            'description': 'SsangYong Racing Series returns for Round 3 with drivers from New Zealand and Australia taking to the grid at Hampton Downs raceway.',
+            'uploader_id': '3812193411001',
+            'upload_date': '20151213',
+            'timestamp': 1449975272,
+        },
+        'expected_warnings': [
+            'Failed to download MPD manifest'
+        ],
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/3812193411001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        brightcove_id = self._search_regex(r'<param\s*name="@videoPlayer"\s*value="(\d+)"', webpage, 'brightcove id')
+        return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py

index 1c4b6d6353292dddbf3d67a42a24667ec3bb81d7..343edf20663d172a4e071a77f228fc4d9962003d 100644 (file)
--- a/youtube_dl/extractor/tv4.py
+++ b/youtube_dl/extractor/tv4.py
@@ -67,7 +67,7 @@ class TV4IE(InfoExtractor):
          info = self._download_json(
              'http://www.tv4play.se/player/assets/%s.json' % video_id, video_id, 'Downloading video info JSON')
  
-        # If is_geo_restricted is true, it doesn't neceserally mean we can't download it
+        # If is_geo_restricted is true, it doesn't necessarily mean we can't download it
          if info['is_geo_restricted']:
              self.report_warning('This content might not be available in your country due to licensing restrictions.')
          if info['requires_subscription']:
diff --git a/youtube_dl/extractor/tvc.py b/youtube_dl/extractor/tvc.py

index 3a4f393fcf6d79f3f42970db7aab853d5efedf84..4065354ddde2c63698908dfac81dc98cac77e79d 100644 (file)
--- a/youtube_dl/extractor/tvc.py
+++ b/youtube_dl/extractor/tvc.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class TVCIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?tvc\.ru/video/iframe/id/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tvc\.ru/video/iframe/id/(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.tvc.ru/video/iframe/id/74622/isPlay/false/id_stat/channel/?acc_video_id=/channel/brand/id/17/show/episodes/episode_id/39702',
          'md5': 'bbc5ff531d1e90e856f60fc4b3afd708',
@@ -64,7 +64,7 @@ class TVCIE(InfoExtractor):
  
  
  class TVCArticleIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?tvc\.ru/(?!video/iframe/id/)(?P<id>[^?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?tvc\.ru/(?!video/iframe/id/)(?P<id>[^?#]+)'
      _TESTS = [{
          'url': 'http://www.tvc.ru/channel/brand/id/29/show/episodes/episode_id/39702/',
          'info_dict': {
diff --git a/youtube_dl/extractor/tvigle.py b/youtube_dl/extractor/tvigle.py

index dc3a8334a6b335143dff417d805a26df412d8783..ead4c00c79bb453585b4ba18c67f7535bcc69254 100644 (file)
--- a/youtube_dl/extractor/tvigle.py
+++ b/youtube_dl/extractor/tvigle.py
@@ -58,7 +58,9 @@ class TvigleIE(InfoExtractor):
          if not video_id:
              webpage = self._download_webpage(url, display_id)
              video_id = self._html_search_regex(
-                r'class="video-preview current_playing" id="(\d+)">',
+                (r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
+                 r'var\s+cloudId\s*=\s*["\'](\d+)',
+                 r'class="video-preview current_playing" id="(\d+)"'),
                  webpage, 'video id')
  
          video_data = self._download_json(
@@ -81,10 +83,10 @@ class TvigleIE(InfoExtractor):
  
          formats = []
          for vcodec, fmts in item['videos'].items():
+            if vcodec == 'hls':
+                continue
              for format_id, video_url in fmts.items():
                  if format_id == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        video_url, video_id, 'mp4', m3u8_id=vcodec))
                      continue
                  height = self._search_regex(
                      r'^(\d+)[pP]$', format_id, 'height', default=None)
diff --git a/youtube_dl/extractor/tvland.py b/youtube_dl/extractor/tvland.py

new file mode 100644 (file)

index 0000000..b73279d
--- /dev/null
+++ b/youtube_dl/extractor/tvland.py
@@ -0,0 +1,64 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .mtv import MTVServicesInfoExtractor
+
+
+class TVLandIE(MTVServicesInfoExtractor):
+    IE_NAME = 'tvland.com'
+    _VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|episodes)/(?P<id>[^/?#.]+)'
+    _FEED_URL = 'http://www.tvland.com/feeds/mrss/'
+    _TESTS = [{
+        'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048',
+        'playlist': [
+            {
+                'md5': '227e9723b9669c05bf51098b10287aa7',
+                'info_dict': {
+                    'id': 'bcbd3a83-3aca-4dca-809b-f78a87dcccdd',
+                    'ext': 'mp4',
+                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 1 of 5',
+                }
+            },
+            {
+                'md5': '9fa2b764ec0e8194fb3ebb01a83df88b',
+                'info_dict': {
+                    'id': 'f4279548-6e13-40dd-92e8-860d27289197',
+                    'ext': 'mp4',
+                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 2 of 5',
+                }
+            },
+            {
+                'md5': 'fde4c3bccd7cc7e3576b338734153cec',
+                'info_dict': {
+                    'id': '664e4a38-53ef-4115-9bc9-d0f789ec6334',
+                    'ext': 'mp4',
+                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 3 of 5',
+                }
+            },
+            {
+                'md5': '247f6780cda6891f2e49b8ae2b10e017',
+                'info_dict': {
+                    'id': '9146ecf5-b15a-4d78-879c-6679b77f4960',
+                    'ext': 'mp4',
+                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 4 of 5',
+                }
+            },
+            {
+                'md5': 'fd269f33256e47bad5eb6c40de089ff6',
+                'info_dict': {
+                    'id': '04334a2e-9a47-4214-a8c2-ae5792e2fab7',
+                    'ext': 'mp4',
+                    'title': 'Everybody Loves Raymond|Everybody Loves Raymond 048 HD, Part 5 of 5',
+                }
+            }
+        ],
+    }, {
+        'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies',
+        'md5': 'e2c6389401cf485df26c79c247b08713',
+        'info_dict': {
+            'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88',
+            'ext': 'mp4',
+            'title': 'Younger|Younger: Hilary Duff - Little Lies',
+            'description': 'md5:7d192f56ca8d958645c83f0de8ef0269'
+        },
+    }]
diff --git a/youtube_dl/extractor/tvplay.py b/youtube_dl/extractor/tvplay.py

index 79863e781fd41101c76659ab3b43a85433d25665..df70a6b230a4217261f3f69a3e1213a88f07afbf 100644 (file)
--- a/youtube_dl/extractor/tvplay.py
+++ b/youtube_dl/extractor/tvplay.py
@@ -13,7 +13,7 @@ from ..utils import (
  
  class TVPlayIE(InfoExtractor):
      IE_DESC = 'TV3Play and related services'
-    _VALID_URL = r'''(?x)http://(?:www\.)?
+    _VALID_URL = r'''(?x)https?://(?:www\.)?
          (?:tvplay\.lv/parraides|
             tv3play\.lt/programos|
             play\.tv3\.lt/programos|
@@ -104,6 +104,7 @@ class TVPlayIE(InfoExtractor):
                  'duration': 1492,
                  'timestamp': 1330522854,
                  'upload_date': '20120229',
+                'age_limit': 18,
              },
              'params': {
                  # rtmp download
diff --git a/youtube_dl/extractor/tweakers.py b/youtube_dl/extractor/tweakers.py

index c80ec15cf1170d2e307dfc851cd5a91c4126afc2..f3198fb85adb29b8081b9735899dd574cb504c67 100644 (file)
--- a/youtube_dl/extractor/tweakers.py
+++ b/youtube_dl/extractor/tweakers.py
@@ -1,19 +1,13 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    xpath_text,
-    xpath_with_ns,
-    int_or_none,
-    float_or_none,
-)
  
  
  class TweakersIE(InfoExtractor):
      _VALID_URL = r'https?://tweakers\.net/video/(?P<id>\d+)'
      _TEST = {
          'url': 'https://tweakers.net/video/9926/new-nintendo-3ds-xl-op-alle-fronten-beter.html',
-        'md5': '1b5afa817403bb5baa08359dca31e6df',
+        'md5': '3147e4ddad366f97476a93863e4557c8',
          'info_dict': {
              'id': '9926',
              'ext': 'mp4',
@@ -25,41 +19,7 @@ class TweakersIE(InfoExtractor):
      }
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        playlist = self._download_xml(
-            'https://tweakers.net/video/s1playlist/%s/playlist.xspf' % video_id,
-            video_id)
-
-        NS_MAP = {
-            'xspf': 'http://xspf.org/ns/0/',
-            's1': 'http://static.streamone.nl/player/ns/0',
-        }
-
-        track = playlist.find(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP))
-
-        title = xpath_text(
-            track, xpath_with_ns('./xspf:title', NS_MAP), 'title')
-        description = xpath_text(
-            track, xpath_with_ns('./xspf:annotation', NS_MAP), 'description')
-        thumbnail = xpath_text(
-            track, xpath_with_ns('./xspf:image', NS_MAP), 'thumbnail')
-        duration = float_or_none(
-            xpath_text(track, xpath_with_ns('./xspf:duration', NS_MAP), 'duration'),
-            1000)
-
-        formats = [{
-            'url': location.text,
-            'format_id': location.get(xpath_with_ns('s1:label', NS_MAP)),
-            'width': int_or_none(location.get(xpath_with_ns('s1:width', NS_MAP))),
-            'height': int_or_none(location.get(xpath_with_ns('s1:height', NS_MAP))),
-        } for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP))]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-        }
+        playlist_id = self._match_id(url)
+        entries = self._extract_xspf_playlist(
+            'https://tweakers.net/video/s1playlist/%s/playlist.xspf' % playlist_id, playlist_id)
+        return self.playlist_result(entries, playlist_id)
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index c1ee1decc433627ffa52196d44f7563b46d309cc..e03e2dbaa42f23a5107a50c67e7c12d9f378600b 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -5,6 +5,8 @@ from .common import InfoExtractor
  from ..utils import (
      parse_iso8601,
      int_or_none,
+    xpath_attr,
+    xpath_element,
  )
  
  
@@ -15,7 +17,7 @@ class TwentyFourVideoIE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.24video.net/video/view/1044982',
-            'md5': 'd041af8b5b4246ea466226a0d6693345',
+            'md5': 'e09fc0901d9eaeedac872f154931deeb',
              'info_dict': {
                  'id': '1044982',
                  'ext': 'mp4',
@@ -64,33 +66,24 @@ class TwentyFourVideoIE(InfoExtractor):
              r'<div class="comments-title" id="comments-count">(\d+) комментари',
              webpage, 'comment count', fatal=False))
  
-        formats = []
+        # Sets some cookies
+        self._download_xml(
+            r'http://www.24video.net/video/xml/%s?mode=init' % video_id,
+            video_id, 'Downloading init XML')
  
-        pc_video = self._download_xml(
+        video_xml = self._download_xml(
              'http://www.24video.net/video/xml/%s?mode=play' % video_id,
-            video_id, 'Downloading PC video URL').find('.//video')
+            video_id, 'Downloading video XML')
  
-        formats.append({
-            'url': pc_video.attrib['url'],
-            'format_id': 'pc',
-            'quality': 1,
-        })
+        video = xpath_element(video_xml, './/video', 'video', fatal=True)
  
-        like_count = int_or_none(pc_video.get('ratingPlus'))
-        dislike_count = int_or_none(pc_video.get('ratingMinus'))
-        age_limit = 18 if pc_video.get('adult') == 'true' else 0
+        formats = [{
+            'url': xpath_attr(video, '', 'url', 'video URL', fatal=True),
+        }]
  
-        mobile_video = self._download_xml(
-            'http://www.24video.net/video/xml/%s' % video_id,
-            video_id, 'Downloading mobile video URL').find('.//video')
-
-        formats.append({
-            'url': mobile_video.attrib['url'],
-            'format_id': 'mobile',
-            'quality': 0,
-        })
-
-        self._sort_formats(formats)
+        like_count = int_or_none(video.get('ratingPlus'))
+        dislike_count = int_or_none(video.get('ratingMinus'))
+        age_limit = 18 if video.get('adult') == 'true' else 0
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/twentymin.py b/youtube_dl/extractor/twentymin.py

new file mode 100644 (file)

index 0000000..ca7d953
--- /dev/null
+++ b/youtube_dl/extractor/twentymin.py
@@ -0,0 +1,73 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import remove_end
+
+
+class TwentyMinutenIE(InfoExtractor):
+    IE_NAME = '20min'
+    _VALID_URL = r'https?://(?:www\.)?20min\.ch/(?:videotv/*\?.*\bvid=(?P<id>\d+)|(?:[^/]+/)*(?P<display_id>[^/#?]+))'
+    _TESTS = [{
+        # regular video
+        'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2',
+        'md5': 'b52d6bc6ea6398e6a38f12cfd418149c',
+        'info_dict': {
+            'id': '469148',
+            'ext': 'flv',
+            'title': '85 000 Franken für 15 perfekte Minuten',
+            'description': 'Was die Besucher vom Silvesterzauber erwarten können. (Video: Alice Grosjean/Murat Temel)',
+            'thumbnail': 'http://thumbnails.20min-tv.ch/server063/469148/frame-72-469148.jpg'
+        }
+    }, {
+        # news article with video
+        'url': 'http://www.20min.ch/schweiz/news/story/-Wir-muessen-mutig-nach-vorne-schauen--22050469',
+        'md5': 'cd4cbb99b94130cff423e967cd275e5e',
+        'info_dict': {
+            'id': '469408',
+            'display_id': '-Wir-muessen-mutig-nach-vorne-schauen--22050469',
+            'ext': 'flv',
+            'title': '«Wir müssen mutig nach vorne schauen»',
+            'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
+            'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
+        }
+    }, {
+        'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.20min.ch/ro/sortir/cinema/story/Grandir-au-bahut--c-est-dur-18927411',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id') or video_id
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._html_search_regex(
+            r'<h1>.*?<span>(.+?)</span></h1>',
+            webpage, 'title', default=None)
+        if not title:
+            title = remove_end(re.sub(
+                r'^20 [Mm]inuten.*? -', '', self._og_search_title(webpage)), ' - News')
+
+        if not video_id:
+            video_id = self._search_regex(
+                r'"file\d?"\s*,\s*\"(\d+)', webpage, 'video id')
+
+        description = self._html_search_meta(
+            'description', webpage, 'description')
+        thumbnail = self._og_search_thumbnail(webpage)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'url': 'http://speed.20min-tv.ch/%sm.flv' % video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+        }
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 73ce335b7f0a5b5790f8dd65e3ac170e9791b7d8..36ee1adff2288570fc39936640bacd3abafe9ed2 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -7,13 +7,20 @@ import random
  
  from .common import InfoExtractor
  from ..compat import (
+    compat_parse_qs,
      compat_str,
-    compat_urllib_parse,
-    compat_urllib_request,
+    compat_urllib_parse_urlencode,
+    compat_urllib_parse_urlparse,
+    compat_urlparse,
  )
  from ..utils import (
      ExtractorError,
+    int_or_none,
+    orderedSet,
+    parse_duration,
      parse_iso8601,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
@@ -22,8 +29,7 @@ class TwitchBaseIE(InfoExtractor):
  
      _API_BASE = 'https://api.twitch.tv'
      _USHER_BASE = 'http://usher.twitch.tv'
-    _LOGIN_URL = 'https://secure.twitch.tv/login'
-    _LOGIN_POST_URL = 'https://passport.twitch.tv/authorize'
+    _LOGIN_URL = 'http://www.twitch.tv/login'
      _NETRC_MACHINE = 'twitch'
  
      def _handle_error(self, response):
@@ -43,7 +49,7 @@ class TwitchBaseIE(InfoExtractor):
          for cookie in self._downloader.cookiejar:
              if cookie.name == 'api_token':
                  headers['Twitch-Api-Token'] = cookie.value
-        request = compat_urllib_request.Request(url, headers=headers)
+        request = sanitized_Request(url, headers=headers)
          response = super(TwitchBaseIE, self)._download_json(request, video_id, note)
          self._handle_error(response)
          return response
@@ -56,19 +62,28 @@ class TwitchBaseIE(InfoExtractor):
          if username is None:
              return
  
-        login_page = self._download_webpage(
+        login_page, handle = self._download_webpage_handle(
              self._LOGIN_URL, None, 'Downloading login page')
  
          login_form = self._hidden_inputs(login_page)
  
          login_form.update({
-            'login': username.encode('utf-8'),
-            'password': password.encode('utf-8'),
+            'username': username,
+            'password': password,
          })
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_POST_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
-        request.add_header('Referer', self._LOGIN_URL)
+        redirect_url = handle.geturl()
+
+        post_url = self._search_regex(
+            r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
+            'post url', default=redirect_url, group='url')
+
+        if not post_url.startswith('http'):
+            post_url = compat_urlparse.urljoin(redirect_url, post_url)
+
+        request = sanitized_Request(
+            post_url, urlencode_postdata(login_form))
+        request.add_header('Referer', redirect_url)
          response = self._download_webpage(
              request, None, 'Logging in as %s' % username)
  
@@ -129,14 +144,14 @@ class TwitchItemBaseIE(TwitchBaseIE):
      def _extract_info(self, info):
          return {
              'id': info['_id'],
-            'title': info['title'],
-            'description': info['description'],
-            'duration': info['length'],
-            'thumbnail': info['preview'],
-            'uploader': info['channel']['display_name'],
-            'uploader_id': info['channel']['name'],
-            'timestamp': parse_iso8601(info['recorded_at']),
-            'view_count': info['views'],
+            'title': info.get('title') or 'Untitled Broadcast',
+            'description': info.get('description'),
+            'duration': int_or_none(info.get('length')),
+            'thumbnail': info.get('preview'),
+            'uploader': info.get('channel', {}).get('display_name'),
+            'uploader_id': info.get('channel', {}).get('name'),
+            'timestamp': parse_iso8601(info.get('recorded_at')),
+            'view_count': int_or_none(info.get('views')),
          }
  
      def _real_extract(self, url):
@@ -184,8 +199,8 @@ class TwitchVodIE(TwitchItemBaseIE):
      _ITEM_TYPE = 'vod'
      _ITEM_SHORTCUT = 'v'
  
-    _TEST = {
-        'url': 'http://www.twitch.tv/riotgames/v/6528877',
+    _TESTS = [{
+        'url': 'http://www.twitch.tv/riotgames/v/6528877?t=5m10s',
          'info_dict': {
              'id': 'v6528877',
              'ext': 'mp4',
@@ -197,25 +212,62 @@ class TwitchVodIE(TwitchItemBaseIE):
              'uploader': 'Riot Games',
              'uploader_id': 'riotgames',
              'view_count': int,
+            'start_time': 310,
          },
          'params': {
              # m3u8 download
              'skip_download': True,
          },
-    }
+    }, {
+        # Untitled broadcast (title is None)
+        'url': 'http://www.twitch.tv/belkao_o/v/11230755',
+        'info_dict': {
+            'id': 'v11230755',
+            'ext': 'mp4',
+            'title': 'Untitled Broadcast',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'duration': 1638,
+            'timestamp': 1439746708,
+            'upload_date': '20150816',
+            'uploader': 'BelkAO_o',
+            'uploader_id': 'belkao_o',
+            'view_count': int,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          item_id = self._match_id(url)
+
          info = self._download_info(self._ITEM_SHORTCUT, item_id)
          access_token = self._download_json(
              '%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
              'Downloading %s access token' % self._ITEM_TYPE)
+
          formats = self._extract_m3u8_formats(
-            '%s/vod/%s?nauth=%s&nauthsig=%s&allow_source=true'
-            % (self._USHER_BASE, item_id, access_token['token'], access_token['sig']),
+            '%s/vod/%s?%s' % (
+                self._USHER_BASE, item_id,
+                compat_urllib_parse_urlencode({
+                    'allow_source': 'true',
+                    'allow_audio_only': 'true',
+                    'allow_spectre': 'true',
+                    'player': 'twitchweb',
+                    'nauth': access_token['token'],
+                    'nauthsig': access_token['sig'],
+                })),
              item_id, 'mp4')
+
          self._prefer_source(formats)
          info['formats'] = formats
+
+        parsed_url = compat_urllib_parse_urlparse(url)
+        query = compat_parse_qs(parsed_url.query)
+        if 't' in query:
+            info['start_time'] = parse_duration(query['t'][0])
+
          return info
  
  
@@ -231,17 +283,37 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
          entries = []
          offset = 0
          limit = self._PAGE_LIMIT
+        broken_paging_detected = False
+        counter_override = None
          for counter in itertools.count(1):
              response = self._download_json(
                  self._PLAYLIST_URL % (channel_id, offset, limit),
-                channel_id, 'Downloading %s videos JSON page %d' % (self._PLAYLIST_TYPE, counter))
+                channel_id,
+                'Downloading %s videos JSON page %s'
+                % (self._PLAYLIST_TYPE, counter_override or counter))
              page_entries = self._extract_playlist_page(response)
              if not page_entries:
                  break
+            total = int_or_none(response.get('_total'))
+            # Since the beginning of March 2016 twitch's paging mechanism
+            # is completely broken on the twitch side. It simply ignores
+            # a limit and returns the whole offset number of videos.
+            # Working around by just requesting all videos at once.
+            # Upd: pagination bug was fixed by twitch on 15.03.2016.
+            if not broken_paging_detected and total and len(page_entries) > limit:
+                self.report_warning(
+                    'Twitch pagination is broken on twitch side, requesting all videos at once',
+                    channel_id)
+                broken_paging_detected = True
+                offset = total
+                counter_override = '(all at once)'
+                continue
              entries.extend(page_entries)
+            if broken_paging_detected or total and len(page_entries) >= total:
+                break
              offset += limit
          return self.playlist_result(
-            [self.url_result(entry) for entry in set(entries)],
+            [self.url_result(entry) for entry in orderedSet(entries)],
              channel_id, channel_name)
  
      def _extract_playlist_page(self, response):
@@ -361,6 +433,7 @@ class TwitchStreamIE(TwitchBaseIE):
  
          query = {
              'allow_source': 'true',
+            'allow_audio_only': 'true',
              'p': random.randint(1000000, 10000000),
              'player': 'twitchweb',
              'segment_preference': '4',
@@ -369,7 +442,7 @@ class TwitchStreamIE(TwitchBaseIE):
          }
          formats = self._extract_m3u8_formats(
              '%s/api/channel/hls/%s.m3u8?%s'
-            % (self._USHER_BASE, channel_id, compat_urllib_parse.urlencode(query)),
+            % (self._USHER_BASE, channel_id, compat_urllib_parse_urlencode(query)),
              channel_id, 'mp4')
          self._prefer_source(formats)
  
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 1aaa0630519fff52ed51f4964121552592ffbaa6..ea673054fdc7135a203cca8db00dc128344b0829 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -1,72 +1,386 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      float_or_none,
-    unescapeHTML,
+    xpath_text,
+    remove_end,
+    int_or_none,
+    ExtractorError,
  )
  
  
-class TwitterCardIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/cards/tfw/v1/(?P<id>\d+)'
-    _TEST = {
-        'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
-        'md5': 'a74f50b310c83170319ba16de6955192',
-        'info_dict': {
-            'id': '560070183650213889',
-            'ext': 'mp4',
-            'title': 'TwitterCard',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'duration': 30.033,
+class TwitterBaseIE(InfoExtractor):
+    def _get_vmap_video_url(self, vmap_url, video_id):
+        vmap_data = self._download_xml(vmap_url, video_id)
+        return xpath_text(vmap_data, './/MediaFile').strip()
+
+
+class TwitterCardIE(TwitterBaseIE):
+    IE_NAME = 'twitter:card'
+    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos/tweet)/(?P<id>\d+)'
+    _TESTS = [
+        {
+            'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
+            # MD5 checksums are different in different places
+            'info_dict': {
+                'id': '560070183650213889',
+                'ext': 'mp4',
+                'title': 'Twitter Card',
+                'thumbnail': 're:^https?://.*\.jpg$',
+                'duration': 30.033,
+            }
          },
-    }
+        {
+            'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
+            'md5': '7ee2a553b63d1bccba97fbed97d9e1c8',
+            'info_dict': {
+                'id': '623160978427936768',
+                'ext': 'mp4',
+                'title': 'Twitter Card',
+                'thumbnail': 're:^https?://.*\.jpg',
+                'duration': 80.155,
+            },
+        },
+        {
+            'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
+            'md5': 'd4724ffe6d2437886d004fa5de1043b3',
+            'info_dict': {
+                'id': 'dq4Oj5quskI',
+                'ext': 'mp4',
+                'title': 'Ubuntu 11.10 Overview',
+                'description': 'Take a quick peek at what\'s new and improved in Ubuntu 11.10.\n\nOnce installed take a look at 10 Things to Do After Installing: http://www.omgubuntu.co.uk/2011/10/10-things-to-do-after-installing-ubuntu-11-10/',
+                'upload_date': '20111013',
+                'uploader': 'OMG! Ubuntu!',
+                'uploader_id': 'omgubuntu',
+            },
+            'add_ie': ['Youtube'],
+        },
+        {
+            'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
+            'md5': 'ab2745d0b0ce53319a534fccaa986439',
+            'info_dict': {
+                'id': 'iBb2x00UVlv',
+                'ext': 'mp4',
+                'upload_date': '20151113',
+                'uploader_id': '1189339351084113920',
+                'uploader': 'ArsenalTerje',
+                'title': 'Vine by ArsenalTerje',
+            },
+            'add_ie': ['Vine'],
+        }, {
+            'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
+            'md5': '3846d0a07109b5ab622425449b59049d',
+            'info_dict': {
+                'id': '705235433198714880',
+                'ext': 'mp4',
+                'title': 'Twitter web player',
+                'thumbnail': 're:^https?://.*\.jpg',
+            },
+        },
+    ]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        # Different formats served for different User-Agents
-        USER_AGENTS = [
-            'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/20.0 (Chrome)',  # mp4
-            'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0',  # webm
-        ]
-
          config = None
          formats = []
-        for user_agent in USER_AGENTS:
-            request = compat_urllib_request.Request(url)
-            request.add_header('User-Agent', user_agent)
-            webpage = self._download_webpage(request, video_id)
+        duration = None
  
-            config = self._parse_json(
-                unescapeHTML(self._search_regex(
-                    r'data-player-config="([^"]+)"', webpage, 'data player config')),
-                video_id)
+        webpage = self._download_webpage(url, video_id)
  
-            video_url = config['playlist'][0]['source']
+        iframe_url = self._html_search_regex(
+            r'<iframe[^>]+src="((?:https?:)?//(?:www.youtube.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
+            webpage, 'video iframe', default=None)
+        if iframe_url:
+            return self.url_result(iframe_url)
  
-            f = {
-                'url': video_url,
-            }
+        config = self._parse_json(self._html_search_regex(
+            r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
+            video_id)
+
+        if config.get('source_type') == 'vine':
+            return self.url_result(config['player_url'], 'Vine')
  
+        def _search_dimensions_in_video_url(a_format, video_url):
              m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
              if m:
-                f.update({
+                a_format.update({
                      'width': int(m.group('width')),
                      'height': int(m.group('height')),
                  })
+
+        video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
+
+        if video_url:
+            f = {
+                'url': video_url,
+            }
+
+            _search_dimensions_in_video_url(f, video_url)
+
              formats.append(f)
+
+        vmap_url = config.get('vmapUrl') or config.get('vmap_url')
+        if vmap_url:
+            formats.append({
+                'url': self._get_vmap_video_url(vmap_url, video_id),
+            })
+
+        media_info = None
+
+        for entity in config.get('status', {}).get('entities', []):
+            if 'mediaInfo' in entity:
+                media_info = entity['mediaInfo']
+
+        if media_info:
+            for media_variant in media_info['variants']:
+                media_url = media_variant['url']
+                if media_url.endswith('.m3u8'):
+                    formats.extend(self._extract_m3u8_formats(media_url, video_id, ext='mp4', m3u8_id='hls'))
+                elif media_url.endswith('.mpd'):
+                    formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
+                else:
+                    vbr = int_or_none(media_variant.get('bitRate'), scale=1000)
+                    a_format = {
+                        'url': media_url,
+                        'format_id': 'http-%d' % vbr if vbr else 'http',
+                        'vbr': vbr,
+                    }
+                    # Reported bitRate may be zero
+                    if not a_format['vbr']:
+                        del a_format['vbr']
+
+                    _search_dimensions_in_video_url(a_format, media_url)
+
+                    formats.append(a_format)
+
+            duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
+
          self._sort_formats(formats)
  
-        thumbnail = config.get('posterImageUrl')
-        duration = float_or_none(config.get('duration'))
+        title = self._search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+        thumbnail = config.get('posterImageUrl') or config.get('image_src')
+        duration = float_or_none(config.get('duration')) or duration
  
          return {
              'id': video_id,
-            'title': 'TwitterCard',
+            'title': title,
              'thumbnail': thumbnail,
              'duration': duration,
              'formats': formats,
          }
+
+
+class TwitterIE(InfoExtractor):
+    IE_NAME = 'twitter'
+    _VALID_URL = r'https?://(?:www\.|m\.|mobile\.)?twitter\.com/(?P<user_id>[^/]+)/status/(?P<id>\d+)'
+    _TEMPLATE_URL = 'https://twitter.com/%s/status/%s'
+
+    _TESTS = [{
+        'url': 'https://twitter.com/freethenipple/status/643211948184596480',
+        'info_dict': {
+            'id': '643211948184596480',
+            'ext': 'mp4',
+            'title': 'FREE THE NIPPLE - FTN supporters on Hollywood Blvd today!',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'description': 'FREE THE NIPPLE on Twitter: "FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ"',
+            'uploader': 'FREE THE NIPPLE',
+            'uploader_id': 'freethenipple',
+        },
+        'params': {
+            'skip_download': True,  # requires ffmpeg
+        },
+    }, {
+        'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
+        'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
+        'info_dict': {
+            'id': '657991469417025536',
+            'ext': 'mp4',
+            'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai',
+            'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"',
+            'thumbnail': 're:^https?://.*\.png',
+            'uploader': 'Gifs',
+            'uploader_id': 'giphz',
+        },
+        'expected_warnings': ['height', 'width'],
+    }, {
+        'url': 'https://twitter.com/starwars/status/665052190608723968',
+        'md5': '39b7199856dee6cd4432e72c74bc69d4',
+        'info_dict': {
+            'id': '665052190608723968',
+            'ext': 'mp4',
+            'title': 'Star Wars - A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens.',
+            'description': 'Star Wars on Twitter: "A new beginning is coming December 18. Watch the official 60 second #TV spot for #StarWars: #TheForceAwakens."',
+            'uploader_id': 'starwars',
+            'uploader': 'Star Wars',
+        },
+    }, {
+        'url': 'https://twitter.com/BTNBrentYarina/status/705235433198714880',
+        'info_dict': {
+            'id': '705235433198714880',
+            'ext': 'mp4',
+            'title': 'Brent Yarina - Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight.',
+            'description': 'Brent Yarina on Twitter: "Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight."',
+            'uploader_id': 'BTNBrentYarina',
+            'uploader': 'Brent Yarina',
+        },
+        'params': {
+            # The same video as https://twitter.com/i/videos/tweet/705235433198714880
+            # Test case of TwitterCardIE
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://twitter.com/jaydingeer/status/700207533655363584',
+        'md5': '',
+        'info_dict': {
+            'id': '700207533655363584',
+            'ext': 'mp4',
+            'title': 'jay - BEAT PROD: @suhmeduh #Damndaniel',
+            'description': 'jay on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'uploader': 'jay',
+            'uploader_id': 'jaydingeer',
+        },
+        'params': {
+            'skip_download': True,  # requires ffmpeg
+        },
+    }, {
+        'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
+        'md5': '89a15ed345d13b86e9a5a5e051fa308a',
+        'info_dict': {
+            'id': 'MIOxnrUteUd',
+            'ext': 'mp4',
+            'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
+            'uploader': 'TAKUMA',
+            'uploader_id': '1004126642786242560',
+            'upload_date': '20140615',
+        },
+        'add_ie': ['Vine'],
+    }, {
+        'url': 'https://twitter.com/captainamerica/status/719944021058060289',
+        # md5 constantly changes
+        'info_dict': {
+            'id': '719944021058060289',
+            'ext': 'mp4',
+            'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.',
+            'description': 'Captain America on Twitter: "@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI"',
+            'uploader_id': 'captainamerica',
+            'uploader': 'Captain America',
+        },
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        user_id = mobj.group('user_id')
+        twid = mobj.group('id')
+
+        webpage = self._download_webpage(self._TEMPLATE_URL % (user_id, twid), twid)
+
+        username = remove_end(self._og_search_title(webpage), ' on Twitter')
+
+        title = description = self._og_search_description(webpage).strip('').replace('\n', ' ').strip('“”')
+
+        # strip  'https -_t.co_BJYgOjSeGA' junk from filenames
+        title = re.sub(r'\s+(https?://[^ ]+)', '', title)
+
+        info = {
+            'uploader_id': user_id,
+            'uploader': username,
+            'webpage_url': url,
+            'description': '%s on Twitter: "%s"' % (username, description),
+            'title': username + ' - ' + title,
+        }
+
+        mobj = re.search(r'''(?x)
+            <video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
+                <source[^>]+video-src="(?P<url>[^"]+)"
+        ''', webpage)
+
+        if mobj:
+            more_info = mobj.group('more_info')
+            height = int_or_none(self._search_regex(
+                r'data-height="(\d+)"', more_info, 'height', fatal=False))
+            width = int_or_none(self._search_regex(
+                r'data-width="(\d+)"', more_info, 'width', fatal=False))
+            thumbnail = self._search_regex(
+                r'poster="([^"]+)"', more_info, 'poster', fatal=False)
+            info.update({
+                'id': twid,
+                'url': mobj.group('url'),
+                'height': height,
+                'width': width,
+                'thumbnail': thumbnail,
+            })
+            return info
+
+        if 'class="PlayableMedia' in webpage:
+            info.update({
+                '_type': 'url_transparent',
+                'ie_key': 'TwitterCard',
+                'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
+            })
+
+            return info
+
+        raise ExtractorError('There\'s no video in this tweet.')
+
+
+class TwitterAmplifyIE(TwitterBaseIE):
+    IE_NAME = 'twitter:amplify'
+    _VALID_URL = 'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
+
+    _TEST = {
+        'url': 'https://amp.twimg.com/v/0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',
+        'md5': '7df102d0b9fd7066b86f3159f8e81bf6',
+        'info_dict': {
+            'id': '0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',
+            'ext': 'mp4',
+            'title': 'Twitter Video',
+            'thumbnail': 're:^https?://.*',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        vmap_url = self._html_search_meta(
+            'twitter:amplify:vmap', webpage, 'vmap url')
+        video_url = self._get_vmap_video_url(vmap_url, video_id)
+
+        thumbnails = []
+        thumbnail = self._html_search_meta(
+            'twitter:image:src', webpage, 'thumbnail', fatal=False)
+
+        def _find_dimension(target):
+            w = int_or_none(self._html_search_meta(
+                'twitter:%s:width' % target, webpage, fatal=False))
+            h = int_or_none(self._html_search_meta(
+                'twitter:%s:height' % target, webpage, fatal=False))
+            return w, h
+
+        if thumbnail:
+            thumbnail_w, thumbnail_h = _find_dimension('image')
+            thumbnails.append({
+                'url': thumbnail,
+                'width': thumbnail_w,
+                'height': thumbnail_h,
+            })
+
+        video_w, video_h = _find_dimension('player')
+        formats = [{
+            'url': video_url,
+            'width': video_w,
+            'height': video_h,
+        }]
+
+        return {
+            'id': video_id,
+            'title': 'Twitter Video',
+            'formats': formats,
+            'thumbnails': thumbnails,
+        }
diff --git a/youtube_dl/extractor/ubu.py b/youtube_dl/extractor/ubu.py

deleted file mode 100644 (file)

index d502377..0000000
--- a/youtube_dl/extractor/ubu.py
+++ /dev/null
@@ -1,57 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    qualities,
-)
-
-
-class UbuIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?ubu\.com/film/(?P<id>[\da-z_-]+)\.html'
-    _TEST = {
-        'url': 'http://ubu.com/film/her_noise.html',
-        'md5': '138d5652618bf0f03878978db9bef1ee',
-        'info_dict': {
-            'id': 'her_noise',
-            'ext': 'm4v',
-            'title': 'Her Noise - The Making Of (2007)',
-            'duration': 3600,
-        },
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._html_search_regex(
-            r'<title>.+?Film &amp; Video: ([^<]+)</title>', webpage, 'title')
-
-        duration = int_or_none(self._html_search_regex(
-            r'Duration: (\d+) minutes', webpage, 'duration', fatal=False),
-            invscale=60)
-
-        formats = []
-        FORMAT_REGEXES = [
-            ('sq', r"'flashvars'\s*,\s*'file=([^']+)'"),
-            ('hq', r'href="(http://ubumexico\.centro\.org\.mx/video/[^"]+)"'),
-        ]
-        preference = qualities([fid for fid, _ in FORMAT_REGEXES])
-        for format_id, format_regex in FORMAT_REGEXES:
-            m = re.search(format_regex, webpage)
-            if m:
-                formats.append({
-                    'url': m.group(1),
-                    'format_id': format_id,
-                    'preference': preference(format_id),
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/udemy.py b/youtube_dl/extractor/udemy.py

index 4a0eaf65f78be0dbac2b089aa064eec043b15e41..d1e6f2703e022dac0edc3ef0f16794a6285d2b8f 100644 (file)
--- a/youtube_dl/extractor/udemy.py
+++ b/youtube_dl/extractor/udemy.py
@@ -4,17 +4,35 @@ import re
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse,
+    compat_HTTPError,
+    compat_urllib_parse_urlencode,
      compat_urllib_request,
+    compat_urlparse,
  )
  from ..utils import (
+    determine_ext,
+    extract_attributes,
      ExtractorError,
+    float_or_none,
+    int_or_none,
+    sanitized_Request,
+    unescapeHTML,
+    urlencode_postdata,
  )
  
  
  class UdemyIE(InfoExtractor):
      IE_NAME = 'udemy'
-    _VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        www\.udemy\.com/
+                        (?:
+                            [^#]+\#/lecture/|
+                            lecture/view/?\?lectureId=|
+                            [^/]+/learn/v4/t/lecture/
+                        )
+                        (?P<id>\d+)
+                    '''
      _LOGIN_URL = 'https://www.udemy.com/join/login-popup/?displayType=ajax&showSkipButton=1'
      _ORIGIN_URL = 'https://www.udemy.com'
      _NETRC_MACHINE = 'udemy'
@@ -30,8 +48,55 @@ class UdemyIE(InfoExtractor):
              'duration': 579.29,
          },
          'skip': 'Requires udemy account credentials',
+    }, {
+        # new URL schema
+        'url': 'https://www.udemy.com/electric-bass-right-from-the-start/learn/v4/t/lecture/4580906',
+        'only_matching': True,
      }]
  
+    def _extract_course_info(self, webpage, video_id):
+        course = self._parse_json(
+            unescapeHTML(self._search_regex(
+                r'ng-init=["\'].*\bcourse=({.+?});', webpage, 'course', default='{}')),
+            video_id, fatal=False) or {}
+        course_id = course.get('id') or self._search_regex(
+            (r'&quot;id&quot;\s*:\s*(\d+)', r'data-course-id=["\'](\d+)'),
+            webpage, 'course id')
+        return course_id, course.get('title')
+
+    def _enroll_course(self, base_url, webpage, course_id):
+        def combine_url(base_url, url):
+            return compat_urlparse.urljoin(base_url, url) if not url.startswith('http') else url
+
+        checkout_url = unescapeHTML(self._search_regex(
+            r'href=(["\'])(?P<url>(?:https?://(?:www\.)?udemy\.com)?/payment/checkout/.+?)\1',
+            webpage, 'checkout url', group='url', default=None))
+        if checkout_url:
+            raise ExtractorError(
+                'Course %s is not free. You have to pay for it before you can download. '
+                'Use this URL to confirm purchase: %s'
+                % (course_id, combine_url(base_url, checkout_url)),
+                expected=True)
+
+        enroll_url = unescapeHTML(self._search_regex(
+            r'href=(["\'])(?P<url>(?:https?://(?:www\.)?udemy\.com)?/course/subscribe/.+?)\1',
+            webpage, 'enroll url', group='url', default=None))
+        if enroll_url:
+            webpage = self._download_webpage(
+                combine_url(base_url, enroll_url),
+                course_id, 'Enrolling in the course')
+            if '>You have enrolled in' in webpage:
+                self.to_screen('%s: Successfully enrolled in the course' % course_id)
+
+    def _download_lecture(self, course_id, lecture_id):
+        return self._download_json(
+            'https://www.udemy.com/api-2.0/users/me/subscribed-courses/%s/lectures/%s?%s' % (
+                course_id, lecture_id, compat_urllib_parse_urlencode({
+                    'fields[lecture]': 'title,description,view_html,asset',
+                    'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data',
+                })),
+            lecture_id, 'Downloading lecture JSON')
+
      def _handle_error(self, response):
          if not isinstance(response, dict):
              return
@@ -43,7 +108,7 @@ class UdemyIE(InfoExtractor):
                  error_str += ' - %s' % error_data.get('formErrors')
              raise ExtractorError(error_str, expected=True)
  
-    def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
+    def _download_json(self, url_or_request, *args, **kwargs):
          headers = {
              'X-Udemy-Snail-Case': 'true',
              'X-Requested-With': 'XMLHttpRequest',
@@ -53,14 +118,15 @@ class UdemyIE(InfoExtractor):
                  headers['X-Udemy-Client-Id'] = cookie.value
              elif cookie.name == 'access_token':
                  headers['X-Udemy-Bearer-Token'] = cookie.value
+                headers['X-Udemy-Authorization'] = 'Bearer %s' % cookie.value
  
          if isinstance(url_or_request, compat_urllib_request.Request):
              for header, value in headers.items():
                  url_or_request.add_header(header, value)
          else:
-            url_or_request = compat_urllib_request.Request(url_or_request, headers=headers)
+            url_or_request = sanitized_Request(url_or_request, headers=headers)
  
-        response = super(UdemyIE, self)._download_json(url_or_request, video_id, note)
+        response = super(UdemyIE, self)._download_json(url_or_request, *args, **kwargs)
          self._handle_error(response)
          return response
  
@@ -70,9 +136,7 @@ class UdemyIE(InfoExtractor):
      def _login(self):
          (username, password) = self._get_login_info()
          if username is None:
-            raise ExtractorError(
-                'Udemy account is required, use --username and --password options to provide account credentials.',
-                expected=True)
+            return
  
          login_popup = self._download_webpage(
              self._LOGIN_URL, None, 'Downloading login popup')
@@ -87,12 +151,12 @@ class UdemyIE(InfoExtractor):
          login_form = self._form_hidden_inputs('login-form', login_popup)
  
          login_form.update({
-            'email': username.encode('utf-8'),
-            'password': password.encode('utf-8'),
+            'email': username,
+            'password': password,
          })
  
-        request = compat_urllib_request.Request(
-            self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+        request = sanitized_Request(
+            self._LOGIN_URL, urlencode_postdata(login_form))
          request.add_header('Referer', self._ORIGIN_URL)
          request.add_header('Origin', self._ORIGIN_URL)
  
@@ -110,44 +174,124 @@ class UdemyIE(InfoExtractor):
      def _real_extract(self, url):
          lecture_id = self._match_id(url)
  
-        lecture = self._download_json(
-            'https://www.udemy.com/api-1.1/lectures/%s' % lecture_id,
-            lecture_id, 'Downloading lecture JSON')
-
-        asset_type = lecture.get('assetType') or lecture.get('asset_type')
-        if asset_type != 'Video':
-            raise ExtractorError(
-                'Lecture %s is not a video' % lecture_id, expected=True)
+        webpage = self._download_webpage(url, lecture_id)
  
-        asset = lecture['asset']
+        course_id, _ = self._extract_course_info(webpage, lecture_id)
  
-        stream_url = asset.get('streamUrl') or asset.get('stream_url')
-        mobj = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url)
-        if mobj:
-            return self.url_result(mobj.group(1), 'Youtube')
+        try:
+            lecture = self._download_lecture(course_id, lecture_id)
+        except ExtractorError as e:
+            # Error could possibly mean we are not enrolled in the course
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                self._enroll_course(url, webpage, course_id)
+                lecture = self._download_lecture(course_id, lecture_id)
+            else:
+                raise
  
-        video_id = asset['id']
-        thumbnail = asset.get('thumbnailUrl') or asset.get('thumbnail_url')
-        duration = asset['data']['duration']
+        title = lecture['title']
+        description = lecture.get('description')
  
-        download_url = asset.get('downloadUrl') or asset.get('download_url')
+        asset = lecture['asset']
  
-        video = download_url.get('Video') or download_url.get('video')
-        video_480p = download_url.get('Video480p') or download_url.get('video_480p')
+        asset_type = asset.get('asset_type') or asset.get('assetType')
+        if asset_type != 'Video':
+            raise ExtractorError(
+                'Lecture %s is not a video' % lecture_id, expected=True)
  
-        formats = [
-            {
-                'url': video_480p[0],
-                'format_id': '360p',
-            },
-            {
-                'url': video[0],
-                'format_id': '720p',
-            },
-        ]
+        stream_url = asset.get('stream_url') or asset.get('streamUrl')
+        if stream_url:
+            youtube_url = self._search_regex(
+                r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url, 'youtube URL', default=None)
+            if youtube_url:
+                return self.url_result(youtube_url, 'Youtube')
  
-        title = lecture['title']
-        description = lecture['description']
+        video_id = asset['id']
+        thumbnail = asset.get('thumbnail_url') or asset.get('thumbnailUrl')
+        duration = float_or_none(asset.get('data', {}).get('duration'))
+
+        formats = []
+
+        def extract_output_format(src):
+            return {
+                'url': src['url'],
+                'format_id': '%sp' % (src.get('height') or format_id),
+                'width': int_or_none(src.get('width')),
+                'height': int_or_none(src.get('height')),
+                'vbr': int_or_none(src.get('video_bitrate_in_kbps')),
+                'vcodec': src.get('video_codec'),
+                'fps': int_or_none(src.get('frame_rate')),
+                'abr': int_or_none(src.get('audio_bitrate_in_kbps')),
+                'acodec': src.get('audio_codec'),
+                'asr': int_or_none(src.get('audio_sample_rate')),
+                'tbr': int_or_none(src.get('total_bitrate_in_kbps')),
+                'filesize': int_or_none(src.get('file_size_in_bytes')),
+            }
+
+        outputs = asset.get('data', {}).get('outputs')
+        if not isinstance(outputs, dict):
+            outputs = {}
+
+        def add_output_format_meta(f, key):
+            output = outputs.get(key)
+            if isinstance(output, dict):
+                output_format = extract_output_format(output)
+                output_format.update(f)
+                return output_format
+            return f
+
+        download_urls = asset.get('download_urls')
+        if isinstance(download_urls, dict):
+            video = download_urls.get('Video')
+            if isinstance(video, list):
+                for format_ in video:
+                    video_url = format_.get('file')
+                    if not video_url:
+                        continue
+                    format_id = format_.get('label')
+                    f = {
+                        'url': format_['file'],
+                        'format_id': '%sp' % format_id,
+                        'height': int_or_none(format_id),
+                    }
+                    if format_id:
+                        # Some videos contain additional metadata (e.g.
+                        # https://www.udemy.com/ios9-swift/learn/#/lecture/3383208)
+                        f = add_output_format_meta(f, format_id)
+                    formats.append(f)
+
+        view_html = lecture.get('view_html')
+        if view_html:
+            view_html_urls = set()
+            for source in re.findall(r'<source[^>]+>', view_html):
+                attributes = extract_attributes(source)
+                src = attributes.get('src')
+                if not src:
+                    continue
+                res = attributes.get('data-res')
+                height = int_or_none(res)
+                if src in view_html_urls:
+                    continue
+                view_html_urls.add(src)
+                if attributes.get('type') == 'application/x-mpegURL' or determine_ext(src) == 'm3u8':
+                    m3u8_formats = self._extract_m3u8_formats(
+                        src, video_id, 'mp4', entry_protocol='m3u8_native',
+                        m3u8_id='hls', fatal=False)
+                    for f in m3u8_formats:
+                        m = re.search(r'/hls_(?P<height>\d{3,4})_(?P<tbr>\d{2,})/', f['url'])
+                        if m:
+                            if not f.get('height'):
+                                f['height'] = int(m.group('height'))
+                            if not f.get('tbr'):
+                                f['tbr'] = int(m.group('tbr'))
+                    formats.extend(m3u8_formats)
+                else:
+                    formats.append(add_output_format_meta({
+                        'url': src,
+                        'format_id': '%dp' % height if height else None,
+                        'height': height,
+                    }, res))
+
+        self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
  
          return {
              'id': video_id,
@@ -161,9 +305,7 @@ class UdemyIE(InfoExtractor):
  
  class UdemyCourseIE(UdemyIE):
      IE_NAME = 'udemy:course'
-    _VALID_URL = r'https?://www\.udemy\.com/(?P<coursepath>[\da-z-]+)'
-    _SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
-    _ALREADY_ENROLLED = '>You are already taking this course.<'
+    _VALID_URL = r'https?://www\.udemy\.com/(?P<id>[^/?#&]+)'
      _TESTS = []
  
      @classmethod
@@ -171,33 +313,47 @@ class UdemyCourseIE(UdemyIE):
          return False if UdemyIE.suitable(url) else super(UdemyCourseIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        course_path = mobj.group('coursepath')
-
-        response = self._download_json(
-            'https://www.udemy.com/api-1.1/courses/%s' % course_path,
-            course_path, 'Downloading course JSON')
+        course_path = self._match_id(url)
  
-        course_id = int(response['id'])
-        course_title = response['title']
+        webpage = self._download_webpage(url, course_path)
  
-        webpage = self._download_webpage(
-            'https://www.udemy.com/course/subscribe/?courseId=%s' % course_id,
-            course_id, 'Enrolling in the course')
+        course_id, title = self._extract_course_info(webpage, course_path)
  
-        if self._SUCCESSFULLY_ENROLLED in webpage:
-            self.to_screen('%s: Successfully enrolled in' % course_id)
-        elif self._ALREADY_ENROLLED in webpage:
-            self.to_screen('%s: Already enrolled in' % course_id)
+        self._enroll_course(url, webpage, course_id)
  
          response = self._download_json(
-            'https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
-            course_id, 'Downloading course curriculum')
-
-        entries = [
-            self.url_result(
-                'https://www.udemy.com/%s/#/lecture/%s' % (course_path, asset['id']), 'Udemy')
-            for asset in response if asset.get('assetType') or asset.get('asset_type') == 'Video'
-        ]
-
-        return self.playlist_result(entries, course_id, course_title)
+            'https://www.udemy.com/api-2.0/courses/%s/cached-subscriber-curriculum-items' % course_id,
+            course_id, 'Downloading course curriculum', query={
+                'fields[chapter]': 'title,object_index',
+                'fields[lecture]': 'title,asset',
+                'page_size': '1000',
+            })
+
+        entries = []
+        chapter, chapter_number = [None] * 2
+        for entry in response['results']:
+            clazz = entry.get('_class')
+            if clazz == 'lecture':
+                asset = entry.get('asset')
+                if isinstance(asset, dict):
+                    asset_type = asset.get('asset_type') or asset.get('assetType')
+                    if asset_type != 'Video':
+                        continue
+                lecture_id = entry.get('id')
+                if lecture_id:
+                    entry = {
+                        '_type': 'url_transparent',
+                        'url': 'https://www.udemy.com/%s/learn/v4/t/lecture/%s' % (course_path, entry['id']),
+                        'title': entry.get('title'),
+                        'ie_key': UdemyIE.ie_key(),
+                    }
+                    if chapter_number:
+                        entry['chapter_number'] = chapter_number
+                    if chapter:
+                        entry['chapter'] = chapter
+                    entries.append(entry)
+            elif clazz == 'chapter':
+                chapter_number = entry.get('object_index')
+                chapter = entry.get('title')
+
+        return self.playlist_result(entries, course_id, title)
diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py

index 2151f83382d6b3185722b54de2d0eab2a988c6ae..ee35b7227372c0ddc128dfc694577578f9fc6009 100644 (file)
--- a/youtube_dl/extractor/udn.py
+++ b/youtube_dl/extractor/udn.py
@@ -12,7 +12,8 @@ from ..compat import compat_urlparse
  
  class UDNEmbedIE(InfoExtractor):
      IE_DESC = '聯合影音'
-    _VALID_URL = r'https?://video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+    _PROTOCOL_RELATIVE_VALID_URL = r'//video\.udn\.com/(?:embed|play)/news/(?P<id>\d+)'
+    _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL
      _TESTS = [{
          'url': 'http://video.udn.com/embed/news/300040',
          'md5': 'de06b4c90b042c128395a88f0384817e',
diff --git a/youtube_dl/extractor/ultimedia.py b/youtube_dl/extractor/ultimedia.py

deleted file mode 100644 (file)

index c475105..0000000
--- a/youtube_dl/extractor/ultimedia.py
+++ /dev/null
@@ -1,105 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_urlparse
-from ..utils import (
-    ExtractorError,
-    qualities,
-    unified_strdate,
-    clean_html,
-)
-
-
-class UltimediaIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ultimedia\.com/default/index/video[^/]+/id/(?P<id>[\d+a-z]+)'
-    _TESTS = [{
-        # news
-        'url': 'https://www.ultimedia.com/default/index/videogeneric/id/s8uk0r',
-        'md5': '276a0e49de58c7e85d32b057837952a2',
-        'info_dict': {
-            'id': 's8uk0r',
-            'ext': 'mp4',
-            'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
-            'description': 'md5:3e5c8fd65791487333dda5db8aed32af',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'upload_date': '20150317',
-        },
-    }, {
-        # music
-        'url': 'https://www.ultimedia.com/default/index/videomusic/id/xvpfp8',
-        'md5': '2ea3513813cf230605c7e2ffe7eca61c',
-        'info_dict': {
-            'id': 'xvpfp8',
-            'ext': 'mp4',
-            'title': "Two - C'est la vie (Clip)",
-            'description': 'Two',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'upload_date': '20150224',
-        },
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        deliver_url = self._proto_relative_url(self._search_regex(
-            r'<iframe[^>]+src="((?:https?:)?//(?:www\.)?ultimedia\.com/deliver/[^"]+)"',
-            webpage, 'deliver URL'), compat_urllib_parse_urlparse(url).scheme + ':')
-
-        deliver_page = self._download_webpage(
-            deliver_url, video_id, 'Downloading iframe page')
-
-        if '>This video is currently not available' in deliver_page:
-            raise ExtractorError(
-                'Video %s is currently not available' % video_id, expected=True)
-
-        player = self._parse_json(
-            self._search_regex(
-                r"jwplayer\('player(?:_temp)?'\)\.setup\(({.+?})\)\.on",
-                deliver_page, 'player'),
-            video_id)
-
-        quality = qualities(['flash', 'html5'])
-        formats = []
-        for mode in player['modes']:
-            video_url = mode.get('config', {}).get('file')
-            if not video_url:
-                continue
-            if re.match(r'https?://www\.youtube\.com/.+?', video_url):
-                return self.url_result(video_url, 'Youtube')
-            formats.append({
-                'url': video_url,
-                'format_id': mode.get('type'),
-                'quality': quality(mode.get('type')),
-            })
-        self._sort_formats(formats)
-
-        thumbnail = player.get('image')
-
-        title = clean_html((
-            self._html_search_regex(
-                r'(?s)<div\s+id="catArticle">.+?</div>(.+?)</h1>',
-                webpage, 'title', default=None) or
-            self._search_regex(
-                r"var\s+nameVideo\s*=\s*'([^']+)'",
-                deliver_page, 'title')))
-
-        description = clean_html(self._html_search_regex(
-            r'(?s)<span>Description</span>(.+?)</p>', webpage,
-            'description', fatal=False))
-
-        upload_date = unified_strdate(self._search_regex(
-            r'Ajouté le\s*<span>([^<]+)', webpage,
-            'upload date', fatal=False))
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'upload_date': upload_date,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/unistra.py b/youtube_dl/extractor/unistra.py

index f70978299ac9e682f5cdb99a7396541fd08c115c..66d9f1bf3fc9ff8481fb55aa8045078244b11635 100644 (file)
--- a/youtube_dl/extractor/unistra.py
+++ b/youtube_dl/extractor/unistra.py
@@ -7,7 +7,7 @@ from ..utils import qualities
  
  
  class UnistraIE(InfoExtractor):
-    _VALID_URL = r'http://utv\.unistra\.fr/(?:index|video)\.php\?id_video\=(?P<id>\d+)'
+    _VALID_URL = r'https?://utv\.unistra\.fr/(?:index|video)\.php\?id_video\=(?P<id>\d+)'
  
      _TESTS = [
          {
@@ -38,7 +38,7 @@ class UnistraIE(InfoExtractor):
  
          webpage = self._download_webpage(url, video_id)
  
-        files = set(re.findall(r'file\s*:\s*"([^"]+)"', webpage))
+        files = set(re.findall(r'file\s*:\s*"(/[^"]+)"', webpage))
  
          quality = qualities(['SD', 'HD'])
          formats = []
diff --git a/youtube_dl/extractor/usatoday.py b/youtube_dl/extractor/usatoday.py

new file mode 100644 (file)

index 0000000..e5678dc
--- /dev/null
+++ b/youtube_dl/extractor/usatoday.py
@@ -0,0 +1,48 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    get_element_by_attribute,
+    parse_duration,
+    update_url_query,
+    ExtractorError,
+)
+from ..compat import compat_str
+
+
+class USATodayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)'
+    _TEST = {
+        'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/',
+        'md5': '4d40974481fa3475f8bccfd20c5361f8',
+        'info_dict': {
+            'id': '81729424',
+            'ext': 'mp4',
+            'title': 'US, France warn Syrian regime ahead of new peace talks',
+            'timestamp': 1457891045,
+            'description': 'md5:7e50464fdf2126b0f533748d3c78d58f',
+            'uploader_id': '29906170001',
+            'upload_date': '20160313',
+        }
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/29906170001/38a9eecc-bdd8-42a3-ba14-95397e48b3f8_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(update_url_query(url, {'ajax': 'true'}), display_id)
+        ui_video_data = get_element_by_attribute('class', 'ui-video-data', webpage)
+        if not ui_video_data:
+            raise ExtractorError('no video on the webpage', expected=True)
+        video_data = self._parse_json(ui_video_data, display_id)
+
+        return {
+            '_type': 'url_transparent',
+            'url': self.BRIGHTCOVE_URL_TEMPLATE % video_data['brightcove_id'],
+            'id': compat_str(video_data['id']),
+            'title': video_data['title'],
+            'thumbnail': video_data.get('thumbnail'),
+            'description': video_data.get('description'),
+            'duration': parse_duration(video_data.get('length')),
+            'ie_key': 'BrightcoveNew',
+        }
diff --git a/youtube_dl/extractor/ustream.py b/youtube_dl/extractor/ustream.py

index c39c278ab211c45809e594f64cc90f71304e9d92..54605d863027968a4a15c5358b9f98539c69c4b3 100644 (file)
--- a/youtube_dl/extractor/ustream.py
+++ b/youtube_dl/extractor/ustream.py
@@ -1,17 +1,20 @@
  from __future__ import unicode_literals
  
-import json
  import re
  
  from .common import InfoExtractor
  from ..compat import (
      compat_urlparse,
  )
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    float_or_none,
+)
  
  
  class UstreamIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.ustream\.tv/(?P<type>recorded|embed|embed/recorded)/(?P<videoID>\d+)'
+    _VALID_URL = r'https?://www\.ustream\.tv/(?P<type>recorded|embed|embed/recorded)/(?P<id>\d+)'
      IE_NAME = 'ustream'
      _TESTS = [{
          'url': 'http://www.ustream.tv/recorded/20274954',
@@ -19,8 +22,12 @@ class UstreamIE(InfoExtractor):
          'info_dict': {
              'id': '20274954',
              'ext': 'flv',
-            'uploader': 'Young Americans for Liberty',
              'title': 'Young Americans for Liberty February 7, 2012 2:28 AM',
+            'description': 'Young Americans for Liberty February 7, 2012 2:28 AM',
+            'timestamp': 1328577035,
+            'upload_date': '20120207',
+            'uploader': 'yaliberty',
+            'uploader_id': '6780869',
          },
      }, {
          # From http://sportscanada.tv/canadagames/index.php/week2/figure-skating/444
@@ -32,73 +39,80 @@ class UstreamIE(InfoExtractor):
              'ext': 'flv',
              'title': '-CG11- Canada Games Figure Skating',
              'uploader': 'sportscanadatv',
-        }
+        },
+        'skip': 'This Pro Broadcaster has chosen to remove this video from the ustream.tv site.',
+    }, {
+        'url': 'http://www.ustream.tv/embed/10299409',
+        'info_dict': {
+            'id': '10299409',
+        },
+        'playlist_count': 3,
      }]
  
      def _real_extract(self, url):
          m = re.match(self._VALID_URL, url)
-        video_id = m.group('videoID')
+        video_id = m.group('id')
  
-        # some sites use this embed format (see: http://github.com/rg3/youtube-dl/issues/2990)
+        # some sites use this embed format (see: https://github.com/rg3/youtube-dl/issues/2990)
          if m.group('type') == 'embed/recorded':
-            video_id = m.group('videoID')
+            video_id = m.group('id')
              desktop_url = 'http://www.ustream.tv/recorded/' + video_id
              return self.url_result(desktop_url, 'Ustream')
          if m.group('type') == 'embed':
-            video_id = m.group('videoID')
+            video_id = m.group('id')
              webpage = self._download_webpage(url, video_id)
-            desktop_video_id = self._html_search_regex(
-                r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id')
-            desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id
-            return self.url_result(desktop_url, 'Ustream')
+            content_video_ids = self._parse_json(self._search_regex(
+                r'ustream\.vars\.offAirContentVideoIds=([^;]+);', webpage,
+                'content video IDs'), video_id)
+            return self.playlist_result(
+                map(lambda u: self.url_result('http://www.ustream.tv/recorded/' + u, 'Ustream'), content_video_ids),
+                video_id)
  
          params = self._download_json(
-            'http://cdngw.ustream.tv/rgwjson/Viewer.getVideo/' + json.dumps({
-                'brandId': 1,
-                'videoId': int(video_id),
-                'autoplay': False,
-            }), video_id)
+            'https://api.ustream.tv/videos/%s.json' % video_id, video_id)
  
-        if 'error' in params:
-            raise ExtractorError(params['error']['message'], expected=True)
+        error = params.get('error')
+        if error:
+            raise ExtractorError(
+                '%s returned error: %s' % (self.IE_NAME, error), expected=True)
  
-        video_url = params['flv']
+        video = params['video']
  
-        webpage = self._download_webpage(url, video_id)
+        title = video['title']
+        filesize = float_or_none(video.get('file_size'))
  
-        self.report_extraction(video_id)
-
-        video_title = self._html_search_regex(r'data-title="(?P<title>.+)"',
-                                              webpage, 'title', default=None)
-
-        if not video_title:
-            try:
-                video_title = params['moduleConfig']['meta']['title']
-            except KeyError:
-                pass
-
-        if not video_title:
-            video_title = 'Ustream video ' + video_id
+        formats = [{
+            'id': video_id,
+            'url': video_url,
+            'ext': format_id,
+            'filesize': filesize,
+        } for format_id, video_url in video['media_urls'].items()]
+        self._sort_formats(formats)
  
-        uploader = self._html_search_regex(r'data-content-type="channel".*?>(?P<uploader>.*?)</a>',
-                                           webpage, 'uploader', fatal=False, flags=re.DOTALL, default=None)
+        description = video.get('description')
+        timestamp = int_or_none(video.get('created_at'))
+        duration = float_or_none(video.get('length'))
+        view_count = int_or_none(video.get('views'))
  
-        if not uploader:
-            try:
-                uploader = params['moduleConfig']['meta']['userName']
-            except KeyError:
-                uploader = None
+        uploader = video.get('owner', {}).get('username')
+        uploader_id = video.get('owner', {}).get('id')
  
-        thumbnail = self._html_search_regex(r'<link rel="image_src" href="(?P<thumb>.*?)"',
-                                            webpage, 'thumbnail', fatal=False)
+        thumbnails = [{
+            'id': thumbnail_id,
+            'url': thumbnail_url,
+        } for thumbnail_id, thumbnail_url in video.get('thumbnail', {}).items()]
  
          return {
              'id': video_id,
-            'url': video_url,
-            'ext': 'flv',
-            'title': video_title,
+            'title': title,
+            'description': description,
+            'thumbnails': thumbnails,
+            'timestamp': timestamp,
+            'duration': duration,
+            'view_count': view_count,
              'uploader': uploader,
-            'thumbnail': thumbnail,
+            'uploader_id': uploader_id,
+            'formats': formats,
          }
  
  
diff --git a/youtube_dl/extractor/ustudio.py b/youtube_dl/extractor/ustudio.py

new file mode 100644 (file)

index 0000000..cafc082
--- /dev/null
+++ b/youtube_dl/extractor/ustudio.py
@@ -0,0 +1,67 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+)
+
+
+class UstudioIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:(?:www|v1)\.)?ustudio\.com/video/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://ustudio.com/video/Uxu2my9bgSph/san_francisco_golden_gate_bridge',
+        'md5': '58bbfca62125378742df01fc2abbdef6',
+        'info_dict': {
+            'id': 'Uxu2my9bgSph',
+            'display_id': 'san_francisco_golden_gate_bridge',
+            'ext': 'mp4',
+            'title': 'San Francisco: Golden Gate Bridge',
+            'description': 'md5:23925500697f2c6d4830e387ba51a9be',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'upload_date': '20111107',
+            'uploader': 'Tony Farley',
+        }
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        config = self._download_xml(
+            'http://v1.ustudio.com/embed/%s/ustudio/config.xml' % video_id,
+            display_id)
+
+        def extract(kind):
+            return [{
+                'url': item.attrib['url'],
+                'width': int_or_none(item.get('width')),
+                'height': int_or_none(item.get('height')),
+            } for item in config.findall('./qualities/quality/%s' % kind) if item.get('url')]
+
+        formats = extract('video')
+        self._sort_formats(formats)
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._og_search_title(webpage)
+        upload_date = unified_strdate(self._search_regex(
+            r'(?s)Uploaded by\s*.+?\s*on\s*<span>([^<]+)</span>',
+            webpage, 'upload date', fatal=False))
+        uploader = self._search_regex(
+            r'Uploaded by\s*<a[^>]*>([^<]+)<',
+            webpage, 'uploader', fatal=False)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnails': extract('image'),
+            'upload_date': upload_date,
+            'uploader': uploader,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/varzesh3.py b/youtube_dl/extractor/varzesh3.py

index 9369abaf8f7bdfa2b220c39d02f9460dbab711c2..84698371a8ab2daf77faae1684141eb32425f232 100644 (file)
--- a/youtube_dl/extractor/varzesh3.py
+++ b/youtube_dl/extractor/varzesh3.py
@@ -2,11 +2,19 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlparse,
+    compat_parse_qs,
+)
+from ..utils import (
+    clean_html,
+    remove_start,
+)
  
  
  class Varzesh3IE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?video\.varzesh3\.com/(?:[^/]+/)+(?P<id>[^/]+)/?'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://video.varzesh3.com/germany/bundesliga/5-%D9%88%D8%A7%DA%A9%D9%86%D8%B4-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AF%D8%B1%D9%88%D8%A7%D8%B2%D9%87%E2%80%8C%D8%A8%D8%A7%D9%86%D8%A7%D9%86%D8%9B%D9%87%D9%81%D8%AA%D9%87-26-%D8%A8%D9%88%D9%86%D8%AF%D8%B3/',
          'md5': '2a933874cb7dce4366075281eb49e855',
          'info_dict': {
@@ -15,8 +23,19 @@ class Varzesh3IE(InfoExtractor):
              'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا',
              'description': 'فصل ۲۰۱۵-۲۰۱۴',
              'thumbnail': 're:^https?://.*\.jpg$',
-        }
-    }
+        },
+        'skip': 'HTTP 404 Error',
+    }, {
+        'url': 'http://video.varzesh3.com/video/112785/%D8%AF%D9%84%D9%87-%D8%B9%D9%84%DB%8C%D8%9B-%D8%B3%D8%AA%D8%A7%D8%B1%D9%87-%D9%86%D9%88%D8%B8%D9%87%D9%88%D8%B1-%D9%84%DB%8C%DA%AF-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AC%D8%B2%DB%8C%D8%B1%D9%87',
+        'md5': '841b7cd3afbc76e61708d94e53a4a4e7',
+        'info_dict': {
+            'id': '112785',
+            'ext': 'mp4',
+            'title': 'دله علی؛ ستاره نوظهور لیگ برتر جزیره',
+            'description': 'فوتبال 120',
+        },
+        'expected_warnings': ['description'],
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
@@ -26,15 +45,30 @@ class Varzesh3IE(InfoExtractor):
          video_url = self._search_regex(
              r'<source[^>]+src="([^"]+)"', webpage, 'video url')
  
-        title = self._og_search_title(webpage)
+        title = remove_start(self._html_search_regex(
+            r'<title>([^<]+)</title>', webpage, 'title'), 'ویدیو ورزش 3 | ')
+
          description = self._html_search_regex(
              r'(?s)<div class="matn">(.+?)</div>',
-            webpage, 'description', fatal=False)
-        thumbnail = self._og_search_thumbnail(webpage)
+            webpage, 'description', default=None)
+        if description is None:
+            description = clean_html(self._html_search_meta('description', webpage))
+
+        thumbnail = self._og_search_thumbnail(webpage, default=None)
+        if thumbnail is None:
+            fb_sharer_url = self._search_regex(
+                r'<a[^>]+href="(https?://www\.facebook\.com/sharer/sharer\.php?[^"]+)"',
+                webpage, 'facebook sharer URL', fatal=False)
+            sharer_params = compat_parse_qs(compat_urllib_parse_urlparse(fb_sharer_url).query)
+            thumbnail = sharer_params.get('p[images][0]', [None])[0]
  
          video_id = self._search_regex(
              r"<link[^>]+rel='(?:canonical|shortlink)'[^>]+href='/\?p=([^']+)'",
-            webpage, display_id, default=display_id)
+            webpage, display_id, default=None)
+        if video_id is None:
+            video_id = self._search_regex(
+                'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
+                default=display_id)
  
          return {
              'url': video_url,
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index 722eb52368825b92c88506ff33d79bf1f2f91a32..dff1bb70281b9a1cd69d9507b963e4c622a29044 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -2,18 +2,16 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class Vbox7IE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?vbox7\.com/play:(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?vbox7\.com/play:(?P<id>[^/]+)'
      _TEST = {
          'url': 'http://vbox7.com/play:249bb972c2',
          'md5': '99f65c0c9ef9b682b97313e052734c3f',
@@ -47,9 +45,9 @@ class Vbox7IE(InfoExtractor):
          title = self._html_search_regex(r'<title>(.*)</title>',
                                          webpage, 'title').split('/')[0].strip()
  
-        info_url = "http://vbox7.com/play/magare.do"
-        data = compat_urllib_parse.urlencode({'as3': '1', 'vid': video_id})
-        info_request = compat_urllib_request.Request(info_url, data)
+        info_url = 'http://vbox7.com/play/magare.do'
+        data = urlencode_postdata({'as3': '1', 'vid': video_id})
+        info_request = sanitized_Request(info_url, data)
          info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
          info_response = self._download_webpage(info_request, video_id, 'Downloading info webpage')
          if info_response is None:
diff --git a/youtube_dl/extractor/veoh.py b/youtube_dl/extractor/veoh.py

index 01e258e32218c227c5de3caf60588baab56e9045..23ce0a0d1929febac87f789374d8411d7b7ddd00 100644 (file)
--- a/youtube_dl/extractor/veoh.py
+++ b/youtube_dl/extractor/veoh.py
@@ -4,17 +4,15 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
  from ..utils import (
      int_or_none,
      ExtractorError,
+    sanitized_Request,
  )
  
  
  class VeohIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|yapi-)[\da-zA-Z]+)'
+    _VALID_URL = r'https?://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|yapi-)[\da-zA-Z]+)'
  
      _TESTS = [
          {
@@ -110,7 +108,7 @@ class VeohIE(InfoExtractor):
          if 'class="adultwarning-container"' in webpage:
              self.report_age_confirmation()
              age_limit = 18
-            request = compat_urllib_request.Request(url)
+            request = sanitized_Request(url)
              request.add_header('Cookie', 'confirmedAdult=true')
              webpage = self._download_webpage(request, video_id)
  
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

index 3c8d2a9437af3021df921f5692ad5ae984b14ead..1a0ff3395598027ebd8de05a609faca987c14e9e 100644 (file)
--- a/youtube_dl/extractor/vessel.py
+++ b/youtube_dl/extractor/vessel.py
@@ -4,10 +4,10 @@ from __future__ import unicode_literals
  import json
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
  from ..utils import (
      ExtractorError,
      parse_iso8601,
+    sanitized_Request,
  )
  
  
@@ -33,7 +33,7 @@ class VesselIE(InfoExtractor):
      @staticmethod
      def make_json_request(url, data):
          payload = json.dumps(data).encode('utf-8')
-        req = compat_urllib_request.Request(url, payload)
+        req = sanitized_Request(url, payload)
          req.add_header('Content-Type', 'application/json; charset=utf-8')
          return req
  
diff --git a/youtube_dl/extractor/vesti.py b/youtube_dl/extractor/vesti.py

index a0c59a2e0e1cb8fca2e0e3eb3ec2e4edce2918bb..cb64ae0bd07cdca051eb3aa10550840a296ded85 100644 (file)
--- a/youtube_dl/extractor/vesti.py
+++ b/youtube_dl/extractor/vesti.py
@@ -10,7 +10,7 @@ from .rutv import RUTVIE
  
  class VestiIE(InfoExtractor):
      IE_DESC = 'Вести.Ru'
-    _VALID_URL = r'http://(?:.+?\.)?vesti\.ru/(?P<id>.+)'
+    _VALID_URL = r'https?://(?:.+?\.)?vesti\.ru/(?P<id>.+)'
  
      _TESTS = [
          {
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index c17094f8193f7678cc3d0a912c3d970f38e6bf7c..147480f6465513066db58ce3cf32e194c4ff8490 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -1,23 +1,22 @@
  from __future__ import unicode_literals
  
  import re
-import xml.etree.ElementTree
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-)
+from ..compat import compat_etree_fromstring
  from ..utils import (
      ExtractorError,
      int_or_none,
+    sanitized_Request,
+    parse_iso8601,
  )
  
  
  class VevoIE(InfoExtractor):
-    """
+    '''
      Accepts urls from vevo.com or in the format 'vevo:{id}'
      (currently used by MTVIE and MySpaceIE)
-    """
+    '''
      _VALID_URL = r'''(?x)
          (?:https?://www\.vevo\.com/watch/(?:[^/]+/(?:[^/]+/)?)?|
             https?://cache\.vevo\.com/m/html/embed\.html\?video=|
@@ -27,19 +26,15 @@ class VevoIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
-        "md5": "95ee28ee45e70130e3ab02b0f579ae23",
+        'md5': '95ee28ee45e70130e3ab02b0f579ae23',
          'info_dict': {
              'id': 'GB1101300280',
              'ext': 'mp4',
-            "upload_date": "20130624",
-            "uploader": "Hurts",
-            "title": "Somebody to Die For",
-            "duration": 230.12,
-            "width": 1920,
-            "height": 1080,
-            # timestamp and upload_date are often incorrect; seem to change randomly
-            'timestamp': int,
-        }
+            'title': 'Somebody to Die For',
+            'upload_date': '20130624',
+            'uploader': 'Hurts',
+            'timestamp': 1372057200,
+        },
      }, {
          'note': 'v3 SMIL format',
          'url': 'http://www.vevo.com/watch/cassadee-pope/i-wish-i-could-break-your-heart/USUV71302923',
@@ -47,82 +42,70 @@ class VevoIE(InfoExtractor):
          'info_dict': {
              'id': 'USUV71302923',
              'ext': 'mp4',
+            'title': 'I Wish I Could Break Your Heart',
              'upload_date': '20140219',
              'uploader': 'Cassadee Pope',
-            'title': 'I Wish I Could Break Your Heart',
-            'duration': 226.101,
-            'age_limit': 0,
-            'timestamp': int,
-        }
+            'timestamp': 1392796919,
+        },
      }, {
          'note': 'Age-limited video',
          'url': 'https://www.vevo.com/watch/justin-timberlake/tunnel-vision-explicit/USRV81300282',
          'info_dict': {
              'id': 'USRV81300282',
              'ext': 'mp4',
-            'age_limit': 18,
              'title': 'Tunnel Vision (Explicit)',
+            'upload_date': '20130703',
+            'age_limit': 18,
              'uploader': 'Justin Timberlake',
-            'upload_date': 're:2013070[34]',
-            'timestamp': int,
+            'timestamp': 1372888800,
+        },
+    }, {
+        'note': 'No video_info',
+        'url': 'http://www.vevo.com/watch/k-camp-1/Till-I-Die/USUV71503000',
+        'md5': '8b83cc492d72fc9cf74a02acee7dc1b0',
+        'info_dict': {
+            'id': 'USUV71503000',
+            'ext': 'mp4',
+            'title': 'Till I Die',
+            'upload_date': '20151207',
+            'age_limit': 18,
+            'uploader': 'K Camp',
+            'timestamp': 1449468000,
          },
-        'params': {
-            'skip_download': 'true',
-        }
      }]
-    _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/'
+    _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com'
+    _SOURCE_TYPES = {
+        0: 'youtube',
+        1: 'brightcove',
+        2: 'http',
+        3: 'hls_ios',
+        4: 'hls',
+        5: 'smil',  # http
+        7: 'f4m_cc',
+        8: 'f4m_ak',
+        9: 'f4m_l3',
+        10: 'ism',
+        13: 'smil',  # rtmp
+        18: 'dash',
+    }
+    _VERSIONS = {
+        0: 'youtube',  # only in AuthenticateVideo videoVersions
+        1: 'level3',
+        2: 'akamai',
+        3: 'level3',
+        4: 'amazon',
+    }
  
-    def _real_initialize(self):
-        req = compat_urllib_request.Request(
-            'http://www.vevo.com/auth', data=b'')
-        webpage = self._download_webpage(
-            req, None,
-            note='Retrieving oauth token',
-            errnote='Unable to retrieve oauth token',
-            fatal=False)
-        if webpage is False:
-            self._oauth_token = None
-        else:
-            self._oauth_token = self._search_regex(
-                r'access_token":\s*"([^"]+)"',
-                webpage, 'access token', fatal=False)
-
-    def _formats_from_json(self, video_info):
-        last_version = {'version': -1}
-        for version in video_info['videoVersions']:
-            # These are the HTTP downloads, other types are for different manifests
-            if version['sourceType'] == 2:
-                if version['version'] > last_version['version']:
-                    last_version = version
-        if last_version['version'] == -1:
-            raise ExtractorError('Unable to extract last version of the video')
-
-        renditions = xml.etree.ElementTree.fromstring(last_version['data'])
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
          formats = []
-        # Already sorted from worst to best quality
-        for rend in renditions.findall('rendition'):
-            attr = rend.attrib
-            format_note = '%(videoCodec)s@%(videoBitrate)4sk, %(audioCodec)s@%(audioBitrate)3sk' % attr
-            formats.append({
-                'url': attr['url'],
-                'format_id': attr['name'],
-                'format_note': format_note,
-                'height': int(attr['frameheight']),
-                'width': int(attr['frameWidth']),
-            })
-        return formats
-
-    def _formats_from_smil(self, smil_xml):
-        formats = []
-        smil_doc = xml.etree.ElementTree.fromstring(smil_xml.encode('utf-8'))
-        els = smil_doc.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
+        els = smil.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
          for el in els:
              src = el.attrib['src']
              m = re.match(r'''(?xi)
                  (?P<ext>[a-z0-9]+):
                  (?P<path>
                      [/a-z0-9]+     # The directory and main part of the URL
-                    _(?P<cbr>[0-9]+)k
+                    _(?P<tbr>[0-9]+)k
                      _(?P<width>[0-9]+)x(?P<height>[0-9]+)
                      _(?P<vcodec>[a-z0-9]+)
                      _(?P<vbr>[0-9]+)
@@ -136,9 +119,10 @@ class VevoIE(InfoExtractor):
              format_url = self._SMIL_BASE_URL + m.group('path')
              formats.append({
                  'url': format_url,
-                'format_id': 'SMIL_' + m.group('cbr'),
+                'format_id': 'smil_' + m.group('tbr'),
                  'vcodec': m.group('vcodec'),
                  'acodec': m.group('acodec'),
+                'tbr': int(m.group('tbr')),
                  'vbr': int(m.group('vbr')),
                  'abr': int(m.group('abr')),
                  'ext': m.group('ext'),
@@ -147,40 +131,154 @@ class VevoIE(InfoExtractor):
              })
          return formats
  
-    def _download_api_formats(self, video_id):
-        if not self._oauth_token:
-            self._downloader.report_warning(
-                'No oauth token available, skipping API HLS download')
-            return []
-
-        api_url = 'https://apiv2.vevo.com/video/%s/streams/hls?token=%s' % (
-            video_id, self._oauth_token)
-        api_data = self._download_json(
-            api_url, video_id,
-            note='Downloading HLS formats',
-            errnote='Failed to download HLS format list', fatal=False)
-        if api_data is None:
-            return []
-
-        m3u8_url = api_data[0]['url']
-        return self._extract_m3u8_formats(
-            m3u8_url, video_id, entry_protocol='m3u8_native', ext='mp4',
-            preference=0)
+    def _initialize_api(self, video_id):
+        req = sanitized_Request(
+            'http://www.vevo.com/auth', data=b'')
+        webpage = self._download_webpage(
+            req, None,
+            note='Retrieving oauth token',
+            errnote='Unable to retrieve oauth token')
+
+        if 'THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION' in webpage:
+            raise ExtractorError(
+                '%s said: This page is currently unavailable in your region.' % self.IE_NAME, expected=True)
+
+        auth_info = self._parse_json(webpage, video_id)
+        self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
+
+    def _call_api(self, path, video_id, note, errnote, fatal=True):
+        return self._download_json(self._api_url_template % path, video_id, note, errnote)
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
-        json_url = 'http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
-        response = self._download_json(json_url, video_id)
-        video_info = response['video']
+        json_url = 'http://api.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
+        response = self._download_json(
+            json_url, video_id, 'Downloading video info', 'Unable to download info')
+        video_info = response.get('video') or {}
+        video_versions = video_info.get('videoVersions')
+        uploader = None
+        timestamp = None
+        view_count = None
+        formats = []
  
          if not video_info:
-            if 'statusMessage' in response:
-                raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusMessage']), expected=True)
-            raise ExtractorError('Unable to extract videos')
+            if response.get('statusCode') != 909:
+                ytid = response.get('errorInfo', {}).get('ytid')
+                if ytid:
+                    self.report_warning(
+                        'Video is geoblocked, trying with the YouTube video %s' % ytid)
+                    return self.url_result(ytid, 'Youtube', ytid)
+
+                if 'statusMessage' in response:
+                    raise ExtractorError('%s said: %s' % (
+                        self.IE_NAME, response['statusMessage']), expected=True)
+                raise ExtractorError('Unable to extract videos')
  
-        formats = self._formats_from_json(video_info)
+            self._initialize_api(video_id)
+            video_info = self._call_api(
+                'video/%s' % video_id, video_id, 'Downloading api video info',
+                'Failed to download video info')
+
+            video_versions = self._call_api(
+                'video/%s/streams' % video_id, video_id,
+                'Downloading video versions info',
+                'Failed to download video versions info')
+
+            timestamp = parse_iso8601(video_info.get('releaseDate'))
+            artists = video_info.get('artists')
+            if artists:
+                uploader = artists[0]['name']
+            view_count = int_or_none(video_info.get('views', {}).get('total'))
+
+            for video_version in video_versions:
+                version = self._VERSIONS.get(video_version['version'])
+                version_url = video_version.get('url')
+                if not version_url:
+                    continue
+
+                if '.ism' in version_url:
+                    continue
+                elif '.mpd' in version_url:
+                    formats.extend(self._extract_mpd_formats(
+                        version_url, video_id, mpd_id='dash-%s' % version,
+                        note='Downloading %s MPD information' % version,
+                        errnote='Failed to download %s MPD information' % version,
+                        fatal=False))
+                elif '.m3u8' in version_url:
+                    formats.extend(self._extract_m3u8_formats(
+                        version_url, video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='hls-%s' % version,
+                        note='Downloading %s m3u8 information' % version,
+                        errnote='Failed to download %s m3u8 information' % version,
+                        fatal=False))
+                else:
+                    m = re.search(r'''(?xi)
+                        _(?P<width>[0-9]+)x(?P<height>[0-9]+)
+                        _(?P<vcodec>[a-z0-9]+)
+                        _(?P<vbr>[0-9]+)
+                        _(?P<acodec>[a-z0-9]+)
+                        _(?P<abr>[0-9]+)
+                        \.(?P<ext>[a-z0-9]+)''', version_url)
+                    if not m:
+                        continue
+
+                    formats.append({
+                        'url': version_url,
+                        'format_id': 'http-%s-%s' % (version, video_version['quality']),
+                        'vcodec': m.group('vcodec'),
+                        'acodec': m.group('acodec'),
+                        'vbr': int(m.group('vbr')),
+                        'abr': int(m.group('abr')),
+                        'ext': m.group('ext'),
+                        'width': int(m.group('width')),
+                        'height': int(m.group('height')),
+                    })
+        else:
+            timestamp = int_or_none(self._search_regex(
+                r'/Date\((\d+)\)/',
+                video_info['releaseDate'], 'release date', fatal=False),
+                scale=1000)
+            artists = video_info.get('mainArtists')
+            if artists:
+                uploader = artists[0]['artistName']
+
+            smil_parsed = False
+            for video_version in video_info['videoVersions']:
+                version = self._VERSIONS.get(video_version['version'])
+                if version == 'youtube':
+                    continue
+                else:
+                    source_type = self._SOURCE_TYPES.get(video_version['sourceType'])
+                    renditions = compat_etree_fromstring(video_version['data'])
+                    if source_type == 'http':
+                        for rend in renditions.findall('rendition'):
+                            attr = rend.attrib
+                            formats.append({
+                                'url': attr['url'],
+                                'format_id': 'http-%s-%s' % (version, attr['name']),
+                                'height': int_or_none(attr.get('frameheight')),
+                                'width': int_or_none(attr.get('frameWidth')),
+                                'tbr': int_or_none(attr.get('totalBitrate')),
+                                'vbr': int_or_none(attr.get('videoBitrate')),
+                                'abr': int_or_none(attr.get('audioBitrate')),
+                                'vcodec': attr.get('videoCodec'),
+                                'acodec': attr.get('audioCodec'),
+                            })
+                    elif source_type == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            renditions.find('rendition').attrib['url'], video_id,
+                            'mp4', 'm3u8_native', m3u8_id='hls-%s' % version,
+                            note='Downloading %s m3u8 information' % version,
+                            errnote='Failed to download %s m3u8 information' % version,
+                            fatal=False))
+                    elif source_type == 'smil' and version == 'level3' and not smil_parsed:
+                        formats.extend(self._extract_smil_formats(
+                            renditions.find('rendition').attrib['url'], video_id, False))
+                        smil_parsed = True
+        self._sort_formats(formats)
+
+        title = video_info['title']
  
          is_explicit = video_info.get('isExplicit')
          if is_explicit is True:
@@ -190,40 +288,16 @@ class VevoIE(InfoExtractor):
          else:
              age_limit = None
  
-        # Download via HLS API
-        formats.extend(self._download_api_formats(video_id))
-
-        # Download SMIL
-        smil_blocks = sorted((
-            f for f in video_info['videoVersions']
-            if f['sourceType'] == 13),
-            key=lambda f: f['version'])
-        smil_url = '%s/Video/V2/VFILE/%s/%sr.smil' % (
-            self._SMIL_BASE_URL, video_id, video_id.lower())
-        if smil_blocks:
-            smil_url_m = self._search_regex(
-                r'url="([^"]+)"', smil_blocks[-1]['data'], 'SMIL URL',
-                default=None)
-            if smil_url_m is not None:
-                smil_url = smil_url_m
-        if smil_url:
-            smil_xml = self._download_webpage(
-                smil_url, video_id, 'Downloading SMIL info', fatal=False)
-            if smil_xml:
-                formats.extend(self._formats_from_smil(smil_xml))
-
-        self._sort_formats(formats)
-        timestamp_ms = int_or_none(self._search_regex(
-            r'/Date\((\d+)\)/',
-            video_info['launchDate'], 'launch date', fatal=False))
+        duration = video_info.get('duration')
  
          return {
              'id': video_id,
-            'title': video_info['title'],
+            'title': title,
              'formats': formats,
-            'thumbnail': video_info['imageUrl'],
-            'timestamp': timestamp_ms // 1000,
-            'uploader': video_info['mainArtists'][0]['artistName'],
-            'duration': video_info['duration'],
+            'thumbnail': video_info.get('imageUrl') or video_info.get('thumbnailUrl'),
+            'timestamp': timestamp,
+            'uploader': uploader,
+            'duration': duration,
+            'view_count': view_count,
              'age_limit': age_limit,
          }
diff --git a/youtube_dl/extractor/vgtv.py b/youtube_dl/extractor/vgtv.py

index f38a72fde8974a7a1ea290de04281f67079b1a16..b11cd254c7da9c8c780dedd2b2db120f8025c74b 100644 (file)
--- a/youtube_dl/extractor/vgtv.py
+++ b/youtube_dl/extractor/vgtv.py
@@ -4,26 +4,49 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
+from .xstream import XstreamIE
  from ..utils import (
      ExtractorError,
      float_or_none,
  )
  
  
-class VGTVIE(InfoExtractor):
-    IE_DESC = 'VGTV and BTTV'
+class VGTVIE(XstreamIE):
+    IE_DESC = 'VGTV, BTTV, FTV, Aftenposten and Aftonbladet'
+
+    _HOST_TO_APPNAME = {
+        'vgtv.no': 'vgtv',
+        'bt.no/tv': 'bttv',
+        'aftenbladet.no/tv': 'satv',
+        'fvn.no/fvntv': 'fvntv',
+        'aftenposten.no/webtv': 'aptv',
+        'ap.vgtv.no/webtv': 'aptv',
+    }
+
+    _APP_NAME_TO_VENDOR = {
+        'vgtv': 'vgtv',
+        'bttv': 'bt',
+        'satv': 'sa',
+        'fvntv': 'fvn',
+        'aptv': 'ap',
+    }
+
      _VALID_URL = r'''(?x)
-                    (?:
-                        vgtv:|
-                        http://(?:www\.)?
+                    (?:https?://(?:www\.)?
+                    (?P<host>
+                        %s
                      )
-                    (?P<host>vgtv|bt)
+                    /?
                      (?:
-                        :|
-                        \.no/(?:tv/)?\#!/(?:video|live)/
-                    )
-                    (?P<id>[0-9]+)
-                    '''
+                        \#!/(?:video|live)/|
+                        embed?.*id=
+                    )|
+                    (?P<appname>
+                        %s
+                    ):)
+                    (?P<id>\d+)
+                    ''' % ('|'.join(_HOST_TO_APPNAME.keys()), '|'.join(_APP_NAME_TO_VENDOR.keys()))
+
      _TESTS = [
          {
              # streamType: vod
@@ -59,17 +82,18 @@ class VGTVIE(InfoExtractor):
                  # m3u8 download
                  'skip_download': True,
              },
+            'skip': 'Video is no longer available',
          },
          {
-            # streamType: live
+            # streamType: wasLive
              'url': 'http://www.vgtv.no/#!/live/113063/direkte-v75-fra-solvalla',
              'info_dict': {
                  'id': '113063',
-                'ext': 'flv',
-                'title': 're:^DIREKTE: V75 fra Solvalla [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+                'ext': 'mp4',
+                'title': 'V75 fra Solvalla 30.05.15',
                  'description': 'md5:b3743425765355855f88e096acc93231',
                  'thumbnail': 're:^https?://.*\.jpg',
-                'duration': 0,
+                'duration': 25966,
                  'timestamp': 1432975582,
                  'upload_date': '20150530',
                  'view_count': int,
@@ -79,31 +103,57 @@ class VGTVIE(InfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            'url': 'http://www.aftenposten.no/webtv/#!/video/21039/trailer-sweatshop-i-can-t-take-any-more',
+            'md5': 'fd828cd29774a729bf4d4425fe192972',
+            'info_dict': {
+                'id': '21039',
+                'ext': 'mp4',
+                'title': 'TRAILER: «SWEATSHOP» - I can´t take any more',
+                'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238',
+                'duration': 66,
+                'timestamp': 1417002452,
+                'upload_date': '20141126',
+                'view_count': int,
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
+        },
          {
              'url': 'http://www.bt.no/tv/#!/video/100250/norling-dette-er-forskjellen-paa-1-divisjon-og-eliteserien',
              'only_matching': True,
          },
+        {
+            'url': 'http://ap.vgtv.no/webtv#!/video/111084/de-nye-bysyklene-lettere-bedre-gir-stoerre-hjul-og-feste-til-mobil',
+            'only_matching': True,
+        },
      ]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
          host = mobj.group('host')
-
-        HOST_WEBSITES = {
-            'vgtv': 'vgtv',
-            'bt': 'bttv',
-        }
+        appname = self._HOST_TO_APPNAME[host] if host else mobj.group('appname')
+        vendor = self._APP_NAME_TO_VENDOR[appname]
  
          data = self._download_json(
              'http://svp.vg.no/svp/api/v1/%s/assets/%s?appName=%s-website'
-            % (host, video_id, HOST_WEBSITES[host]),
+            % (vendor, video_id, appname),
              video_id, 'Downloading media JSON')
  
          if data.get('status') == 'inactive':
              raise ExtractorError(
                  'Video %s is no longer available' % video_id, expected=True)
  
+        info = {
+            'formats': [],
+        }
+        if len(video_id) == 5:
+            if appname == 'bttv':
+                info = self._extract_video_info('btno', video_id)
+
          streams = data['streamUrls']
          stream_type = data.get('streamType')
  
@@ -112,56 +162,62 @@ class VGTVIE(InfoExtractor):
          hls_url = streams.get('hls')
          if hls_url:
              formats.extend(self._extract_m3u8_formats(
-                hls_url, video_id, 'mp4', m3u8_id='hls'))
+                hls_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
  
          hds_url = streams.get('hds')
-        # wasLive hds are always 404
-        if hds_url and stream_type != 'wasLive':
-            formats.extend(self._extract_f4m_formats(
-                hds_url + '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
-                video_id, f4m_id='hds'))
+        if hds_url:
+            hdcore_sign = 'hdcore=3.7.0'
+            f4m_formats = self._extract_f4m_formats(
+                hds_url + '?%s' % hdcore_sign, video_id, f4m_id='hds', fatal=False)
+            if f4m_formats:
+                for entry in f4m_formats:
+                    # URLs without the extra param induce an 404 error
+                    entry.update({'extra_param_to_segment_url': hdcore_sign})
+                    formats.append(entry)
  
+        mp4_urls = streams.get('pseudostreaming') or []
          mp4_url = streams.get('mp4')
          if mp4_url:
-            _url = hls_url or hds_url
-            MP4_URL_TEMPLATE = '%s/%%s.%s' % (mp4_url.rpartition('/')[0], mp4_url.rpartition('.')[-1])
-            for mp4_format in _url.split(','):
-                m = re.search('(?P<width>\d+)_(?P<height>\d+)_(?P<vbr>\d+)', mp4_format)
-                if not m:
-                    continue
-                width = int(m.group('width'))
-                height = int(m.group('height'))
-                vbr = int(m.group('vbr'))
-                formats.append({
-                    'url': MP4_URL_TEMPLATE % mp4_format,
-                    'format_id': 'mp4-%s' % vbr,
-                    'width': width,
-                    'height': height,
-                    'vbr': vbr,
-                    'preference': 1,
+            mp4_urls.append(mp4_url)
+        for mp4_url in mp4_urls:
+            format_info = {
+                'url': mp4_url,
+            }
+            mobj = re.search('(\d+)_(\d+)_(\d+)', mp4_url)
+            if mobj:
+                tbr = int(mobj.group(3))
+                format_info.update({
+                    'width': int(mobj.group(1)),
+                    'height': int(mobj.group(2)),
+                    'tbr': tbr,
+                    'format_id': 'mp4-%s' % tbr,
                  })
-        self._sort_formats(formats)
+            formats.append(format_info)
+
+        info['formats'].extend(formats)
+
+        self._sort_formats(info['formats'])
  
-        return {
+        info.update({
              'id': video_id,
-            'title': self._live_title(data['title']),
+            'title': self._live_title(data['title']) if stream_type == 'live' else data['title'],
              'description': data['description'],
              'thumbnail': data['images']['main'] + '?t[]=900x506q80',
              'timestamp': data['published'],
              'duration': float_or_none(data['duration'], 1000),
              'view_count': data['displays'],
-            'formats': formats,
              'is_live': True if stream_type == 'live' else False,
-        }
+        })
+        return info
  
  
  class BTArticleIE(InfoExtractor):
      IE_NAME = 'bt:article'
      IE_DESC = 'Bergens Tidende Articles'
-    _VALID_URL = 'http://(?:www\.)?bt\.no/(?:[^/]+/)+(?P<id>[^/]+)-\d+\.html'
+    _VALID_URL = r'https?://(?:www\.)?bt\.no/(?:[^/]+/)+(?P<id>[^/]+)-\d+\.html'
      _TEST = {
          'url': 'http://www.bt.no/nyheter/lokalt/Kjemper-for-internatet-1788214.html',
-        'md5': 'd055e8ee918ef2844745fcfd1a4175fb',
+        'md5': '2acbe8ad129b3469d5ae51b1158878df',
          'info_dict': {
              'id': '23199',
              'ext': 'mp4',
@@ -178,15 +234,15 @@ class BTArticleIE(InfoExtractor):
      def _real_extract(self, url):
          webpage = self._download_webpage(url, self._match_id(url))
          video_id = self._search_regex(
-            r'SVP\.Player\.load\(\s*(\d+)', webpage, 'video id')
-        return self.url_result('vgtv:bt:%s' % video_id, 'VGTV')
+            r'<video[^>]+data-id="(\d+)"', webpage, 'video id')
+        return self.url_result('bttv:%s' % video_id, 'VGTV')
  
  
  class BTVestlendingenIE(InfoExtractor):
      IE_NAME = 'bt:vestlendingen'
      IE_DESC = 'Bergens Tidende - Vestlendingen'
-    _VALID_URL = 'http://(?:www\.)?bt\.no/spesial/vestlendingen/#!/(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?bt\.no/spesial/vestlendingen/#!/(?P<id>\d+)'
+    _TESTS = [{
          'url': 'http://www.bt.no/spesial/vestlendingen/#!/86588',
          'md5': 'd7d17e3337dc80de6d3a540aefbe441b',
          'info_dict': {
@@ -197,7 +253,19 @@ class BTVestlendingenIE(InfoExtractor):
              'timestamp': 1430473209,
              'upload_date': '20150501',
          },
-    }
+        'skip': '404 Error',
+    }, {
+        'url': 'http://www.bt.no/spesial/vestlendingen/#!/86255',
+        'md5': 'a2893f8632e96389f4bdf36aa9463ceb',
+        'info_dict': {
+            'id': '86255',
+            'ext': 'mov',
+            'title': 'Du må tåle å fryse og være sulten',
+            'description': 'md5:b8046f4d022d5830ddab04865791d063',
+            'upload_date': '20150321',
+            'timestamp': 1426942023,
+        },
+    }]
  
      def _real_extract(self, url):
-        return self.url_result('xstream:btno:%s' % self._match_id(url), 'Xstream')
+        return self.url_result('bttv:%s' % self._match_id(url), 'VGTV')
diff --git a/youtube_dl/extractor/vice.py b/youtube_dl/extractor/vice.py

index 01af7a99574401b38e487b01dd5104e674740bbc..95daf4dfdf2155dbbab26f2896cf3c42e0f33e2f 100644 (file)
--- a/youtube_dl/extractor/vice.py
+++ b/youtube_dl/extractor/vice.py
@@ -1,30 +1,44 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
-from .ooyala import OoyalaIE
  from ..utils import ExtractorError
  
  
  class ViceIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)+(?P<id>.+)'
-
-    _TESTS = [
-        {
-            'url': 'http://www.vice.com/Fringes/cowboy-capitalists-part-1',
-            'info_dict': {
-                'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
-                'ext': 'mp4',
-                'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
-            },
-            'params': {
-                # Requires ffmpeg (m3u8 manifest)
-                'skip_download': True,
-            },
-        }, {
-            'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
-            'only_matching': True,
-        }
-    ]
+    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)'
+
+    _TESTS = [{
+        'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
+        'info_dict': {
+            'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
+            'ext': 'flv',
+            'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
+            'duration': 725.983,
+        },
+    }, {
+        'url': 'http://www.vice.com/video/how-to-hack-a-car',
+        'md5': '6fb2989a3fed069fb8eab3401fc2d3c9',
+        'info_dict': {
+            'id': '3jstaBeXgAs',
+            'ext': 'mp4',
+            'title': 'How to Hack a Car: Phreaked Out (Episode 2)',
+            'description': 'md5:ee95453f7ff495db8efe14ae8bf56f30',
+            'uploader_id': 'MotherboardTV',
+            'uploader': 'Motherboard',
+            'upload_date': '20140529',
+        },
+    }, {
+        'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.vice.com/ru/video/big-night-out-ibiza-clive-martin-229',
+        'only_matching': True,
+    }, {
+        'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -32,8 +46,43 @@ class ViceIE(InfoExtractor):
          try:
              embed_code = self._search_regex(
                  r'embedCode=([^&\'"]+)', webpage,
-                'ooyala embed code')
-            ooyala_url = OoyalaIE._url_for_embed_code(embed_code)
+                'ooyala embed code', default=None)
+            if embed_code:
+                return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
+            youtube_id = self._search_regex(
+                r'data-youtube-id="([^"]+)"', webpage, 'youtube id')
+            return self.url_result(youtube_id, 'Youtube')
          except ExtractorError:
              raise ExtractorError('The page doesn\'t contain a video', expected=True)
-        return self.url_result(ooyala_url, ie='Ooyala')
+
+
+class ViceShowIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
+
+    _TEST = {
+        'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
+        'info_dict': {
+            'id': 'fuck-thats-delicious-2',
+            'title': "Fuck, That's Delicious",
+            'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
+        },
+        'playlist_count': 17,
+    }
+
+    def _real_extract(self, url):
+        show_id = self._match_id(url)
+        webpage = self._download_webpage(url, show_id)
+
+        entries = [
+            self.url_result(video_url, ViceIE.ie_key())
+            for video_url, _ in re.findall(
+                r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
+                % ViceIE._VALID_URL, webpage)]
+
+        title = self._search_regex(
+            r'<title>(.+?)</title>', webpage, 'title', default=None)
+        if title:
+            title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
+        description = self._html_search_meta('description', webpage, 'description')
+
+        return self.playlist_result(entries, show_id, title, description)
diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py

index 8516a2940cb38c7e030e504e6a29f88fcd3946a1..8d92aee878d3ad0c0d5725db755451c88e527f66 100644 (file)
--- a/youtube_dl/extractor/viddler.py
+++ b/youtube_dl/extractor/viddler.py
@@ -1,12 +1,14 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_urlparse,
+)
  from ..utils import (
      float_or_none,
      int_or_none,
-)
-from ..compat import (
-    compat_urllib_request
+    sanitized_Request,
  )
  
  
@@ -14,10 +16,10 @@ class ViddlerIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?viddler\.com/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
      _TESTS = [{
          'url': 'http://www.viddler.com/v/43903784',
-        'md5': 'ae43ad7cb59431ce043f0ff7fa13cbf4',
+        'md5': '9eee21161d2c7f5b39690c3e325fab2f',
          'info_dict': {
              'id': '43903784',
-            'ext': 'mp4',
+            'ext': 'mov',
              'title': 'Video Made Easy',
              'description': 'md5:6a697ebd844ff3093bd2e82c37b409cd',
              'uploader': 'viddler',
@@ -31,10 +33,10 @@ class ViddlerIE(InfoExtractor):
          }
      }, {
          'url': 'http://www.viddler.com/v/4d03aad9/',
-        'md5': 'faa71fbf70c0bee7ab93076fd007f4b0',
+        'md5': 'f12c5a7fa839c47a79363bfdf69404fb',
          'info_dict': {
              'id': '4d03aad9',
-            'ext': 'mp4',
+            'ext': 'ts',
              'title': 'WALL-TO-GORTAT',
              'upload_date': '20150126',
              'uploader': 'deadspin',
@@ -44,10 +46,10 @@ class ViddlerIE(InfoExtractor):
          }
      }, {
          'url': 'http://www.viddler.com/player/221ebbbd/0/',
-        'md5': '0defa2bd0ea613d14a6e9bd1db6be326',
+        'md5': '740511f61d3d1bb71dc14a0fe01a1c10',
          'info_dict': {
              'id': '221ebbbd',
-            'ext': 'mp4',
+            'ext': 'mov',
              'title': 'LETeens-Grammar-snack-third-conditional',
              'description': ' ',
              'upload_date': '20140929',
@@ -56,16 +58,42 @@ class ViddlerIE(InfoExtractor):
              'view_count': int,
              'comment_count': int,
          }
+    }, {
+        # secret protected
+        'url': 'http://www.viddler.com/v/890c0985?secret=34051570',
+        'info_dict': {
+            'id': '890c0985',
+            'ext': 'mp4',
+            'title': 'Complete Property Training - Traineeships',
+            'description': ' ',
+            'upload_date': '20130606',
+            'uploader': 'TiffanyBowtell',
+            'timestamp': 1370496993,
+            'view_count': int,
+            'comment_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        json_url = (
-            'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' %
-            video_id)
+        query = {
+            'video_id': video_id,
+            'key': 'v0vhrt7bg2xq1vyxhkct',
+        }
+
+        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+        secret = qs.get('secret', [None])[0]
+        if secret:
+            query['secret'] = secret
+
          headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'}
-        request = compat_urllib_request.Request(json_url, None, headers)
+        request = sanitized_Request(
+            'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?%s'
+            % compat_urllib_parse_urlencode(query), None, headers)
          data = self._download_json(request, video_id)['video']
  
          formats = []
diff --git a/youtube_dl/extractor/videobam.py b/youtube_dl/extractor/videobam.py

deleted file mode 100644 (file)

index 0eb3d94..0000000
--- a/youtube_dl/extractor/videobam.py
+++ /dev/null
@@ -1,81 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-import json
-
-from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-class VideoBamIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?videobam\.com/(?:videos/download/)?(?P<id>[a-zA-Z]+)'
-
-    _TESTS = [
-        {
-            'url': 'http://videobam.com/OiJQM',
-            'md5': 'db471f27763a531f10416a0c58b5a1e0',
-            'info_dict': {
-                'id': 'OiJQM',
-                'ext': 'mp4',
-                'title': 'Is Alcohol Worse Than Ecstasy?',
-                'description': 'md5:d25b96151515c91debc42bfbb3eb2683',
-                'uploader': 'frihetsvinge',
-            },
-        },
-        {
-            'url': 'http://videobam.com/pqLvq',
-            'md5': 'd9a565b5379a99126ef94e1d7f9a383e',
-            'note': 'HD video',
-            'info_dict': {
-                'id': 'pqLvq',
-                'ext': 'mp4',
-                'title': '_',
-            }
-        },
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        page = self._download_webpage('http://videobam.com/%s' % video_id, video_id, 'Downloading page')
-
-        formats = []
-
-        for preference, format_id in enumerate(['low', 'high']):
-            mobj = re.search(r"%s: '(?P<url>[^']+)'" % format_id, page)
-            if not mobj:
-                continue
-            formats.append({
-                'url': mobj.group('url'),
-                'ext': 'mp4',
-                'format_id': format_id,
-                'preference': preference,
-            })
-
-        if not formats:
-            player_config = json.loads(self._html_search_regex(r'var player_config = ({.+?});', page, 'player config'))
-            formats = [{
-                'url': item['url'],
-                'ext': 'mp4',
-            } for item in player_config['playlist'] if 'autoPlay' in item]
-
-        self._sort_formats(formats)
-
-        title = self._og_search_title(page, default='_', fatal=False)
-        description = self._og_search_description(page, default=None)
-        thumbnail = self._og_search_thumbnail(page)
-        uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None)
-        view_count = int_or_none(
-            self._html_search_regex(r'<strong>Views:</strong> (\d+) ', page, 'view count', fatal=False))
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'view_count': view_count,
-            'formats': formats,
-            'age_limit': 18,
-        }
diff --git a/youtube_dl/extractor/videodetective.py b/youtube_dl/extractor/videodetective.py

index 0ffc7ff7dc9185a3a3ec5c0fd14d302872662dda..2ed5d964344211c22d2260b1946273772434db8b 100644 (file)
--- a/youtube_dl/extractor/videodetective.py
+++ b/youtube_dl/extractor/videodetective.py
@@ -14,8 +14,11 @@ class VideoDetectiveIE(InfoExtractor):
              'id': '194487',
              'ext': 'mp4',
              'title': 'KICK-ASS 2',
-            'description': 'md5:65ba37ad619165afac7d432eaded6013',
-            'duration': 138,
+            'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
          },
      }
  
@@ -24,4 +27,4 @@ class VideoDetectiveIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
          og_video = self._og_search_video_url(webpage)
          query = compat_urlparse.urlparse(og_video).query
-        return self.url_result(InternetVideoArchiveIE._build_url(query), ie=InternetVideoArchiveIE.ie_key())
+        return self.url_result(InternetVideoArchiveIE._build_json_url(query), ie=InternetVideoArchiveIE.ie_key())
diff --git a/youtube_dl/extractor/videofyme.py b/youtube_dl/extractor/videofyme.py

index 94f9e9be94f9a420fd0207339085cbc35aba6805..cd3f50a63b70745b157dfd4d2f67549ead6d0de2 100644 (file)
--- a/youtube_dl/extractor/videofyme.py
+++ b/youtube_dl/extractor/videofyme.py
@@ -2,8 +2,8 @@ from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from ..utils import (
-    find_xpath_attr,
      int_or_none,
+    parse_iso8601,
  )
  
  
@@ -18,33 +18,35 @@ class VideofyMeIE(InfoExtractor):
              'id': '1100701',
              'ext': 'mp4',
              'title': 'This is VideofyMe',
-            'description': None,
+            'description': '',
+            'upload_date': '20130326',
+            'timestamp': 1364288959,
              'uploader': 'VideofyMe',
              'uploader_id': 'thisisvideofyme',
              'view_count': int,
+            'likes': int,
+            'comment_count': int,
          },
-
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        config = self._download_xml('http://sunshine.videofy.me/?videoId=%s' % video_id,
-                                    video_id)
-        video = config.find('video')
-        sources = video.find('sources')
-        url_node = next(node for node in [find_xpath_attr(sources, 'source', 'id', 'HQ %s' % key)
-                                          for key in ['on', 'av', 'off']] if node is not None)
-        video_url = url_node.find('url').text
-        view_count = int_or_none(self._search_regex(
-            r'([0-9]+)', video.find('views').text, 'view count', fatal=False))
+
+        config = self._download_json('http://vf-player-info-loader.herokuapp.com/%s.json' % video_id, video_id)['videoinfo']
+
+        video = config.get('video')
+        blog = config.get('blog', {})
  
          return {
              'id': video_id,
-            'title': video.find('title').text,
-            'url': video_url,
-            'thumbnail': video.find('thumb').text,
-            'description': video.find('description').text,
-            'uploader': config.find('blog/name').text,
-            'uploader_id': video.find('identifier').text,
-            'view_count': view_count,
+            'title': video['title'],
+            'url': video['sources']['source']['url'],
+            'thumbnail': video.get('thumb'),
+            'description': video.get('description'),
+            'timestamp': parse_iso8601(video.get('date')),
+            'uploader': blog.get('name'),
+            'uploader_id': blog.get('identifier'),
+            'view_count': int_or_none(self._search_regex(r'([0-9]+)', video.get('views'), 'view count', fatal=False)),
+            'likes': int_or_none(video.get('likes')),
+            'comment_count': int_or_none(video.get('nrOfComments')),
          }
diff --git a/youtube_dl/extractor/videolecturesnet.py b/youtube_dl/extractor/videolecturesnet.py

deleted file mode 100644 (file)

index d6a7eb2..0000000
--- a/youtube_dl/extractor/videolecturesnet.py
+++ /dev/null
@@ -1,86 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    find_xpath_attr,
-    int_or_none,
-    parse_duration,
-    unified_strdate,
-)
-
-
-class VideoLecturesNetIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?videolectures\.net/(?P<id>[^/#?]+)/'
-    IE_NAME = 'videolectures.net'
-
-    _TEST = {
-        'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/',
-        'info_dict': {
-            'id': 'promogram_igor_mekjavic_eng',
-            'ext': 'mp4',
-            'title': 'Automatics, robotics and biocybernetics',
-            'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
-            'upload_date': '20130627',
-            'duration': 565,
-            'thumbnail': 're:http://.*\.jpg',
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        smil_url = 'http://videolectures.net/%s/video/1/smil.xml' % video_id
-        smil = self._download_xml(smil_url, video_id)
-
-        title = find_xpath_attr(smil, './/meta', 'name', 'title').attrib['content']
-        description_el = find_xpath_attr(smil, './/meta', 'name', 'abstract')
-        description = (
-            None if description_el is None
-            else description_el.attrib['content'])
-        upload_date = unified_strdate(
-            find_xpath_attr(smil, './/meta', 'name', 'date').attrib['content'])
-
-        switch = smil.find('.//switch')
-        duration = parse_duration(switch.attrib.get('dur'))
-        thumbnail_el = find_xpath_attr(switch, './image', 'type', 'thumbnail')
-        thumbnail = (
-            None if thumbnail_el is None else thumbnail_el.attrib.get('src'))
-
-        formats = []
-        for v in switch.findall('./video'):
-            proto = v.attrib.get('proto')
-            if proto not in ['http', 'rtmp']:
-                continue
-            f = {
-                'width': int_or_none(v.attrib.get('width')),
-                'height': int_or_none(v.attrib.get('height')),
-                'filesize': int_or_none(v.attrib.get('size')),
-                'tbr': int_or_none(v.attrib.get('systemBitrate')) / 1000.0,
-                'ext': v.attrib.get('ext'),
-            }
-            src = v.attrib['src']
-            if proto == 'http':
-                if self._is_valid_url(src, video_id):
-                    f['url'] = src
-                    formats.append(f)
-            elif proto == 'rtmp':
-                f.update({
-                    'url': v.attrib['streamer'],
-                    'play_path': src,
-                    'rtmp_real_time': True,
-                })
-                formats.append(f)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'upload_date': upload_date,
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/videomega.py b/youtube_dl/extractor/videomega.py

index 78ff6310a07f6864abda658ca1e804e26594460b..4f0dcd18c7f28ab17aec58c814d53fd8ae21e7ac 100644 (file)
--- a/youtube_dl/extractor/videomega.py
+++ b/youtube_dl/extractor/videomega.py
@@ -4,7 +4,10 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
+from ..utils import (
+    decode_packed_codes,
+    sanitized_Request,
+)
  
  
  class VideoMegaIE(InfoExtractor):
@@ -30,7 +33,7 @@ class VideoMegaIE(InfoExtractor):
          video_id = self._match_id(url)
  
          iframe_url = 'http://videomega.tv/cdn.php?ref=%s' % video_id
-        req = compat_urllib_request.Request(iframe_url)
+        req = sanitized_Request(iframe_url)
          req.add_header('Referer', url)
          req.add_header('Cookie', 'noadvtday=0')
          webpage = self._download_webpage(req, video_id)
@@ -41,8 +44,10 @@ class VideoMegaIE(InfoExtractor):
              r'(?:^[Vv]ideo[Mm]ega\.tv\s-\s*|\s*-\svideomega\.tv$)', '', title)
          thumbnail = self._search_regex(
              r'<video[^>]+?poster="([^"]+)"', webpage, 'thumbnail', fatal=False)
+
+        real_codes = decode_packed_codes(webpage)
          video_url = self._search_regex(
-            r'<source[^>]+?src="([^"]+)"', webpage, 'video URL')
+            r'"src"\s*,\s*"([^"]+)"', real_codes, 'video URL')
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/videomore.py b/youtube_dl/extractor/videomore.py

new file mode 100644 (file)

index 0000000..04e95c6
--- /dev/null
+++ b/youtube_dl/extractor/videomore.py
@@ -0,0 +1,244 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_age_limit,
+    parse_iso8601,
+    xpath_text,
+)
+
+
+class VideomoreIE(InfoExtractor):
+    IE_NAME = 'videomore'
+    _VALID_URL = r'videomore:(?P<sid>\d+)$|https?://videomore\.ru/(?:(?:embed|[^/]+/[^/]+)/|[^/]+\?.*\btrack_id=)(?P<id>\d+)(?:[/?#&]|\.(?:xml|json)|$)'
+    _TESTS = [{
+        'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
+        'md5': '70875fbf57a1cd004709920381587185',
+        'info_dict': {
+            'id': '367617',
+            'ext': 'flv',
+            'title': 'В гостях Алексей Чумаков и Юлия Ковальчук',
+            'description': 'В гостях – лучшие романтические комедии года, «Выживший» Иньярриту и «Стив Джобс» Дэнни Бойла.',
+            'series': 'Кино в деталях',
+            'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук',
+            'episode_number': None,
+            'season': 'Сезон 2015',
+            'season_number': 5,
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 2910,
+            'age_limit': 16,
+            'view_count': int,
+        },
+    }, {
+        'url': 'http://videomore.ru/embed/259974',
+        'info_dict': {
+            'id': '259974',
+            'ext': 'flv',
+            'title': '80 серия',
+            'description': '«Медведей» ждет решающий матч. Макеев выясняет отношения со Стрельцовым. Парни узнают подробности прошлого Макеева.',
+            'series': 'Молодежка',
+            'episode': '80 серия',
+            'episode_number': 40,
+            'season': '2 сезон',
+            'season_number': 2,
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 2809,
+            'age_limit': 16,
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://videomore.ru/molodezhka/sezon_promo/341073',
+        'info_dict': {
+            'id': '341073',
+            'ext': 'flv',
+            'title': 'Команда проиграла из-за Бакина?',
+            'description': 'Молодежка 3 сезон скоро',
+            'series': 'Молодежка',
+            'episode': 'Команда проиграла из-за Бакина?',
+            'episode_number': None,
+            'season': 'Промо',
+            'season_number': 99,
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 29,
+            'age_limit': 16,
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://videomore.ru/elki_3?track_id=364623',
+        'only_matching': True,
+    }, {
+        'url': 'http://videomore.ru/embed/364623',
+        'only_matching': True,
+    }, {
+        'url': 'http://videomore.ru/video/tracks/364623.xml',
+        'only_matching': True,
+    }, {
+        'url': 'http://videomore.ru/video/tracks/364623.json',
+        'only_matching': True,
+    }, {
+        'url': 'http://videomore.ru/video/tracks/158031/quotes/33248',
+        'only_matching': True,
+    }, {
+        'url': 'videomore:367617',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<object[^>]+data=(["\'])https?://videomore.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('sid') or mobj.group('id')
+
+        video = self._download_xml(
+            'http://videomore.ru/video/tracks/%s.xml' % video_id,
+            video_id, 'Downloading video XML')
+
+        video_url = xpath_text(video, './/video_url', 'video url', fatal=True)
+        formats = self._extract_f4m_formats(video_url, video_id, f4m_id='hds')
+        self._sort_formats(formats)
+
+        data = self._download_json(
+            'http://videomore.ru/video/tracks/%s.json' % video_id,
+            video_id, 'Downloading video JSON')
+
+        title = data.get('title') or data['project_title']
+        description = data.get('description') or data.get('description_raw')
+        timestamp = parse_iso8601(data.get('published_at'))
+        duration = int_or_none(data.get('duration'))
+        view_count = int_or_none(data.get('views'))
+        age_limit = parse_age_limit(data.get('min_age'))
+        thumbnails = [{
+            'url': thumbnail,
+        } for thumbnail in data.get('big_thumbnail_urls', [])]
+
+        series = data.get('project_title')
+        episode = data.get('title')
+        episode_number = int_or_none(data.get('episode_of_season') or None)
+        season = data.get('season_title')
+        season_number = int_or_none(data.get('season_pos') or None)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'series': series,
+            'episode': episode,
+            'episode_number': episode_number,
+            'season': season,
+            'season_number': season_number,
+            'thumbnails': thumbnails,
+            'timestamp': timestamp,
+            'duration': duration,
+            'view_count': view_count,
+            'age_limit': age_limit,
+            'formats': formats,
+        }
+
+
+class VideomoreVideoIE(InfoExtractor):
+    IE_NAME = 'videomore:video'
+    _VALID_URL = r'https?://videomore\.ru/(?:(?:[^/]+/){2})?(?P<id>[^/?#&]+)[/?#&]*$'
+    _TESTS = [{
+        # single video with og:video:iframe
+        'url': 'http://videomore.ru/elki_3',
+        'info_dict': {
+            'id': '364623',
+            'ext': 'flv',
+            'title': 'Ёлки 3',
+            'description': '',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 5579,
+            'age_limit': 6,
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # season single series with og:video:iframe
+        'url': 'http://videomore.ru/poslednii_ment/1_sezon/14_seriya',
+        'only_matching': True,
+    }, {
+        'url': 'http://videomore.ru/sejchas_v_seti/serii_221-240/226_vypusk',
+        'only_matching': True,
+    }, {
+        # single video without og:video:iframe
+        'url': 'http://videomore.ru/marin_i_ego_druzya',
+        'info_dict': {
+            'id': '359073',
+            'ext': 'flv',
+            'title': '1 серия. Здравствуй, Аквавилль!',
+            'description': 'md5:c6003179538b5d353e7bcd5b1372b2d7',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'duration': 754,
+            'age_limit': 6,
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if VideomoreIE.suitable(url) else super(VideomoreVideoIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        video_url = self._og_search_property(
+            'video:iframe', webpage, 'video url', default=None)
+
+        if not video_url:
+            video_id = self._search_regex(
+                (r'config\s*:\s*["\']https?://videomore\.ru/video/tracks/(\d+)\.xml',
+                 r'track-id=["\'](\d+)',
+                 r'xcnt_product_id\s*=\s*(\d+)'), webpage, 'video id')
+            video_url = 'videomore:%s' % video_id
+
+        return self.url_result(video_url, VideomoreIE.ie_key())
+
+
+class VideomoreSeasonIE(InfoExtractor):
+    IE_NAME = 'videomore:season'
+    _VALID_URL = r'https?://videomore\.ru/(?!embed)(?P<id>[^/]+/[^/?#&]+)[/?#&]*$'
+    _TESTS = [{
+        'url': 'http://videomore.ru/molodezhka/sezon_promo',
+        'info_dict': {
+            'id': 'molodezhka/sezon_promo',
+            'title': 'Молодежка Промо',
+        },
+        'playlist_mincount': 12,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._og_search_title(webpage)
+
+        entries = [
+            self.url_result(item) for item in re.findall(
+                r'<a[^>]+href="((?:https?:)?//videomore\.ru/%s/[^/]+)"[^>]+class="widget-item-desc"'
+                % display_id, webpage)]
+
+        return self.playlist_result(entries, display_id, title)
diff --git a/youtube_dl/extractor/videopremium.py b/youtube_dl/extractor/videopremium.py

index 3176e3b9dda8580ca4c693343508367d12b25751..5de8273c34aa61a6e1e79ce8b7d142ab9135d35d 100644 (file)
--- a/youtube_dl/extractor/videopremium.py
+++ b/youtube_dl/extractor/videopremium.py
@@ -26,7 +26,7 @@ class VideoPremiumIE(InfoExtractor):
          webpage_url = 'http://videopremium.tv/' + video_id
          webpage = self._download_webpage(webpage_url, video_id)
  
-        if re.match(r"^<html><head><script[^>]*>window.location\s*=", webpage):
+        if re.match(r'^<html><head><script[^>]*>window.location\s*=', webpage):
              # Download again, we need a cookie
              webpage = self._download_webpage(
                  webpage_url, video_id,
@@ -37,10 +37,10 @@ class VideoPremiumIE(InfoExtractor):
  
          return {
              'id': video_id,
-            'url': "rtmp://e%d.md.iplay.md/play" % random.randint(1, 16),
-            'play_path': "mp4:%s.f4v" % video_id,
-            'page_url': "http://videopremium.tv/" + video_id,
-            'player_url': "http://videopremium.tv/uplayer/uppod.swf",
+            'url': 'rtmp://e%d.md.iplay.md/play' % random.randint(1, 16),
+            'play_path': 'mp4:%s.f4v' % video_id,
+            'page_url': 'http://videopremium.tv/' + video_id,
+            'player_url': 'http://videopremium.tv/uplayer/uppod.swf',
              'ext': 'f4v',
              'title': video_title,
          }
diff --git a/youtube_dl/extractor/videott.py b/youtube_dl/extractor/videott.py

index 591024eaded0cdddbb0779bd942c0fa8f63d86a6..0f798711bca7ebc25c893f82746ed3b1a49ff778 100644 (file)
--- a/youtube_dl/extractor/videott.py
+++ b/youtube_dl/extractor/videott.py
@@ -11,9 +11,10 @@ from ..utils import (
  
  
  class VideoTtIE(InfoExtractor):
+    _WORKING = False
      ID_NAME = 'video.tt'
      IE_DESC = 'video.tt - Your True Tube'
-    _VALID_URL = r'http://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
+    _VALID_URL = r'https?://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
  
      _TESTS = [{
          'url': 'http://www.video.tt/watch_video.php?v=amd5YujV8',
diff --git a/youtube_dl/extractor/videoweed.py b/youtube_dl/extractor/videoweed.py

deleted file mode 100644 (file)

index ca2e509..0000000
--- a/youtube_dl/extractor/videoweed.py
+++ /dev/null
@@ -1,26 +0,0 @@
-from __future__ import unicode_literals
-
-from .novamov import NovaMovIE
-
-
-class VideoWeedIE(NovaMovIE):
-    IE_NAME = 'videoweed'
-    IE_DESC = 'VideoWeed'
-
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
-
-    _HOST = 'www.videoweed.es'
-
-    _FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
-    _TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
-
-    _TEST = {
-        'url': 'http://www.videoweed.es/file/b42178afbea14',
-        'md5': 'abd31a2132947262c50429e1d16c1bfd',
-        'info_dict': {
-            'id': 'b42178afbea14',
-            'ext': 'flv',
-            'title': 'optical illusion  dissapeared image magic illusion',
-            'description': ''
-        },
-    }
diff --git a/youtube_dl/extractor/vidme.py b/youtube_dl/extractor/vidme.py

index e0b55078b2c9af8bf654ee6cd6982305074cb39b..b1156d531aba6793fc7ce7dda9649950d922f606 100644 (file)
--- a/youtube_dl/extractor/vidme.py
+++ b/youtube_dl/extractor/vidme.py
@@ -1,15 +1,20 @@
  from __future__ import unicode_literals
  
+import itertools
+
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
+    ExtractorError,
      int_or_none,
      float_or_none,
-    str_to_int,
+    parse_iso8601,
  )
  
  
  class VidmeIE(InfoExtractor):
-    _VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]+)'
+    IE_NAME = 'vidme'
+    _VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{,5})(?:[^\da-zA-Z]|$)'
      _TESTS = [{
          'url': 'https://vid.me/QNB',
          'md5': 'f42d05e7149aeaec5c037b17e5d3dc82',
@@ -18,49 +23,251 @@ class VidmeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Fishing for piranha - the easy way',
              'description': 'source: https://www.facebook.com/photo.php?v=312276045600871',
-            'duration': 119.92,
+            'thumbnail': 're:^https?://.*\.jpg',
              'timestamp': 1406313244,
              'upload_date': '20140725',
+            'age_limit': 0,
+            'duration': 119.92,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+    }, {
+        'url': 'https://vid.me/Gc6M',
+        'md5': 'f42d05e7149aeaec5c037b17e5d3dc82',
+        'info_dict': {
+            'id': 'Gc6M',
+            'ext': 'mp4',
+            'title': 'O Mere Dil ke chain - Arnav and Khushi VM',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1441211642,
+            'upload_date': '20150902',
+            'uploader': 'SunshineM',
+            'uploader_id': '3552827',
+            'age_limit': 0,
+            'duration': 223.72,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # tests uploader field
+        'url': 'https://vid.me/4Iib',
+        'info_dict': {
+            'id': '4Iib',
+            'ext': 'mp4',
+            'title': 'The Carver',
+            'description': 'md5:e9c24870018ae8113be936645b93ba3c',
              'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1433203629,
+            'upload_date': '20150602',
+            'uploader': 'Thomas',
+            'uploader_id': '109747',
+            'age_limit': 0,
+            'duration': 97.859999999999999,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+        'params': {
+            'skip_download': True,
          },
      }, {
-        # From http://naked-yogi.tumblr.com/post/118312946248/naked-smoking-stretching
+        # nsfw test from http://naked-yogi.tumblr.com/post/118312946248/naked-smoking-stretching
          'url': 'https://vid.me/e/Wmur',
+        'info_dict': {
+            'id': 'Wmur',
+            'ext': 'mp4',
+            'title': 'naked smoking & stretching',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1430931613,
+            'upload_date': '20150506',
+            'uploader': 'naked-yogi',
+            'uploader_id': '1638622',
+            'age_limit': 18,
+            'duration': 653.26999999999998,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # nsfw, user-disabled
+        'url': 'https://vid.me/dzGJ',
          'only_matching': True,
+    }, {
+        # suspended
+        'url': 'https://vid.me/Ox3G',
+        'only_matching': True,
+    }, {
+        # deleted
+        'url': 'https://vid.me/KTPm',
+        'only_matching': True,
+    }, {
+        # no formats in the API response
+        'url': 'https://vid.me/e5g',
+        'info_dict': {
+            'id': 'e5g',
+            'ext': 'mp4',
+            'title': 'Video upload (e5g)',
+            'thumbnail': 're:^https?://.*\.jpg',
+            'timestamp': 1401480195,
+            'upload_date': '20140530',
+            'uploader': None,
+            'uploader_id': None,
+            'age_limit': 0,
+            'duration': 483,
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
-        url = url.replace('vid.me/e/', 'vid.me/')
          video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = self._html_search_regex(
-            r'<source src="([^"]+)"', webpage, 'video URL')
-
-        title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage, default='')
-        thumbnail = self._og_search_thumbnail(webpage)
-        timestamp = int_or_none(self._og_search_property('updated_time', webpage, fatal=False))
-        width = int_or_none(self._og_search_property('video:width', webpage, fatal=False))
-        height = int_or_none(self._og_search_property('video:height', webpage, fatal=False))
-        duration = float_or_none(self._html_search_regex(
-            r'data-duration="([^"]+)"', webpage, 'duration', fatal=False))
-        view_count = str_to_int(self._html_search_regex(
-            r'<(?:li|span) class="video_views">\s*([\d,\.]+)\s*plays?', webpage, 'view count', fatal=False))
-        like_count = str_to_int(self._html_search_regex(
-            r'class="score js-video-vote-score"[^>]+data-score="([\d,\.\s]+)">',
-            webpage, 'like count', fatal=False))
+
+        try:
+            response = self._download_json(
+                'https://api.vid.me/videoByUrl/%s' % video_id, video_id)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
+                response = self._parse_json(e.cause.read(), video_id)
+            else:
+                raise
+
+        error = response.get('error')
+        if error:
+            raise ExtractorError(
+                '%s returned error: %s' % (self.IE_NAME, error), expected=True)
+
+        video = response['video']
+
+        if video.get('state') == 'deleted':
+            raise ExtractorError(
+                'Vidme said: Sorry, this video has been deleted.',
+                expected=True)
+
+        if video.get('state') in ('user-disabled', 'suspended'):
+            raise ExtractorError(
+                'Vidme said: This video has been suspended either due to a copyright claim, '
+                'or for violating the terms of use.',
+                expected=True)
+
+        formats = [{
+            'format_id': f.get('type'),
+            'url': f['uri'],
+            'width': int_or_none(f.get('width')),
+            'height': int_or_none(f.get('height')),
+            'preference': 0 if f.get('type', '').endswith('clip') else 1,
+        } for f in video.get('formats', []) if f.get('uri')]
+
+        if not formats and video.get('complete_url'):
+            formats.append({
+                'url': video.get('complete_url'),
+                'width': int_or_none(video.get('width')),
+                'height': int_or_none(video.get('height')),
+            })
+
+        self._sort_formats(formats)
+
+        title = video['title']
+        description = video.get('description')
+        thumbnail = video.get('thumbnail_url')
+        timestamp = parse_iso8601(video.get('date_created'), ' ')
+        uploader = video.get('user', {}).get('username')
+        uploader_id = video.get('user', {}).get('user_id')
+        age_limit = 18 if video.get('nsfw') is True else 0
+        duration = float_or_none(video.get('duration'))
+        view_count = int_or_none(video.get('view_count'))
+        like_count = int_or_none(video.get('likes_count'))
+        comment_count = int_or_none(video.get('comment_count'))
  
          return {
              'id': video_id,
-            'url': video_url,
-            'title': title,
+            'title': title or 'Video upload (%s)' % video_id,
              'description': description,
              'thumbnail': thumbnail,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'age_limit': age_limit,
              'timestamp': timestamp,
-            'width': width,
-            'height': height,
              'duration': duration,
              'view_count': view_count,
              'like_count': like_count,
+            'comment_count': comment_count,
+            'formats': formats,
          }
+
+
+class VidmeListBaseIE(InfoExtractor):
+    # Max possible limit according to https://docs.vid.me/#api-Videos-List
+    _LIMIT = 100
+
+    def _entries(self, user_id, user_name):
+        for page_num in itertools.count(1):
+            page = self._download_json(
+                'https://api.vid.me/videos/%s?user=%s&limit=%d&offset=%d'
+                % (self._API_ITEM, user_id, self._LIMIT, (page_num - 1) * self._LIMIT),
+                user_name, 'Downloading user %s page %d' % (self._API_ITEM, page_num))
+
+            videos = page.get('videos', [])
+            if not videos:
+                break
+
+            for video in videos:
+                video_url = video.get('full_url') or video.get('embed_url')
+                if video_url:
+                    yield self.url_result(video_url, VidmeIE.ie_key())
+
+            total = int_or_none(page.get('page', {}).get('total'))
+            if total and self._LIMIT * page_num >= total:
+                break
+
+    def _real_extract(self, url):
+        user_name = self._match_id(url)
+
+        user_id = self._download_json(
+            'https://api.vid.me/userByUsername?username=%s' % user_name,
+            user_name)['user']['user_id']
+
+        return self.playlist_result(
+            self._entries(user_id, user_name), user_id,
+            '%s - %s' % (user_name, self._TITLE))
+
+
+class VidmeUserIE(VidmeListBaseIE):
+    IE_NAME = 'vidme:user'
+    _VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{6,})(?!/likes)(?:[^\da-zA-Z]|$)'
+    _API_ITEM = 'list'
+    _TITLE = 'Videos'
+    _TEST = {
+        'url': 'https://vid.me/EFARCHIVE',
+        'info_dict': {
+            'id': '3834632',
+            'title': 'EFARCHIVE - %s' % _TITLE,
+        },
+        'playlist_mincount': 238,
+    }
+
+
+class VidmeUserLikesIE(VidmeListBaseIE):
+    IE_NAME = 'vidme:user:likes'
+    _VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{6,})/likes'
+    _API_ITEM = 'likes'
+    _TITLE = 'Likes'
+    _TEST = {
+        'url': 'https://vid.me/ErinAlexis/likes',
+        'info_dict': {
+            'id': '6483530',
+            'title': 'ErinAlexis - %s' % _TITLE,
+        },
+        'playlist_mincount': 415,
+    }
diff --git a/youtube_dl/extractor/vidzi.py b/youtube_dl/extractor/vidzi.py

index 08a5a7b8ddbace34da0a4c6bb751562eb51a95ba..3c78fb3d5a071a6f49dec7467e620c0b8a01ded9 100644 (file)
--- a/youtube_dl/extractor/vidzi.py
+++ b/youtube_dl/extractor/vidzi.py
@@ -1,10 +1,14 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .common import InfoExtractor
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    decode_packed_codes,
+    js_to_json,
+)
  
  
-class VidziIE(InfoExtractor):
+class VidziIE(JWPlatformBaseIE):
      _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?P<id>\w+)'
      _TEST = {
          'url': 'http://vidzi.tv/cghql9yq6emu.html',
@@ -14,19 +18,25 @@ class VidziIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'youtube-dl test video  1\\\\2\'3/4<5\\\\6ä7↭',
          },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        video_url = self._html_search_regex(
-            r'{\s*file\s*:\s*"([^"]+)"\s*}', webpage, 'video url')
          title = self._html_search_regex(
              r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
  
-        return {
-            'id': video_id,
-            'title': title,
-            'url': video_url,
-        }
+        code = decode_packed_codes(webpage).replace('\\\'', '\'')
+        jwplayer_data = self._parse_json(
+            self._search_regex(r'setup\(([^)]+)\)', code, 'jwplayer data'),
+            video_id, transform_source=js_to_json)
+
+        info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
+        info_dict['title'] = title
+
+        return info_dict
diff --git a/youtube_dl/extractor/vier.py b/youtube_dl/extractor/vier.py

index 15377097e658b20e75a08f19b370be3bef2158c7..6645c6186dbff315e850f22ae793677803cbbf9b 100644 (file)
--- a/youtube_dl/extractor/vier.py
+++ b/youtube_dl/extractor/vier.py
@@ -2,6 +2,7 @@
  from __future__ import unicode_literals
  
  import re
+import itertools
  
  from .common import InfoExtractor
  
@@ -49,6 +50,7 @@ class VierIE(InfoExtractor):
  
          playlist_url = 'http://vod.streamcloud.be/%s/mp4:_definst_/%s.mp4/playlist.m3u8' % (application, filename)
          formats = self._extract_m3u8_formats(playlist_url, display_id, 'mp4')
+        self._sort_formats(formats)
  
          title = self._og_search_title(webpage, default=display_id)
          description = self._og_search_description(webpage, default=None)
@@ -91,31 +93,27 @@ class VierVideosIE(InfoExtractor):
          mobj = re.match(self._VALID_URL, url)
          program = mobj.group('program')
  
-        webpage = self._download_webpage(url, program)
-
          page_id = mobj.group('page')
          if page_id:
              page_id = int(page_id)
              start_page = page_id
-            last_page = start_page + 1
              playlist_id = '%s-page%d' % (program, page_id)
          else:
              start_page = 0
-            last_page = int(self._search_regex(
-                r'videos\?page=(\d+)">laatste</a>',
-                webpage, 'last page', default=0)) + 1
              playlist_id = program
  
          entries = []
-        for current_page_id in range(start_page, last_page):
+        for current_page_id in itertools.count(start_page):
              current_page = self._download_webpage(
                  'http://www.vier.be/%s/videos?page=%d' % (program, current_page_id),
                  program,
-                'Downloading page %d' % (current_page_id + 1)) if current_page_id != page_id else webpage
+                'Downloading page %d' % (current_page_id + 1))
              page_entries = [
                  self.url_result('http://www.vier.be' + video_url, 'Vier')
                  for video_url in re.findall(
                      r'<h3><a href="(/[^/]+/videos/[^/]+(?:/\d+)?)">', current_page)]
              entries.extend(page_entries)
+            if page_id or '>Meer<' not in current_page:
+                break
  
          return self.playlist_result(entries, playlist_id)
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

index 393b63618a9a872e2b5313e33a6c73eff5a5dccb..7839225d426ddc17f1f4ca948fa4af5e6685d37e 100644 (file)
--- a/youtube_dl/extractor/viewster.py
+++ b/youtube_dl/extractor/viewster.py
@@ -1,27 +1,34 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_request,
+    compat_HTTPError,
      compat_urllib_parse,
+    compat_urllib_parse_unquote,
  )
  from ..utils import (
      determine_ext,
+    ExtractorError,
      int_or_none,
      parse_iso8601,
+    sanitized_Request,
+    HEADRequest,
+    url_basename,
  )
  
  
  class ViewsterIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?viewster\.com/(?:serie|movie)/(?P<id>\d+-\d+-\d+)'
+    _VALID_URL = r'https?://(?:www\.)?viewster\.com/(?:serie|movie)/(?P<id>\d+-\d+-\d+)'
      _TESTS = [{
          # movie, Type=Movie
          'url': 'http://www.viewster.com/movie/1140-11855-000/the-listening-project/',
-        'md5': '14d3cfffe66d57b41ae2d9c873416f01',
+        'md5': 'e642d1b27fcf3a4ffa79f194f5adde36',
          'info_dict': {
              'id': '1140-11855-000',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'The listening Project',
              'description': 'md5:bac720244afd1a8ea279864e67baa071',
              'timestamp': 1214870400,
@@ -31,10 +38,10 @@ class ViewsterIE(InfoExtractor):
      }, {
          # series episode, Type=Episode
          'url': 'http://www.viewster.com/serie/1284-19427-001/the-world-and-a-wall/',
-        'md5': 'd5434c80fcfdb61651cc2199a88d6ba3',
+        'md5': '9243079a8531809efe1b089db102c069',
          'info_dict': {
              'id': '1284-19427-001',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'The World and a Wall',
              'description': 'md5:24814cf74d3453fdf5bfef9716d073e3',
              'timestamp': 1428192000,
@@ -59,12 +66,20 @@ class ViewsterIE(InfoExtractor):
              'description': 'md5:e7097a8fc97151e25f085c9eb7a1cdb1',
          },
          'playlist_mincount': 16,
+    }, {
+        # geo restricted series
+        'url': 'https://www.viewster.com/serie/1280-18794-002/',
+        'only_matching': True,
+    }, {
+        # geo restricted video
+        'url': 'https://www.viewster.com/serie/1280-18794-002/what-is-extraterritoriality-lawo/',
+        'only_matching': True,
      }]
  
      _ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01'
  
      def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True):
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          request.add_header('Accept', self._ACCEPT_HEADER)
          request.add_header('Auth-token', self._AUTH_TOKEN)
          return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal)
@@ -72,9 +87,9 @@ class ViewsterIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
          # Get 'api_token' cookie
-        self._request_webpage(url, video_id)
-        cookies = self._get_cookies(url)
-        self._AUTH_TOKEN = compat_urllib_parse.unquote(cookies['api_token'].value)
+        self._request_webpage(HEADRequest('http://www.viewster.com/'), video_id)
+        cookies = self._get_cookies('http://www.viewster.com/')
+        self._AUTH_TOKEN = compat_urllib_parse_unquote(cookies['api_token'].value)
  
          info = self._download_json(
              'https://public-api.viewster.com/search/%s' % video_id,
@@ -83,10 +98,16 @@ class ViewsterIE(InfoExtractor):
          entry_id = info.get('Id') or info['id']
  
          # unfinished serie has no Type
-        if info.get('Type') in ['Serie', None]:
-            episodes = self._download_json(
-                'https://public-api.viewster.com/series/%s/episodes' % entry_id,
-                video_id, 'Downloading series JSON')
+        if info.get('Type') in ('Serie', None):
+            try:
+                episodes = self._download_json(
+                    'https://public-api.viewster.com/series/%s/episodes' % entry_id,
+                    video_id, 'Downloading series JSON')
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
+                    self.raise_geo_restricted()
+                else:
+                    raise
              entries = [
                  self.url_result(
                      'http://www.viewster.com/movie/%s' % episode['OriginId'], 'Viewster')
@@ -96,7 +117,8 @@ class ViewsterIE(InfoExtractor):
              return self.playlist_result(entries, video_id, title, description)
  
          formats = []
-        for media_type in ('application/f4m+xml', 'application/x-mpegURL'):
+        manifest_url = None
+        for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
              media = self._download_json(
                  'https://public-api.viewster.com/movies/%s/video?mediaType=%s'
                  % (entry_id, compat_urllib_parse.quote(media_type)),
@@ -108,25 +130,52 @@ class ViewsterIE(InfoExtractor):
                  continue
              ext = determine_ext(video_url)
              if ext == 'f4m':
+                manifest_url = video_url
                  video_url += '&' if '?' in video_url else '?'
                  video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
                  formats.extend(self._extract_f4m_formats(
                      video_url, video_id, f4m_id='hds'))
              elif ext == 'm3u8':
-                formats.extend(self._extract_m3u8_formats(
+                manifest_url = video_url
+                m3u8_formats = self._extract_m3u8_formats(
                      video_url, video_id, 'mp4', m3u8_id='hls',
-                    fatal=False  # m3u8 sometimes fail
-                ))
+                    fatal=False)  # m3u8 sometimes fail
+                if m3u8_formats:
+                    formats.extend(m3u8_formats)
              else:
-                formats.append({
-                    'url': video_url,
-                })
+                qualities_basename = self._search_regex(
+                    '/([^/]+)\.csmil/',
+                    manifest_url, 'qualities basename', default=None)
+                if not qualities_basename:
+                    continue
+                QUALITIES_RE = r'((,\d+k)+,?)'
+                qualities = self._search_regex(
+                    QUALITIES_RE, qualities_basename,
+                    'qualities', default=None)
+                if not qualities:
+                    continue
+                qualities = qualities.strip(',').split(',')
+                http_template = re.sub(QUALITIES_RE, r'%s', qualities_basename)
+                http_url_basename = url_basename(video_url)
+                for q in qualities:
+                    tbr = int_or_none(self._search_regex(
+                        r'(\d+)k', q, 'bitrate', default=None))
+                    formats.append({
+                        'url': video_url.replace(http_url_basename, http_template % q),
+                        'ext': 'mp4',
+                        'format_id': 'http' + ('-%d' % tbr if tbr else ''),
+                        'tbr': tbr,
+                    })
+
+        if not formats and not info.get('LanguageSets') and not info.get('VODSettings'):
+            self.raise_geo_restricted()
+
          self._sort_formats(formats)
  
-        synopsis = info.get('Synopsis', {})
+        synopsis = info.get('Synopsis') or {}
          # Prefer title outside synopsis since it's less messy
          title = (info.get('Title') or synopsis['Title']).strip()
-        description = synopsis.get('Detailed') or info.get('Synopsis', {}).get('Short')
+        description = synopsis.get('Detailed') or (info.get('Synopsis') or {}).get('Short')
          duration = int_or_none(info.get('Duration'))
          timestamp = parse_iso8601(info.get('ReleaseDate'))
  
diff --git a/youtube_dl/extractor/viidea.py b/youtube_dl/extractor/viidea.py

new file mode 100644 (file)

index 0000000..a4f914d
--- /dev/null
+++ b/youtube_dl/extractor/viidea.py
@@ -0,0 +1,193 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_urlparse,
+    compat_str,
+)
+from ..utils import (
+    parse_duration,
+    js_to_json,
+    parse_iso8601,
+)
+
+
+class ViideaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
+            videolectures\.net|
+            flexilearn\.viidea\.net|
+            presentations\.ocwconsortium\.org|
+            video\.travel-zoom\.si|
+            video\.pomp-forum\.si|
+            tv\.nil\.si|
+            video\.hekovnik.com|
+            video\.szko\.si|
+            kpk\.viidea\.com|
+            inside\.viidea\.net|
+            video\.kiberpipa\.org|
+            bvvideo\.si|
+            kongres\.viidea\.net|
+            edemokracija\.viidea\.com
+        )(?:/lecture)?/(?P<id>[^/]+)(?:/video/(?P<part>\d+))?/*(?:[#?].*)?$'''
+
+    _TESTS = [{
+        'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/',
+        'info_dict': {
+            'id': '20171',
+            'display_id': 'promogram_igor_mekjavic_eng',
+            'ext': 'mp4',
+            'title': 'Automatics, robotics and biocybernetics',
+            'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1372349289,
+            'upload_date': '20130627',
+            'duration': 565,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+    }, {
+        # video with invalid direct format links (HTTP 403)
+        'url': 'http://videolectures.net/russir2010_filippova_nlp/',
+        'info_dict': {
+            'id': '14891',
+            'display_id': 'russir2010_filippova_nlp',
+            'ext': 'flv',
+            'title': 'NLP at Google',
+            'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1284375600,
+            'upload_date': '20100913',
+            'duration': 5352,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }, {
+        # event playlist
+        'url': 'http://videolectures.net/deeplearning2015_montreal/',
+        'info_dict': {
+            'id': '23181',
+            'title': 'Deep Learning Summer School, Montreal 2015',
+            'description': 'md5:0533a85e4bd918df52a01f0e1ebe87b7',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1438560000,
+        },
+        'playlist_count': 30,
+    }, {
+        # multi part lecture
+        'url': 'http://videolectures.net/mlss09uk_bishop_ibi/',
+        'info_dict': {
+            'id': '9737',
+            'display_id': 'mlss09uk_bishop_ibi',
+            'title': 'Introduction To Bayesian Inference',
+            'thumbnail': 're:http://.*\.jpg',
+            'timestamp': 1251622800,
+        },
+        'playlist': [{
+            'info_dict': {
+                'id': '9737_part1',
+                'display_id': 'mlss09uk_bishop_ibi_part1',
+                'ext': 'wmv',
+                'title': 'Introduction To Bayesian Inference (Part 1)',
+                'thumbnail': 're:http://.*\.jpg',
+                'duration': 4622,
+                'timestamp': 1251622800,
+                'upload_date': '20090830',
+            },
+        }, {
+            'info_dict': {
+                'id': '9737_part2',
+                'display_id': 'mlss09uk_bishop_ibi_part2',
+                'ext': 'wmv',
+                'title': 'Introduction To Bayesian Inference (Part 2)',
+                'thumbnail': 're:http://.*\.jpg',
+                'duration': 5641,
+                'timestamp': 1251622800,
+                'upload_date': '20090830',
+            },
+        }],
+        'playlist_count': 2,
+    }]
+
+    def _real_extract(self, url):
+        lecture_slug, explicit_part_id = re.match(self._VALID_URL, url).groups()
+
+        webpage = self._download_webpage(url, lecture_slug)
+
+        cfg = self._parse_json(self._search_regex(
+            [r'cfg\s*:\s*({.+?})\s*,\s*[\da-zA-Z_]+\s*:\s*\(?\s*function',
+             r'cfg\s*:\s*({[^}]+})'],
+            webpage, 'cfg'), lecture_slug, js_to_json)
+
+        lecture_id = compat_str(cfg['obj_id'])
+
+        base_url = self._proto_relative_url(cfg['livepipe'], 'http:')
+
+        lecture_data = self._download_json(
+            '%s/site/api/lecture/%s?format=json' % (base_url, lecture_id),
+            lecture_id)['lecture'][0]
+
+        lecture_info = {
+            'id': lecture_id,
+            'display_id': lecture_slug,
+            'title': lecture_data['title'],
+            'timestamp': parse_iso8601(lecture_data.get('time')),
+            'description': lecture_data.get('description_wiki'),
+            'thumbnail': lecture_data.get('thumb'),
+        }
+
+        playlist_entries = []
+        lecture_type = lecture_data.get('type')
+        parts = [compat_str(video) for video in cfg.get('videos', [])]
+        if parts:
+            multipart = len(parts) > 1
+
+            def extract_part(part_id):
+                smil_url = '%s/%s/video/%s/smil.xml' % (base_url, lecture_slug, part_id)
+                smil = self._download_smil(smil_url, lecture_id)
+                info = self._parse_smil(smil, smil_url, lecture_id)
+                self._sort_formats(info['formats'])
+                info['id'] = lecture_id if not multipart else '%s_part%s' % (lecture_id, part_id)
+                info['display_id'] = lecture_slug if not multipart else '%s_part%s' % (lecture_slug, part_id)
+                if multipart:
+                    info['title'] += ' (Part %s)' % part_id
+                switch = smil.find('.//switch')
+                if switch is not None:
+                    info['duration'] = parse_duration(switch.attrib.get('dur'))
+                item_info = lecture_info.copy()
+                item_info.update(info)
+                return item_info
+
+            if explicit_part_id or not multipart:
+                result = extract_part(explicit_part_id or parts[0])
+            else:
+                result = {
+                    '_type': 'multi_video',
+                    'entries': [extract_part(part) for part in parts],
+                }
+                result.update(lecture_info)
+
+            # Immediately return explicitly requested part or non event item
+            if explicit_part_id or lecture_type != 'evt':
+                return result
+
+            playlist_entries.append(result)
+
+        # It's probably a playlist
+        if not parts or lecture_type == 'evt':
+            playlist_webpage = self._download_webpage(
+                '%s/site/ajax/drilldown/?id=%s' % (base_url, lecture_id), lecture_id)
+            entries = [
+                self.url_result(compat_urlparse.urljoin(url, video_url), 'Viidea')
+                for _, video_url in re.findall(
+                    r'<a[^>]+href=(["\'])(.+?)\1[^>]+id=["\']lec=\d+', playlist_webpage)]
+            playlist_entries.extend(entries)
+
+        playlist = self.playlist_result(playlist_entries, lecture_id)
+        playlist.update(lecture_info)
+        return playlist
diff --git a/youtube_dl/extractor/viki.py b/youtube_dl/extractor/viki.py

index ddbd395c89ee183dee719f549cbe534c64835769..e04b814c8cf27755bfe0a86af3d5bf43262bd0da 100644 (file)
--- a/youtube_dl/extractor/viki.py
+++ b/youtube_dl/extractor/viki.py
@@ -7,14 +7,14 @@ import hmac
  import hashlib
  import itertools
  
+from .common import InfoExtractor
  from ..utils import (
      ExtractorError,
      int_or_none,
      parse_age_limit,
      parse_iso8601,
+    sanitized_Request,
  )
-from ..compat import compat_urllib_request
-from .common import InfoExtractor
  
  
  class VikiBaseIE(InfoExtractor):
@@ -30,6 +30,12 @@ class VikiBaseIE(InfoExtractor):
  
      _token = None
  
+    _ERRORS = {
+        'geo': 'Sorry, this content is not available in your region.',
+        'upcoming': 'Sorry, this content is not yet available.',
+        # 'paywall': 'paywall',
+    }
+
      def _prepare_call(self, path, timestamp=None, post_data=None):
          path += '?' if '?' not in path else '&'
          if not timestamp:
@@ -43,7 +49,7 @@ class VikiBaseIE(InfoExtractor):
              hashlib.sha1
          ).hexdigest()
          url = self._API_URL_TEMPLATE % (query, sig)
-        return compat_urllib_request.Request(
+        return sanitized_Request(
              url, json.dumps(post_data).encode('utf-8')) if post_data else url
  
      def _call_api(self, path, video_id, note, timestamp=None, post_data=None):
@@ -67,6 +73,12 @@ class VikiBaseIE(InfoExtractor):
              '%s returned error: %s' % (self.IE_NAME, error),
              expected=True)
  
+    def _check_errors(self, data):
+        for reason, status in data.get('blocking', {}).items():
+            if status and reason in self._ERRORS:
+                raise ExtractorError('%s said: %s' % (
+                    self.IE_NAME, self._ERRORS[reason]), expected=True)
+
      def _real_initialize(self):
          self._login()
  
@@ -164,13 +176,13 @@ class VikiIE(VikiBaseIE):
      }, {
          # youtube external
          'url': 'http://www.viki.com/videos/50562v-poor-nastya-complete-episode-1',
-        'md5': '216d1afdc0c64d1febc1e9f2bd4b864b',
+        'md5': '63f8600c1da6f01b7640eee7eca4f1da',
          'info_dict': {
              'id': '50562v',
-            'ext': 'mp4',
+            'ext': 'webm',
              'title': 'Poor Nastya [COMPLETE] - Episode 1',
              'description': '',
-            'duration': 607,
+            'duration': 606,
              'timestamp': 1274949505,
              'upload_date': '20101213',
              'uploader': 'ad14065n',
@@ -193,6 +205,7 @@ class VikiIE(VikiBaseIE):
              'timestamp': 1321985454,
              'description': 'md5:44b1e46619df3a072294645c770cef36',
              'title': 'Love In Magic',
+            'age_limit': 13,
          },
      }]
  
@@ -202,6 +215,8 @@ class VikiIE(VikiBaseIE):
          video = self._call_api(
              'videos/%s.json' % video_id, video_id, 'Downloading video JSON')
  
+        self._check_errors(video)
+
          title = self.dict_selection(video.get('titles', {}), 'en')
          if not title:
              title = 'Episode %d' % video.get('number') if video.get('type') == 'episode' else video.get('id') or video_id
@@ -262,8 +277,9 @@ class VikiIE(VikiBaseIE):
                  r'^(\d+)[pP]$', format_id, 'height', default=None))
              for protocol, format_dict in stream_dict.items():
                  if format_id == 'm3u8':
-                    formats = self._extract_m3u8_formats(
-                        format_dict['url'], video_id, 'mp4', m3u8_id='m3u8-%s' % protocol)
+                    formats.extend(self._extract_m3u8_formats(
+                        format_dict['url'], video_id, 'mp4', 'm3u8_native',
+                        m3u8_id='m3u8-%s' % protocol, fatal=False))
                  else:
                      formats.append({
                          'url': format_dict['url'],
@@ -315,6 +331,8 @@ class VikiChannelIE(VikiBaseIE):
              'containers/%s.json' % channel_id, channel_id,
              'Downloading channel JSON')
  
+        self._check_errors(channel)
+
          title = self.dict_selection(channel['titles'], 'en')
  
          description = self.dict_selection(channel['descriptions'], 'en')
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index 10d6745af703e00d6962d3e14c8b01f2419ad955..59f9cb1ae49ab0adb7fcc62ec81b12c30e652b28 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -8,27 +8,29 @@ import itertools
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_urllib_parse,
-    compat_urllib_request,
      compat_urlparse,
  )
  from ..utils import (
+    determine_ext,
      ExtractorError,
      InAdvancePagedList,
      int_or_none,
      RegexNotFoundError,
+    sanitized_Request,
      smuggle_url,
      std_headers,
      unified_strdate,
      unsmuggle_url,
      urlencode_postdata,
      unescapeHTML,
+    parse_filesize,
  )
  
  
  class VimeoBaseInfoExtractor(InfoExtractor):
      _NETRC_MACHINE = 'vimeo'
      _LOGIN_REQUIRED = False
+    _LOGIN_URL = 'https://vimeo.com/log_in'
  
      def _login(self):
          (username, password) = self._get_login_info()
@@ -37,36 +39,60 @@ class VimeoBaseInfoExtractor(InfoExtractor):
                  raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
              return
          self.report_login()
-        login_url = 'https://vimeo.com/log_in'
-        webpage = self._download_webpage(login_url, None, False)
-        token = self._search_regex(r'xsrft":"(.*?)"', webpage, 'login token')
+        webpage = self._download_webpage(self._LOGIN_URL, None, False)
+        token, vuid = self._extract_xsrft_and_vuid(webpage)
          data = urlencode_postdata({
+            'action': 'login',
              'email': username,
              'password': password,
-            'action': 'login',
              'service': 'vimeo',
              'token': token,
          })
-        login_request = compat_urllib_request.Request(login_url, data)
+        login_request = sanitized_Request(self._LOGIN_URL, data)
          login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        login_request.add_header('Cookie', 'xsrft=%s' % token)
+        login_request.add_header('Referer', self._LOGIN_URL)
+        self._set_vimeo_cookie('vuid', vuid)
          self._download_webpage(login_request, None, False, 'Wrong login info')
  
+    def _extract_xsrft_and_vuid(self, webpage):
+        xsrft = self._search_regex(
+            r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
+            webpage, 'login token', group='xsrft')
+        vuid = self._search_regex(
+            r'["\']vuid["\']\s*:\s*(["\'])(?P<vuid>.+?)\1',
+            webpage, 'vuid', group='vuid')
+        return xsrft, vuid
+
+    def _set_vimeo_cookie(self, name, value):
+        self._set_cookie('vimeo.com', name, value)
+
  
  class VimeoIE(VimeoBaseInfoExtractor):
      """Information extractor for vimeo.com."""
  
      # _VALID_URL matches Vimeo URLs
      _VALID_URL = r'''(?x)
-        https?://
-        (?:(?:www|(?P<player>player))\.)?
-        vimeo(?P<pro>pro)?\.com/
-        (?!channels/[^/?#]+/?(?:$|[?#])|album/)
-        (?:.*?/)?
-        (?:(?:play_redirect_hls|moogaloop\.swf)\?clip_id=)?
-        (?:videos?/)?
-        (?P<id>[0-9]+)
-        /?(?:[?&].*)?(?:[#].*)?$'''
+                    https?://
+                        (?:
+                            (?:
+                                www|
+                                (?P<player>player)
+                            )
+                            \.
+                        )?
+                        vimeo(?P<pro>pro)?\.com/
+                        (?!channels/[^/?#]+/?(?:$|[?#])|[^/]+/review/|(?:album|ondemand)/)
+                        (?:.*?/)?
+                        (?:
+                            (?:
+                                play_redirect_hls|
+                                moogaloop\.swf)\?clip_id=
+                            )?
+                        (?:videos?/)?
+                        (?P<id>[0-9]+)
+                        (?:/[\da-f]+)?
+                        /?(?:[?&].*)?(?:[#].*)?$
+                    '''
      IE_NAME = 'vimeo'
      _TESTS = [
          {
@@ -75,12 +101,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'info_dict': {
                  'id': '56015672',
                  'ext': 'mp4',
-                "upload_date": "20121220",
-                "description": "This is a test case for youtube-dl.\nFor more information, see github.com/rg3/youtube-dl\nTest chars: \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
-                "uploader_id": "user7108434",
-                "uploader": "Filippo Valsorda",
-                "title": "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
-                "duration": 10,
+                'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
+                'description': 'md5:2d3305bad981a06ff79f027f19865021',
+                'upload_date': '20121220',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user7108434',
+                'uploader_id': 'user7108434',
+                'uploader': 'Filippo Valsorda',
+                'duration': 10,
              },
          },
          {
@@ -90,10 +117,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'info_dict': {
                  'id': '68093876',
                  'ext': 'mp4',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
                  'uploader_id': 'openstreetmapus',
                  'uploader': 'OpenStreetMap US',
                  'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
-                'description': 'md5:380943ec71b89736ff4bf27183233d09',
+                'description': 'md5:fd69a7b8d8c34a4e1d2ec2e4afd6ec30',
                  'duration': 1595,
              },
          },
@@ -106,6 +134,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
                  'uploader': 'The BLN & Business of Software',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
                  'uploader_id': 'theblnbusinessofsoftware',
                  'duration': 3610,
                  'description': None,
@@ -120,10 +149,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'youtube-dl password protected test video',
                  'upload_date': '20130614',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user18948128',
                  'uploader_id': 'user18948128',
                  'uploader': 'Jaime Marquínez Ferrándiz',
                  'duration': 10,
-                'description': 'This is "youtube-dl password protected test video" by Jaime Marquínez Ferrándiz on Vimeo, the home for high quality videos and the people who love them.',
+                'description': 'This is "youtube-dl password protected test video" by Jaime Marquínez Ferrándiz on Vimeo, the home for high quality videos and the people\u2026',
              },
              'params': {
                  'videopassword': 'youtube-dl',
@@ -139,6 +169,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Key & Peele: Terrorist Interrogation',
                  'description': 'md5:8678b246399b070816b12313e8b4eb5c',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/atencio',
                  'uploader_id': 'atencio',
                  'uploader': 'Peter Atencio',
                  'upload_date': '20130927',
@@ -147,7 +178,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
          },
          {
              'url': 'http://vimeo.com/76979871',
-            'md5': '3363dd6ffebe3784d56f4132317fd446',
              'note': 'Video with subtitles',
              'info_dict': {
                  'id': '76979871',
@@ -155,6 +185,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'title': 'The New Vimeo Player (You Know, For Videos)',
                  'description': 'md5:2ec900bf97c3f389378a96aee11260ea',
                  'upload_date': '20131015',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/staff',
                  'uploader_id': 'staff',
                  'uploader': 'Vimeo Staff',
                  'duration': 62,
@@ -169,9 +200,43 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Pier Solar OUYA Official Trailer',
                  'uploader': 'Tulio Gonçalves',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user28849593',
                  'uploader_id': 'user28849593',
              },
          },
+        {
+            # contains original format
+            'url': 'https://vimeo.com/33951933',
+            'md5': '53c688fa95a55bf4b7293d37a89c5c53',
+            'info_dict': {
+                'id': '33951933',
+                'ext': 'mp4',
+                'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute',
+                'uploader': 'The DMCI',
+                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/dmci',
+                'uploader_id': 'dmci',
+                'upload_date': '20111220',
+                'description': 'md5:ae23671e82d05415868f7ad1aec21147',
+            },
+        },
+        {
+            'url': 'https://vimeo.com/109815029',
+            'note': 'Video not completely processed, "failed" seed status',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
+            'only_matching': True,
+        },
+        {
+            # source file returns 403: Forbidden
+            'url': 'https://vimeo.com/7809605',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://vimeo.com/160743502/abd0e13fb4',
+            'only_matching': True,
+        }
      ]
  
      @staticmethod
@@ -181,7 +246,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
          if mobj:
              player_url = unescapeHTML(mobj.group('url'))
-            surl = smuggle_url(player_url, {'Referer': url})
+            surl = smuggle_url(player_url, {'http_headers': {'Referer': url}})
              return surl
          # Look for embedded (swf embed) Vimeo player
          mobj = re.search(
@@ -190,10 +255,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
              return mobj.group(1)
  
      def _verify_video_password(self, url, video_id, webpage):
-        password = self._downloader.params.get('videopassword', None)
+        password = self._downloader.params.get('videopassword')
          if password is None:
              raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
-        token = self._search_regex(r'xsrft[\s=:"\']+([^"\']+)', webpage, 'login token')
+        token, vuid = self._extract_xsrft_and_vuid(webpage)
          data = urlencode_postdata({
              'password': password,
              'token': token,
@@ -201,35 +266,35 @@ class VimeoIE(VimeoBaseInfoExtractor):
          if url.startswith('http://'):
              # vimeo only supports https now, but the user can give an http url
              url = url.replace('http://', 'https://')
-        password_request = compat_urllib_request.Request(url + '/password', data)
+        password_request = sanitized_Request(url + '/password', data)
          password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
-        password_request.add_header('Cookie', 'xsrft=%s' % token)
+        password_request.add_header('Referer', url)
+        self._set_vimeo_cookie('vuid', vuid)
          return self._download_webpage(
              password_request, video_id,
              'Verifying the password', 'Wrong password')
  
      def _verify_player_video_password(self, url, video_id):
-        password = self._downloader.params.get('videopassword', None)
+        password = self._downloader.params.get('videopassword')
          if password is None:
              raise ExtractorError('This video is protected by a password, use the --video-password option')
-        data = compat_urllib_parse.urlencode({'password': password})
+        data = urlencode_postdata({'password': password})
          pass_url = url + '/check-password'
-        password_request = compat_urllib_request.Request(pass_url, data)
+        password_request = sanitized_Request(pass_url, data)
          password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
+        password_request.add_header('Referer', url)
          return self._download_json(
              password_request, video_id,
-            'Verifying the password',
-            'Wrong password')
+            'Verifying the password', 'Wrong password')
  
      def _real_initialize(self):
          self._login()
  
      def _real_extract(self, url):
-        url, data = unsmuggle_url(url)
-        headers = std_headers
-        if data is not None:
-            headers = headers.copy()
-            headers.update(data)
+        url, data = unsmuggle_url(url, {})
+        headers = std_headers.copy()
+        if 'http_headers' in data:
+            headers.update(data['http_headers'])
          if 'Referer' not in headers:
              headers['Referer'] = url
  
@@ -243,7 +308,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              url = 'https://vimeo.com/' + video_id
  
          # Retrieve video webpage to extract further information
-        request = compat_urllib_request.Request(url, None, headers)
+        request = sanitized_Request(url, headers=headers)
          try:
              webpage = self._download_webpage(request, video_id)
          except ExtractorError as ee:
@@ -263,20 +328,30 @@ class VimeoIE(VimeoBaseInfoExtractor):
          self.report_extraction(video_id)
  
          vimeo_config = self._search_regex(
-            r'vimeo\.config\s*=\s*({.+?});', webpage,
+            r'vimeo\.config\s*=\s*(?:({.+?})|_extend\([^,]+,\s+({.+?})\));', webpage,
              'vimeo config', default=None)
          if vimeo_config:
              seed_status = self._parse_json(vimeo_config, video_id).get('seed_status', {})
              if seed_status.get('state') == 'failed':
                  raise ExtractorError(
-                    '%s returned error: %s' % (self.IE_NAME, seed_status['title']),
+                    '%s said: %s' % (self.IE_NAME, seed_status['title']),
                      expected=True)
  
          # Extract the config JSON
          try:
              try:
                  config_url = self._html_search_regex(
-                    r' data-config-url="(.+?)"', webpage, 'config URL')
+                    r' data-config-url="(.+?)"', webpage,
+                    'config URL', default=None)
+                if not config_url:
+                    # Sometimes new react-based page is served instead of old one that require
+                    # different config URL extraction approach (see
+                    # https://github.com/rg3/youtube-dl/pull/7209)
+                    vimeo_clip_page_config = self._search_regex(
+                        r'vimeo\.clip_page_config\s*=\s*({.+?});', webpage,
+                        'vimeo clip page config')
+                    config_url = self._parse_json(
+                        vimeo_clip_page_config, video_id)['player']['config_url']
                  config_json = self._download_webpage(config_url, video_id)
                  config = json.loads(config_json)
              except RegexNotFoundError:
@@ -295,7 +370,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
  
              if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
-                if data and '_video_password_verified' in data:
+                if '_video_password_verified' in data:
                      raise ExtractorError('video password verification failed!')
                  self._verify_video_password(url, video_id, webpage)
                  return self._real_extract(
@@ -307,17 +382,25 @@ class VimeoIE(VimeoBaseInfoExtractor):
              if config.get('view') == 4:
                  config = self._verify_player_video_password(url, video_id)
  
+        if '>You rented this title.<' in webpage:
+            feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
+            if feature_id and not data.get('force_feature_id', False):
+                return self.url_result(smuggle_url(
+                    'https://player.vimeo.com/player/%s' % feature_id,
+                    {'force_feature_id': True}), 'Vimeo')
+
          # Extract title
-        video_title = config["video"]["title"]
+        video_title = config['video']['title']
  
-        # Extract uploader and uploader_id
-        video_uploader = config["video"]["owner"]["name"]
-        video_uploader_id = config["video"]["owner"]["url"].split('/')[-1] if config["video"]["owner"]["url"] else None
+        # Extract uploader, uploader_url and uploader_id
+        video_uploader = config['video'].get('owner', {}).get('name')
+        video_uploader_url = config['video'].get('owner', {}).get('url')
+        video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
  
          # Extract video thumbnail
-        video_thumbnail = config["video"].get("thumbnail")
+        video_thumbnail = config['video'].get('thumbnail')
          if video_thumbnail is None:
-            video_thumbs = config["video"].get("thumbs")
+            video_thumbs = config['video'].get('thumbs')
              if video_thumbs and isinstance(video_thumbs, dict):
                  _, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
  
@@ -341,7 +424,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              self._downloader.report_warning('Cannot find video description')
  
          # Extract video duration
-        video_duration = int_or_none(config["video"].get("duration"))
+        video_duration = int_or_none(config['video'].get('duration'))
  
          # Extract upload date
          video_upload_date = None
@@ -359,41 +442,47 @@ class VimeoIE(VimeoBaseInfoExtractor):
              like_count = None
              comment_count = None
  
-        # Vimeo specific: extract request signature and timestamp
-        sig = config['request']['signature']
-        timestamp = config['request']['timestamp']
-
-        # Vimeo specific: extract video codec and quality information
-        # First consider quality, then codecs, then take everything
-        codecs = [('vp6', 'flv'), ('vp8', 'flv'), ('h264', 'mp4')]
-        files = {'hd': [], 'sd': [], 'other': []}
-        config_files = config["video"].get("files") or config["request"].get("files")
-        for codec_name, codec_extension in codecs:
-            for quality in config_files.get(codec_name, []):
-                format_id = '-'.join((codec_name, quality)).lower()
-                key = quality if quality in files else 'other'
-                video_url = None
-                if isinstance(config_files[codec_name], dict):
-                    file_info = config_files[codec_name][quality]
-                    video_url = file_info.get('url')
-                else:
-                    file_info = {}
-                if video_url is None:
-                    video_url = "http://player.vimeo.com/play_redirect?clip_id=%s&sig=%s&time=%s&quality=%s&codecs=%s&type=moogaloop_local&embed_location=" \
-                        % (video_id, sig, timestamp, quality, codec_name.upper())
-
-                files[key].append({
-                    'ext': codec_extension,
-                    'url': video_url,
-                    'format_id': format_id,
-                    'width': file_info.get('width'),
-                    'height': file_info.get('height'),
-                })
          formats = []
-        for key in ('other', 'sd', 'hd'):
-            formats += files[key]
-        if len(formats) == 0:
-            raise ExtractorError('No known codec found')
+        download_request = sanitized_Request('https://vimeo.com/%s?action=load_download_config' % video_id, headers={
+            'X-Requested-With': 'XMLHttpRequest'})
+        download_data = self._download_json(download_request, video_id, fatal=False)
+        if download_data:
+            source_file = download_data.get('source_file')
+            if isinstance(source_file, dict):
+                download_url = source_file.get('download_url')
+                if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
+                    source_name = source_file.get('public_name', 'Original')
+                    if self._is_valid_url(download_url, video_id, '%s video' % source_name):
+                        ext = source_file.get('extension', determine_ext(download_url)).lower()
+                        formats.append({
+                            'url': download_url,
+                            'ext': ext,
+                            'width': int_or_none(source_file.get('width')),
+                            'height': int_or_none(source_file.get('height')),
+                            'filesize': parse_filesize(source_file.get('size')),
+                            'format_id': source_name,
+                            'preference': 1,
+                        })
+        config_files = config['video'].get('files') or config['request'].get('files', {})
+        for f in config_files.get('progressive', []):
+            video_url = f.get('url')
+            if not video_url:
+                continue
+            formats.append({
+                'url': video_url,
+                'format_id': 'http-%s' % f.get('quality'),
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'fps': int_or_none(f.get('fps')),
+                'tbr': int_or_none(f.get('bitrate')),
+            })
+        m3u8_url = config_files.get('hls', {}).get('url')
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
+        # at the same time without actual units specified. This lead to wrong sorting.
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
  
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
@@ -407,6 +496,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
          return {
              'id': video_id,
              'uploader': video_uploader,
+            'uploader_url': video_uploader_url,
              'uploader_id': video_uploader_id,
              'upload_date': video_upload_date,
              'title': video_title,
@@ -422,10 +512,43 @@ class VimeoIE(VimeoBaseInfoExtractor):
          }
  
  
-class VimeoChannelIE(InfoExtractor):
+class VimeoOndemandIE(VimeoBaseInfoExtractor):
+    IE_NAME = 'vimeo:ondemand'
+    _VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        # ondemand video not available via https://vimeo.com/id
+        'url': 'https://vimeo.com/ondemand/20704',
+        'md5': 'c424deda8c7f73c1dfb3edd7630e2f35',
+        'info_dict': {
+            'id': '105442900',
+            'ext': 'mp4',
+            'title': 'המעבדה - במאי יותם פלדמן',
+            'uploader': 'גם סרטים',
+            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms',
+            'uploader_id': 'gumfilms',
+        },
+    }, {
+        'url': 'https://vimeo.com/ondemand/nazmaalik',
+        'only_matching': True,
+    }, {
+        'url': 'https://vimeo.com/ondemand/141692381',
+        'only_matching': True,
+    }, {
+        'url': 'https://vimeo.com/ondemand/thelastcolony/150274832',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        return self.url_result(self._og_search_video_url(webpage), VimeoIE.ie_key())
+
+
+class VimeoChannelIE(VimeoBaseInfoExtractor):
      IE_NAME = 'vimeo:channel'
      _VALID_URL = r'https://vimeo\.com/channels/(?P<id>[^/?#]+)/?(?:$|[?#])'
      _MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
+    _TITLE = None
      _TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'
      _TESTS = [{
          'url': 'https://vimeo.com/channels/tributes',
@@ -440,7 +563,7 @@ class VimeoChannelIE(InfoExtractor):
          return '%s/videos/page:%d/' % (base_url, pagenum)
  
      def _extract_list_title(self, webpage):
-        return self._html_search_regex(self._TITLE_RE, webpage, 'list title')
+        return self._TITLE or self._html_search_regex(self._TITLE_RE, webpage, 'list title')
  
      def _login_list_password(self, page_url, list_id, webpage):
          login_form = self._search_regex(
@@ -449,27 +572,27 @@ class VimeoChannelIE(InfoExtractor):
          if not login_form:
              return webpage
  
-        password = self._downloader.params.get('videopassword', None)
+        password = self._downloader.params.get('videopassword')
          if password is None:
              raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
          fields = self._hidden_inputs(login_form)
-        token = self._search_regex(r'xsrft[\s=:"\']+([^"\']+)', webpage, 'login token')
+        token, vuid = self._extract_xsrft_and_vuid(webpage)
          fields['token'] = token
          fields['password'] = password
          post = urlencode_postdata(fields)
          password_path = self._search_regex(
              r'action="([^"]+)"', login_form, 'password URL')
          password_url = compat_urlparse.urljoin(page_url, password_path)
-        password_request = compat_urllib_request.Request(password_url, post)
+        password_request = sanitized_Request(password_url, post)
          password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
-        self._set_cookie('vimeo.com', 'xsrft', token)
+        self._set_vimeo_cookie('vuid', vuid)
+        self._set_vimeo_cookie('xsrft', token)
  
          return self._download_webpage(
              password_request, list_id,
              'Verifying the password', 'Wrong password')
  
-    def _extract_videos(self, list_id, base_url):
-        video_ids = []
+    def _title_and_entries(self, list_id, base_url):
          for pagenum in itertools.count(1):
              page_url = self._page_url(base_url, pagenum)
              webpage = self._download_webpage(
@@ -478,18 +601,18 @@ class VimeoChannelIE(InfoExtractor):
  
              if pagenum == 1:
                  webpage = self._login_list_password(page_url, list_id, webpage)
+                yield self._extract_list_title(webpage)
+
+            for video_id in re.findall(r'id="clip_(\d+?)"', webpage):
+                yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
  
-            video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
              if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
                  break
  
-        entries = [self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo')
-                   for video_id in video_ids]
-        return {'_type': 'playlist',
-                'id': list_id,
-                'title': self._extract_list_title(webpage),
-                'entries': entries,
-                }
+    def _extract_videos(self, list_id, base_url):
+        title_and_entries = self._title_and_entries(list_id, base_url)
+        list_title = next(title_and_entries)
+        return self.playlist_result(title_and_entries, list_id, list_title)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -499,7 +622,7 @@ class VimeoChannelIE(InfoExtractor):
  
  class VimeoUserIE(VimeoChannelIE):
      IE_NAME = 'vimeo:user'
-    _VALID_URL = r'https://vimeo\.com/(?![0-9]+(?:$|[?#/]))(?P<name>[^/]+)(?:/videos|[#?]|$)'
+    _VALID_URL = r'https://vimeo\.com/(?!(?:[0-9]+|watchlater)(?:$|[?#/]))(?P<name>[^/]+)(?:/videos|[#?]|$)'
      _TITLE_RE = r'<a[^>]+?class="user">([^<>]+?)</a>'
      _TESTS = [{
          'url': 'https://vimeo.com/nkistudio/videos',
@@ -550,7 +673,7 @@ class VimeoAlbumIE(VimeoChannelIE):
  
  class VimeoGroupsIE(VimeoAlbumIE):
      IE_NAME = 'vimeo:group'
-    _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)'
+    _VALID_URL = r'https://vimeo\.com/groups/(?P<name>[^/]+)(?:/(?!videos?/\d+)|$)'
      _TESTS = [{
          'url': 'https://vimeo.com/groups/rolexawards',
          'info_dict': {
@@ -603,14 +726,14 @@ class VimeoReviewIE(InfoExtractor):
          return self.url_result(player_url, 'Vimeo', video_id)
  
  
-class VimeoWatchLaterIE(VimeoBaseInfoExtractor, VimeoChannelIE):
+class VimeoWatchLaterIE(VimeoChannelIE):
      IE_NAME = 'vimeo:watchlater'
      IE_DESC = 'Vimeo watch later list, "vimeowatchlater" keyword (requires authentication)'
-    _VALID_URL = r'https://vimeo\.com/home/watchlater|:vimeowatchlater'
+    _VALID_URL = r'https://vimeo\.com/(?:home/)?watchlater|:vimeowatchlater'
+    _TITLE = 'Watch Later'
      _LOGIN_REQUIRED = True
-    _TITLE_RE = r'href="/home/watchlater".*?>(.*?)<'
      _TESTS = [{
-        'url': 'https://vimeo.com/home/watchlater',
+        'url': 'https://vimeo.com/watchlater',
          'only_matching': True,
      }]
  
@@ -619,14 +742,14 @@ class VimeoWatchLaterIE(VimeoBaseInfoExtractor, VimeoChannelIE):
  
      def _page_url(self, base_url, pagenum):
          url = '%s/page:%d/' % (base_url, pagenum)
-        request = compat_urllib_request.Request(url)
+        request = sanitized_Request(url)
          # Set the header to get a partial html page with the ids,
          # the normal page doesn't contain them.
          request.add_header('X-Requested-With', 'XMLHttpRequest')
          return request
  
      def _real_extract(self, url):
-        return self._extract_videos('watchlater', 'https://vimeo.com/home/watchlater')
+        return self._extract_videos('watchlater', 'https://vimeo.com/watchlater')
  
  
  class VimeoLikesIE(InfoExtractor):
@@ -636,10 +759,10 @@ class VimeoLikesIE(InfoExtractor):
      _TEST = {
          'url': 'https://vimeo.com/user755559/likes/',
          'playlist_mincount': 293,
-        "info_dict": {
+        'info_dict': {
              'id': 'user755559_likes',
-            "description": "See all the videos urza likes",
-            "title": 'Videos urza likes',
+            'description': 'See all the videos urza likes',
+            'title': 'Videos urza likes',
          },
      }
  
diff --git a/youtube_dl/extractor/vine.py b/youtube_dl/extractor/vine.py

index c733a48fa26edce6b219d3bf2404267b8c346bf6..a6a6cc47955f6aae482d8022bf52d4d233fb955f 100644 (file)
--- a/youtube_dl/extractor/vine.py
+++ b/youtube_dl/extractor/vine.py
@@ -1,10 +1,14 @@
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
  import itertools
  
  from .common import InfoExtractor
-from ..utils import unified_strdate
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+)
  
  
  class VineIE(InfoExtractor):
@@ -17,10 +21,12 @@ class VineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Chicken.',
              'alt_title': 'Vine by Jack Dorsey',
-            'description': 'Chicken.',
              'upload_date': '20130519',
              'uploader': 'Jack Dorsey',
              'uploader_id': '76',
+            'like_count': int,
+            'comment_count': int,
+            'repost_count': int,
          },
      }, {
          'url': 'https://vine.co/v/MYxVapFvz2z',
@@ -29,11 +35,13 @@ class VineIE(InfoExtractor):
              'id': 'MYxVapFvz2z',
              'ext': 'mp4',
              'title': 'Fuck Da Police #Mikebrown #justice #ferguson #prayforferguson #protesting #NMOS14',
-            'alt_title': 'Vine by Luna',
-            'description': 'Fuck Da Police #Mikebrown #justice #ferguson #prayforferguson #protesting #NMOS14',
+            'alt_title': 'Vine by Mars Ruiz',
              'upload_date': '20140815',
-            'uploader': 'Luna',
+            'uploader': 'Mars Ruiz',
              'uploader_id': '1102363502380728320',
+            'like_count': int,
+            'comment_count': int,
+            'repost_count': int,
          },
      }, {
          'url': 'https://vine.co/v/bxVjBbZlPUH',
@@ -43,14 +51,33 @@ class VineIE(InfoExtractor):
              'ext': 'mp4',
              'title': '#mw3 #ac130 #killcam #angelofdeath',
              'alt_title': 'Vine by Z3k3',
-            'description': '#mw3 #ac130 #killcam #angelofdeath',
              'upload_date': '20130430',
              'uploader': 'Z3k3',
              'uploader_id': '936470460173008896',
+            'like_count': int,
+            'comment_count': int,
+            'repost_count': int,
          },
      }, {
          'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
          'only_matching': True,
+    }, {
+        'url': 'https://vine.co/v/e192BnZnZ9V',
+        'info_dict': {
+            'id': 'e192BnZnZ9V',
+            'ext': 'mp4',
+            'title': 'ยิ้ม~ เขิน~ อาย~ น่าร้ากอ้ะ >//< @n_whitewo @orlameena #lovesicktheseries  #lovesickseason2',
+            'alt_title': 'Vine by Pimry_zaa',
+            'upload_date': '20150705',
+            'uploader': 'Pimry_zaa',
+            'uploader_id': '1135760698325307392',
+            'like_count': int,
+            'comment_count': int,
+            'repost_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      def _real_extract(self, url):
@@ -58,32 +85,33 @@ class VineIE(InfoExtractor):
          webpage = self._download_webpage('https://vine.co/v/' + video_id, video_id)
  
          data = self._parse_json(
-            self._html_search_regex(
-                r'window\.POST_DATA = { %s: ({.+?}) };\s*</script>' % video_id,
+            self._search_regex(
+                r'window\.POST_DATA\s*=\s*{\s*%s\s*:\s*({.+?})\s*};\s*</script>' % video_id,
                  webpage, 'vine data'),
              video_id)
  
          formats = [{
              'format_id': '%(format)s-%(rate)s' % f,
-            'vcodec': f['format'],
-            'quality': f['rate'],
+            'vcodec': f.get('format'),
+            'quality': f.get('rate'),
              'url': f['videoUrl'],
-        } for f in data['videoUrls']]
+        } for f in data['videoUrls'] if f.get('videoUrl')]
  
          self._sort_formats(formats)
  
+        username = data.get('username')
+
          return {
              'id': video_id,
-            'title': self._og_search_title(webpage),
-            'alt_title': self._og_search_description(webpage, default=None),
-            'description': data['description'],
-            'thumbnail': data['thumbnailUrl'],
-            'upload_date': unified_strdate(data['created']),
-            'uploader': data['username'],
-            'uploader_id': data['userIdStr'],
-            'like_count': data['likes']['count'],
-            'comment_count': data['comments']['count'],
-            'repost_count': data['reposts']['count'],
+            'title': data.get('description') or self._og_search_title(webpage),
+            'alt_title': 'Vine by %s' % username if username else self._og_search_description(webpage, default=None),
+            'thumbnail': data.get('thumbnailUrl'),
+            'upload_date': unified_strdate(data.get('created')),
+            'uploader': username,
+            'uploader_id': data.get('userIdStr'),
+            'like_count': int_or_none(data.get('likes', {}).get('count')),
+            'comment_count': int_or_none(data.get('comments', {}).get('count')),
+            'repost_count': int_or_none(data.get('reposts', {}).get('count')),
              'formats': formats,
          }
  
@@ -91,7 +119,7 @@ class VineIE(InfoExtractor):
  class VineUserIE(InfoExtractor):
      IE_NAME = 'vine:user'
      _VALID_URL = r'(?:https?://)?vine\.co/(?P<u>u/)?(?P<user>[^/]+)/?(\?.*)?$'
-    _VINE_BASE_URL = "https://vine.co/"
+    _VINE_BASE_URL = 'https://vine.co/'
      _TESTS = [
          {
              'url': 'https://vine.co/Visa',
@@ -111,7 +139,7 @@ class VineUserIE(InfoExtractor):
          user = mobj.group('user')
          u = mobj.group('u')
  
-        profile_url = "%sapi/users/profiles/%s%s" % (
+        profile_url = '%sapi/users/profiles/%s%s' % (
              self._VINE_BASE_URL, 'vanity/' if not u else '', user)
          profile_data = self._download_json(
              profile_url, user, note='Downloading user profile data')
@@ -119,7 +147,7 @@ class VineUserIE(InfoExtractor):
          user_id = profile_data['data']['userId']
          timeline_data = []
          for pagenum in itertools.count(1):
-            timeline_url = "%sapi/timelines/users/%s?page=%s&size=100" % (
+            timeline_url = '%sapi/timelines/users/%s?page=%s&size=100' % (
                  self._VINE_BASE_URL, user_id, pagenum)
              timeline_page = self._download_json(
                  timeline_url, user, note='Downloading page %d' % pagenum)
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index c30c5a8e524324a29d7c4a2dad49d5b9d50dda30..67220f1b7a991e48494adf24c791317e29eda8cd 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -5,18 +5,19 @@ import re
  import json
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urllib_parse,
-    compat_urllib_request,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
+    int_or_none,
      orderedSet,
+    sanitized_Request,
      str_to_int,
      unescapeHTML,
      unified_strdate,
+    urlencode_postdata,
  )
+from .vimeo import VimeoIE
+from .pladform import PladformIE
  
  
  class VKIE(InfoExtractor):
@@ -139,16 +140,29 @@ class VKIE(InfoExtractor):
              'url': 'https://vk.com/video276849682_170681728',
              'info_dict': {
                  'id': 'V3K4mi0SYkc',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
                  'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
-                'duration': 179,
+                'duration': 178,
                  'upload_date': '20130116',
                  'uploader': "Children's Joy Foundation",
                  'uploader_id': 'thecjf',
                  'view_count': int,
              },
          },
+        {
+            # video key is extra_data not url\d+
+            'url': 'http://vk.com/video-110305615_171782105',
+            'md5': 'e13fcda136f99764872e739d13fac1d1',
+            'info_dict': {
+                'id': '171782105',
+                'ext': 'mp4',
+                'title': 'S-Dance, репетиции к The way show',
+                'uploader': 'THE WAY SHOW | 17 апреля',
+                'upload_date': '20160207',
+                'view_count': int,
+            },
+        },
          {
              # removed video, just testing that we match the pattern
              'url': 'http://vk.com/feed?z=video-43215063_166094326%2Fbb50cacd3177146d7a',
@@ -163,6 +177,11 @@ class VKIE(InfoExtractor):
              # vk wrapper
              'url': 'http://www.biqle.ru/watch/847655_160197695',
              'only_matching': True,
+        },
+        {
+            # pladform embed
+            'url': 'https://vk.com/video-76116461_171554880',
+            'only_matching': True,
          }
      ]
  
@@ -181,9 +200,9 @@ class VKIE(InfoExtractor):
              'pass': password.encode('cp1251'),
          })
  
-        request = compat_urllib_request.Request(
+        request = sanitized_Request(
              'https://login.vk.com/?act=login',
-            compat_urllib_parse.urlencode(login_form).encode('utf-8'))
+            urlencode_postdata(login_form))
          login_page = self._download_webpage(
              request, None, note='Logging in as %s' % username)
  
@@ -249,10 +268,17 @@ class VKIE(InfoExtractor):
          if youtube_url:
              return self.url_result(youtube_url, 'Youtube')
  
+        vimeo_url = VimeoIE._extract_vimeo_url(url, info_page)
+        if vimeo_url is not None:
+            return self.url_result(vimeo_url)
+
+        pladform_url = PladformIE._extract_url(info_page)
+        if pladform_url:
+            return self.url_result(pladform_url)
+
          m_rutube = re.search(
-            r'\ssrc="((?:https?:)?//rutube\.ru\\?/video\\?/embed(?:.*?))\\?"', info_page)
+            r'\ssrc="((?:https?:)?//rutube\.ru\\?/(?:video|play)\\?/embed(?:.*?))\\?"', info_page)
          if m_rutube is not None:
-            self.to_screen('rutube video detected')
              rutube_url = self._proto_relative_url(
                  m_rutube.group(1).replace('\\', ''))
              return self.url_result(rutube_url)
@@ -276,16 +302,25 @@ class VKIE(InfoExtractor):
              mobj.group(1) + ' ' + mobj.group(2)
              upload_date = unified_strdate(mobj.group(1) + ' ' + mobj.group(2))
  
-        view_count = str_to_int(self._search_regex(
-            r'"mv_views_count_number"[^>]*>([\d,.]+) views<',
-            info_page, 'view count', fatal=False))
-
-        formats = [{
-            'format_id': k,
-            'url': v,
-            'width': int(k[len('url'):]),
-        } for k, v in data.items()
-            if k.startswith('url')]
+        view_count = None
+        views = self._html_search_regex(
+            r'"mv_views_count_number"[^>]*>(.+?\bviews?)<',
+            info_page, 'view count', fatal=False)
+        if views:
+            view_count = str_to_int(self._search_regex(
+                r'([\d,.]+)', views, 'view count', fatal=False))
+
+        formats = []
+        for k, v in data.items():
+            if not k.startswith('url') and k != 'extra_data' or not v:
+                continue
+            height = int_or_none(self._search_regex(
+                r'^url(\d+)', k, 'height', default=None))
+            formats.append({
+                'format_id': k,
+                'url': v,
+                'height': height,
+            })
          self._sort_formats(formats)
  
          return {
@@ -303,7 +338,7 @@ class VKIE(InfoExtractor):
  class VKUserVideosIE(InfoExtractor):
      IE_NAME = 'vk:uservideos'
      IE_DESC = "VK - User's Videos"
-    _VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)$'
+    _VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
      _TEMPLATE_URL = 'https://vk.com/videos'
      _TESTS = [{
          'url': 'http://vk.com/videos205387401',
@@ -315,6 +350,9 @@ class VKUserVideosIE(InfoExtractor):
      }, {
          'url': 'http://vk.com/videos-77521',
          'only_matching': True,
+    }, {
+        'url': 'http://vk.com/videos-97664626?section=all',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

new file mode 100644 (file)

index 0000000..baf39bb
--- /dev/null
+++ b/youtube_dl/extractor/vlive.py
@@ -0,0 +1,88 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    dict_get,
+    float_or_none,
+    int_or_none,
+)
+from ..compat import compat_urllib_parse_urlencode
+
+
+class VLiveIE(InfoExtractor):
+    IE_NAME = 'vlive'
+    _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.vlive.tv/video/1326',
+        'md5': 'cc7314812855ce56de70a06a27314983',
+        'info_dict': {
+            'id': '1326',
+            'ext': 'mp4',
+            'title': "[V] Girl's Day's Broadcast",
+            'creator': "Girl's Day",
+            'view_count': int,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://www.vlive.tv/video/%s' % video_id, video_id)
+
+        long_video_id = self._search_regex(
+            r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"([^"]+)"',
+            webpage, 'long video id')
+
+        key = self._search_regex(
+            r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"[^"]+"\s*,\s*"([^"]+)"',
+            webpage, 'key')
+
+        title = self._og_search_title(webpage)
+
+        playinfo = self._download_json(
+            'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
+            % compat_urllib_parse_urlencode({
+                'videoId': long_video_id,
+                'key': key,
+                'ptc': 'http',
+                'doct': 'json',  # document type (xml or json)
+                'cpt': 'vtt',  # captions type (vtt or ttml)
+            }), video_id)
+
+        formats = [{
+            'url': vid['source'],
+            'format_id': vid.get('encodingOption', {}).get('name'),
+            'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
+            'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
+            'width': int_or_none(vid.get('encodingOption', {}).get('width')),
+            'height': int_or_none(vid.get('encodingOption', {}).get('height')),
+            'filesize': int_or_none(vid.get('size')),
+        } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
+        self._sort_formats(formats)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+        creator = self._html_search_regex(
+            r'<div[^>]+class="info_area"[^>]*>\s*<a\s+[^>]*>([^<]+)',
+            webpage, 'creator', fatal=False)
+
+        view_count = int_or_none(playinfo.get('meta', {}).get('count'))
+
+        subtitles = {}
+        for caption in playinfo.get('captions', {}).get('list', []):
+            lang = dict_get(caption, ('language', 'locale', 'country', 'label'))
+            if lang and caption.get('source'):
+                subtitles[lang] = [{
+                    'ext': 'vtt',
+                    'url': caption['source']}]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'creator': creator,
+            'thumbnail': thumbnail,
+            'view_count': view_count,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py

index ccf1928b5d323f277b4e8a47bd4d008e821b147c..a938a4007ead91a25ca84b43307ce16e1787e2e6 100644 (file)
--- a/youtube_dl/extractor/vodlocker.py
+++ b/youtube_dl/extractor/vodlocker.py
@@ -2,14 +2,16 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse,
-    compat_urllib_request,
+from ..utils import (
+    ExtractorError,
+    NO_DEFAULT,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class VodlockerIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vodlocker\.com/(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
+    _VALID_URL = r'https?://(?:www\.)?vodlocker\.(?:com|city)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:\..*?)?'
  
      _TESTS = [{
          'url': 'http://vodlocker.com/e8wvyzz4sl42',
@@ -26,26 +28,47 @@ class VodlockerIE(InfoExtractor):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
+        if any(p in webpage for p in (
+                '>THIS FILE WAS DELETED<',
+                '>File Not Found<',
+                'The file you were looking for could not be found, sorry for any inconvenience.<')):
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
          fields = self._hidden_inputs(webpage)
  
          if fields['op'] == 'download1':
              self._sleep(3, video_id)  # they do detect when requests happen too fast!
-            post = compat_urllib_parse.urlencode(fields)
-            req = compat_urllib_request.Request(url, post)
+            post = urlencode_postdata(fields)
+            req = sanitized_Request(url, post)
              req.add_header('Content-type', 'application/x-www-form-urlencoded')
              webpage = self._download_webpage(
                  req, video_id, 'Downloading video page')
  
+        def extract_file_url(html, default=NO_DEFAULT):
+            return self._search_regex(
+                r'file:\s*"(http[^\"]+)",', html, 'file url', default=default)
+
+        video_url = extract_file_url(webpage, default=None)
+
+        if not video_url:
+            embed_url = self._search_regex(
+                r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?vodlocker\.(?:com|city)/embed-.+?)\1',
+                webpage, 'embed url', group='url')
+            embed_webpage = self._download_webpage(
+                embed_url, video_id, 'Downloading embed webpage')
+            video_url = extract_file_url(embed_webpage)
+            thumbnail_webpage = embed_webpage
+        else:
+            thumbnail_webpage = webpage
+
          title = self._search_regex(
              r'id="file_title".*?>\s*(.*?)\s*<(?:br|span)', webpage, 'title')
          thumbnail = self._search_regex(
-            r'image:\s*"(http[^\"]+)",', webpage, 'thumbnail')
-        url = self._search_regex(
-            r'file:\s*"(http[^\"]+)",', webpage, 'file url')
+            r'image:\s*"(http[^\"]+)",', thumbnail_webpage, 'thumbnail', fatal=False)
  
          formats = [{
              'format_id': 'sd',
-            'url': url,
+            'url': video_url,
          }]
  
          return {
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 254383d6cf0d6267e0423db5f9a1a3143161e314..93d15a556dedb6e0589dc5f393e0a01ad5b4a8a0 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -3,14 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urlparse,
-)
+from ..compat import compat_urlparse
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
+    sanitized_Request,
  )
  
  
@@ -37,7 +35,7 @@ class VoiceRepublicIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        req = compat_urllib_request.Request(
+        req = sanitized_Request(
              compat_urlparse.urljoin(url, '/talks/%s' % display_id))
          # Older versions of Firefox get redirected to an "upgrade browser" page
          req.add_header('User-Agent', 'youtube-dl')
diff --git a/youtube_dl/extractor/voxmedia.py b/youtube_dl/extractor/voxmedia.py

new file mode 100644 (file)

index 0000000..0c6b1f0
--- /dev/null
+++ b/youtube_dl/extractor/voxmedia.py
@@ -0,0 +1,132 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_urllib_parse_unquote
+
+
+class VoxMediaIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com/(?:[^/]+/)*(?P<id>[^/?]+)'
+    _TESTS = [{
+        'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
+        'md5': '73856edf3e89a711e70d5cf7cb280b37',
+        'info_dict': {
+            'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
+            'ext': 'mp4',
+            'title': 'Google\'s new material design direction',
+            'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
+        }
+    }, {
+        # data-ooyala-id
+        'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
+        'md5': 'd744484ff127884cd2ba09e3fa604e4b',
+        'info_dict': {
+            'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B',
+            'ext': 'mp4',
+            'title': 'The Nexus 6: hands-on with Google\'s phablet',
+            'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
+        }
+    }, {
+        # volume embed
+        'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
+        'md5': '375c483c5080ab8cd85c9c84cfc2d1e4',
+        'info_dict': {
+            'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
+            'ext': 'mp4',
+            'title': 'The new frontier of LGBTQ civil rights, explained',
+            'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
+        }
+    }, {
+        # youtube embed
+        'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
+        'md5': '83b3080489fb103941e549352d3e0977',
+        'info_dict': {
+            'id': 'FcNHTJU1ufM',
+            'ext': 'mp4',
+            'title': 'How "the robot" became the greatest novelty dance of all time',
+            'description': 'md5:b081c0d588b8b2085870cda55e6da176',
+            'upload_date': '20160324',
+            'uploader_id': 'voxdotcom',
+            'uploader': 'Vox',
+        }
+    }, {
+        # SBN.VideoLinkset.entryGroup multiple ooyala embeds
+        'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
+        'info_dict': {
+            'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
+            'title': '25 lies you will tell yourself on National Signing Day',
+            'description': 'It\'s the most self-delusional time of the year, and everyone\'s gonna tell the same lies together!',
+        },
+        'playlist': [{
+            'md5': '721fededf2ab74ae4176c8c8cbfe092e',
+            'info_dict': {
+                'id': 'p3cThlMjE61VDi_SD9JlIteSNPWVDBB9',
+                'ext': 'mp4',
+                'title': 'Buddy Hield vs Steph Curry (and the world)',
+                'description': 'Let’s dissect only the most important Final Four storylines.',
+            },
+        }, {
+            'md5': 'bf0c5cc115636af028be1bab79217ea9',
+            'info_dict': {
+                'id': 'BmbmVjMjE6esPHxdALGubTrouQ0jYLHj',
+                'ext': 'mp4',
+                'title': 'Chasing Cinderella 2016: Syracuse basketball',
+                'description': 'md5:e02d56b026d51aa32c010676765a690d',
+            },
+        }],
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = compat_urllib_parse_unquote(self._download_webpage(url, display_id))
+
+        def create_entry(provider_video_id, provider_video_type, title=None, description=None):
+            return {
+                '_type': 'url_transparent',
+                'url': provider_video_id if provider_video_type == 'youtube' else '%s:%s' % (provider_video_type, provider_video_id),
+                'title': title or self._og_search_title(webpage),
+                'description': description or self._og_search_description(webpage),
+            }
+
+        entries = []
+        entries_data = self._search_regex([
+            r'Chorus\.VideoContext\.addVideo\((\[{.+}\])\);',
+            r'var\s+entry\s*=\s*({.+});',
+            r'SBN\.VideoLinkset\.entryGroup\(\s*(\[.+\])',
+        ], webpage, 'video data', default=None)
+        if entries_data:
+            entries_data = self._parse_json(entries_data, display_id)
+            if isinstance(entries_data, dict):
+                entries_data = [entries_data]
+            for video_data in entries_data:
+                provider_video_id = video_data.get('provider_video_id')
+                provider_video_type = video_data.get('provider_video_type')
+                if provider_video_id and provider_video_type:
+                    entries.append(create_entry(
+                        provider_video_id, provider_video_type,
+                        video_data.get('title'), video_data.get('description')))
+
+        provider_video_id = self._search_regex(
+            r'data-ooyala-id="([^"]+)"', webpage, 'ooyala id', default=None)
+        if provider_video_id:
+            entries.append(create_entry(provider_video_id, 'ooyala'))
+
+        volume_uuid = self._search_regex(
+            r'data-volume-uuid="([^"]+)"', webpage, 'volume uuid', default=None)
+        if volume_uuid:
+            volume_webpage = self._download_webpage(
+                'http://volume.vox-cdn.com/embed/%s' % volume_uuid, volume_uuid)
+            video_data = self._parse_json(self._search_regex(
+                r'Volume\.createVideo\(({.+})\s*,\s*{.*}\);', volume_webpage, 'video data'), volume_uuid)
+            for provider_video_type in ('ooyala', 'youtube'):
+                provider_video_id = video_data.get('%s_id' % provider_video_type)
+                if provider_video_id:
+                    description = video_data.get('description_long') or video_data.get('description_short')
+                    entries.append(create_entry(
+                        provider_video_id, provider_video_type, video_data.get('title_short'), description))
+                    break
+
+        if len(entries) == 1:
+            return entries[0]
+        else:
+            return self.playlist_result(entries, display_id, self._og_search_title(webpage), self._og_search_description(webpage))
diff --git a/youtube_dl/extractor/vrt.py b/youtube_dl/extractor/vrt.py

index bbd3bbf7bad98c787c0840ed0f302198ebb7932a..8e35f24e81e7e61240898ea3259914737365fd2f 100644 (file)
--- a/youtube_dl/extractor/vrt.py
+++ b/youtube_dl/extractor/vrt.py
@@ -4,11 +4,14 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..utils import (
+    determine_ext,
+    float_or_none,
+)
  
  
  class VRTIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:deredactie|sporza|cobra)\.be/cm/(?:[^/]+/)+(?P<id>[^/]+)/*'
+    _VALID_URL = r'https?://(?:deredactie|sporza|cobra(?:\.canvas)?)\.be/cm/(?:[^/]+/)+(?P<id>[^/]+)/*'
      _TESTS = [
          # deredactie.be
          {
@@ -52,6 +55,15 @@ class VRTIE(InfoExtractor):
                  'duration': 661,
              }
          },
+        {
+            # YouTube video
+            'url': 'http://deredactie.be/cm/vrtnieuws/videozone/nieuws/cultuurenmedia/1.2622957',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://cobra.canvas.be/cm/cobra/videozone/rubriek/film-videozone/1.2377055',
+            'only_matching': True,
+        }
      ]
  
      def _real_extract(self, url):
@@ -62,18 +74,37 @@ class VRTIE(InfoExtractor):
          video_id = self._search_regex(
              r'data-video-id="([^"]+)_[^"]+"', webpage, 'video id', fatal=False)
  
+        src = self._search_regex(
+            r'data-video-src="([^"]+)"', webpage, 'video src', default=None)
+
+        video_type = self._search_regex(
+            r'data-video-type="([^"]+)"', webpage, 'video type', default=None)
+
+        if video_type == 'YouTubeVideo':
+            return self.url_result(src, 'Youtube')
+
          formats = []
+
          mobj = re.search(
              r'data-video-iphone-server="(?P<server>[^"]+)"\s+data-video-iphone-path="(?P<path>[^"]+)"',
              webpage)
          if mobj:
              formats.extend(self._extract_m3u8_formats(
                  '%s/%s' % (mobj.group('server'), mobj.group('path')),
-                video_id, 'mp4'))
-        mobj = re.search(r'data-video-src="(?P<src>[^"]+)"', webpage)
-        if mobj:
-            formats.extend(self._extract_f4m_formats(
-                '%s/manifest.f4m' % mobj.group('src'), video_id))
+                video_id, 'mp4', m3u8_id='hls', fatal=False))
+
+        if src:
+            if determine_ext(src) == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    src, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            else:
+                formats.extend(self._extract_f4m_formats(
+                    '%s/manifest.f4m' % src, video_id, f4m_id='hds', fatal=False))
+
+        if not formats and 'data-video-geoblocking="true"' in webpage:
+            self.raise_geo_restricted('This video is only available in Belgium')
+
          self._sort_formats(formats)
  
          title = self._og_search_title(webpage)
diff --git a/youtube_dl/extractor/vube.py b/youtube_dl/extractor/vube.py

index 149e364677fcab4d0374479c4b96ff741277b17e..10ca6acb12469f85267405f9431b9508c0537e57 100644 (file)
--- a/youtube_dl/extractor/vube.py
+++ b/youtube_dl/extractor/vube.py
@@ -15,7 +15,7 @@ from ..utils import (
  class VubeIE(InfoExtractor):
      IE_NAME = 'vube'
      IE_DESC = 'Vube.com'
-    _VALID_URL = r'http://vube\.com/(?:[^/]+/)+(?P<id>[\da-zA-Z]{10})\b'
+    _VALID_URL = r'https?://vube\.com/(?:[^/]+/)+(?P<id>[\da-zA-Z]{10})\b'
  
      _TESTS = [
          {
diff --git a/youtube_dl/extractor/vuclip.py b/youtube_dl/extractor/vuclip.py

index a6d9b5fee1f4864d82c7f8bb83e87884c96afe3b..eaa888f005cc61c53b8f45c3e3b93633083b17ed 100644 (file)
--- a/youtube_dl/extractor/vuclip.py
+++ b/youtube_dl/extractor/vuclip.py
@@ -14,7 +14,7 @@ from ..utils import (
  
  
  class VuClipIE(InfoExtractor):
-    _VALID_URL = r'http://(?:m\.)?vuclip\.com/w\?.*?cid=(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:m\.)?vuclip\.com/w\?.*?cid=(?P<id>[0-9]+)'
  
      _TEST = {
          'url': 'http://m.vuclip.com/w?cid=922692425&fid=70295&z=1010&nvar&frm=index.html',
diff --git a/youtube_dl/extractor/walla.py b/youtube_dl/extractor/walla.py

index 24efbd6e6341ba5aa73e5df11cb9af36f941da43..8b9488340368ea0292fa2614336778099c9eb11e 100644 (file)
--- a/youtube_dl/extractor/walla.py
+++ b/youtube_dl/extractor/walla.py
@@ -11,7 +11,7 @@ from ..utils import (
  
  
  class WallaIE(InfoExtractor):
-    _VALID_URL = r'http://vod\.walla\.co\.il/[^/]+/(?P<id>\d+)/(?P<display_id>.+)'
+    _VALID_URL = r'https?://vod\.walla\.co\.il/[^/]+/(?P<id>\d+)/(?P<display_id>.+)'
      _TEST = {
          'url': 'http://vod.walla.co.il/movie/2642630/one-direction-all-for-one',
          'info_dict': {
diff --git a/youtube_dl/extractor/washingtonpost.py b/youtube_dl/extractor/washingtonpost.py

index 72eb010f8d1e480e37f133545ad6f8e1a64fef3f..ec8b999983f6ae89a3bf53909e9d70a463f87f52 100644 (file)
--- a/youtube_dl/extractor/washingtonpost.py
+++ b/youtube_dl/extractor/washingtonpost.py
@@ -19,25 +19,25 @@ class WashingtonPostIE(InfoExtractor):
              'title': 'Sinkhole of bureaucracy',
          },
          'playlist': [{
-            'md5': '79132cc09ec5309fa590ae46e4cc31bc',
+            'md5': 'b9be794ceb56c7267d410a13f99d801a',
              'info_dict': {
                  'id': 'fc433c38-b146-11e3-b8b3-44b1d1cd4c1f',
                  'ext': 'mp4',
                  'title': 'Breaking Points: The Paper Mine',
-                'duration': 1287,
+                'duration': 1290,
                  'description': 'Overly complicated paper pushing is nothing new to government bureaucracy. But the way federal retirement applications are filed may be the most outdated. David Fahrenthold explains.',
                  'uploader': 'The Washington Post',
                  'timestamp': 1395527908,
                  'upload_date': '20140322',
              },
          }, {
-            'md5': 'e1d5734c06865cc504ad99dc2de0d443',
+            'md5': '1fff6a689d8770966df78c8cb6c8c17c',
              'info_dict': {
                  'id': '41255e28-b14a-11e3-b8b3-44b1d1cd4c1f',
                  'ext': 'mp4',
                  'title': 'The town bureaucracy sustains',
                  'description': 'Underneath the friendly town of Boyers is a sea of government paperwork. In a disused limestone mine, hundreds of locals now track, file and process retirement applications for the federal government. We set out to find out what it\'s like to do paperwork 230 feet underground.',
-                'duration': 2217,
+                'duration': 2220,
                  'timestamp': 1395528005,
                  'upload_date': '20140322',
                  'uploader': 'The Washington Post',
diff --git a/youtube_dl/extractor/wat.py b/youtube_dl/extractor/wat.py

index affcc52f6e244c40bbca6381700c2f15e645580f..5227bb5ad9a2cd4f71c156cd8ca9bb3f5fbd5d17 100644 (file)
--- a/youtube_dl/extractor/wat.py
+++ b/youtube_dl/extractor/wat.py
@@ -12,7 +12,7 @@ from ..utils import (
  
  
  class WatIE(InfoExtractor):
-    _VALID_URL = r'http://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html'
+    _VALID_URL = r'(?:wat:(?P<real_id>\d{8})|https?://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html)'
      IE_NAME = 'wat.tv'
      _TESTS = [
          {
@@ -54,10 +54,12 @@ class WatIE(InfoExtractor):
          def real_id_for_chapter(chapter):
              return chapter['tc_start'].split('-')[0]
          mobj = re.match(self._VALID_URL, url)
-        short_id = mobj.group('short_id')
          display_id = mobj.group('display_id')
-        webpage = self._download_webpage(url, display_id or short_id)
-        real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
+        real_id = mobj.group('real_id')
+        if not real_id:
+            short_id = mobj.group('short_id')
+            webpage = self._download_webpage(url, display_id or short_id)
+            real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
  
          video_info = self.download_video_info(real_id)
  
diff --git a/youtube_dl/extractor/wayofthemaster.py b/youtube_dl/extractor/wayofthemaster.py

deleted file mode 100644 (file)

index af7bb8b..0000000
--- a/youtube_dl/extractor/wayofthemaster.py
+++ /dev/null
@@ -1,52 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class WayOfTheMasterIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.wayofthemaster\.com/([^/?#]*/)*(?P<id>[^/?#]+)\.s?html(?:$|[?#])'
-
-    _TEST = {
-        'url': 'http://www.wayofthemaster.com/hbks.shtml',
-        'md5': '5316b57487ada8480606a93cb3d18d24',
-        'info_dict': {
-            'id': 'hbks',
-            'ext': 'mp4',
-            'title': 'Intelligent Design vs. Evolution',
-        },
-    }
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._search_regex(
-            r'<img src="images/title_[^"]+".*?alt="([^"]+)"',
-            webpage, 'title', default=None)
-        if title is None:
-            title = self._html_search_regex(
-                r'<title>(.*?)</title>', webpage, 'page title')
-
-        url_base = self._search_regex(
-            r'<param\s+name="?movie"?\s+value=".*?/wotm_videoplayer_highlow[0-9]*\.swf\?vid=([^"]+)"',
-            webpage, 'URL base')
-        formats = [{
-            'format_id': 'low',
-            'quality': 1,
-            'url': url_base + '_low.mp4',
-        }, {
-            'format_id': 'high',
-            'quality': 2,
-            'url': url_base + '_high.mp4',
-        }]
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/wdr.py b/youtube_dl/extractor/wdr.py

index b468023060d3476e0a0ede8e36ea23b25ffeb298..31c90430327da895ffc974c1d489cb4c92689d2f 100644 (file)
--- a/youtube_dl/extractor/wdr.py
+++ b/youtube_dl/extractor/wdr.py
@@ -10,8 +10,8 @@ from ..compat import (
      compat_urlparse,
  )
  from ..utils import (
-    determine_ext,
      unified_strdate,
+    qualities,
  )
  
  
@@ -33,6 +33,7 @@ class WDRIE(InfoExtractor):
              'params': {
                  'skip_download': True,
              },
+            'skip': 'Page Not Found',
          },
          {
              'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html',
@@ -47,6 +48,7 @@ class WDRIE(InfoExtractor):
              'params': {
                  'skip_download': True,
              },
+            'skip': 'Page Not Found',
          },
          {
              'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
@@ -71,6 +73,7 @@ class WDRIE(InfoExtractor):
                  'upload_date': '20140717',
                  'is_live': False
              },
+            'skip': 'Page Not Found',
          },
          {
              'url': 'http://www1.wdr.de/mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100.html',
@@ -83,10 +86,10 @@ class WDRIE(InfoExtractor):
              'url': 'http://www1.wdr.de/mediathek/video/livestream/index.html',
              'info_dict': {
                  'id': 'mdb-103364',
-                'title': 're:^WDR Fernsehen [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
+                'title': 're:^WDR Fernsehen Live [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
                  'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
                  'ext': 'flv',
-                'upload_date': '20150212',
+                'upload_date': '20150101',
                  'is_live': True
              },
              'params': {
@@ -105,7 +108,9 @@ class WDRIE(InfoExtractor):
          if mobj.group('player') is None:
              entries = [
                  self.url_result(page_url + href, 'WDR')
-                for href in re.findall(r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, webpage)
+                for href in re.findall(
+                    r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX,
+                    webpage)
              ]
  
              if entries:  # Playlist page
@@ -130,8 +135,8 @@ class WDRIE(InfoExtractor):
                      note='Downloading playlist page %d' % page_num)
              return self.playlist_result(entries, page_id)
  
-        flashvars = compat_parse_qs(
-            self._html_search_regex(r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
+        flashvars = compat_parse_qs(self._html_search_regex(
+            r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
  
          page_id = flashvars['trackerClipId'][0]
          video_url = flashvars['dslSrc'][0]
@@ -145,30 +150,60 @@ class WDRIE(InfoExtractor):
          if 'trackerClipAirTime' in flashvars:
              upload_date = flashvars['trackerClipAirTime'][0]
          else:
-            upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
+            upload_date = self._html_search_meta(
+                'DC.Date', webpage, 'upload date')
  
          if upload_date:
              upload_date = unified_strdate(upload_date)
  
+        formats = []
+        preference = qualities(['S', 'M', 'L', 'XL'])
+
          if video_url.endswith('.f4m'):
-            video_url += '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18'
-            ext = 'flv'
+            formats.extend(self._extract_f4m_formats(
+                video_url + '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18', page_id,
+                f4m_id='hds', fatal=False))
          elif video_url.endswith('.smil'):
-            fmt = self._extract_smil_formats(video_url, page_id)[0]
-            video_url = fmt['url']
-            sep = '&' if '?' in video_url else '?'
-            video_url += sep
-            video_url += 'hdcore=3.3.0&plugin=aasp-3.3.0.99.43'
-            ext = fmt['ext']
+            formats.extend(self._extract_smil_formats(
+                video_url, page_id, False, {
+                    'hdcore': '3.3.0',
+                    'plugin': 'aasp-3.3.0.99.43',
+                }))
          else:
-            ext = determine_ext(video_url)
+            formats.append({
+                'url': video_url,
+                'http_headers': {
+                    'User-Agent': 'mobile',
+                },
+            })
+
+        m3u8_url = self._search_regex(
+            r'rel="adaptiv"[^>]+href="([^"]+)"',
+            webpage, 'm3u8 url', default=None)
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, page_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        direct_urls = re.findall(
+            r'rel="web(S|M|L|XL)"[^>]+href="([^"]+)"', webpage)
+        if direct_urls:
+            for quality, video_url in direct_urls:
+                formats.append({
+                    'url': video_url,
+                    'preference': preference(quality),
+                    'http_headers': {
+                        'User-Agent': 'mobile',
+                    },
+                })
+
+        self._sort_formats(formats)
  
          description = self._html_search_meta('Description', webpage, 'description')
  
          return {
              'id': page_id,
-            'url': video_url,
-            'ext': ext,
+            'formats': formats,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
@@ -209,7 +244,7 @@ class WDRMobileIE(InfoExtractor):
  
  
  class WDRMausIE(InfoExtractor):
-    _VALID_URL = 'http://(?:www\.)?wdrmaus\.de/(?:[^/]+/){,2}(?P<id>[^/?#]+)(?:/index\.php5|(?<!index)\.php5|/(?:$|[?#]))'
+    _VALID_URL = r'https?://(?:www\.)?wdrmaus\.de/(?:[^/]+/){,2}(?P<id>[^/?#]+)(?:/index\.php5|(?<!index)\.php5|/(?:$|[?#]))'
      IE_DESC = 'Sendung mit der Maus'
      _TESTS = [{
          'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
diff --git a/youtube_dl/extractor/webofstories.py b/youtube_dl/extractor/webofstories.py

index 2037d9b3d57cd5876d85e9552ffcc9f387fcc975..7aea47ed52f7f64032034ab43d51dbe524bff2b3 100644 (file)
--- a/youtube_dl/extractor/webofstories.py
+++ b/youtube_dl/extractor/webofstories.py
@@ -12,38 +12,52 @@ class WebOfStoriesIE(InfoExtractor):
      _VIDEO_DOMAIN = 'http://eu-mobile.webofstories.com/'
      _GREAT_LIFE_STREAMER = 'rtmp://eu-cdn1.webofstories.com/cfx/st/'
      _USER_STREAMER = 'rtmp://eu-users.webofstories.com/cfx/st/'
-    _TESTS = [
-        {
-            'url': 'http://www.webofstories.com/play/hans.bethe/71',
-            'md5': '373e4dd915f60cfe3116322642ddf364',
-            'info_dict': {
-                'id': '4536',
-                'ext': 'mp4',
-                'title': 'The temperature of the sun',
-                'thumbnail': 're:^https?://.*\.jpg$',
-                'description': 'Hans Bethe talks about calculating the temperature of the sun',
-                'duration': 238,
-            }
+    _TESTS = [{
+        'url': 'http://www.webofstories.com/play/hans.bethe/71',
+        'md5': '373e4dd915f60cfe3116322642ddf364',
+        'info_dict': {
+            'id': '4536',
+            'ext': 'mp4',
+            'title': 'The temperature of the sun',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': 'Hans Bethe talks about calculating the temperature of the sun',
+            'duration': 238,
+        }
+    }, {
+        'url': 'http://www.webofstories.com/play/55908',
+        'md5': '2985a698e1fe3211022422c4b5ed962c',
+        'info_dict': {
+            'id': '55908',
+            'ext': 'mp4',
+            'title': 'The story of Gemmata obscuriglobus',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
+            'duration': 169,
+        },
+        'skip': 'notfound',
+    }, {
+        # malformed og:title meta
+        'url': 'http://www.webofstories.com/play/54215?o=MS',
+        'info_dict': {
+            'id': '54215',
+            'ext': 'mp4',
+            'title': '"A Leg to Stand On"',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': 'Oliver Sacks talks about the death and resurrection of a limb',
+            'duration': 97,
          },
-        {
-            'url': 'http://www.webofstories.com/play/55908',
-            'md5': '2985a698e1fe3211022422c4b5ed962c',
-            'info_dict': {
-                'id': '55908',
-                'ext': 'mp4',
-                'title': 'The story of Gemmata obscuriglobus',
-                'thumbnail': 're:^https?://.*\.jpg$',
-                'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
-                'duration': 169,
-            }
+        'params': {
+            'skip_download': True,
          },
-    ]
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(url, video_id)
-        title = self._og_search_title(webpage)
+        # Sometimes og:title meta is malformed
+        title = self._og_search_title(webpage, default=None) or self._html_search_regex(
+            r'(?s)<strong>Title:\s*</strong>(.+?)<', webpage, 'title')
          description = self._html_search_meta('description', webpage)
          thumbnail = self._og_search_thumbnail(webpage)
  
diff --git a/youtube_dl/extractor/weiqitv.py b/youtube_dl/extractor/weiqitv.py

new file mode 100644 (file)

index 0000000..3dafbee
--- /dev/null
+++ b/youtube_dl/extractor/weiqitv.py
@@ -0,0 +1,52 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class WeiqiTVIE(InfoExtractor):
+    IE_DESC = 'WQTV'
+    _VALID_URL = r'https?://www\.weiqitv\.com/index/video_play\?videoId=(?P<id>[A-Za-z0-9]+)'
+
+    _TESTS = [{
+        'url': 'http://www.weiqitv.com/index/video_play?videoId=53c744f09874f0e76a8b46f3',
+        'md5': '26450599afd64c513bc77030ad15db44',
+        'info_dict': {
+            'id': '53c744f09874f0e76a8b46f3',
+            'ext': 'mp4',
+            'title': '2013年度盘点',
+        },
+    }, {
+        'url': 'http://www.weiqitv.com/index/video_play?videoId=567379a2d4c36cca518b4569',
+        'info_dict': {
+            'id': '567379a2d4c36cca518b4569',
+            'ext': 'mp4',
+            'title': '民国围棋史',
+        },
+    }, {
+        'url': 'http://www.weiqitv.com/index/video_play?videoId=5430220a9874f088658b4567',
+        'info_dict': {
+            'id': '5430220a9874f088658b4567',
+            'ext': 'mp4',
+            'title': '二路托过的手段和运用',
+        },
+    }]
+
+    def _real_extract(self, url):
+        media_id = self._match_id(url)
+        page = self._download_webpage(url, media_id)
+
+        info_json_str = self._search_regex(
+            'var\s+video\s*=\s*(.+});', page, 'info json str')
+        info_json = self._parse_json(info_json_str, media_id)
+
+        letvcloud_url = self._search_regex(
+            'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': 'LetvCloud',
+            'url': letvcloud_url,
+            'title': info_json['name'],
+            'id': media_id,
+        }
diff --git a/youtube_dl/extractor/wimp.py b/youtube_dl/extractor/wimp.py

index f69d46a2858077ed76ec9c8fc86166668f27c705..828c03dc38c4d4d4668f6dfb66e4cc29c51fd7e5 100644 (file)
--- a/youtube_dl/extractor/wimp.py
+++ b/youtube_dl/extractor/wimp.py
@@ -1,52 +1,50 @@
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from .youtube import YoutubeIE
  
  
  class WimpIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?wimp\.com/([^/]+)/'
+    _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://www.wimp.com/maruexhausted/',
-        'md5': 'f1acced123ecb28d9bb79f2479f2b6a1',
+        'md5': 'ee21217ffd66d058e8b16be340b74883',
          'info_dict': {
              'id': 'maruexhausted',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Maru is exhausted.',
              'description': 'md5:57e099e857c0a4ea312542b684a869b8',
          }
      }, {
-        # youtube video
          'url': 'http://www.wimp.com/clowncar/',
+        'md5': '4e2986c793694b55b37cf92521d12bb4',
          'info_dict': {
-            'id': 'cG4CEr2aiSg',
-            'ext': 'mp4',
-            'title': 'Basset hound clown car...incredible!',
-            'description': 'md5:8d228485e0719898c017203f900b3a35',
-            'uploader': 'Gretchen Hoey',
-            'uploader_id': 'gretchenandjeff1',
-            'upload_date': '20140303',
+            'id': 'clowncar',
+            'ext': 'webm',
+            'title': 'It\'s like a clown car.',
+            'description': 'md5:0e56db1370a6e49c5c1d19124c0d2fb2',
          },
-        'add_ie': ['Youtube'],
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group(1)
+        video_id = self._match_id(url)
+
          webpage = self._download_webpage(url, video_id)
-        video_url = self._search_regex(
-            [r"[\"']file[\"']\s*[:,]\s*[\"'](.+?)[\"']", r"videoId\s*:\s*[\"']([^\"']+)[\"']"],
-            webpage, 'video URL')
-        if YoutubeIE.suitable(video_url):
-            self.to_screen('Found YouTube video')
+
+        youtube_id = self._search_regex(
+            r"videoId\s*:\s*[\"']([0-9A-Za-z_-]{11})[\"']",
+            webpage, 'video URL', default=None)
+        if youtube_id:
              return {
                  '_type': 'url',
-                'url': video_url,
+                'url': youtube_id,
                  'ie_key': YoutubeIE.ie_key(),
              }
  
+        video_url = self._search_regex(
+            r'<video[^>]+>\s*<source[^>]+src=(["\'])(?P<url>.+?)\1',
+            webpage, 'video URL', group='url')
+
          return {
              'id': video_id,
              'url': video_url,
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

index 13a079151c9c879561e3e538c49f3122f85b349b..8b14840a2dba606951f1f7d80694f1e7f0cca8d6 100644 (file)
--- a/youtube_dl/extractor/wistia.py
+++ b/youtube_dl/extractor/wistia.py
@@ -1,8 +1,11 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_request
-from ..utils import ExtractorError
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
+    int_or_none,
+)
  
  
  class WistiaIE(InfoExtractor):
@@ -16,6 +19,9 @@ class WistiaIE(InfoExtractor):
              'id': 'sh7fpupwlt',
              'ext': 'mov',
              'title': 'Being Resourceful',
+            'description': 'a Clients From Hell Video Series video from worldwidewebhosting',
+            'upload_date': '20131204',
+            'timestamp': 1386185018,
              'duration': 117,
          },
      }
@@ -23,41 +29,50 @@ class WistiaIE(InfoExtractor):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        request = compat_urllib_request.Request(self._API_URL.format(video_id))
+        request = sanitized_Request(self._API_URL.format(video_id))
          request.add_header('Referer', url)  # Some videos require this.
          data_json = self._download_json(request, video_id)
          if data_json.get('error'):
              raise ExtractorError('Error while getting the playlist',
                                   expected=True)
          data = data_json['media']
+        title = data['name']
  
          formats = []
          thumbnails = []
-        for atype, a in data['assets'].items():
-            if atype == 'still':
+        for a in data['assets']:
+            astatus = a.get('status')
+            atype = a.get('type')
+            if (astatus is not None and astatus != 2) or atype == 'preview':
+                continue
+            elif atype in ('still', 'still_image'):
                  thumbnails.append({
                      'url': a['url'],
                      'resolution': '%dx%d' % (a['width'], a['height']),
                  })
-                continue
-            if atype == 'preview':
-                continue
-            formats.append({
-                'format_id': atype,
-                'url': a['url'],
-                'width': a['width'],
-                'height': a['height'],
-                'filesize': a['size'],
-                'ext': a['ext'],
-                'preference': 1 if atype == 'original' else None,
-            })
+            else:
+                formats.append({
+                    'format_id': atype,
+                    'url': a['url'],
+                    'tbr': int_or_none(a.get('bitrate')),
+                    'vbr': int_or_none(a.get('opt_vbitrate')),
+                    'width': int_or_none(a.get('width')),
+                    'height': int_or_none(a.get('height')),
+                    'filesize': int_or_none(a.get('size')),
+                    'vcodec': a.get('codec'),
+                    'container': a.get('container'),
+                    'ext': a.get('ext'),
+                    'preference': 1 if atype == 'original' else None,
+                })
  
          self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': data['name'],
+            'title': title,
+            'description': data.get('seoDescription'),
              'formats': formats,
              'thumbnails': thumbnails,
-            'duration': data.get('duration'),
+            'duration': int_or_none(data.get('duration')),
+            'timestamp': int_or_none(data.get('createdAt')),
          }
diff --git a/youtube_dl/extractor/worldstarhiphop.py b/youtube_dl/extractor/worldstarhiphop.py

index a3ea26feb38257071c8ae5d3c1702cf0fcd2650a..09415b5896b322b96eca098a61ec87af1a340593 100644 (file)
--- a/youtube_dl/extractor/worldstarhiphop.py
+++ b/youtube_dl/extractor/worldstarhiphop.py
@@ -8,12 +8,12 @@ from .common import InfoExtractor
  class WorldStarHipHopIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www|m)\.worldstar(?:candy|hiphop)\.com/(?:videos|android)/video\.php\?v=(?P<id>.*)'
      _TESTS = [{
-        "url": "http://www.worldstarhiphop.com/videos/video.php?v=wshh6a7q1ny0G34ZwuIO",
-        "md5": "9d04de741161603bf7071bbf4e883186",
-        "info_dict": {
-            "id": "wshh6a7q1ny0G34ZwuIO",
-            "ext": "mp4",
-            "title": "KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!"
+        'url': 'http://www.worldstarhiphop.com/videos/video.php?v=wshh6a7q1ny0G34ZwuIO',
+        'md5': '9d04de741161603bf7071bbf4e883186',
+        'info_dict': {
+            'id': 'wshh6a7q1ny0G34ZwuIO',
+            'ext': 'mp4',
+            'title': 'KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!'
          }
      }, {
          'url': 'http://m.worldstarhiphop.com/android/video.php?v=wshh6a7q1ny0G34ZwuIO',
@@ -21,7 +21,7 @@ class WorldStarHipHopIE(InfoExtractor):
          'info_dict': {
              'id': 'wshh6a7q1ny0G34ZwuIO',
              'ext': 'mp4',
-            "title": "KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!"
+            'title': 'KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!'
          }
      }]
  
diff --git a/youtube_dl/extractor/wsj.py b/youtube_dl/extractor/wsj.py

index 2ddf29a694ec6365e9089bc18536320489b4d2c3..5a897371d1d69a95e08f7b4da4d457b3236e09cc 100644 (file)
--- a/youtube_dl/extractor/wsj.py
+++ b/youtube_dl/extractor/wsj.py
@@ -84,6 +84,5 @@ class WSJIE(InfoExtractor):
              'duration': duration,
              'upload_date': upload_date,
              'title': title,
-            'formats': formats,
              'categories': categories,
          }
diff --git a/youtube_dl/extractor/xbef.py b/youtube_dl/extractor/xbef.py

index 4ff99e5ca37fb8f4f0b663cc99761c31e75f1cf4..e4a2baad22534d772a90b8ec5832c11833f10281 100644 (file)
--- a/youtube_dl/extractor/xbef.py
+++ b/youtube_dl/extractor/xbef.py
@@ -5,7 +5,7 @@ from ..compat import compat_urllib_parse_unquote
  
  
  class XBefIE(InfoExtractor):
-    _VALID_URL = r'http://(?:www\.)?xbef\.com/video/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xbef\.com/video/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://xbef.com/video/5119-glamourous-lesbians-smoking-drinking-and-fucking',
          'md5': 'a478b565baff61634a98f5e5338be995',
diff --git a/youtube_dl/extractor/xboxclips.py b/youtube_dl/extractor/xboxclips.py

index 236ff403bd08f941a2eb023cd41c3bb21c49d4c3..b113ab1c4891fdf96898d359cf38779d61b394f8 100644 (file)
--- a/youtube_dl/extractor/xboxclips.py
+++ b/youtube_dl/extractor/xboxclips.py
@@ -12,7 +12,7 @@ from ..utils import (
  class XboxClipsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?xboxclips\.com/(?:video\.php\?.*vid=|[^/]+/)(?P<id>[\w-]{36})'
      _TEST = {
-        'url': 'https://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
+        'url': 'http://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
          'md5': 'fbe1ec805e920aeb8eced3c3e657df5d',
          'info_dict': {
              'id': '074a69a9-5faf-46aa-b93b-9909c1720325',
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

new file mode 100644 (file)

index 0000000..2d1504e
--- /dev/null
+++ b/youtube_dl/extractor/xfileshare.py
@@ -0,0 +1,143 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    sanitized_Request,
+    urlencode_postdata,
+)
+
+
+class XFileShareIE(InfoExtractor):
+    IE_DESC = 'XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me'
+    _VALID_URL = r'''(?x)
+        https?://(?P<host>(?:www\.)?
+            (?:daclips\.in|gorillavid\.in|movpod\.in|fastvideo\.in|realvid\.net|filehoot\.com|vidto\.me|powerwatch\.pw))/
+        (?:embed-)?(?P<id>[0-9a-zA-Z]+)(?:-[0-9]+x[0-9]+\.html)?
+    '''
+
+    _FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<'
+
+    _TESTS = [{
+        'url': 'http://gorillavid.in/06y9juieqpmi',
+        'md5': '5ae4a3580620380619678ee4875893ba',
+        'info_dict': {
+            'id': '06y9juieqpmi',
+            'ext': 'flv',
+            'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
+        'only_matching': True,
+    }, {
+        'url': 'http://daclips.in/3rso4kdn6f9m',
+        'md5': '1ad8fd39bb976eeb66004d3a4895f106',
+        'info_dict': {
+            'id': '3rso4kdn6f9m',
+            'ext': 'mp4',
+            'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
+            'thumbnail': 're:http://.*\.jpg',
+        }
+    }, {
+        # video with countdown timeout
+        'url': 'http://fastvideo.in/1qmdn1lmsmbw',
+        'md5': '8b87ec3f6564a3108a0e8e66594842ba',
+        'info_dict': {
+            'id': '1qmdn1lmsmbw',
+            'ext': 'mp4',
+            'title': 'Man of Steel - Trailer',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://realvid.net/ctn2y6p2eviw',
+        'md5': 'b2166d2cf192efd6b6d764c18fd3710e',
+        'info_dict': {
+            'id': 'ctn2y6p2eviw',
+            'ext': 'flv',
+            'title': 'rdx 1955',
+            'thumbnail': 're:http://.*\.jpg',
+        },
+    }, {
+        'url': 'http://movpod.in/0wguyyxi1yca',
+        'only_matching': True,
+    }, {
+        'url': 'http://filehoot.com/3ivfabn7573c.html',
+        'info_dict': {
+            'id': '3ivfabn7573c',
+            'ext': 'mp4',
+            'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
+            'thumbnail': 're:http://.*\.jpg',
+        }
+    }, {
+        'url': 'http://vidto.me/ku5glz52nqe1.html',
+        'info_dict': {
+            'id': 'ku5glz52nqe1',
+            'ext': 'mp4',
+            'title': 'test'
+        }
+    }, {
+        'url': 'http://powerwatch.pw/duecjibvicbu',
+        'info_dict': {
+            'id': 'duecjibvicbu',
+            'ext': 'mp4',
+            'title': 'Big Buck Bunny trailer',
+        },
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+
+        url = 'http://%s/%s' % (mobj.group('host'), video_id)
+        webpage = self._download_webpage(url, video_id)
+
+        if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
+            raise ExtractorError('Video %s does not exist' % video_id, expected=True)
+
+        fields = self._hidden_inputs(webpage)
+
+        if fields['op'] == 'download1':
+            countdown = int_or_none(self._search_regex(
+                r'<span id="countdown_str">(?:[Ww]ait)?\s*<span id="cxc">(\d+)</span>\s*(?:seconds?)?</span>',
+                webpage, 'countdown', default=None))
+            if countdown:
+                self._sleep(countdown, video_id)
+
+            post = urlencode_postdata(fields)
+
+            req = sanitized_Request(url, post)
+            req.add_header('Content-type', 'application/x-www-form-urlencoded')
+
+            webpage = self._download_webpage(req, video_id, 'Downloading video page')
+
+        title = (self._search_regex(
+            [r'style="z-index: [0-9]+;">([^<]+)</span>',
+             r'<td nowrap>([^<]+)</td>',
+             r'h4-fine[^>]*>([^<]+)<',
+             r'>Watch (.+) ',
+             r'<h2 class="video-page-head">([^<]+)</h2>'],
+            webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
+        video_url = self._search_regex(
+            [r'file\s*:\s*["\'](http[^"\']+)["\'],',
+             r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'],
+            webpage, 'file url')
+        thumbnail = self._search_regex(
+            r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)
+
+        formats = [{
+            'format_id': 'sd',
+            'url': video_url,
+            'quality': 1,
+        }]
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index b4ad513a0d18ecbae558deceb6234efaa5ce5415..b3547174dd92beffafaf8f220b50b94a25f2fa2b 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -4,11 +4,10 @@ import re
  
  from .common import InfoExtractor
  from ..utils import (
-    ExtractorError,
-    unified_strdate,
-    str_to_int,
+    dict_get,
+    float_or_none,
      int_or_none,
-    parse_duration,
+    unified_strdate,
  )
  
  
@@ -22,8 +21,8 @@ class XHamsterIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'FemaleAgent Shy beauty takes the bait',
                  'upload_date': '20121014',
-                'uploader_id': 'Ruseful2011',
-                'duration': 893,
+                'uploader': 'Ruseful2011',
+                'duration': 893.52,
                  'age_limit': 18,
              }
          },
@@ -34,8 +33,8 @@ class XHamsterIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Britney Spears  Sexy Booty',
                  'upload_date': '20130914',
-                'uploader_id': 'jojo747400',
-                'duration': 200,
+                'uploader': 'jojo747400',
+                'duration': 200.48,
                  'age_limit': 18,
              }
          },
@@ -46,12 +45,12 @@ class XHamsterIE(InfoExtractor):
      ]
  
      def _real_extract(self, url):
-        def extract_video_url(webpage):
-            mp4 = re.search(r'<video\s+.*?file="([^"]+)".*?>', webpage)
-            if mp4 is None:
-                raise ExtractorError('Unable to extract media URL')
-            else:
-                return mp4.group(1)
+        def extract_video_url(webpage, name):
+            return self._search_regex(
+                [r'''file\s*:\s*(?P<q>["'])(?P<mp4>.+?)(?P=q)''',
+                 r'''<a\s+href=(?P<q>["'])(?P<mp4>.+?)(?P=q)\s+class=["']mp4Thumb''',
+                 r'''<video[^>]+file=(?P<q>["'])(?P<mp4>.+?)(?P=q)[^>]*>'''],
+                webpage, name, group='mp4')
  
          def is_hd(webpage):
              return '<div class=\'icon iconHD\'' in webpage
@@ -64,28 +63,36 @@ class XHamsterIE(InfoExtractor):
          mrss_url = '%s://xhamster.com/movies/%s/%s.html' % (proto, video_id, seo)
          webpage = self._download_webpage(mrss_url, video_id)
  
-        title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>', webpage, 'title')
+        title = self._html_search_regex(
+            [r'<h1[^>]*>([^<]+)</h1>',
+             r'<meta[^>]+itemprop=".*?caption.*?"[^>]+content="(.+?)"',
+             r'<title[^>]*>(.+?)(?:,\s*[^,]*?\s*Porn\s*[^,]*?:\s*xHamster[^<]*| - xHamster\.com)</title>'],
+            webpage, 'title')
  
          # Only a few videos have an description
          mobj = re.search(r'<span>Description: </span>([^<]+)', webpage)
          description = mobj.group(1) if mobj else None
  
-        upload_date = self._html_search_regex(r'hint=\'(\d{4}-\d{2}-\d{2}) \d{2}:\d{2}:\d{2} [A-Z]{3,4}\'',
-                                              webpage, 'upload date', fatal=False)
-        if upload_date:
-            upload_date = unified_strdate(upload_date)
+        upload_date = unified_strdate(self._search_regex(
+            r'hint=["\'](\d{4}-\d{2}-\d{2}) \d{2}:\d{2}:\d{2} [A-Z]{3,4}',
+            webpage, 'upload date', fatal=False))
  
-        uploader_id = self._html_search_regex(r'<a href=\'/user/[^>]+>(?P<uploader_id>[^<]+)',
-                                              webpage, 'uploader id', default='anonymous')
+        uploader = self._html_search_regex(
+            r'<span[^>]+itemprop=["\']author[^>]+><a[^>]+href=["\'].+?xhamster\.com/user/[^>]+>(?P<uploader>.+?)</a>',
+            webpage, 'uploader', default='anonymous')
  
-        thumbnail = self._html_search_regex(r'<video\s+.*?poster="([^"]+)".*?>', webpage, 'thumbnail', fatal=False)
+        thumbnail = self._search_regex(
+            [r'''thumb\s*:\s*(?P<q>["'])(?P<thumbnail>.+?)(?P=q)''',
+             r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
+            webpage, 'thumbnail', fatal=False, group='thumbnail')
  
-        duration = parse_duration(self._html_search_regex(r'<span>Runtime:</span> (\d+:\d+)</div>',
-                                                          webpage, 'duration', fatal=False))
+        duration = float_or_none(self._search_regex(
+            r'(["\'])duration\1\s*:\s*(["\'])(?P<duration>.+?)\2',
+            webpage, 'duration', fatal=False, group='duration'))
  
-        view_count = self._html_search_regex(r'<span>Views:</span> ([^<]+)</div>', webpage, 'view count', fatal=False)
-        if view_count:
-            view_count = str_to_int(view_count)
+        view_count = int_or_none(self._search_regex(
+            r'content=["\']User(?:View|Play)s:(\d+)',
+            webpage, 'view count', fatal=False))
  
          mobj = re.search(r"hint='(?P<likecount>\d+) Likes / (?P<dislikecount>\d+) Dislikes'", webpage)
          (like_count, dislike_count) = (mobj.group('likecount'), mobj.group('dislikecount')) if mobj else (None, None)
@@ -97,7 +104,9 @@ class XHamsterIE(InfoExtractor):
  
          hd = is_hd(webpage)
  
-        video_url = extract_video_url(webpage)
+        format_id = 'hd' if hd else 'sd'
+
+        video_url = extract_video_url(webpage, format_id)
          formats = [{
              'url': video_url,
              'format_id': 'hd' if hd else 'sd',
@@ -108,7 +117,7 @@ class XHamsterIE(InfoExtractor):
              mrss_url = self._search_regex(r'<link rel="canonical" href="([^"]+)', webpage, 'mrss_url')
              webpage = self._download_webpage(mrss_url + '?hd', video_id, note='Downloading HD webpage')
              if is_hd(webpage):
-                video_url = extract_video_url(webpage)
+                video_url = extract_video_url(webpage, 'hd')
                  formats.append({
                      'url': video_url,
                      'format_id': 'hd',
@@ -122,7 +131,7 @@ class XHamsterIE(InfoExtractor):
              'title': title,
              'description': description,
              'upload_date': upload_date,
-            'uploader_id': uploader_id,
+            'uploader': uploader,
              'thumbnail': thumbnail,
              'duration': duration,
              'view_count': view_count,
@@ -162,6 +171,12 @@ class XHamsterEmbedIE(InfoExtractor):
  
          video_url = self._search_regex(
              r'href="(https?://xhamster\.com/movies/%s/[^"]+\.html[^"]*)"' % video_id,
-            webpage, 'xhamster url')
+            webpage, 'xhamster url', default=None)
+
+        if not video_url:
+            vars = self._parse_json(
+                self._search_regex(r'vars\s*:\s*({.+?})\s*,\s*\n', webpage, 'vars'),
+                video_id)
+            video_url = dict_get(vars, ('downloadLink', 'homepageLink', 'commentsLink', 'shareUrl'))
  
          return self.url_result(video_url, 'XHamster')
diff --git a/youtube_dl/extractor/xminus.py b/youtube_dl/extractor/xminus.py

index 7c9d8af6f2585207347d58d08fc607ebf4d28900..36e5ead1e690db9bb0c1c1a64650a69c784bbe76 100644 (file)
--- a/youtube_dl/extractor/xminus.py
+++ b/youtube_dl/extractor/xminus.py
@@ -2,15 +2,15 @@
  from __future__ import unicode_literals
  
  import re
+import time
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_chr,
      compat_ord,
  )
  from ..utils import (
      int_or_none,
-    parse_filesize,
+    parse_duration,
  )
  
  
@@ -22,7 +22,7 @@ class XMinusIE(InfoExtractor):
          'info_dict': {
              'id': '4542',
              'ext': 'mp3',
-            'title': 'Леонид Агутин-Песенка шофера',
+            'title': 'Леонид Агутин-Песенка шофёра',
              'duration': 156,
              'tbr': 320,
              'filesize_approx': 5900000,
@@ -36,38 +36,41 @@ class XMinusIE(InfoExtractor):
          webpage = self._download_webpage(url, video_id)
  
          artist = self._html_search_regex(
-            r'minus_track\.artist="(.+?)"', webpage, 'artist')
+            r'<a[^>]+href="/artist/\d+">([^<]+)</a>', webpage, 'artist')
          title = artist + '-' + self._html_search_regex(
-            r'minus_track\.title="(.+?)"', webpage, 'title')
-        duration = int_or_none(self._html_search_regex(
-            r'minus_track\.dur_sec=\'([0-9]*?)\'',
+            r'<span[^>]+class="minustrack-full-title(?:\s+[^"]+)?"[^>]*>([^<]+)', webpage, 'title')
+        duration = parse_duration(self._html_search_regex(
+            r'<span[^>]+class="player-duration(?:\s+[^"]+)?"[^>]*>([^<]+)',
              webpage, 'duration', fatal=False))
-        filesize_approx = parse_filesize(self._html_search_regex(
-            r'<div id="finfo"[^>]*>\s*↓\s*([0-9.]+\s*[a-zA-Z][bB])',
-            webpage, 'approximate filesize', fatal=False))
-        tbr = int_or_none(self._html_search_regex(
-            r'<div class="quality[^"]*"></div>\s*([0-9]+)\s*kbps',
-            webpage, 'bitrate', fatal=False))
+        mobj = re.search(
+            r'<div[^>]+class="dw-info(?:\s+[^"]+)?"[^>]*>(?P<tbr>\d+)\s*кбит/c\s+(?P<filesize>[0-9.]+)\s*мб</div>',
+            webpage)
+        tbr = filesize_approx = None
+        if mobj:
+            filesize_approx = float(mobj.group('filesize')) * 1000000
+            tbr = float(mobj.group('tbr'))
          view_count = int_or_none(self._html_search_regex(
-            r'<div class="quality.*?► ([0-9]+)',
+            r'<span><[^>]+class="icon-chart-bar".*?>(\d+)</span>',
              webpage, 'view count', fatal=False))
          description = self._html_search_regex(
-            r'(?s)<div id="song_texts">(.*?)</div><br',
+            r'(?s)<pre[^>]+id="lyrics-original"[^>]*>(.*?)</pre>',
              webpage, 'song lyrics', fatal=False)
          if description:
              description = re.sub(' *\r *', '\n', description)
  
-        enc_token = self._html_search_regex(
-            r'minus_track\.s?tkn="(.+?)"', webpage, 'enc_token')
-        token = ''.join(
-            c if pos == 3 else compat_chr(compat_ord(c) - 1)
-            for pos, c in enumerate(reversed(enc_token)))
-        video_url = 'http://x-minus.org/dwlf/%s/%s.mp3' % (video_id, token)
+        k = self._search_regex(
+            r'<div[^>]+id="player-bottom"[^>]+data-k="([^"]+)">', webpage,
+            'encoded data')
+        h = time.time() / 3600
+        a = sum(map(int, [compat_ord(c) for c in k])) + int(video_id) + h
+        video_url = 'http://x-minus.me/dl/minus?id=%s&tkn2=%df%d' % (video_id, a, h)
  
          return {
              'id': video_id,
              'title': title,
              'url': video_url,
+            # The extension is unknown until actual downloading
+            'ext': 'mp3',
              'duration': duration,
              'filesize_approx': filesize_approx,
              'tbr': tbr,
diff --git a/youtube_dl/extractor/xstream.py b/youtube_dl/extractor/xstream.py

index 71584c291f9134ac2444a1ac48bba9ae514c2981..76c91bd92c6906f28412aa60e0cb6bf8ace10422 100644 (file)
--- a/youtube_dl/extractor/xstream.py
+++ b/youtube_dl/extractor/xstream.py
@@ -42,11 +42,7 @@ class XstreamIE(InfoExtractor):
          'only_matching': True,
      }]
  
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        partner_id = mobj.group('partner_id')
-        video_id = mobj.group('id')
-
+    def _extract_video_info(self, partner_id, video_id):
          data = self._download_xml(
              'http://frontend.xstream.dk/%s/feed/video/?platform=web&id=%s'
              % (partner_id, video_id),
@@ -97,6 +93,7 @@ class XstreamIE(InfoExtractor):
              formats.append({
                  'url': link.get('href'),
                  'format_id': link.get('rel'),
+                'preference': 1,
              })
  
          thumbnails = [{
@@ -113,3 +110,10 @@ class XstreamIE(InfoExtractor):
              'formats': formats,
              'thumbnails': thumbnails,
          }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        partner_id = mobj.group('partner_id')
+        video_id = mobj.group('id')
+
+        return self._extract_video_info(partner_id, video_id)
diff --git a/youtube_dl/extractor/xtube.py b/youtube_dl/extractor/xtube.py

index 779e4f46a1dd5315c6a9be3dad09e65c07a205b2..4075b8a4f8a705cf29aa1430656146350a8d07aa 100644 (file)
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -1,21 +1,23 @@
  from __future__ import unicode_literals
  
+import itertools
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_request,
-    compat_urllib_parse_unquote,
-)
+from ..compat import compat_urllib_parse_unquote
  from ..utils import (
-    parse_duration,
+    int_or_none,
+    orderedSet,
+    sanitized_Request,
      str_to_int,
  )
  
  
  class XTubeIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>xtube\.com/watch\.php\?v=(?P<id>[^/?&#]+))'
-    _TEST = {
+    _VALID_URL = r'(?:xtube:|https?://(?:www\.)?xtube\.com/(?:watch\.php\?.*\bv=|video-watch/(?P<display_id>[^/]+)-))(?P<id>[^/?&#]+)'
+
+    _TESTS = [{
+        # old URL schema
          'url': 'http://www.xtube.com/watch.php?v=kVTUy_G222_',
          'md5': '092fbdd3cbe292c920ef6fc6a8a9cdab',
          'info_dict': {
@@ -27,108 +29,104 @@ class XTubeIE(InfoExtractor):
              'duration': 450,
              'age_limit': 18,
          }
-    }
+    }, {
+        # new URL schema
+        'url': 'http://www.xtube.com/video-watch/strange-erotica-625837',
+        'only_matching': True,
+    }, {
+        'url': 'xtube:625837',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
+
+        if not display_id:
+            display_id = video_id
+            url = 'http://www.xtube.com/watch.php?v=%s' % video_id
+
+        req = sanitized_Request(url)
+        req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1')
+        webpage = self._download_webpage(req, display_id)
  
-        req = compat_urllib_request.Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
+        flashvars = self._parse_json(
+            self._search_regex(
+                r'xt\.playerOps\s*=\s*({.+?});', webpage, 'player ops'),
+            video_id)['flashvars']
  
-        video_title = self._html_search_regex(
-            r'<p class="title">([^<]+)', webpage, 'title')
-        video_uploader = self._html_search_regex(
-            [r"var\s+contentOwnerId\s*=\s*'([^']+)",
-             r'By:\s*<a href="/community/profile\.php\?user=([^"]+)'],
+        title = flashvars.get('title') or self._search_regex(
+            r'<h1>([^<]+)</h1>', webpage, 'title')
+        video_url = compat_urllib_parse_unquote(flashvars['video_url'])
+        duration = int_or_none(flashvars.get('video_duration'))
+
+        uploader = self._search_regex(
+            r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
              webpage, 'uploader', fatal=False)
-        video_description = self._html_search_regex(
-            r'<p class="fieldsDesc">([^<]+)',
-            webpage, 'description', fatal=False)
-        duration = parse_duration(self._html_search_regex(
-            r'<span class="bold">Runtime:</span> ([^<]+)</p>',
-            webpage, 'duration', fatal=False))
-        view_count = str_to_int(self._html_search_regex(
-            r'<span class="bold">Views:</span> ([\d,\.]+)</p>',
+        description = self._search_regex(
+            r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
+        view_count = str_to_int(self._search_regex(
+            r'<dt>Views:</dt>\s*<dd>([\d,\.]+)</dd>',
              webpage, 'view count', fatal=False))
          comment_count = str_to_int(self._html_search_regex(
-            r'<div id="commentBar">([\d,\.]+) Comments</div>',
+            r'>Comments? \(([\d,\.]+)\)<',
              webpage, 'comment count', fatal=False))
  
-        formats = []
-        for format_id, video_url in re.findall(
-                r'flashvars\.quality_(.+?)\s*=\s*"([^"]+)"', webpage):
-            fmt = {
-                'url': compat_urllib_parse_unquote(video_url),
-                'format_id': format_id,
-            }
-            m = re.search(r'^(?P<height>\d+)[pP]', format_id)
-            if m:
-                fmt['height'] = int(m.group('height'))
-            formats.append(fmt)
-
-        if not formats:
-            video_url = compat_urllib_parse_unquote(self._search_regex(
-                r'flashvars\.video_url\s*=\s*"([^"]+)"',
-                webpage, 'video URL'))
-            formats.append({'url': video_url})
-
-        self._sort_formats(formats)
-
          return {
              'id': video_id,
-            'title': video_title,
-            'uploader': video_uploader,
-            'description': video_description,
+            'display_id': display_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'uploader': uploader,
              'duration': duration,
              'view_count': view_count,
              'comment_count': comment_count,
-            'formats': formats,
              'age_limit': 18,
          }
  
  
  class XTubeUserIE(InfoExtractor):
      IE_DESC = 'XTube user profile'
-    _VALID_URL = r'https?://(?:www\.)?xtube\.com/community/profile\.php\?(.*?)user=(?P<username>[^&#]+)(?:$|[&#])'
+    _VALID_URL = r'https?://(?:www\.)?xtube\.com/profile/(?P<id>[^/]+-\d+)'
      _TEST = {
-        'url': 'http://www.xtube.com/community/profile.php?user=greenshowers',
+        'url': 'http://www.xtube.com/profile/greenshowers-4056496',
          'info_dict': {
-            'id': 'greenshowers',
+            'id': 'greenshowers-4056496',
              'age_limit': 18,
          },
          'playlist_mincount': 155,
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        username = mobj.group('username')
-
-        profile_page = self._download_webpage(
-            url, username, note='Retrieving profile page')
-
-        video_count = int(self._search_regex(
-            r'<strong>%s\'s Videos \(([0-9]+)\)</strong>' % username, profile_page,
-            'video count'))
-
-        PAGE_SIZE = 25
-        urls = []
-        page_count = (video_count + PAGE_SIZE + 1) // PAGE_SIZE
-        for n in range(1, page_count + 1):
-            lpage_url = 'http://www.xtube.com/user_videos.php?page=%d&u=%s' % (n, username)
-            lpage = self._download_webpage(
-                lpage_url, username,
-                note='Downloading page %d/%d' % (n, page_count))
-            urls.extend(
-                re.findall(r'addthis:url="([^"]+)"', lpage))
-
-        return {
-            '_type': 'playlist',
-            'id': username,
-            'age_limit': 18,
-            'entries': [{
-                '_type': 'url',
-                'url': eurl,
-                'ie_key': 'XTube',
-            } for eurl in urls]
-        }
+        user_id = self._match_id(url)
+
+        entries = []
+        for pagenum in itertools.count(1):
+            request = sanitized_Request(
+                'http://www.xtube.com/profile/%s/videos/%d' % (user_id, pagenum),
+                headers={
+                    'Cookie': 'popunder=4',
+                    'X-Requested-With': 'XMLHttpRequest',
+                    'Referer': url,
+                })
+
+            page = self._download_json(
+                request, user_id, 'Downloading videos JSON page %d' % pagenum)
+
+            html = page.get('html')
+            if not html:
+                break
+
+            for video_id in orderedSet([video_id for _, video_id in re.findall(
+                    r'data-plid=(["\'])(.+?)\1', html)]):
+                entries.append(self.url_result('xtube:%s' % video_id, XTubeIE.ie_key()))
+
+            page_count = int_or_none(page.get('pageCount'))
+            if not page_count or pagenum == page_count:
+                break
+
+        playlist = self.playlist_result(entries, user_id)
+        playlist['age_limit'] = 18
+        return playlist
diff --git a/youtube_dl/extractor/xuite.py b/youtube_dl/extractor/xuite.py

index 5aac8adb36e2ad12e798cb4f0c77e5b204c7b91b..2466410faaba4e0047fe26099ee936b68dcb9e34 100644 (file)
--- a/youtube_dl/extractor/xuite.py
+++ b/youtube_dl/extractor/xuite.py
@@ -19,7 +19,7 @@ class XuiteIE(InfoExtractor):
      _TESTS = [{
          # Audio
          'url': 'http://vlog.xuite.net/play/RGkzc1ZULTM4NjA5MTQuZmx2',
-        'md5': '63a42c705772aa53fd4c1a0027f86adf',
+        'md5': 'e79284c87b371424885448d11f6398c8',
          'info_dict': {
              'id': '3860914',
              'ext': 'mp3',
@@ -34,19 +34,20 @@ class XuiteIE(InfoExtractor):
          },
      }, {
          # Video with only one format
-        'url': 'http://vlog.xuite.net/play/TkRZNjhULTM0NDE2MjkuZmx2',
-        'md5': 'c45737fc8ac5dc8ac2f92ecbcecf505e',
+        'url': 'http://vlog.xuite.net/play/WUxxR2xCLTI1OTI1MDk5LmZsdg==',
+        'md5': '21f7b39c009b5a4615b4463df6eb7a46',
          'info_dict': {
-            'id': '3441629',
+            'id': '25925099',
              'ext': 'mp4',
-            'title': '孫燕姿 - 眼淚成詩',
+            'title': 'BigBuckBunny_320x180',
              'thumbnail': 're:^https?://.*\.jpg$',
-            'duration': 217.399,
-            'timestamp': 1299383640,
-            'upload_date': '20110306',
-            'uploader': 'Valen',
-            'uploader_id': '10400126',
-            'categories': ['影視娛樂'],
+            'duration': 596.458,
+            'timestamp': 1454242500,
+            'upload_date': '20160131',
+            'uploader': 'yan12125',
+            'uploader_id': '12158353',
+            'categories': ['個人短片'],
+            'description': 'http://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4',
          },
      }, {
          # Video with two formats
diff --git a/youtube_dl/extractor/xvideos.py b/youtube_dl/extractor/xvideos.py

index 5dcf2fdd12f9140f0bd373fd5db41c93f4b18b38..710ad5041988b0e1c932b135af91a27036dfd664 100644 (file)
--- a/youtube_dl/extractor/xvideos.py
+++ b/youtube_dl/extractor/xvideos.py
@@ -3,14 +3,12 @@ from __future__ import unicode_literals
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_request,
-)
+from ..compat import compat_urllib_parse_unquote
  from ..utils import (
      clean_html,
      ExtractorError,
      determine_ext,
+    sanitized_Request,
  )
  
  
@@ -48,7 +46,7 @@ class XVideosIE(InfoExtractor):
              'url': video_url,
          }]
  
-        android_req = compat_urllib_request.Request(url)
+        android_req = sanitized_Request(url)
          android_req.add_header('User-Agent', self._ANDROID_USER_AGENT)
          android_webpage = self._download_webpage(android_req, video_id, fatal=False)
  
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index f9afbdbab611e233c7f7014ae7d66e996f2b7c31..b376f2b935da82d5b031bdf5b6910593f51add65 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -8,6 +8,7 @@ import re
  from .common import InfoExtractor, SearchInfoExtractor
  from ..compat import (
      compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -23,7 +24,7 @@ from .nbc import NBCSportsVPlayerIE
  
  class YahooIE(InfoExtractor):
      IE_DESC = 'Yahoo screen and movies'
-    _VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?\.html)'
+    _VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?(?:\.html)?)'
      _TESTS = [
          {
              'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
@@ -37,7 +38,7 @@ class YahooIE(InfoExtractor):
          },
          {
              'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
-            'md5': 'd6e6fc6e1313c608f316ddad7b82b306',
+            'md5': 'c3466d2b6d5dd6b9f41ba9ed04c24b23',
              'info_dict': {
                  'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9',
                  'ext': 'mp4',
@@ -48,7 +49,7 @@ class YahooIE(InfoExtractor):
          },
          {
              'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
-            'md5': '60e8ac193d8fb71997caa8fce54c6460',
+            'md5': '75ffabdb87c16d4ffe8c036dc4d1c136',
              'info_dict': {
                  'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
                  'ext': 'mp4',
@@ -58,15 +59,15 @@ class YahooIE(InfoExtractor):
              }
          },
          {
-            'url': 'https://tw.screen.yahoo.com/election-2014-askmayor/敢問市長-黃秀霜批賴清德-非常高傲-033009720.html',
-            'md5': '3a09cf59349cfaddae1797acc3c087fc',
+            'url': 'https://tw.news.yahoo.com/%E6%95%A2%E5%95%8F%E5%B8%82%E9%95%B7%20%E9%BB%83%E7%A7%80%E9%9C%9C%E6%89%B9%E8%B3%B4%E6%B8%85%E5%BE%B7%20%E9%9D%9E%E5%B8%B8%E9%AB%98%E5%82%B2-034024051.html',
+            'md5': '9035d38f88b1782682a3e89f985be5bb',
              'info_dict': {
                  'id': 'cac903b3-fcf4-3c14-b632-643ab541712f',
                  'ext': 'mp4',
                  'title': '敢問市長／黃秀霜批賴清德「非常高傲」',
                  'description': '直言台南沒捷運 交通居五都之末',
                  'duration': 396,
-            }
+            },
          },
          {
              'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
@@ -88,20 +89,35 @@ class YahooIE(InfoExtractor):
                  'title': 'Program that makes hockey more affordable not offered in Manitoba',
                  'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4',
                  'duration': 121,
-            }
+            },
+            'skip': 'Video gone',
          }, {
              'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html',
-            'md5': '226a895aae7e21b0129e2a2006fe9690',
              'info_dict': {
-                'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
-                'ext': 'mp4',
-                'title': '\'The Interview\' TV Spot: War',
-                'description': 'The Interview',
-                'duration': 30,
-            }
+                'id': '154609075',
+            },
+            'playlist': [{
+                'md5': 'f8e336c6b66f503282e5f719641d6565',
+                'info_dict': {
+                    'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
+                    'ext': 'mp4',
+                    'title': '\'The Interview\' TV Spot: War',
+                    'description': 'The Interview',
+                    'duration': 30,
+                },
+            }, {
+                'md5': '958bcb90b4d6df71c56312137ee1cd5a',
+                'info_dict': {
+                    'id': '1fc8ada0-718e-3abe-a450-bf31f246d1a9',
+                    'ext': 'mp4',
+                    'title': '\'The Interview\' TV Spot: Guys',
+                    'description': 'The Interview',
+                    'duration': 30,
+                },
+            }],
          }, {
              'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
-            'md5': '67010fdf3a08d290e060a4dd96baa07b',
+            'md5': '88e209b417f173d86186bef6e4d1f160',
              'info_dict': {
                  'id': 'f885cf7f-43d4-3450-9fac-46ac30ece521',
                  'ext': 'mp4',
@@ -118,10 +134,11 @@ class YahooIE(InfoExtractor):
                  'title': 'Connect the Dots: Dark Side of Virgo',
                  'description': 'md5:1428185051cfd1949807ad4ff6d3686a',
                  'duration': 201,
-            }
+            },
+            'skip': 'Domain name in.lifestyle.yahoo.com gone',
          }, {
              'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
-            'md5': '989396ae73d20c6f057746fb226aa215',
+            'md5': 'b17ac378b1134fa44370fb27db09a744',
              'info_dict': {
                  'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
                  'ext': 'mp4',
@@ -140,11 +157,45 @@ class YahooIE(InfoExtractor):
                  'ext': 'flv',
                  'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
                  'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
+                'upload_date': '20150313',
+                'uploader': 'NBCU-SPORTS',
+                'timestamp': 1426270238,
              }
          }, {
              'url': 'https://tw.news.yahoo.com/-100120367.html',
              'only_matching': True,
-        }
+        }, {
+            # Query result is embedded in webpage, but explicit request to video API fails with geo restriction
+            'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html',
+            'md5': '1ddbf7c850777548438e5c4f147c7b8c',
+            'info_dict': {
+                'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504',
+                'ext': 'mp4',
+                'title': 'Communitary - Community Episode 1: Ladders',
+                'description': 'md5:8fc39608213295748e1e289807838c97',
+                'duration': 1646,
+            },
+        }, {
+            # it uses an alias to get the video_id
+            'url': 'https://www.yahoo.com/movies/the-stars-of-daddys-home-have-very-different-212843197.html',
+            'info_dict': {
+                'id': '40eda9c8-8e5f-3552-8745-830f67d0c737',
+                'ext': 'mp4',
+                'title': 'Will Ferrell & Mark Wahlberg Are Pro-Spanking',
+                'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.',
+            },
+        },
+        {
+            # config['models']['applet_model']['data']['sapi'] has no query
+            'url': 'https://www.yahoo.com/music/livenation/event/galactic-2016',
+            'md5': 'dac0c72d502bc5facda80c9e6d5c98db',
+            'info_dict': {
+                'id': 'a6015640-e9e5-3efb-bb60-05589a183919',
+                'ext': 'mp4',
+                'description': 'Galactic',
+                'title': 'Dolla Diva (feat. Maggie Koerner)',
+            },
+        },
      ]
  
      def _real_extract(self, url):
@@ -153,35 +204,66 @@ class YahooIE(InfoExtractor):
          page_id = mobj.group('id')
          url = mobj.group('url')
          host = mobj.group('host')
-        webpage = self._download_webpage(url, display_id)
+        webpage, urlh = self._download_webpage_handle(url, display_id)
+        if 'err=404' in urlh.geturl():
+            raise ExtractorError('Video gone', expected=True)
  
          # Look for iframed media first
-        iframe_m = re.search(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
-        if iframe_m:
+        entries = []
+        iframe_urls = re.findall(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
+        for idx, iframe_url in enumerate(iframe_urls):
              iframepage = self._download_webpage(
-                host + iframe_m.group(1), display_id, 'Downloading iframe webpage')
+                host + iframe_url, display_id,
+                note='Downloading iframe webpage for video #%d' % idx)
              items_json = self._search_regex(
                  r'mediaItems: (\[.+?\])$', iframepage, 'items', flags=re.MULTILINE, default=None)
              if items_json:
                  items = json.loads(items_json)
                  video_id = items[0]['id']
-                return self._get_info(video_id, display_id, webpage)
+                entries.append(self._get_info(video_id, display_id, webpage))
+        if entries:
+            return self.playlist_result(entries, page_id)
+
          # Look for NBCSports iframes
          nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
          if nbc_sports_url:
              return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
  
+        # Query result is often embedded in webpage as JSON. Sometimes explicit requests
+        # to video API results in a failure with geo restriction reason therefore using
+        # embedded query result when present sounds reasonable.
+        config_json = self._search_regex(
+            r'window\.Af\.bootstrap\[[^\]]+\]\s*=\s*({.*?"applet_type"\s*:\s*"td-applet-videoplayer".*?});(?:</script>|$)',
+            webpage, 'videoplayer applet', default=None)
+        if config_json:
+            config = self._parse_json(config_json, display_id, fatal=False)
+            if config:
+                sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi')
+                if sapi and 'query' in sapi:
+                    return self._extract_info(display_id, sapi, webpage)
+
          items_json = self._search_regex(
              r'mediaItems: ({.*?})$', webpage, 'items', flags=re.MULTILINE,
              default=None)
          if items_json is None:
-            CONTENT_ID_REGEXES = [
-                r'YUI\.namespace\("Media"\)\.CONTENT_ID\s*=\s*"([^"]+)"',
-                r'root\.App\.Cache\.context\.videoCache\.curVideo = \{"([^"]+)"',
-                r'"first_videoid"\s*:\s*"([^"]+)"',
-                r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
-            ]
-            video_id = self._search_regex(CONTENT_ID_REGEXES, webpage, 'content ID')
+            alias = self._search_regex(
+                r'"aliases":{"video":"(.*?)"', webpage, 'alias', default=None)
+            if alias is not None:
+                alias_info = self._download_json(
+                    'https://www.yahoo.com/_td/api/resource/VideoService.videos;video_aliases=["%s"]' % alias,
+                    display_id, 'Downloading alias info')
+                video_id = alias_info[0]['id']
+            else:
+                CONTENT_ID_REGEXES = [
+                    r'YUI\.namespace\("Media"\)\.CONTENT_ID\s*=\s*"([^"]+)"',
+                    r'root\.App\.Cache\.context\.videoCache\.curVideo = \{"([^"]+)"',
+                    r'"first_videoid"\s*:\s*"([^"]+)"',
+                    r'%s[^}]*"ccm_id"\s*:\s*"([^"]+)"' % re.escape(page_id),
+                    r'<article[^>]data-uuid=["\']([^"\']+)',
+                    r'yahoo://article/view\?.*\buuid=([^&"\']+)',
+                ]
+                video_id = self._search_regex(
+                    CONTENT_ID_REGEXES, webpage, 'content ID')
          else:
              items = json.loads(items_json)
              info = items['mediaItems']['query']['results']['mediaObj'][0]
@@ -190,22 +272,10 @@ class YahooIE(InfoExtractor):
              video_id = info['id']
          return self._get_info(video_id, display_id, webpage)
  
-    def _get_info(self, video_id, display_id, webpage):
-        region = self._search_regex(
-            r'\\?"region\\?"\s*:\s*\\?"([^"]+?)\\?"',
-            webpage, 'region', fatal=False, default='US')
-        data = compat_urllib_parse.urlencode({
-            'protocol': 'http',
-            'region': region,
-        })
-        query_url = (
-            'https://video.media.yql.yahoo.com/v1/video/sapi/streams/'
-            '{id}?{data}'.format(id=video_id, data=data))
-        query_result = self._download_json(
-            query_url, display_id, 'Downloading video info')
-
-        info = query_result['query']['results']['mediaObj'][0]
+    def _extract_info(self, display_id, query, webpage):
+        info = query['query']['results']['mediaObj'][0]
          meta = info.get('meta')
+        video_id = info.get('id')
  
          if not meta:
              msg = info['status'].get('msg')
@@ -231,6 +301,9 @@ class YahooIE(InfoExtractor):
                      'ext': 'flv',
                  })
              else:
+                if s.get('format') == 'm3u8_playlist':
+                    format_info['protocol'] = 'm3u8_native'
+                    format_info['ext'] = 'mp4'
                  format_url = compat_urlparse.urljoin(host, path)
                  format_info['url'] = format_url
              formats.append(format_info)
@@ -264,6 +337,21 @@ class YahooIE(InfoExtractor):
              'subtitles': subtitles,
          }
  
+    def _get_info(self, video_id, display_id, webpage):
+        region = self._search_regex(
+            r'\\?"region\\?"\s*:\s*\\?"([^"]+?)\\?"',
+            webpage, 'region', fatal=False, default='US')
+        data = compat_urllib_parse_urlencode({
+            'protocol': 'http',
+            'region': region,
+        })
+        query_url = (
+            'https://video.media.yql.yahoo.com/v1/video/sapi/streams/'
+            '{id}?{data}'.format(id=video_id, data=data))
+        query_result = self._download_json(
+            query_url, display_id, 'Downloading video info')
+        return self._extract_info(display_id, query_result, webpage)
+
  
  class YahooSearchIE(SearchInfoExtractor):
      IE_DESC = 'Yahoo screen search'
diff --git a/youtube_dl/extractor/yam.py b/youtube_dl/extractor/yam.py

index 001ee17b6f93d457bdc2fbdaf802b61ef19e1b41..63bbc06346a04b385c722eaae22d0ff5c41445f4 100644 (file)
--- a/youtube_dl/extractor/yam.py
+++ b/youtube_dl/extractor/yam.py
@@ -15,7 +15,7 @@ from ..utils import (
  
  class YamIE(InfoExtractor):
      IE_DESC = '蕃薯藤yam天空部落'
-    _VALID_URL = r'http://mymedia.yam.com/m/(?P<id>\d+)'
+    _VALID_URL = r'https?://mymedia.yam.com/m/(?P<id>\d+)'
  
      _TESTS = [{
          # An audio hosted on Yam
diff --git a/youtube_dl/extractor/yandexmusic.py b/youtube_dl/extractor/yandexmusic.py

index f4c0f5702e59bea80046438606dd9a28271d8a30..7a90cc60cfa61a880ba8a0d05cd841017a24bfa8 100644 (file)
--- a/youtube_dl/extractor/yandexmusic.py
+++ b/youtube_dl/extractor/yandexmusic.py
@@ -1,4 +1,4 @@
-# coding=utf-8
+# coding: utf-8
  from __future__ import unicode_literals
  
  import re
@@ -7,12 +7,49 @@ import hashlib
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
+    ExtractorError,
      int_or_none,
      float_or_none,
+    sanitized_Request,
+    urlencode_postdata,
  )
  
  
  class YandexMusicBaseIE(InfoExtractor):
+    @staticmethod
+    def _handle_error(response):
+        error = response.get('error')
+        if error:
+            raise ExtractorError(error, expected=True)
+
+    def _download_json(self, *args, **kwargs):
+        response = super(YandexMusicBaseIE, self)._download_json(*args, **kwargs)
+        self._handle_error(response)
+        return response
+
+
+class YandexMusicTrackIE(YandexMusicBaseIE):
+    IE_NAME = 'yandexmusic:track'
+    IE_DESC = 'Яндекс.Музыка - Трек'
+    _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
+
+    _TEST = {
+        'url': 'http://music.yandex.ru/album/540508/track/4878838',
+        'md5': 'f496818aa2f60b6c0062980d2e00dc20',
+        'info_dict': {
+            'id': '4878838',
+            'ext': 'mp3',
+            'title': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio - Gypsy Eyes 1',
+            'filesize': 4628061,
+            'duration': 193.04,
+            'track': 'Gypsy Eyes 1',
+            'album': 'Gypsy Soul',
+            'album_artist': 'Carlo Ambrosio',
+            'artist': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio',
+            'release_year': '2009',
+        }
+    }
+
      def _get_track_url(self, storage_dir, track_id):
          data = self._download_json(
              'http://music.yandex.ru/api/v1.5/handlers/api-jsonp.jsx?action=getTrackSrc&p=download-info/%s'
@@ -26,32 +63,50 @@ class YandexMusicBaseIE(InfoExtractor):
                  % (data['host'], key, data['ts'] + data['path'], storage[1]))
  
      def _get_track_info(self, track):
-        return {
+        thumbnail = None
+        cover_uri = track.get('albums', [{}])[0].get('coverUri')
+        if cover_uri:
+            thumbnail = cover_uri.replace('%%', 'orig')
+            if not thumbnail.startswith('http'):
+                thumbnail = 'http://' + thumbnail
+
+        track_title = track['title']
+        track_info = {
              'id': track['id'],
              'ext': 'mp3',
              'url': self._get_track_url(track['storageDir'], track['id']),
-            'title': '%s - %s' % (track['artists'][0]['name'], track['title']),
              'filesize': int_or_none(track.get('fileSize')),
              'duration': float_or_none(track.get('durationMs'), 1000),
+            'thumbnail': thumbnail,
+            'track': track_title,
          }
  
-
-class YandexMusicTrackIE(YandexMusicBaseIE):
-    IE_NAME = 'yandexmusic:track'
-    IE_DESC = 'Яндекс.Музыка - Трек'
-    _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
-
-    _TEST = {
-        'url': 'http://music.yandex.ru/album/540508/track/4878838',
-        'md5': 'f496818aa2f60b6c0062980d2e00dc20',
-        'info_dict': {
-            'id': '4878838',
-            'ext': 'mp3',
-            'title': 'Carlo Ambrosio - Gypsy Eyes 1',
-            'filesize': 4628061,
-            'duration': 193.04,
-        }
-    }
+        def extract_artist(artist_list):
+            if artist_list and isinstance(artist_list, list):
+                artists_names = [a['name'] for a in artist_list if a.get('name')]
+                if artists_names:
+                    return ', '.join(artists_names)
+
+        albums = track.get('albums')
+        if albums and isinstance(albums, list):
+            album = albums[0]
+            if isinstance(album, dict):
+                year = album.get('year')
+                track_info.update({
+                    'album': album.get('title'),
+                    'album_artist': extract_artist(album.get('artists')),
+                    'release_year': compat_str(year) if year else None,
+                })
+
+        track_artist = extract_artist(track.get('artists'))
+        if track_artist:
+            track_info.update({
+                'artist': track_artist,
+                'title': '%s - %s' % (track_artist, track_title),
+            })
+        else:
+            track_info['title'] = track_title
+        return track_info
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
@@ -64,7 +119,15 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
          return self._get_track_info(track)
  
  
-class YandexMusicAlbumIE(YandexMusicBaseIE):
+class YandexMusicPlaylistBaseIE(YandexMusicBaseIE):
+    def _build_playlist(self, tracks):
+        return [
+            self.url_result(
+                'http://music.yandex.ru/album/%s/track/%s' % (track['albums'][0]['id'], track['id']))
+            for track in tracks if track.get('albums') and isinstance(track.get('albums'), list)]
+
+
+class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
      IE_NAME = 'yandexmusic:album'
      IE_DESC = 'Яндекс.Музыка - Альбом'
      _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)'
@@ -85,7 +148,7 @@ class YandexMusicAlbumIE(YandexMusicBaseIE):
              'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id,
              album_id, 'Downloading album JSON')
  
-        entries = [self._get_track_info(track) for track in album['volumes'][0]]
+        entries = self._build_playlist(album['volumes'][0])
  
          title = '%s - %s' % (album['artists'][0]['name'], album['title'])
          year = album.get('year')
@@ -95,12 +158,12 @@ class YandexMusicAlbumIE(YandexMusicBaseIE):
          return self.playlist_result(entries, compat_str(album['id']), title)
  
  
-class YandexMusicPlaylistIE(YandexMusicBaseIE):
+class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
      IE_NAME = 'yandexmusic:playlist'
      IE_DESC = 'Яндекс.Музыка - Плейлист'
      _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/users/[^/]+/playlists/(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://music.yandex.ru/users/music.partners/playlists/1245',
          'info_dict': {
              'id': '1245',
@@ -108,20 +171,54 @@ class YandexMusicPlaylistIE(YandexMusicBaseIE):
              'description': 'md5:3b9f27b0efbe53f2ee1e844d07155cc9',
          },
          'playlist_count': 6,
-    }
+    }, {
+        # playlist exceeding the limit of 150 tracks shipped with webpage (see
+        # https://github.com/rg3/youtube-dl/issues/6666)
+        'url': 'https://music.yandex.ru/users/ya.playlist/playlists/1036',
+        'info_dict': {
+            'id': '1036',
+            'title': 'Музыка 90-х',
+        },
+        'playlist_count': 310,
+    }]
  
      def _real_extract(self, url):
          playlist_id = self._match_id(url)
  
          webpage = self._download_webpage(url, playlist_id)
  
-        playlist = self._parse_json(
+        mu = self._parse_json(
              self._search_regex(
                  r'var\s+Mu\s*=\s*({.+?});\s*</script>', webpage, 'player'),
-            playlist_id)['pageData']['playlist']
-
-        entries = [self._get_track_info(track) for track in playlist['tracks']]
+            playlist_id)
+
+        playlist = mu['pageData']['playlist']
+        tracks, track_ids = playlist['tracks'], playlist['trackIds']
+
+        # tracks dictionary shipped with webpage is limited to 150 tracks,
+        # missing tracks should be retrieved manually.
+        if len(tracks) < len(track_ids):
+            present_track_ids = set([compat_str(track['id']) for track in tracks if track.get('id')])
+            missing_track_ids = set(map(compat_str, track_ids)) - set(present_track_ids)
+            request = sanitized_Request(
+                'https://music.yandex.ru/handlers/track-entries.jsx',
+                urlencode_postdata({
+                    'entries': ','.join(missing_track_ids),
+                    'lang': mu.get('settings', {}).get('lang', 'en'),
+                    'external-domain': 'music.yandex.ru',
+                    'overembed': 'false',
+                    'sign': mu.get('authData', {}).get('user', {}).get('sign'),
+                    'strict': 'true',
+                }))
+            request.add_header('Referer', url)
+            request.add_header('X-Requested-With', 'XMLHttpRequest')
+
+            missing_tracks = self._download_json(
+                request, playlist_id, 'Downloading missing tracks JSON', fatal=False)
+            if missing_tracks:
+                tracks.extend(missing_tracks)
  
          return self.playlist_result(
-            entries, compat_str(playlist_id),
+            self._build_playlist(tracks),
+            compat_str(playlist_id),
              playlist['title'], playlist.get('description'))
diff --git a/youtube_dl/extractor/ynet.py b/youtube_dl/extractor/ynet.py

index 869f3e8190ca0b751366a85f142a0b49fe294fa1..0d943c3432a57570afc229ab4efce66b2118f763 100644 (file)
--- a/youtube_dl/extractor/ynet.py
+++ b/youtube_dl/extractor/ynet.py
@@ -9,7 +9,7 @@ from ..compat import compat_urllib_parse_unquote_plus
  
  
  class YnetIE(InfoExtractor):
-    _VALID_URL = r'http://(?:.+?\.)?ynet\.co\.il/(?:.+?/)?0,7340,(?P<id>L(?:-[0-9]+)+),00\.html'
+    _VALID_URL = r'https?://(?:.+?\.)?ynet\.co\.il/(?:.+?/)?0,7340,(?P<id>L(?:-[0-9]+)+),00\.html'
      _TESTS = [
          {
              'url': 'http://hot.ynet.co.il/home/0,7340,L-11659-99244,00.html',
@@ -41,10 +41,12 @@ class YnetIE(InfoExtractor):
          m = re.search(r'ynet - HOT -- (["\']+)(?P<title>.+?)\1', title)
          if m:
              title = m.group('title')
+        formats = self._extract_f4m_formats(f4m_url, video_id)
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
-            'formats': self._extract_f4m_formats(f4m_url, video_id),
+            'formats': formats,
              'thumbnail': self._og_search_thumbnail(webpage),
          }
diff --git a/youtube_dl/extractor/youjizz.py b/youtube_dl/extractor/youjizz.py

index c642075dcfabbfb025d64b92e392d614578f42b1..4150b28daffad5c8cae227c4bf76a125733ced73 100644 (file)
--- a/youtube_dl/extractor/youjizz.py
+++ b/youtube_dl/extractor/youjizz.py
@@ -16,8 +16,8 @@ class YouJizzIE(InfoExtractor):
          'info_dict': {
              'id': '2189178',
              'ext': 'flv',
-            "title": "Zeichentrick 1",
-            "age_limit": 18,
+            'title': 'Zeichentrick 1',
+            'age_limit': 18,
          }
      }
  
diff --git a/youtube_dl/extractor/youku.py b/youtube_dl/extractor/youku.py

index 78caeb8b36e0be8cf4e97365d9e28251723059b7..349ce09414b765060ac5f06121b87b529e287a12 100644 (file)
--- a/youtube_dl/extractor/youku.py
+++ b/youtube_dl/extractor/youku.py
@@ -2,14 +2,18 @@
  from __future__ import unicode_literals
  
  import base64
+import random
+import string
+import time
  
  from .common import InfoExtractor
-from ..utils import ExtractorError
-
  from ..compat import (
-    compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_ord,
-    compat_urllib_request,
+)
+from ..utils import (
+    ExtractorError,
+    sanitized_Request,
  )
  
  
@@ -24,8 +28,8 @@ class YoukuIE(InfoExtractor):
      '''
  
      _TESTS = [{
+        # MD5 is unstable
          'url': 'http://v.youku.com/v_show/id_XMTc1ODE5Njcy.html',
-        'md5': '5f3af4192eabacc4501508d54a8cabd7',
          'info_dict': {
              'id': 'XMTc1ODE5Njcy_part1',
              'title': '★Smile﹗♡ Git Fresh -Booty Music舞蹈.',
@@ -41,6 +45,7 @@ class YoukuIE(InfoExtractor):
              'title': '武媚娘传奇 85',
          },
          'playlist_count': 11,
+        'skip': 'Available in China only',
      }, {
          'url': 'http://v.youku.com/v_show/id_XMTI1OTczNDM5Mg==.html',
          'info_dict': {
@@ -48,10 +53,28 @@ class YoukuIE(InfoExtractor):
              'title': '花千骨 04',
          },
          'playlist_count': 13,
-        'skip': 'Available in China only',
+    }, {
+        'url': 'http://v.youku.com/v_show/id_XNjA1NzA2Njgw.html',
+        'note': 'Video protected with password',
+        'info_dict': {
+            'id': 'XNjA1NzA2Njgw',
+            'title': '邢義田复旦讲座之想象中的胡人—从“左衽孔子”说起',
+        },
+        'playlist_count': 19,
+        'params': {
+            'videopassword': '100600',
+        },
+    }, {
+        # /play/get.json contains streams with "channel_type":"tail"
+        'url': 'http://v.youku.com/v_show/id_XOTUxMzg4NDMy.html',
+        'info_dict': {
+            'id': 'XOTUxMzg4NDMy',
+            'title': '我的世界☆明月庄主☆车震猎杀☆杀人艺术Minecraft',
+        },
+        'playlist_count': 6,
      }]
  
-    def construct_video_urls(self, data1, data2):
+    def construct_video_urls(self, data):
          # get sid, token
          def yk_t(s1, s2):
              ls = list(range(256))
@@ -69,34 +92,26 @@ class YoukuIE(InfoExtractor):
              return bytes(s)
  
          sid, token = yk_t(
-            b'becaf9be', base64.b64decode(data2['ep'].encode('ascii'))
+            b'becaf9be', base64.b64decode(data['security']['encrypt_string'].encode('ascii'))
          ).decode('ascii').split('_')
  
          # get oip
-        oip = data2['ip']
-
-        # get fileid
-        string_ls = list(
-            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ/\:._-1234567890')
-        shuffled_string_ls = []
-        seed = data1['seed']
-        N = len(string_ls)
-        for ii in range(N):
-            seed = (seed * 0xd3 + 0x754f) % 0x10000
-            idx = seed * len(string_ls) // 0x10000
-            shuffled_string_ls.append(string_ls[idx])
-            del string_ls[idx]
+        oip = data['security']['ip']
  
          fileid_dict = {}
-        for format in data1['streamtypes']:
-            streamfileid = [
-                int(i) for i in data1['streamfileids'][format].strip('*').split('*')]
-            fileid = ''.join(
-                [shuffled_string_ls[i] for i in streamfileid])
-            fileid_dict[format] = fileid[:8] + '%s' + fileid[10:]
+        for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
+            format = stream.get('stream_type')
+            fileid = stream['stream_fileid']
+            fileid_dict[format] = fileid
  
          def get_fileid(format, n):
-            fileid = fileid_dict[format] % hex(int(n))[2:].upper().zfill(2)
+            number = hex(int(str(n), 10))[2:].upper()
+            if len(number) == 1:
+                number = '0' + number
+            streamfileids = fileid_dict[format]
+            fileid = streamfileids[0:8] + number + streamfileids[10:]
              return fileid
  
          # get ep
@@ -111,15 +126,17 @@ class YoukuIE(InfoExtractor):
  
          # generate video_urls
          video_urls_dict = {}
-        for format in data1['streamtypes']:
+        for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
+            format = stream.get('stream_type')
              video_urls = []
-            for dt in data1['segs'][format]:
-                n = str(int(dt['no']))
+            for dt in stream['segs']:
+                n = str(stream['segs'].index(dt))
                  param = {
-                    'K': dt['k'],
+                    'K': dt['key'],
                      'hd': self.get_hd(format),
                      'myp': 0,
-                    'ts': dt['seconds'],
                      'ypp': 0,
                      'ctype': 12,
                      'ev': 1,
@@ -130,34 +147,47 @@ class YoukuIE(InfoExtractor):
                  video_url = \
                      'http://k.youku.com/player/getFlvPath/' + \
                      'sid/' + sid + \
-                    '_' + str(int(n) + 1).zfill(2) + \
+                    '_00' + \
                      '/st/' + self.parse_ext_l(format) + \
                      '/fileid/' + get_fileid(format, n) + '?' + \
-                    compat_urllib_parse.urlencode(param)
+                    compat_urllib_parse_urlencode(param)
                  video_urls.append(video_url)
              video_urls_dict[format] = video_urls
  
          return video_urls_dict
  
+    @staticmethod
+    def get_ysuid():
+        return '%d%s' % (int(time.time()), ''.join([
+            random.choice(string.ascii_letters) for i in range(3)]))
+
      def get_hd(self, fm):
          hd_id_dict = {
+            '3gp': '0',
+            '3gphd': '1',
              'flv': '0',
+            'flvhd': '0',
              'mp4': '1',
+            'mp4hd': '1',
+            'mp4hd2': '1',
+            'mp4hd3': '1',
              'hd2': '2',
              'hd3': '3',
-            '3gp': '0',
-            '3gphd': '1'
          }
          return hd_id_dict[fm]
  
      def parse_ext_l(self, fm):
          ext_dict = {
+            '3gp': 'flv',
+            '3gphd': 'mp4',
              'flv': 'flv',
+            'flvhd': 'flv',
              'mp4': 'mp4',
+            'mp4hd': 'mp4',
+            'mp4hd2': 'flv',
+            'mp4hd3': 'flv',
              'hd2': 'flv',
              'hd3': 'flv',
-            '3gp': 'flv',
-            '3gphd': 'mp4'
          }
          return ext_dict[fm]
  
@@ -166,49 +196,65 @@ class YoukuIE(InfoExtractor):
              '3gp': 'h6',
              '3gphd': 'h5',
              'flv': 'h4',
+            'flvhd': 'h4',
              'mp4': 'h3',
+            'mp4hd': 'h3',
+            'mp4hd2': 'h4',
+            'mp4hd3': 'h4',
              'hd2': 'h2',
-            'hd3': 'h1'
+            'hd3': 'h1',
          }
          return _dict[fm]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
+        self._set_cookie('youku.com', '__ysuid', self.get_ysuid())
+
          def retrieve_data(req_url, note):
-            req = compat_urllib_request.Request(req_url)
+            headers = {
+                'Referer': req_url,
+            }
+            self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com')
+            req = sanitized_Request(req_url, headers=headers)
  
              cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
              if cn_verification_proxy:
                  req.add_header('Ytdl-request-proxy', cn_verification_proxy)
  
              raw_data = self._download_json(req, video_id, note=note)
-            return raw_data['data'][0]
+
+            return raw_data['data']
+
+        video_password = self._downloader.params.get('videopassword')
  
          # request basic data
-        data1 = retrieve_data(
-            'http://v.youku.com/player/getPlayList/VideoIDS/%s' % video_id,
-            'Downloading JSON metadata 1')
-        data2 = retrieve_data(
-            'http://v.youku.com/player/getPlayList/VideoIDS/%s/Pf/4/ctype/12/ev/1' % video_id,
-            'Downloading JSON metadata 2')
-
-        error_code = data1.get('error_code')
-        if error_code:
-            error = data1.get('error')
-            if error is not None and '因版权原因无法观看此视频' in error:
+        basic_data_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % video_id
+        if video_password:
+            basic_data_url += '&pwd=%s' % video_password
+
+        data = retrieve_data(basic_data_url, 'Downloading JSON metadata')
+
+        error = data.get('error')
+        if error:
+            error_note = error.get('note')
+            if error_note is not None and '因版权原因无法观看此视频' in error_note:
                  raise ExtractorError(
                      'Youku said: Sorry, this video is available in China only', expected=True)
+            elif error_note and '该视频被设为私密' in error_note:
+                raise ExtractorError(
+                    'Youku said: Sorry, this video is private', expected=True)
              else:
-                msg = 'Youku server reported error %i' % error_code
-                if error is not None:
-                    msg += ': ' + error
+                msg = 'Youku server reported error %i' % error.get('code')
+                if error_note is not None:
+                    msg += ': ' + error_note
                  raise ExtractorError(msg)
  
-        title = data1['title']
+        # get video title
+        title = data['video']['title']
  
          # generate video_urls_dict
-        video_urls_dict = self.construct_video_urls(data1, data2)
+        video_urls_dict = self.construct_video_urls(data)
  
          # construct info
          entries = [{
@@ -217,10 +263,13 @@ class YoukuIE(InfoExtractor):
              'formats': [],
              # some formats are not available for all parts, we have to detect
              # which one has all
-        } for i in range(max(len(v) for v in data1['segs'].values()))]
-        for fm in data1['streamtypes']:
+        } for i in range(max(len(v.get('segs')) for v in data['stream']))]
+        for stream in data['stream']:
+            if stream.get('channel_type') == 'tail':
+                continue
+            fm = stream.get('stream_type')
              video_urls = video_urls_dict[fm]
-            for video_url, seg, entry in zip(video_urls, data1['segs'][fm], entries):
+            for video_url, seg, entry in zip(video_urls, stream['segs'], entries):
                  entry['formats'].append({
                      'url': video_url,
                      'format_id': self.get_format_name(fm),
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index 4ba7c36db78fb457b63e05fe161a75b00383c78c..1124fe6c280cb0e23bee3a41ea323165ec714dce 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -1,121 +1,170 @@
  from __future__ import unicode_literals
  
-
-import json
  import re
-import sys
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlparse,
-    compat_urllib_request,
-)
  from ..utils import (
-    ExtractorError,
+    int_or_none,
+    sanitized_Request,
+    str_to_int,
      unescapeHTML,
      unified_strdate,
  )
-from ..aes import (
-    aes_decrypt_text
-)
+from ..aes import aes_decrypt_text
  
  
  class YouPornIE(InfoExtractor):
-    _VALID_URL = r'^(?P<proto>https?://)(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+    _TESTS = [{
          'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
+        'md5': '71ec5fcfddacf80f495efa8b6a8d9a89',
          'info_dict': {
              'id': '505835',
+            'display_id': 'sex-ed-is-it-safe-to-masturbate-daily',
              'ext': 'mp4',
-            'upload_date': '20101221',
+            'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
              'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
+            'thumbnail': 're:^https?://.*\.jpg$',
              'uploader': 'Ask Dan And Jennifer',
-            'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
+            'upload_date': '20101221',
+            'average_rating': int,
+            'view_count': int,
+            'comment_count': int,
+            'categories': list,
+            'tags': list,
              'age_limit': 18,
-        }
-    }
+        },
+    }, {
+        # Anonymous User uploader
+        'url': 'http://www.youporn.com/watch/561726/big-tits-awesome-brunette-on-amazing-webcam-show/?from=related3&al=2&from_id=561726&pos=4',
+        'info_dict': {
+            'id': '561726',
+            'display_id': 'big-tits-awesome-brunette-on-amazing-webcam-show',
+            'ext': 'mp4',
+            'title': 'Big Tits Awesome Brunette On amazing webcam show',
+            'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'Anonymous User',
+            'upload_date': '20111125',
+            'average_rating': int,
+            'view_count': int,
+            'comment_count': int,
+            'categories': list,
+            'tags': list,
+            'age_limit': 18,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('videoid')
-        url = mobj.group('proto') + 'www.' + mobj.group('url')
+        video_id = mobj.group('id')
+        display_id = mobj.group('display_id')
  
-        req = compat_urllib_request.Request(url)
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
-        age_limit = self._rta_search(webpage)
+        request = sanitized_Request(url)
+        request.add_header('Cookie', 'age_verified=1')
+        webpage = self._download_webpage(request, display_id)
+
+        title = self._search_regex(
+            [r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>.+?)\1',
+             r'<h1[^>]+class=["\']heading\d?["\'][^>]*>([^<])<'],
+            webpage, 'title', group='title')
  
-        # Get JSON parameters
-        json_params = self._search_regex(
-            [r'videoJa?son\s*=\s*({.+})',
-             r'var\s+currentVideo\s*=\s*new\s+Video\((.+?)\)[,;]'],
-            webpage, 'JSON parameters')
-        try:
-            params = json.loads(json_params)
-        except ValueError:
-            raise ExtractorError('Invalid JSON')
-
-        self.report_extraction(video_id)
-        try:
-            video_title = params['title']
-            upload_date = unified_strdate(params['release_date_f'])
-            video_description = params['description']
-            video_uploader = params['submitted_by']
-            thumbnail = params['thumbnails'][0]['image']
-        except KeyError:
-            raise ExtractorError('Missing JSON parameter: ' + sys.exc_info()[1])
-
-        # Get all of the links from the page
-        DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>'
-        download_list_html = self._search_regex(DOWNLOAD_LIST_RE,
-                                                webpage, 'download list').strip()
-        LINK_RE = r'<a href="([^"]+)">'
-        links = re.findall(LINK_RE, download_list_html)
-
-        # Get all encrypted links
-        encrypted_links = re.findall(r'var encryptedQuality[0-9]{3}URL = \'([a-zA-Z0-9+/]+={0,2})\';', webpage)
-        for encrypted_link in encrypted_links:
-            link = aes_decrypt_text(encrypted_link, video_title, 32).decode('utf-8')
+        links = []
+
+        sources = self._search_regex(
+            r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None)
+        if sources:
+            for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources):
+                links.append(link)
+
+        # Fallback #1
+        for _, link in re.findall(
+                r'(?:videoUrl|videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage):
+            links.append(link)
+
+        # Fallback #2, this also contains extra low quality 180p format
+        for _, link in re.findall(r'<a[^>]+href=(["\'])(http.+?)\1[^>]+title=["\']Download [Vv]ideo', webpage):
              links.append(link)
  
+        # Fallback #3, encrypted links
+        for _, encrypted_link in re.findall(
+                r'encryptedQuality\d{3,4}URL\s*=\s*(["\'])([\da-zA-Z+/=]+)\1', webpage):
+            links.append(aes_decrypt_text(encrypted_link, title, 32).decode('utf-8'))
+
          formats = []
-        for link in links:
-            # A link looks like this:
-            # http://cdn1.download.youporn.phncdn.com/201210/31/8004515/480p_370k_8004515/YouPorn%20-%20Nubile%20Films%20The%20Pillow%20Fight.mp4?nvb=20121113051249&nva=20121114051249&ir=1200&sr=1200&hash=014b882080310e95fb6a0
-            # A path looks like this:
-            # /201210/31/8004515/480p_370k_8004515/YouPorn%20-%20Nubile%20Films%20The%20Pillow%20Fight.mp4
-            video_url = unescapeHTML(link)
-            path = compat_urllib_parse_urlparse(video_url).path
-            format_parts = path.split('/')[4].split('_')[:2]
-
-            dn = compat_urllib_parse_urlparse(video_url).netloc.partition('.')[0]
-
-            resolution = format_parts[0]
-            height = int(resolution[:-len('p')])
-            bitrate = int(format_parts[1][:-len('k')])
-            format = '-'.join(format_parts) + '-' + dn
-
-            formats.append({
+        for video_url in set(unescapeHTML(link) for link in links):
+            f = {
                  'url': video_url,
-                'format': format,
-                'format_id': format,
-                'height': height,
-                'tbr': bitrate,
-                'resolution': resolution,
-            })
-
+            }
+            # Video URL's path looks like this:
+            #  /201012/17/505835/720p_1500k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
+            #  /201012/17/505835/vl_240p_240k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
+            # We will benefit from it by extracting some metadata
+            mobj = re.search(r'(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+/', video_url)
+            if mobj:
+                height = int(mobj.group('height'))
+                bitrate = int(mobj.group('bitrate'))
+                f.update({
+                    'format_id': '%dp-%dk' % (height, bitrate),
+                    'height': height,
+                    'tbr': bitrate,
+                })
+            formats.append(f)
          self._sort_formats(formats)
  
-        if not formats:
-            raise ExtractorError('ERROR: no known formats available for video')
+        description = self._og_search_description(webpage, default=None)
+        thumbnail = self._search_regex(
+            r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
+            webpage, 'thumbnail', fatal=False, group='thumbnail')
+
+        uploader = self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']videoInfoBy(?:\s+[^"\']+)?["\'][^>]*>\s*By:\s*</div>(.+?)</(?:a|div)>',
+            webpage, 'uploader', fatal=False)
+        upload_date = unified_strdate(self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']videoInfoTime["\'][^>]*>(.+?)</div>',
+            webpage, 'upload date', fatal=False))
+
+        age_limit = self._rta_search(webpage)
+
+        average_rating = int_or_none(self._search_regex(
+            r'<div[^>]+class=["\']videoInfoRating["\'][^>]*>\s*<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
+            webpage, 'average rating', fatal=False))
+
+        view_count = str_to_int(self._search_regex(
+            r'(?s)<div[^>]+class=["\']videoInfoViews["\'][^>]*>.*?([\d,.]+)\s*</div>',
+            webpage, 'view count', fatal=False))
+        comment_count = str_to_int(self._search_regex(
+            r'>All [Cc]omments? \(([\d,.]+)\)',
+            webpage, 'comment count', fatal=False))
+
+        def extract_tag_box(title):
+            tag_box = self._search_regex(
+                (r'<div[^>]+class=["\']tagBoxTitle["\'][^>]*>\s*%s\b.*?</div>\s*'
+                 '<div[^>]+class=["\']tagBoxContent["\']>(.+?)</div>') % re.escape(title),
+                webpage, '%s tag box' % title, default=None)
+            if not tag_box:
+                return []
+            return re.findall(r'<a[^>]+href=[^>]+>([^<]+)', tag_box)
+
+        categories = extract_tag_box('Category')
+        tags = extract_tag_box('Tags')
  
          return {
              'id': video_id,
-            'uploader': video_uploader,
-            'upload_date': upload_date,
-            'title': video_title,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
              'thumbnail': thumbnail,
-            'description': video_description,
+            'uploader': uploader,
+            'upload_date': upload_date,
+            'average_rating': average_rating,
+            'view_count': view_count,
+            'comment_count': comment_count,
+            'categories': categories,
+            'tags': tags,
              'age_limit': age_limit,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index 67a1df9a0a1bebeaea4577411ff0c65f99d0166f..44f98d294909a75f44f9c01e3a2ce0e7c66d86b5 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -6,6 +6,7 @@ from __future__ import unicode_literals
  import itertools
  import json
  import os.path
+import random
  import re
  import time
  import traceback
@@ -16,29 +17,34 @@ from ..swfinterp import SWFInterpreter
  from ..compat import (
      compat_chr,
      compat_parse_qs,
-    compat_urllib_parse,
      compat_urllib_parse_unquote,
      compat_urllib_parse_unquote_plus,
+    compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
-    compat_urllib_request,
      compat_urlparse,
      compat_str,
  )
  from ..utils import (
      clean_html,
+    error_to_compat_str,
      ExtractorError,
      float_or_none,
      get_element_by_attribute,
      get_element_by_id,
      int_or_none,
+    mimetype2ext,
      orderedSet,
      parse_duration,
+    remove_quotes,
+    remove_start,
+    sanitized_Request,
      smuggle_url,
      str_to_int,
      unescapeHTML,
      unified_strdate,
      unsmuggle_url,
      uppercase_escape,
+    urlencode_postdata,
      ISO3166Utils,
  )
  
@@ -46,7 +52,7 @@ from ..utils import (
  class YoutubeBaseInfoExtractor(InfoExtractor):
      """Provide base functions for Youtube extractors"""
      _LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
-    _TWOFACTOR_URL = 'https://accounts.google.com/SecondFactor'
+    _TWOFACTOR_URL = 'https://accounts.google.com/signin/challenge'
      _NETRC_MACHINE = 'youtube'
      # If True it will raise an error if no login info is provided
      _LOGIN_REQUIRED = False
@@ -110,62 +116,48 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
              'hl': 'en_US',
          }
  
-        # Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
-        # chokes on unicode
-        login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k, v in login_form_strs.items())
-        login_data = compat_urllib_parse.urlencode(login_form).encode('ascii')
+        login_data = urlencode_postdata(login_form_strs)
  
-        req = compat_urllib_request.Request(self._LOGIN_URL, login_data)
+        req = sanitized_Request(self._LOGIN_URL, login_data)
          login_results = self._download_webpage(
              req, None,
              note='Logging in', errnote='unable to log in', fatal=False)
          if login_results is False:
              return False
  
+        error_msg = self._html_search_regex(
+            r'<[^>]+id="errormsg_0_Passwd"[^>]*>([^<]+)<',
+            login_results, 'error message', default=None)
+        if error_msg:
+            raise ExtractorError('Unable to login: %s' % error_msg, expected=True)
+
          if re.search(r'id="errormsg_0_Passwd"', login_results) is not None:
              raise ExtractorError('Please use your account password and a two-factor code instead of an application-specific password.', expected=True)
  
          # Two-Factor
          # TODO add SMS and phone call support - these require making a request and then prompting the user
  
-        if re.search(r'(?i)<form[^>]* id="gaia_secondfactorform"', login_results) is not None:
-            tfa_code = self._get_tfa_info()
+        if re.search(r'(?i)<form[^>]* id="challenge"', login_results) is not None:
+            tfa_code = self._get_tfa_info('2-step verification code')
  
-            if tfa_code is None:
-                self._downloader.report_warning('Two-factor authentication required. Provide it with --twofactor <code>')
-                self._downloader.report_warning('(Note that only TOTP (Google Authenticator App) codes work at this time.)')
+            if not tfa_code:
+                self._downloader.report_warning(
+                    'Two-factor authentication required. Provide it either interactively or with --twofactor <code>'
+                    '(Note that only TOTP (Google Authenticator App) codes work at this time.)')
                  return False
  
-            # Unlike the first login form, secTok and timeStmp are both required for the TFA form
-
-            match = re.search(r'id="secTok"\n\s+value=\'(.+)\'/>', login_results, re.M | re.U)
-            if match is None:
-                self._downloader.report_warning('Failed to get secTok - did the page structure change?')
-            secTok = match.group(1)
-            match = re.search(r'id="timeStmp"\n\s+value=\'(.+)\'/>', login_results, re.M | re.U)
-            if match is None:
-                self._downloader.report_warning('Failed to get timeStmp - did the page structure change?')
-            timeStmp = match.group(1)
-
-            tfa_form_strs = {
-                'continue': 'https://www.youtube.com/signin?action_handle_signin=true&feature=sign_in_button&hl=en_US&nomobiletemp=1',
-                'smsToken': '',
-                'smsUserPin': tfa_code,
-                'smsVerifyPin': 'Verify',
-
-                'PersistentCookie': 'yes',
-                'checkConnection': '',
-                'checkedDomains': 'youtube',
-                'pstMsg': '1',
-                'secTok': secTok,
-                'timeStmp': timeStmp,
-                'service': 'youtube',
-                'hl': 'en_US',
-            }
-            tfa_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k, v in tfa_form_strs.items())
-            tfa_data = compat_urllib_parse.urlencode(tfa_form).encode('ascii')
+            tfa_code = remove_start(tfa_code, 'G-')
+
+            tfa_form_strs = self._form_hidden_inputs('challenge', login_results)
+
+            tfa_form_strs.update({
+                'Pin': tfa_code,
+                'TrustDevice': 'on',
+            })
+
+            tfa_data = urlencode_postdata(tfa_form_strs)
  
-            tfa_req = compat_urllib_request.Request(self._TWOFACTOR_URL, tfa_data)
+            tfa_req = sanitized_Request(self._TWOFACTOR_URL, tfa_data)
              tfa_results = self._download_webpage(
                  tfa_req, None,
                  note='Submitting TFA code', errnote='unable to submit tfa', fatal=False)
@@ -173,8 +165,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
              if tfa_results is False:
                  return False
  
-            if re.search(r'(?i)<form[^>]* id="gaia_secondfactorform"', tfa_results) is not None:
-                self._downloader.report_warning('Two-factor code expired. Please try again, or use a one-use backup code instead.')
+            if re.search(r'(?i)<form[^>]* id="challenge"', tfa_results) is not None:
+                self._downloader.report_warning('Two-factor code expired or invalid. Please try again, or use a one-use backup code instead.')
                  return False
              if re.search(r'(?i)<form[^>]* id="gaia_loginform"', tfa_results) is not None:
                  self._downloader.report_warning('unable to log in - did the page structure change?')
@@ -196,6 +188,71 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
              return
  
  
+class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
+    # Extract entries from page with "Load more" button
+    def _entries(self, page, playlist_id):
+        more_widget_html = content_html = page
+        for page_num in itertools.count(1):
+            for entry in self._process_page(content_html):
+                yield entry
+
+            mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
+            if not mobj:
+                break
+
+            more = self._download_json(
+                'https://youtube.com/%s' % mobj.group('more'), playlist_id,
+                'Downloading page #%s' % page_num,
+                transform_source=uppercase_escape)
+            content_html = more['content_html']
+            if not content_html.strip():
+                # Some webpages show a "Load more" button but they don't
+                # have more videos
+                break
+            more_widget_html = more['load_more_widget_html']
+
+
+class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+    def _process_page(self, content):
+        for video_id, video_title in self.extract_videos_from_page(content):
+            yield self.url_result(video_id, 'Youtube', video_id, video_title)
+
+    def extract_videos_from_page(self, page):
+        ids_in_page = []
+        titles_in_page = []
+        for mobj in re.finditer(self._VIDEO_RE, page):
+            # The link with index 0 is not the first video of the playlist (not sure if still actual)
+            if 'index' in mobj.groupdict() and mobj.group('id') == '0':
+                continue
+            video_id = mobj.group('id')
+            video_title = unescapeHTML(mobj.group('title'))
+            if video_title:
+                video_title = video_title.strip()
+            try:
+                idx = ids_in_page.index(video_id)
+                if video_title and not titles_in_page[idx]:
+                    titles_in_page[idx] = video_title
+            except ValueError:
+                ids_in_page.append(video_id)
+                titles_in_page.append(video_title)
+        return zip(ids_in_page, titles_in_page)
+
+
+class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
+    def _process_page(self, content):
+        for playlist_id in orderedSet(re.findall(
+                r'<h3[^>]+class="[^"]*yt-lockup-title[^"]*"[^>]*><a[^>]+href="/?playlist\?list=([0-9A-Za-z-_]{10,})"',
+                content)):
+            yield self.url_result(
+                'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist')
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        webpage = self._download_webpage(url, playlist_id)
+        title = self._og_search_title(webpage, fatal=False)
+        return self.playlist_result(self._entries(webpage, playlist_id), playlist_id, title)
+
+
  class YoutubeIE(YoutubeBaseInfoExtractor):
      IE_DESC = 'YouTube.com'
      _VALID_URL = r"""(?x)^
@@ -213,11 +270,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                               |(?:                                             # or the v= param in all its forms
                                   (?:(?:watch|movie)(?:_popup)?(?:\.php)?/?)?  # preceding watch(_popup|.php) or nothing (like /?v=xxxx)
                                   (?:\?|\#!?)                                  # the params delimiter ? or # or #!
-                                 (?:.*?&)?                                    # any other preceding param (like /?s=tuff&v=xxxx)
+                                 (?:.*?[&;])??                                # any other preceding param (like /?s=tuff&v=xxxx or ?s=tuff&amp;v=V36LpHqtcDY)
                                   v=
                               )
                           ))
-                         |youtu\.be/                                          # just youtu.be/xxxx
+                         |(?:
+                            youtu\.be|                                        # just youtu.be/xxxx
+                            vid\.plus|                                        # or vid.plus/xxxx
+                            zwearz\.com/watch|                                # or zwearz.com/watch/xxxx
+                         )/
                           |(?:www\.)?cleanvideosearch\.com/media/action/yt/watch\?videoId=
                           )
                       )?                                                       # all until now is optional -> you can pass the naked ID
@@ -227,108 +288,114 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                       $"""
      _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
      _formats = {
-        '5': {'ext': 'flv', 'width': 400, 'height': 240},
-        '6': {'ext': 'flv', 'width': 450, 'height': 270},
-        '13': {'ext': '3gp'},
-        '17': {'ext': '3gp', 'width': 176, 'height': 144},
-        '18': {'ext': 'mp4', 'width': 640, 'height': 360},
-        '22': {'ext': 'mp4', 'width': 1280, 'height': 720},
-        '34': {'ext': 'flv', 'width': 640, 'height': 360},
-        '35': {'ext': 'flv', 'width': 854, 'height': 480},
-        '36': {'ext': '3gp', 'width': 320, 'height': 240},
-        '37': {'ext': 'mp4', 'width': 1920, 'height': 1080},
-        '38': {'ext': 'mp4', 'width': 4096, 'height': 3072},
-        '43': {'ext': 'webm', 'width': 640, 'height': 360},
-        '44': {'ext': 'webm', 'width': 854, 'height': 480},
-        '45': {'ext': 'webm', 'width': 1280, 'height': 720},
-        '46': {'ext': 'webm', 'width': 1920, 'height': 1080},
-        '59': {'ext': 'mp4', 'width': 854, 'height': 480},
-        '78': {'ext': 'mp4', 'width': 854, 'height': 480},
-
-
-        # 3d videos
-        '82': {'ext': 'mp4', 'height': 360, 'format_note': '3D', 'preference': -20},
-        '83': {'ext': 'mp4', 'height': 480, 'format_note': '3D', 'preference': -20},
-        '84': {'ext': 'mp4', 'height': 720, 'format_note': '3D', 'preference': -20},
-        '85': {'ext': 'mp4', 'height': 1080, 'format_note': '3D', 'preference': -20},
-        '100': {'ext': 'webm', 'height': 360, 'format_note': '3D', 'preference': -20},
-        '101': {'ext': 'webm', 'height': 480, 'format_note': '3D', 'preference': -20},
-        '102': {'ext': 'webm', 'height': 720, 'format_note': '3D', 'preference': -20},
+        '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
+        '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
+        '13': {'ext': '3gp', 'acodec': 'aac', 'vcodec': 'mp4v'},
+        '17': {'ext': '3gp', 'width': 176, 'height': 144, 'acodec': 'aac', 'abr': 24, 'vcodec': 'mp4v'},
+        '18': {'ext': 'mp4', 'width': 640, 'height': 360, 'acodec': 'aac', 'abr': 96, 'vcodec': 'h264'},
+        '22': {'ext': 'mp4', 'width': 1280, 'height': 720, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
+        '34': {'ext': 'flv', 'width': 640, 'height': 360, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
+        '35': {'ext': 'flv', 'width': 854, 'height': 480, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
+        # itag 36 videos are either 320x180 (BaW_jenozKc) or 320x240 (__2ABJjxzNo), abr varies as well
+        '36': {'ext': '3gp', 'width': 320, 'acodec': 'aac', 'vcodec': 'mp4v'},
+        '37': {'ext': 'mp4', 'width': 1920, 'height': 1080, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
+        '38': {'ext': 'mp4', 'width': 4096, 'height': 3072, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
+        '43': {'ext': 'webm', 'width': 640, 'height': 360, 'acodec': 'vorbis', 'abr': 128, 'vcodec': 'vp8'},
+        '44': {'ext': 'webm', 'width': 854, 'height': 480, 'acodec': 'vorbis', 'abr': 128, 'vcodec': 'vp8'},
+        '45': {'ext': 'webm', 'width': 1280, 'height': 720, 'acodec': 'vorbis', 'abr': 192, 'vcodec': 'vp8'},
+        '46': {'ext': 'webm', 'width': 1920, 'height': 1080, 'acodec': 'vorbis', 'abr': 192, 'vcodec': 'vp8'},
+        '59': {'ext': 'mp4', 'width': 854, 'height': 480, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
+        '78': {'ext': 'mp4', 'width': 854, 'height': 480, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
+
+
+        # 3D videos
+        '82': {'ext': 'mp4', 'height': 360, 'format_note': '3D', 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264', 'preference': -20},
+        '83': {'ext': 'mp4', 'height': 480, 'format_note': '3D', 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264', 'preference': -20},
+        '84': {'ext': 'mp4', 'height': 720, 'format_note': '3D', 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264', 'preference': -20},
+        '85': {'ext': 'mp4', 'height': 1080, 'format_note': '3D', 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264', 'preference': -20},
+        '100': {'ext': 'webm', 'height': 360, 'format_note': '3D', 'acodec': 'vorbis', 'abr': 128, 'vcodec': 'vp8', 'preference': -20},
+        '101': {'ext': 'webm', 'height': 480, 'format_note': '3D', 'acodec': 'vorbis', 'abr': 192, 'vcodec': 'vp8', 'preference': -20},
+        '102': {'ext': 'webm', 'height': 720, 'format_note': '3D', 'acodec': 'vorbis', 'abr': 192, 'vcodec': 'vp8', 'preference': -20},
  
          # Apple HTTP Live Streaming
-        '92': {'ext': 'mp4', 'height': 240, 'format_note': 'HLS', 'preference': -10},
-        '93': {'ext': 'mp4', 'height': 360, 'format_note': 'HLS', 'preference': -10},
-        '94': {'ext': 'mp4', 'height': 480, 'format_note': 'HLS', 'preference': -10},
-        '95': {'ext': 'mp4', 'height': 720, 'format_note': 'HLS', 'preference': -10},
-        '96': {'ext': 'mp4', 'height': 1080, 'format_note': 'HLS', 'preference': -10},
-        '132': {'ext': 'mp4', 'height': 240, 'format_note': 'HLS', 'preference': -10},
-        '151': {'ext': 'mp4', 'height': 72, 'format_note': 'HLS', 'preference': -10},
+        '91': {'ext': 'mp4', 'height': 144, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 48, 'vcodec': 'h264', 'preference': -10},
+        '92': {'ext': 'mp4', 'height': 240, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 48, 'vcodec': 'h264', 'preference': -10},
+        '93': {'ext': 'mp4', 'height': 360, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264', 'preference': -10},
+        '94': {'ext': 'mp4', 'height': 480, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264', 'preference': -10},
+        '95': {'ext': 'mp4', 'height': 720, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 256, 'vcodec': 'h264', 'preference': -10},
+        '96': {'ext': 'mp4', 'height': 1080, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 256, 'vcodec': 'h264', 'preference': -10},
+        '132': {'ext': 'mp4', 'height': 240, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 48, 'vcodec': 'h264', 'preference': -10},
+        '151': {'ext': 'mp4', 'height': 72, 'format_note': 'HLS', 'acodec': 'aac', 'abr': 24, 'vcodec': 'h264', 'preference': -10},
  
          # DASH mp4 video
-        '133': {'ext': 'mp4', 'height': 240, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '134': {'ext': 'mp4', 'height': 360, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '135': {'ext': 'mp4', 'height': 480, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '136': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '137': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '138': {'ext': 'mp4', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},  # Height can vary (https://github.com/rg3/youtube-dl/issues/4559)
-        '160': {'ext': 'mp4', 'height': 144, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '264': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '298': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'h264'},
-        '299': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'h264'},
-        '266': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'vcodec': 'h264'},
+        '133': {'ext': 'mp4', 'height': 240, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '134': {'ext': 'mp4', 'height': 360, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '135': {'ext': 'mp4', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '136': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '137': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '138': {'ext': 'mp4', 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},  # Height can vary (https://github.com/rg3/youtube-dl/issues/4559)
+        '160': {'ext': 'mp4', 'height': 144, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '264': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '298': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
+        '299': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
+        '266': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
  
          # Dash mp4 audio
-        '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'vcodec': 'none', 'abr': 48, 'preference': -50, 'container': 'm4a_dash'},
-        '140': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'vcodec': 'none', 'abr': 128, 'preference': -50, 'container': 'm4a_dash'},
-        '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'vcodec': 'none', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'},
+        '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 48, 'preference': -50, 'container': 'm4a_dash'},
+        '140': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 128, 'preference': -50, 'container': 'm4a_dash'},
+        '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'},
  
          # Dash webm
-        '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
-        '278': {'ext': 'webm', 'height': 144, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'container': 'webm', 'vcodec': 'vp9'},
-        '242': {'ext': 'webm', 'height': 240, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '243': {'ext': 'webm', 'height': 360, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '244': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '245': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '246': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '247': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '248': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '271': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '272': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
-        '302': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
-        '303': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
-        '308': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
-        '313': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'vcodec': 'vp9'},
-        '315': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40, 'fps': 60, 'vcodec': 'vp9'},
+        '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},
+        '278': {'ext': 'webm', 'height': 144, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp9', 'preference': -40},
+        '242': {'ext': 'webm', 'height': 240, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '243': {'ext': 'webm', 'height': 360, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '244': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '245': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '246': {'ext': 'webm', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '247': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '248': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '271': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        # itag 272 videos are either 3840x2160 (e.g. RtoitU2A-3E) or 7680x4320 (sLprVF6d7Ug)
+        '272': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '302': {'ext': 'webm', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'vp9', 'fps': 60, 'preference': -40},
+        '303': {'ext': 'webm', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'vp9', 'fps': 60, 'preference': -40},
+        '308': {'ext': 'webm', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'vp9', 'fps': 60, 'preference': -40},
+        '313': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'vp9', 'preference': -40},
+        '315': {'ext': 'webm', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'vp9', 'fps': 60, 'preference': -40},
  
          # Dash webm audio
-        '171': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 128, 'preference': -50},
-        '172': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'abr': 256, 'preference': -50},
+        '171': {'ext': 'webm', 'acodec': 'vorbis', 'format_note': 'DASH audio', 'abr': 128, 'preference': -50},
+        '172': {'ext': 'webm', 'acodec': 'vorbis', 'format_note': 'DASH audio', 'abr': 256, 'preference': -50},
  
          # Dash webm audio with opus inside
-        '249': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 50, 'preference': -50},
-        '250': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 70, 'preference': -50},
-        '251': {'ext': 'webm', 'vcodec': 'none', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 160, 'preference': -50},
+        '249': {'ext': 'webm', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 50, 'preference': -50},
+        '250': {'ext': 'webm', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 70, 'preference': -50},
+        '251': {'ext': 'webm', 'format_note': 'DASH audio', 'acodec': 'opus', 'abr': 160, 'preference': -50},
  
          # RTMP (unnamed)
          '_rtmp': {'protocol': 'rtmp'},
      }
+    _SUBTITLE_FORMATS = ('ttml', 'vtt')
  
      IE_NAME = 'youtube'
      _TESTS = [
          {
-            'url': 'http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9',
+            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
              'info_dict': {
                  'id': 'BaW_jenozKc',
                  'ext': 'mp4',
                  'title': 'youtube-dl test video "\'/\\ä↭𝕐',
                  'uploader': 'Philipp Hagemeister',
                  'uploader_id': 'phihag',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
                  'upload_date': '20121002',
+                'license': 'Standard YouTube License',
                  'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
                  'categories': ['Science & Technology'],
                  'tags': ['youtube-dl'],
@@ -346,12 +413,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'upload_date': '20120506',
                  'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
-                'description': 'md5:782e8651347686cba06e58f71ab51773',
+                'alt_title': 'I Love It (feat. Charli XCX)',
+                'description': 'md5:f3ceb5ef83a08d95b9d146f973157cc8',
                  'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                           'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                           'iconic ep', 'iconic', 'love', 'it'],
                  'uploader': 'Icona Pop',
                  'uploader_id': 'IconaPop',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IconaPop',
+                'license': 'Standard YouTube License',
+                'creator': 'Icona Pop',
              }
          },
          {
@@ -362,9 +433,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'upload_date': '20130703',
                  'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
+                'alt_title': 'Tunnel Vision',
                  'description': 'md5:64249768eec3bc4276236606ea996373',
                  'uploader': 'justintimberlakeVEVO',
                  'uploader_id': 'justintimberlakeVEVO',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
+                'license': 'Standard YouTube License',
+                'creator': 'Justin Timberlake',
+                'age_limit': 18,
              }
          },
          {
@@ -377,9 +453,34 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Principal Sexually Assaults A Teacher - Episode 117 - 8th June 2012',
                  'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
                  'uploader': 'SET India',
-                'uploader_id': 'setindia'
+                'uploader_id': 'setindia',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/setindia',
+                'license': 'Standard YouTube License',
+                'age_limit': 18,
              }
          },
+        {
+            'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
+            'note': 'Use the first video ID in the URL',
+            'info_dict': {
+                'id': 'BaW_jenozKc',
+                'ext': 'mp4',
+                'title': 'youtube-dl test video "\'/\\ä↭𝕐',
+                'uploader': 'Philipp Hagemeister',
+                'uploader_id': 'phihag',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
+                'upload_date': '20121002',
+                'license': 'Standard YouTube License',
+                'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
+                'categories': ['Science & Technology'],
+                'tags': ['youtube-dl'],
+                'like_count': int,
+                'dislike_count': int,
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
          {
              'url': 'http://www.youtube.com/watch?v=a9LDPn-MO4I',
              'note': '256k DASH audio (format 141) via DASH manifest',
@@ -388,8 +489,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'm4a',
                  'upload_date': '20121002',
                  'uploader_id': '8KVIDEO',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
                  'description': '',
                  'uploader': '8KVIDEO',
+                'license': 'Standard YouTube License',
                  'title': 'UHDTV TEST 8K VIDEO.mp4'
              },
              'params': {
@@ -408,6 +511,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'uploader': 'AfrojackVEVO',
                  'uploader_id': 'AfrojackVEVO',
                  'upload_date': '20131011',
+                'license': 'Standard YouTube License',
              },
              'params': {
                  'youtube_include_dash_manifest': True,
@@ -421,10 +525,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': 'nfWlot6h_JM',
                  'ext': 'm4a',
                  'title': 'Taylor Swift - Shake It Off',
-                'description': 'md5:2acfda1b285bdd478ccec22f9918199d',
+                'alt_title': 'Shake It Off',
+                'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
                  'uploader': 'TaylorSwiftVEVO',
                  'uploader_id': 'TaylorSwiftVEVO',
                  'upload_date': '20140818',
+                'license': 'Standard YouTube License',
+                'creator': 'Taylor Swift',
              },
              'params': {
                  'youtube_include_dash_manifest': True,
@@ -440,6 +547,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'upload_date': '20100909',
                  'uploader': 'The Amazing Atheist',
                  'uploader_id': 'TheAmazingAtheist',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
+                'license': 'Standard YouTube License',
                  'title': 'Burning Everyone\'s Koran',
                  'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
              }
@@ -454,7 +563,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
                  'uploader': 'The Witcher',
                  'uploader_id': 'WitcherGame',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
                  'upload_date': '20140605',
+                'license': 'Standard YouTube License',
+                'age_limit': 18,
              },
          },
          # Age-gate video with encrypted signature
@@ -467,7 +579,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
                  'uploader': 'LloydVEVO',
                  'uploader_id': 'LloydVEVO',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
                  'upload_date': '20110629',
+                'license': 'Standard YouTube License',
+                'age_limit': 18,
              },
          },
          # video_info is None (https://github.com/rg3/youtube-dl/issues/4421)
@@ -478,9 +593,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'upload_date': '20100430',
                  'uploader_id': 'deadmau5',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/deadmau5',
+                'creator': 'deadmau5',
                  'description': 'md5:12c56784b8032162bb936a5f76d55360',
                  'uploader': 'deadmau5',
+                'license': 'Standard YouTube License',
                  'title': 'Deadmau5 - Some Chords (HD)',
+                'alt_title': 'Some Chords',
              },
              'expected_warnings': [
                  'DASH manifest missing',
@@ -492,8 +611,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'lqQg6PlCWgI',
                  'ext': 'mp4',
-                'upload_date': '20120731',
+                'upload_date': '20150827',
                  'uploader_id': 'olympic',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
+                'license': 'Standard YouTube License',
                  'description': 'HO09  - Women -  GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
                  'uploader': 'Olympics',
                  'title': 'Hockey - Women -  GER-AUS - London 2012 Olympic Games',
@@ -511,8 +632,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'stretched_ratio': 16 / 9.,
                  'upload_date': '20110310',
                  'uploader_id': 'AllenMeow',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
                  'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
                  'uploader': '孫艾倫',
+                'license': 'Standard YouTube License',
                  'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
              },
          },
@@ -521,7 +644,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'url': 'qEJwOuvDf7I',
              'info_dict': {
                  'id': 'qEJwOuvDf7I',
-                'ext': 'mp4',
+                'ext': 'webm',
                  'title': 'Обсуждение судебной практики по выборам 14 сентября 2014 года в Санкт-Петербурге',
                  'description': '',
                  'upload_date': '20150404',
@@ -530,7 +653,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              },
              'params': {
                  'skip_download': 'requires avconv',
-            }
+            },
+            'skip': 'This live event has ended.',
          },
          # Extraction from multiple DASH manifests (https://github.com/rg3/youtube-dl/pull/6097)
          {
@@ -542,7 +666,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'description': 'md5:116377fd2963b81ec4ce64b542173306',
                  'upload_date': '20150625',
                  'uploader_id': 'dorappi2000',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
                  'uploader': 'dorappi2000',
+                'license': 'Standard YouTube License',
                  'formats': 'mincount:33',
              },
          },
@@ -557,6 +683,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'uploader': 'Airtek',
                  'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.',
                  'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ',
+                'license': 'Standard YouTube License',
                  'title': 'Retransmisión XVIII Media maratón Zaragoza 2015',
              },
              'params': {
@@ -581,6 +708,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
+                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'license': 'Standard YouTube License',
                  },
              }, {
                  'info_dict': {
@@ -591,6 +720,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
+                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'license': 'Standard YouTube License',
                  },
              }, {
                  'info_dict': {
@@ -601,6 +732,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
+                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'license': 'Standard YouTube License',
                  },
              }, {
                  'info_dict': {
@@ -611,11 +744,114 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
+                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'license': 'Standard YouTube License',
                  },
              }],
              'params': {
                  'skip_download': True,
              },
+        },
+        {
+            # Multifeed video with comma in title (see https://github.com/rg3/youtube-dl/issues/8536)
+            'url': 'https://www.youtube.com/watch?v=gVfLd0zydlo',
+            'info_dict': {
+                'id': 'gVfLd0zydlo',
+                'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
+            },
+            'playlist_count': 2,
+        },
+        {
+            'url': 'http://vid.plus/FlRa-iH7PGw',
+            'only_matching': True,
+        },
+        {
+            'url': 'http://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
+            'only_matching': True,
+        },
+        {
+            # Title with JS-like syntax "};" (see https://github.com/rg3/youtube-dl/issues/7468)
+            # Also tests cut-off URL expansion in video description (see
+            # https://github.com/rg3/youtube-dl/issues/1892,
+            # https://github.com/rg3/youtube-dl/issues/8164)
+            'url': 'https://www.youtube.com/watch?v=lsguqyKfVQg',
+            'info_dict': {
+                'id': 'lsguqyKfVQg',
+                'ext': 'mp4',
+                'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
+                'alt_title': 'Dark Walk',
+                'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
+                'upload_date': '20151119',
+                'uploader_id': 'IronSoulElf',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
+                'uploader': 'IronSoulElf',
+                'license': 'Standard YouTube License',
+                'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            # Tags with '};' (see https://github.com/rg3/youtube-dl/issues/7468)
+            'url': 'https://www.youtube.com/watch?v=Ms7iBXnlUO8',
+            'only_matching': True,
+        },
+        {
+            # Video with yt:stretch=17:0
+            'url': 'https://www.youtube.com/watch?v=Q39EVAstoRM',
+            'info_dict': {
+                'id': 'Q39EVAstoRM',
+                'ext': 'mp4',
+                'title': 'Clash Of Clans#14 Dicas De Ataque Para CV 4',
+                'description': 'md5:ee18a25c350637c8faff806845bddee9',
+                'upload_date': '20151107',
+                'uploader_id': 'UCCr7TALkRbo3EtFzETQF1LA',
+                'uploader': 'CH GAMER DROID',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            # Video licensed under Creative Commons
+            'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
+            'info_dict': {
+                'id': 'M4gD1WSo5mA',
+                'ext': 'mp4',
+                'title': 'md5:e41008789470fc2533a3252216f1c1d1',
+                'description': 'md5:a677553cf0840649b731a3024aeff4cc',
+                'upload_date': '20150127',
+                'uploader_id': 'BerkmanCenter',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
+                'uploader': 'BerkmanCenter',
+                'license': 'Creative Commons Attribution license (reuse allowed)',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            # Channel-like uploader_url
+            'url': 'https://www.youtube.com/watch?v=eQcmzGIKrzg',
+            'info_dict': {
+                'id': 'eQcmzGIKrzg',
+                'ext': 'mp4',
+                'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders',
+                'description': 'md5:dda0d780d5a6e120758d1711d062a867',
+                'upload_date': '20151119',
+                'uploader': 'Bernie 2016',
+                'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
+                'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
+                'license': 'Creative Commons Attribution license (reuse allowed)',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
+        {
+            'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY',
+            'only_matching': True,
          }
      ]
  
@@ -645,7 +881,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
  
      def _extract_signature_function(self, video_id, player_url, example_sig):
          id_m = re.match(
-            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?\.(?P<ext>[a-z]+)$',
+            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|/base)?\.(?P<ext>[a-z]+)$',
              player_url)
          if not id_m:
              raise ExtractorError('Cannot identify player %r' % player_url)
@@ -774,7 +1010,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'https://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id,
                  video_id, note=False)
          except ExtractorError as err:
-            self._downloader.report_warning('unable to download video subtitles: %s' % compat_str(err))
+            self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
              return {}
  
          sub_lang_list = {}
@@ -783,8 +1019,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              if lang in sub_lang_list:
                  continue
              sub_formats = []
-            for ext in ['sbv', 'vtt', 'srt']:
-                params = compat_urllib_parse.urlencode({
+            for ext in self._SUBTITLE_FORMATS:
+                params = compat_urllib_parse_urlencode({
                      'lang': lang,
                      'v': video_id,
                      'fmt': ext,
@@ -800,49 +1036,96 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              return {}
          return sub_lang_list
  
+    def _get_ytplayer_config(self, video_id, webpage):
+        patterns = (
+            # User data may contain arbitrary character sequences that may affect
+            # JSON extraction with regex, e.g. when '};' is contained the second
+            # regex won't capture the whole JSON. Yet working around by trying more
+            # concrete regex first keeping in mind proper quoted string handling
+            # to be implemented in future that will replace this workaround (see
+            # https://github.com/rg3/youtube-dl/issues/7468,
+            # https://github.com/rg3/youtube-dl/pull/7599)
+            r';ytplayer\.config\s*=\s*({.+?});ytplayer',
+            r';ytplayer\.config\s*=\s*({.+?});',
+        )
+        config = self._search_regex(
+            patterns, webpage, 'ytplayer.config', default=None)
+        if config:
+            return self._parse_json(
+                uppercase_escape(config), video_id, fatal=False)
+
      def _get_automatic_captions(self, video_id, webpage):
          """We need the webpage for getting the captions url, pass it as an
             argument to speed up the process."""
          self.to_screen('%s: Looking for automatic captions' % video_id)
-        mobj = re.search(r';ytplayer.config = ({.*?});', webpage)
+        player_config = self._get_ytplayer_config(video_id, webpage)
          err_msg = 'Couldn\'t find automatic captions for %s' % video_id
-        if mobj is None:
+        if not player_config:
              self._downloader.report_warning(err_msg)
              return {}
-        player_config = json.loads(mobj.group(1))
          try:
              args = player_config['args']
-            caption_url = args['ttsurl']
-            timestamp = args['timestamp']
-            # We get the available subtitles
-            list_params = compat_urllib_parse.urlencode({
-                'type': 'list',
-                'tlangs': 1,
-                'asrs': 1,
-            })
-            list_url = caption_url + '&' + list_params
-            caption_list = self._download_xml(list_url, video_id)
-            original_lang_node = caption_list.find('track')
-            if original_lang_node is None:
-                self._downloader.report_warning('Video doesn\'t have automatic captions')
-                return {}
-            original_lang = original_lang_node.attrib['lang_code']
-            caption_kind = original_lang_node.attrib.get('kind', '')
+            caption_url = args.get('ttsurl')
+            if caption_url:
+                timestamp = args['timestamp']
+                # We get the available subtitles
+                list_params = compat_urllib_parse_urlencode({
+                    'type': 'list',
+                    'tlangs': 1,
+                    'asrs': 1,
+                })
+                list_url = caption_url + '&' + list_params
+                caption_list = self._download_xml(list_url, video_id)
+                original_lang_node = caption_list.find('track')
+                if original_lang_node is None:
+                    self._downloader.report_warning('Video doesn\'t have automatic captions')
+                    return {}
+                original_lang = original_lang_node.attrib['lang_code']
+                caption_kind = original_lang_node.attrib.get('kind', '')
+
+                sub_lang_list = {}
+                for lang_node in caption_list.findall('target'):
+                    sub_lang = lang_node.attrib['lang_code']
+                    sub_formats = []
+                    for ext in self._SUBTITLE_FORMATS:
+                        params = compat_urllib_parse_urlencode({
+                            'lang': original_lang,
+                            'tlang': sub_lang,
+                            'fmt': ext,
+                            'ts': timestamp,
+                            'kind': caption_kind,
+                        })
+                        sub_formats.append({
+                            'url': caption_url + '&' + params,
+                            'ext': ext,
+                        })
+                    sub_lang_list[sub_lang] = sub_formats
+                return sub_lang_list
+
+            # Some videos don't provide ttsurl but rather caption_tracks and
+            # caption_translation_languages (e.g. 20LmZk1hakA)
+            caption_tracks = args['caption_tracks']
+            caption_translation_languages = args['caption_translation_languages']
+            caption_url = compat_parse_qs(caption_tracks.split(',')[0])['u'][0]
+            parsed_caption_url = compat_urllib_parse_urlparse(caption_url)
+            caption_qs = compat_parse_qs(parsed_caption_url.query)
  
              sub_lang_list = {}
-            for lang_node in caption_list.findall('target'):
-                sub_lang = lang_node.attrib['lang_code']
+            for lang in caption_translation_languages.split(','):
+                lang_qs = compat_parse_qs(compat_urllib_parse_unquote_plus(lang))
+                sub_lang = lang_qs.get('lc', [None])[0]
+                if not sub_lang:
+                    continue
                  sub_formats = []
-                for ext in ['sbv', 'vtt', 'srt']:
-                    params = compat_urllib_parse.urlencode({
-                        'lang': original_lang,
-                        'tlang': sub_lang,
-                        'fmt': ext,
-                        'ts': timestamp,
-                        'kind': caption_kind,
+                for ext in self._SUBTITLE_FORMATS:
+                    caption_qs.update({
+                        'tlang': [sub_lang],
+                        'fmt': [ext],
                      })
+                    sub_url = compat_urlparse.urlunparse(parsed_caption_url._replace(
+                        query=compat_urllib_parse_urlencode(caption_qs, True)))
                      sub_formats.append({
-                        'url': caption_url + '&' + params,
+                        'url': sub_url,
                          'ext': ext,
                      })
                  sub_lang_list[sub_lang] = sub_formats
@@ -853,6 +1136,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              self._downloader.report_warning(err_msg)
              return {}
  
+    def _mark_watched(self, video_id, video_info):
+        playback_url = video_info.get('videostats_playback_base_url', [None])[0]
+        if not playback_url:
+            return
+        parsed_playback_url = compat_urlparse.urlparse(playback_url)
+        qs = compat_urlparse.parse_qs(parsed_playback_url.query)
+
+        # cpn generation algorithm is reverse engineered from base.js.
+        # In fact it works even with dummy cpn.
+        CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
+        cpn = ''.join((CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(0, 16)))
+
+        qs.update({
+            'ver': ['2'],
+            'cpn': [cpn],
+        })
+        playback_url = compat_urlparse.urlunparse(
+            parsed_playback_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
+
+        self._download_webpage(
+            playback_url, video_id, 'Marking watched',
+            'Unable to mark watched', fatal=False)
+
      @classmethod
      def extract_id(cls, url):
          mobj = re.match(cls._VALID_URL, url, re.VERBOSE)
@@ -880,73 +1186,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          url = 'https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=%s' % video_id
          return self._download_webpage(url, video_id, note='Searching for annotations.', errnote='Unable to download video annotations.')
  
-    def _parse_dash_manifest(
-            self, video_id, dash_manifest_url, player_url, age_gate, fatal=True):
-        def decrypt_sig(mobj):
-            s = mobj.group(1)
-            dec_s = self._decrypt_signature(s, video_id, player_url, age_gate)
-            return '/signature/%s' % dec_s
-        dash_manifest_url = re.sub(r'/s/([a-fA-F0-9\.]+)', decrypt_sig, dash_manifest_url)
-        dash_doc = self._download_xml(
-            dash_manifest_url, video_id,
-            note='Downloading DASH manifest',
-            errnote='Could not download DASH manifest',
-            fatal=fatal)
-
-        if dash_doc is False:
-            return []
-
-        formats = []
-        for a in dash_doc.findall('.//{urn:mpeg:DASH:schema:MPD:2011}AdaptationSet'):
-            mime_type = a.attrib.get('mimeType')
-            for r in a.findall('{urn:mpeg:DASH:schema:MPD:2011}Representation'):
-                url_el = r.find('{urn:mpeg:DASH:schema:MPD:2011}BaseURL')
-                if url_el is None:
-                    continue
-                if mime_type == 'text/vtt':
-                    # TODO implement WebVTT downloading
-                    pass
-                elif mime_type.startswith('audio/') or mime_type.startswith('video/'):
-                    segment_list = r.find('{urn:mpeg:DASH:schema:MPD:2011}SegmentList')
-                    format_id = r.attrib['id']
-                    video_url = url_el.text
-                    filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength'))
-                    f = {
-                        'format_id': format_id,
-                        'url': video_url,
-                        'width': int_or_none(r.attrib.get('width')),
-                        'height': int_or_none(r.attrib.get('height')),
-                        'tbr': int_or_none(r.attrib.get('bandwidth'), 1000),
-                        'asr': int_or_none(r.attrib.get('audioSamplingRate')),
-                        'filesize': filesize,
-                        'fps': int_or_none(r.attrib.get('frameRate')),
-                    }
-                    if segment_list is not None:
-                        f.update({
-                            'initialization_url': segment_list.find('{urn:mpeg:DASH:schema:MPD:2011}Initialization').attrib['sourceURL'],
-                            'segment_urls': [segment.attrib.get('media') for segment in segment_list.findall('{urn:mpeg:DASH:schema:MPD:2011}SegmentURL')],
-                            'protocol': 'http_dash_segments',
-                        })
-                    try:
-                        existing_format = next(
-                            fo for fo in formats
-                            if fo['format_id'] == format_id)
-                    except StopIteration:
-                        full_info = self._formats.get(format_id, {}).copy()
-                        full_info.update(f)
-                        codecs = r.attrib.get('codecs')
-                        if codecs:
-                            if full_info.get('acodec') == 'none' and 'vcodec' not in full_info:
-                                full_info['vcodec'] = codecs
-                            elif full_info.get('vcodec') == 'none' and 'acodec' not in full_info:
-                                full_info['acodec'] = codecs
-                        formats.append(full_info)
-                    else:
-                        existing_format.update(f)
-                else:
-                    self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
-        return formats
-
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
  
@@ -999,7 +1238,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              # this can be viewed without login into Youtube
              url = proto + '://www.youtube.com/embed/%s' % video_id
              embed_webpage = self._download_webpage(url, video_id, 'Downloading embed webpage')
-            data = compat_urllib_parse.urlencode({
+            data = compat_urllib_parse_urlencode({
                  'video_id': video_id,
                  'eurl': 'https://youtube.googleapis.com/v/' + video_id,
                  'sts': self._search_regex(
@@ -1016,10 +1255,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              age_gate = False
              video_info = None
              # Try looking directly into the video webpage
-            mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage)
-            if mobj:
-                json_code = uppercase_escape(mobj.group(1))
-                ytplayer_config = json.loads(json_code)
+            ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
+            if ytplayer_config:
                  args = ytplayer_config['args']
                  if args.get('url_encoded_fmt_stream_map'):
                      # Convert to the same format returned by compat_parse_qs
@@ -1049,6 +1286,17 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      if not video_info:
                          video_info = get_video_info
                      if 'token' in get_video_info:
+                        # Different get_video_info requests may report different results, e.g.
+                        # some may report video unavailability, but some may serve it without
+                        # any complaint (see https://github.com/rg3/youtube-dl/issues/7362,
+                        # the original webpage as well as el=info and el=embedded get_video_info
+                        # requests report video unavailability due to geo restriction while
+                        # el=detailpage succeeds and returns valid data). This is probably
+                        # due to YouTube measures against IP ranges of hosting providers.
+                        # Working around by preferring the first succeeded video_info containing
+                        # the token if no such video_info yet was found.
+                        if 'token' not in video_info:
+                            video_info = get_video_info
                          break
          if 'token' not in video_info:
              if 'reason' in video_info:
@@ -1079,10 +1327,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              video_description = re.sub(r'''(?x)
                  <a\s+
                      (?:[a-zA-Z-]+="[^"]+"\s+)*?
-                    title="([^"]+)"\s+
+                    (?:title|href)="([^"]+)"\s+
                      (?:[a-zA-Z-]+="[^"]+"\s+)*?
-                    class="yt-uix-redirect-link"\s*>
-                [^<]+
+                    class="(?:yt-uix-redirect-link|yt-uix-sessionlink[^"]*)"[^>]*>
+                [^<]+\.{3}\s*
                  </a>
              ''', r'\1', video_description)
              video_description = clean_html(video_description)
@@ -1097,9 +1345,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              if not self._downloader.params.get('noplaylist'):
                  entries = []
                  feed_ids = []
-                multifeed_metadata_list = compat_urllib_parse_unquote_plus(video_info['multifeed_metadata_list'][0])
+                multifeed_metadata_list = video_info['multifeed_metadata_list'][0]
                  for feed in multifeed_metadata_list.split(','):
-                    feed_data = compat_parse_qs(feed)
+                    # Unquote should take place before split on comma (,) since textual
+                    # fields may contain comma as well (see
+                    # https://github.com/rg3/youtube-dl/issues/8536)
+                    feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
                      entries.append({
                          '_type': 'url_transparent',
                          'ie_key': 'Youtube',
@@ -1134,9 +1385,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
  
          # uploader_id
          video_uploader_id = None
-        mobj = re.search(r'<link itemprop="url" href="http://www.youtube.com/(?:user|channel)/([^"]+)">', video_webpage)
+        video_uploader_url = None
+        mobj = re.search(
+            r'<link itemprop="url" href="(?P<uploader_url>https?://www.youtube.com/(?:user|channel)/(?P<uploader_id>[^"]+))">',
+            video_webpage)
          if mobj is not None:
-            video_uploader_id = mobj.group(1)
+            video_uploader_id = mobj.group('uploader_id')
+            video_uploader_url = mobj.group('uploader_url')
          else:
              self._downloader.report_warning('unable to extract uploader nickname')
  
@@ -1164,6 +1419,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split())
          upload_date = unified_strdate(upload_date)
  
+        video_license = self._html_search_regex(
+            r'<h4[^>]+class="title"[^>]*>\s*License\s*</h4>\s*<ul[^>]*>\s*<li>(.+?)</li',
+            video_webpage, 'license', default=None)
+
+        m_music = re.search(
+            r'<h4[^>]+class="title"[^>]*>\s*Music\s*</h4>\s*<ul[^>]*>\s*<li>(?P<title>.+?) by (?P<creator>.+?)(?:\(.+?\))?</li',
+            video_webpage)
+        if m_music:
+            video_alt_title = remove_quotes(unescapeHTML(m_music.group('title')))
+            video_creator = clean_html(m_music.group('creator'))
+        else:
+            video_alt_title = video_creator = None
+
          m_cat_container = self._search_regex(
              r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
              video_webpage, 'categories', default=None)
@@ -1228,7 +1496,20 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
              if 'rtmpe%3Dyes' in encoded_url_map:
                  raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
-            url_map = {}
+            formats_spec = {}
+            fmt_list = video_info.get('fmt_list', [''])[0]
+            if fmt_list:
+                for fmt in fmt_list.split(','):
+                    spec = fmt.split('/')
+                    if len(spec) > 1:
+                        width_height = spec[1].split('x')
+                        if len(width_height) == 2:
+                            formats_spec[spec[0]] = {
+                                'resolution': spec[1],
+                                'width': int_or_none(width_height[0]),
+                                'height': int_or_none(width_height[1]),
+                            }
+            formats = []
              for url_data_str in encoded_url_map.split(','):
                  url_data = compat_parse_qs(url_data_str)
                  if 'itag' not in url_data or 'url' not in url_data:
@@ -1274,7 +1555,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                                  player_desc = 'flash player %s' % player_version
                              else:
                                  player_version = self._search_regex(
-                                    r'html5player-([^/]+?)(?:/html5player)?\.js',
+                                    [r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js', r'(?:www|player)-([^/]+)/base\.js'],
                                      player_url,
                                      'html5 player', fatal=False)
                                  player_desc = 'html5 player %s' % player_version
@@ -1288,23 +1569,90 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      url += '&signature=' + signature
                  if 'ratebypass' not in url:
                      url += '&ratebypass=yes'
-                url_map[format_id] = url
-            formats = _map_to_format_list(url_map)
+
+                dct = {
+                    'format_id': format_id,
+                    'url': url,
+                    'player_url': player_url,
+                }
+                if format_id in self._formats:
+                    dct.update(self._formats[format_id])
+                if format_id in formats_spec:
+                    dct.update(formats_spec[format_id])
+
+                # Some itags are not included in DASH manifest thus corresponding formats will
+                # lack metadata (see https://github.com/rg3/youtube-dl/pull/5993).
+                # Trying to extract metadata from url_encoded_fmt_stream_map entry.
+                mobj = re.search(r'^(?P<width>\d+)[xX](?P<height>\d+)$', url_data.get('size', [''])[0])
+                width, height = (int(mobj.group('width')), int(mobj.group('height'))) if mobj else (None, None)
+
+                more_fields = {
+                    'filesize': int_or_none(url_data.get('clen', [None])[0]),
+                    'tbr': float_or_none(url_data.get('bitrate', [None])[0], 1000),
+                    'width': width,
+                    'height': height,
+                    'fps': int_or_none(url_data.get('fps', [None])[0]),
+                    'format_note': url_data.get('quality_label', [None])[0] or url_data.get('quality', [None])[0],
+                }
+                for key, value in more_fields.items():
+                    if value:
+                        dct[key] = value
+                type_ = url_data.get('type', [None])[0]
+                if type_:
+                    type_split = type_.split(';')
+                    kind_ext = type_split[0].split('/')
+                    if len(kind_ext) == 2:
+                        kind, _ = kind_ext
+                        dct['ext'] = mimetype2ext(type_split[0])
+                        if kind in ('audio', 'video'):
+                            codecs = None
+                            for mobj in re.finditer(
+                                    r'(?P<key>[a-zA-Z_-]+)=(?P<quote>["\']?)(?P<val>.+?)(?P=quote)(?:;|$)', type_):
+                                if mobj.group('key') == 'codecs':
+                                    codecs = mobj.group('val')
+                                    break
+                            if codecs:
+                                codecs = codecs.split(',')
+                                if len(codecs) == 2:
+                                    acodec, vcodec = codecs[1], codecs[0]
+                                else:
+                                    acodec, vcodec = (codecs[0], 'none') if kind == 'audio' else ('none', codecs[0])
+                                dct.update({
+                                    'acodec': acodec,
+                                    'vcodec': vcodec,
+                                })
+                formats.append(dct)
          elif video_info.get('hlsvp'):
              manifest_url = video_info['hlsvp'][0]
              url_map = self._extract_from_m3u8(manifest_url, video_id)
              formats = _map_to_format_list(url_map)
+            # Accept-Encoding header causes failures in live streams on Youtube and Youtube Gaming
+            for a_format in formats:
+                a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
          else:
+            unavailable_message = self._html_search_regex(
+                r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
+                video_webpage, 'unavailable message', default=None)
+            if unavailable_message:
+                raise ExtractorError(unavailable_message, expected=True)
              raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
  
          # Look for the DASH manifest
          if self._downloader.params.get('youtube_include_dash_manifest', True):
              dash_mpd_fatal = True
-            for dash_manifest_url in dash_mpds:
+            for mpd_url in dash_mpds:
                  dash_formats = {}
                  try:
-                    for df in self._parse_dash_manifest(
-                            video_id, dash_manifest_url, player_url, age_gate, dash_mpd_fatal):
+                    def decrypt_sig(mobj):
+                        s = mobj.group(1)
+                        dec_s = self._decrypt_signature(s, video_id, player_url, age_gate)
+                        return '/signature/%s' % dec_s
+
+                    mpd_url = re.sub(r'/s/([a-fA-F0-9\.]+)', decrypt_sig, mpd_url)
+
+                    for df in self._extract_mpd_formats(
+                            mpd_url, video_id, fatal=dash_mpd_fatal,
+                            formats_dict=self._formats):
                          # Do not overwrite DASH format found in some previous DASH manifest
                          if df['format_id'] not in dash_formats:
                              dash_formats[df['format_id']] = df
@@ -1331,19 +1679,30 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              r'<meta\s+property="og:video:tag".*?content="yt:stretch=(?P<w>[0-9]+):(?P<h>[0-9]+)">',
              video_webpage)
          if stretched_m:
-            ratio = float(stretched_m.group('w')) / float(stretched_m.group('h'))
-            for f in formats:
-                if f.get('vcodec') != 'none':
-                    f['stretched_ratio'] = ratio
+            w = float(stretched_m.group('w'))
+            h = float(stretched_m.group('h'))
+            # yt:stretch may hold invalid ratio data (e.g. for Q39EVAstoRM ratio is 17:0).
+            # We will only process correct ratios.
+            if w > 0 and h > 0:
+                ratio = w / h
+                for f in formats:
+                    if f.get('vcodec') != 'none':
+                        f['stretched_ratio'] = ratio
  
          self._sort_formats(formats)
  
+        self.mark_watched(video_id, video_info)
+
          return {
              'id': video_id,
              'uploader': video_uploader,
              'uploader_id': video_uploader_id,
+            'uploader_url': video_uploader_url,
              'upload_date': upload_date,
+            'license': video_license,
+            'creator': video_creator,
              'title': video_title,
+            'alt_title': video_alt_title,
              'thumbnail': video_thumbnail,
              'description': video_description,
              'categories': video_categories,
@@ -1365,7 +1724,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          }
  
  
-class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
+class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      IE_DESC = 'YouTube.com playlists'
      _VALID_URL = r"""(?x)(?:
                          (?:https?://)?
@@ -1373,7 +1732,7 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
                          youtube\.com/
                          (?:
                             (?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
-                           \? (?:.*?&)*? (?:p|a|list)=
+                           \? (?:.*?[&;])*? (?:p|a|list)=
                          |  p/
                          )
                          (
@@ -1386,7 +1745,7 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
                          ((?:PL|LL|EC|UU|FL|RD|UL)[0-9A-Za-z-_]{10,})
                       )"""
      _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
-    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)'
+    _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
      IE_NAME = 'youtube:playlist'
      _TESTS = [{
          'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
@@ -1465,20 +1824,32 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
      def _extract_mix(self, playlist_id):
          # The mixes are generated from a single video
          # the id of the playlist is just 'RD' + video_id
-        url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
-        webpage = self._download_webpage(
-            url, playlist_id, 'Downloading Youtube mix')
+        ids = []
+        last_id = playlist_id[-11:]
+        for n in itertools.count(1):
+            url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
+            webpage = self._download_webpage(
+                url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
+            new_ids = orderedSet(re.findall(
+                r'''(?xs)data-video-username=".*?".*?
+                           href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
+                webpage))
+            # Fetch new pages until all the videos are repeated, it seems that
+            # there are always 51 unique videos.
+            new_ids = [_id for _id in new_ids if _id not in ids]
+            if not new_ids:
+                break
+            ids.extend(new_ids)
+            last_id = ids[-1]
+
+        url_results = self._ids_to_results(ids)
+
          search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
          title_span = (
              search_title('playlist-title') or
              search_title('title long-title') or
              search_title('title'))
          title = clean_html(title_span)
-        ids = orderedSet(re.findall(
-            r'''(?xs)data-video-username=".*?".*?
-                       href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
-            webpage))
-        url_results = self._ids_to_results(ids)
  
          return self.playlist_result(url_results, playlist_id, title)
  
@@ -1503,45 +1874,13 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
              else:
                  self.report_warning('Youtube gives an alert message: ' + match)
  
-        # Extract the video ids from the playlist pages
-        def _entries():
-            more_widget_html = content_html = page
-            for page_num in itertools.count(1):
-                matches = re.finditer(self._VIDEO_RE, content_html)
-                # We remove the duplicates and the link with index 0
-                # (it's not the first video of the playlist)
-                new_ids = orderedSet(m.group('id') for m in matches if m.group('index') != '0')
-                for vid_id in new_ids:
-                    yield self.url_result(vid_id, 'Youtube', video_id=vid_id)
-
-                mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
-                if not mobj:
-                    break
-
-                more = self._download_json(
-                    'https://youtube.com/%s' % mobj.group('more'), playlist_id,
-                    'Downloading page #%s' % page_num,
-                    transform_source=uppercase_escape)
-                content_html = more['content_html']
-                if not content_html.strip():
-                    # Some webpages show a "Load more" button but they don't
-                    # have more videos
-                    break
-                more_widget_html = more['load_more_widget_html']
-
          playlist_title = self._html_search_regex(
-            r'(?s)<h1 class="pl-header-title[^"]*">\s*(.*?)\s*</h1>',
+            r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
              page, 'title')
  
-        return self.playlist_result(_entries(), playlist_id, playlist_title)
-
-    def _real_extract(self, url):
-        # Extract playlist id
-        mobj = re.match(self._VALID_URL, url)
-        if mobj is None:
-            raise ExtractorError('Invalid URL: %s' % url)
-        playlist_id = mobj.group(1) or mobj.group(2)
+        return self.playlist_result(self._entries(page, playlist_id), playlist_id, playlist_title)
  
+    def _check_download_just_video(self, url, playlist_id):
          # Check if it's a video-specific URL
          query_dict = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
          if 'v' in query_dict:
@@ -1552,42 +1891,53 @@ class YoutubePlaylistIE(YoutubeBaseInfoExtractor):
              else:
                  self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
  
-        if playlist_id.startswith('RD') or playlist_id.startswith('UL'):
+    def _real_extract(self, url):
+        # Extract playlist id
+        mobj = re.match(self._VALID_URL, url)
+        if mobj is None:
+            raise ExtractorError('Invalid URL: %s' % url)
+        playlist_id = mobj.group(1) or mobj.group(2)
+
+        video = self._check_download_just_video(url, playlist_id)
+        if video:
+            return video
+
+        if playlist_id.startswith(('RD', 'UL', 'PU')):
              # Mixes require a custom extraction process
              return self._extract_mix(playlist_id)
  
          return self._extract_playlist(playlist_id)
  
  
-class YoutubeChannelIE(InfoExtractor):
+class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
      IE_DESC = 'YouTube.com channels'
      _VALID_URL = r'https?://(?:youtu\.be|(?:\w+\.)?youtube(?:-nocookie)?\.com)/channel/(?P<id>[0-9A-Za-z_-]+)'
      _TEMPLATE_URL = 'https://www.youtube.com/channel/%s/videos'
+    _VIDEO_RE = r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?'
      IE_NAME = 'youtube:channel'
      _TESTS = [{
          'note': 'paginated channel',
          'url': 'https://www.youtube.com/channel/UCKfVa3S1e4PHvxWcwyMMg8w',
          'playlist_mincount': 91,
          'info_dict': {
-            'id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
+            'id': 'UUKfVa3S1e4PHvxWcwyMMg8w',
+            'title': 'Uploads from lex will',
          }
+    }, {
+        'note': 'Age restricted channel',
+        # from https://www.youtube.com/user/DeusExOfficial
+        'url': 'https://www.youtube.com/channel/UCs0ifCMCm1icqRbqhUINa0w',
+        'playlist_mincount': 64,
+        'info_dict': {
+            'id': 'UUs0ifCMCm1icqRbqhUINa0w',
+            'title': 'Uploads from Deus Ex',
+        },
      }]
  
-    @staticmethod
-    def extract_videos_from_page(page):
-        ids_in_page = []
-        titles_in_page = []
-        for mobj in re.finditer(r'(?:title="(?P<title>[^"]+)"[^>]+)?href="/watch\?v=(?P<id>[0-9A-Za-z_-]+)&?', page):
-            video_id = mobj.group('id')
-            video_title = unescapeHTML(mobj.group('title'))
-            try:
-                idx = ids_in_page.index(video_id)
-                if video_title and not titles_in_page[idx]:
-                    titles_in_page[idx] = video_title
-            except ValueError:
-                ids_in_page.append(video_id)
-                titles_in_page.append(video_title)
-        return zip(ids_in_page, titles_in_page)
+    @classmethod
+    def suitable(cls, url):
+        return (False if YoutubePlaylistsIE.suitable(url) or YoutubeLiveIE.suitable(url)
+                else super(YoutubeChannelIE, cls).suitable(url))
  
      def _real_extract(self, url):
          channel_id = self._match_id(url)
@@ -1600,12 +1950,15 @@ class YoutubeChannelIE(InfoExtractor):
          channel_page = self._download_webpage(
              url + '?view=57', channel_id,
              'Downloading channel page', fatal=False)
-        channel_playlist_id = self._html_search_meta(
-            'channelId', channel_page, 'channel id', default=None)
-        if not channel_playlist_id:
-            channel_playlist_id = self._search_regex(
-                r'data-channel-external-id="([^"]+)"',
-                channel_page, 'channel id', default=None)
+        if channel_page is False:
+            channel_playlist_id = False
+        else:
+            channel_playlist_id = self._html_search_meta(
+                'channelId', channel_page, 'channel id', default=None)
+            if not channel_playlist_id:
+                channel_playlist_id = self._search_regex(
+                    r'data-(?:channel-external-|yt)id="([^"]+)"',
+                    channel_page, 'channel id', default=None)
          if channel_playlist_id and channel_playlist_id.startswith('UC'):
              playlist_id = 'UU' + channel_playlist_id[2:]
              return self.url_result(
@@ -1628,34 +1981,12 @@ class YoutubeChannelIE(InfoExtractor):
                  for video_id, video_title in self.extract_videos_from_page(channel_page)]
              return self.playlist_result(entries, channel_id)
  
-        def _entries():
-            more_widget_html = content_html = channel_page
-            for pagenum in itertools.count(1):
-
-                for video_id, video_title in self.extract_videos_from_page(content_html):
-                    yield self.url_result(
-                        video_id, 'Youtube', video_id=video_id,
-                        video_title=video_title)
-
-                mobj = re.search(
-                    r'data-uix-load-more-href="/?(?P<more>[^"]+)"',
-                    more_widget_html)
-                if not mobj:
-                    break
-
-                more = self._download_json(
-                    'https://youtube.com/%s' % mobj.group('more'), channel_id,
-                    'Downloading page #%s' % (pagenum + 1),
-                    transform_source=uppercase_escape)
-                content_html = more['content_html']
-                more_widget_html = more['load_more_widget_html']
-
-        return self.playlist_result(_entries(), channel_id)
+        return self.playlist_result(self._entries(channel_page, channel_id), channel_id)
  
  
  class YoutubeUserIE(YoutubeChannelIE):
      IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
-    _VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
+    _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:user/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
      _TEMPLATE_URL = 'https://www.youtube.com/user/%s/videos'
      IE_NAME = 'youtube:user'
  
@@ -1674,13 +2005,88 @@ class YoutubeUserIE(YoutubeChannelIE):
      def suitable(cls, url):
          # Don't return True if the url can be extracted with other youtube
          # extractor, the regex would is too permissive and it would match.
-        other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
-        if any(ie.suitable(url) for ie in other_ies):
+        other_yt_ies = iter(klass for (name, klass) in globals().items() if name.startswith('Youtube') and name.endswith('IE') and klass is not cls)
+        if any(ie.suitable(url) for ie in other_yt_ies):
              return False
          else:
              return super(YoutubeUserIE, cls).suitable(url)
  
  
+class YoutubeLiveIE(YoutubeBaseInfoExtractor):
+    IE_DESC = 'YouTube.com live streams'
+    _VALID_URL = r'(?P<base_url>https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+))/live'
+    IE_NAME = 'youtube:live'
+
+    _TESTS = [{
+        'url': 'http://www.youtube.com/user/TheYoungTurks/live',
+        'info_dict': {
+            'id': 'a48o2S1cPoo',
+            'ext': 'mp4',
+            'title': 'The Young Turks - Live Main Show',
+            'uploader': 'The Young Turks',
+            'uploader_id': 'TheYoungTurks',
+            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks',
+            'upload_date': '20150715',
+            'license': 'Standard YouTube License',
+            'description': 'md5:438179573adcdff3c97ebb1ee632b891',
+            'categories': ['News & Politics'],
+            'tags': ['Cenk Uygur (TV Program Creator)', 'The Young Turks (Award-Winning Work)', 'Talk Show (TV Genre)'],
+            'like_count': int,
+            'dislike_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        channel_id = mobj.group('id')
+        base_url = mobj.group('base_url')
+        webpage = self._download_webpage(url, channel_id, fatal=False)
+        if webpage:
+            page_type = self._og_search_property(
+                'type', webpage, 'page type', default=None)
+            video_id = self._html_search_meta(
+                'videoId', webpage, 'video id', default=None)
+            if page_type == 'video' and video_id and re.match(r'^[0-9A-Za-z_-]{11}$', video_id):
+                return self.url_result(video_id, YoutubeIE.ie_key())
+        return self.url_result(base_url)
+
+
+class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
+    IE_DESC = 'YouTube.com user/channel playlists'
+    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists'
+    IE_NAME = 'youtube:playlists'
+
+    _TESTS = [{
+        'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
+        'playlist_mincount': 4,
+        'info_dict': {
+            'id': 'ThirstForScience',
+            'title': 'Thirst for Science',
+        },
+    }, {
+        # with "Load more" button
+        'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
+        'playlist_mincount': 70,
+        'info_dict': {
+            'id': 'igorkle1',
+            'title': 'Игорь Клейнер',
+        },
+    }, {
+        'url': 'https://www.youtube.com/channel/UCiU1dHvZObB2iP6xkJ__Icw/playlists',
+        'playlist_mincount': 17,
+        'info_dict': {
+            'id': 'UCiU1dHvZObB2iP6xkJ__Icw',
+            'title': 'Chem Player',
+        },
+    }]
+
+
  class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
      IE_DESC = 'YouTube.com searches'
      # there doesn't appear to be a real limit, for example if you search for
@@ -1704,7 +2110,7 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
                  'spf': 'navigate',
              }
              url_query.update(self._EXTRA_QUERY_ARGS)
-            result_url = 'https://www.youtube.com/results?' + compat_urllib_parse.urlencode(url_query)
+            result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
              data = self._download_json(
                  result_url, video_id='query "%s"' % query,
                  note='Downloading page %s' % pagenum,
@@ -1736,13 +2142,16 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
  class YoutubeSearchURLIE(InfoExtractor):
      IE_DESC = 'YouTube.com search URLs'
      IE_NAME = 'youtube:search_url'
-    _VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?search_query=(?P<query>[^&]+)(?:[&]|$)'
+    _VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
      _TESTS = [{
          'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
          'playlist_mincount': 5,
          'info_dict': {
              'title': 'youtube-dl test video',
          }
+    }, {
+        'url': 'https://www.youtube.com/results?q=test&sp=EgQIBBgB',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -1754,7 +2163,7 @@ class YoutubeSearchURLIE(InfoExtractor):
              r'(?s)<ol[^>]+class="item-section"(.*?)</ol>', webpage, 'result HTML')
  
          part_codes = re.findall(
-            r'(?s)<h3 class="yt-lockup-title">(.*?)</h3>', result_code)
+            r'(?s)<h3[^>]+class="[^"]*yt-lockup-title[^"]*"[^>]*>(.*?)</h3>', result_code)
          entries = []
          for part_code in part_codes:
              part_title = self._html_search_regex(
@@ -1776,13 +2185,13 @@ class YoutubeSearchURLIE(InfoExtractor):
          }
  
  
-class YoutubeShowIE(InfoExtractor):
+class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor):
      IE_DESC = 'YouTube.com (multi-season) shows'
      _VALID_URL = r'https?://www\.youtube\.com/show/(?P<id>[^?#]*)'
      IE_NAME = 'youtube:show'
      _TESTS = [{
-        'url': 'http://www.youtube.com/show/airdisasters',
-        'playlist_mincount': 3,
+        'url': 'https://www.youtube.com/show/airdisasters',
+        'playlist_mincount': 5,
          'info_dict': {
              'id': 'airdisasters',
              'title': 'Air Disasters',
@@ -1790,26 +2199,9 @@ class YoutubeShowIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        playlist_id = mobj.group('id')
-        webpage = self._download_webpage(
-            url, playlist_id, 'Downloading show webpage')
-        # There's one playlist for each season of the show
-        m_seasons = list(re.finditer(r'href="(/playlist\?list=.*?)"', webpage))
-        self.to_screen('%s: Found %s seasons' % (playlist_id, len(m_seasons)))
-        entries = [
-            self.url_result(
-                'https://www.youtube.com' + season.group(1), 'YoutubePlaylist')
-            for season in m_seasons
-        ]
-        title = self._og_search_title(webpage, fatal=False)
-
-        return {
-            '_type': 'playlist',
-            'id': playlist_id,
-            'title': title,
-            'entries': entries,
-        }
+        playlist_id = self._match_id(url)
+        return super(YoutubeShowIE, self)._real_extract(
+            'https://www.youtube.com/show/%s/playlists' % playlist_id)
  
  
  class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
@@ -1864,11 +2256,20 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
  class YoutubeWatchLaterIE(YoutubePlaylistIE):
      IE_NAME = 'youtube:watchlater'
      IE_DESC = 'Youtube watch later list, ":ytwatchlater" for short (requires authentication)'
-    _VALID_URL = r'https?://www\.youtube\.com/(?:feed/watch_later|playlist\?list=WL)|:ytwatchlater'
+    _VALID_URL = r'https?://www\.youtube\.com/(?:feed/watch_later|(?:playlist|watch)\?(?:.+&)?list=WL)|:ytwatchlater'
  
-    _TESTS = []  # override PlaylistIE tests
+    _TESTS = [{
+        'url': 'https://www.youtube.com/playlist?list=WL',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/watch?v=bCNU9TrbiRk&index=1&list=WL',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
+        video = self._check_download_just_video(url, 'WL')
+        if video:
+            return video
          return self._extract_playlist('WL')
  
  
@@ -1916,6 +2317,7 @@ class YoutubeTruncatedURLIE(InfoExtractor):
              annotation_id=annotation_[^&]+|
              x-yt-cl=[0-9]+|
              hl=[^&]*|
+            t=[0-9]+
          )?
          |
              attribution_link\?a=[^&]+
@@ -1938,6 +2340,9 @@ class YoutubeTruncatedURLIE(InfoExtractor):
      }, {
          'url': 'https://www.youtube.com/watch?hl=en-GB',
          'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/watch?t=2372',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 98f15177bd6665bd1c6b96a071d59d4b67e5d918..2ef17727592405b7bb20b378403d82470b52ce2f 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -9,89 +9,18 @@ from ..utils import (
      int_or_none,
      unified_strdate,
      OnDemandPagedList,
+    xpath_text,
+    determine_ext,
+    qualities,
+    float_or_none,
+    ExtractorError,
  )
  
  
-def extract_from_xml_url(ie, video_id, xml_url):
-    doc = ie._download_xml(
-        xml_url, video_id,
-        note='Downloading video info',
-        errnote='Failed to download video info')
-
-    title = doc.find('.//information/title').text
-    description = doc.find('.//information/detail').text
-    duration = int(doc.find('.//details/lengthSec').text)
-    uploader_node = doc.find('.//details/originChannelTitle')
-    uploader = None if uploader_node is None else uploader_node.text
-    uploader_id_node = doc.find('.//details/originChannelId')
-    uploader_id = None if uploader_id_node is None else uploader_id_node.text
-    upload_date = unified_strdate(doc.find('.//details/airtime').text)
-
-    def xml_to_format(fnode):
-        video_url = fnode.find('url').text
-        is_available = 'http://www.metafilegenerator' not in video_url
-
-        format_id = fnode.attrib['basetype']
-        format_m = re.match(r'''(?x)
-            (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
-            (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
-        ''', format_id)
-
-        ext = format_m.group('container')
-        proto = format_m.group('proto').lower()
-
-        quality = fnode.find('./quality').text
-        abr = int(fnode.find('./audioBitrate').text) // 1000
-        vbr_node = fnode.find('./videoBitrate')
-        vbr = None if vbr_node is None else int(vbr_node.text) // 1000
-
-        width_node = fnode.find('./width')
-        width = None if width_node is None else int_or_none(width_node.text)
-        height_node = fnode.find('./height')
-        height = None if height_node is None else int_or_none(height_node.text)
-
-        format_note = ''
-        if not format_note:
-            format_note = None
-
-        return {
-            'format_id': format_id + '-' + quality,
-            'url': video_url,
-            'ext': ext,
-            'acodec': format_m.group('acodec'),
-            'vcodec': format_m.group('vcodec'),
-            'abr': abr,
-            'vbr': vbr,
-            'width': width,
-            'height': height,
-            'filesize': int_or_none(fnode.find('./filesize').text),
-            'format_note': format_note,
-            'protocol': proto,
-            '_available': is_available,
-        }
-
-    format_nodes = doc.findall('.//formitaeten/formitaet')
-    formats = list(filter(
-        lambda f: f['_available'],
-        map(xml_to_format, format_nodes)))
-    ie._sort_formats(formats)
-
-    return {
-        'id': video_id,
-        'title': title,
-        'description': description,
-        'duration': duration,
-        'uploader': uploader,
-        'uploader_id': uploader_id,
-        'upload_date': upload_date,
-        'formats': formats,
-    }
-
-
  class ZDFIE(InfoExtractor):
      _VALID_URL = r'(?:zdf:|zdf:video:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/(.*beitrag/(?:video/)?))(?P<id>[0-9]+)(?:/[^/?]+)?(?:\?.*)?'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt',
          'info_dict': {
              'id': '2037704',
@@ -104,23 +33,197 @@ class ZDFIE(InfoExtractor):
              'upload_date': '20131127',
          },
          'skip': 'Videos on ZDF.de are depublicised in short order',
-    }
+    }]
+
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        param_groups = {}
+        for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
+            group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
+            params = {}
+            for param in param_group:
+                params[param.get('name')] = param.get('value')
+            param_groups[group_id] = params
+
+        formats = []
+        for video in smil.findall(self._xpath_ns('.//video', namespace)):
+            src = video.get('src')
+            if not src:
+                continue
+            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
+            group_id = video.get('paramGroup')
+            param_group = param_groups[group_id]
+            for proto in param_group['protocols'].split(','):
+                formats.append({
+                    'url': '%s://%s' % (proto, param_group['host']),
+                    'app': param_group['app'],
+                    'play_path': src,
+                    'ext': 'flv',
+                    'format_id': '%s-%d' % (proto, bitrate),
+                    'tbr': bitrate,
+                })
+        self._sort_formats(formats)
+        return formats
+
+    def extract_from_xml_url(self, video_id, xml_url):
+        doc = self._download_xml(
+            xml_url, video_id,
+            note='Downloading video info',
+            errnote='Failed to download video info')
+
+        status_code = doc.find('./status/statuscode')
+        if status_code is not None and status_code.text != 'ok':
+            code = status_code.text
+            if code == 'notVisibleAnymore':
+                message = 'Video %s is not available' % video_id
+            else:
+                message = '%s returned error: %s' % (self.IE_NAME, code)
+            raise ExtractorError(message, expected=True)
+
+        title = doc.find('.//information/title').text
+        description = xpath_text(doc, './/information/detail', 'description')
+        duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
+        uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
+        uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
+        upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
+        subtitles = {}
+        captions_url = doc.find('.//caption/url')
+        if captions_url is not None:
+            subtitles['de'] = [{
+                'url': captions_url.text,
+                'ext': 'ttml',
+            }]
+
+        def xml_to_thumbnails(fnode):
+            thumbnails = []
+            for node in fnode:
+                thumbnail_url = node.text
+                if not thumbnail_url:
+                    continue
+                thumbnail = {
+                    'url': thumbnail_url,
+                }
+                if 'key' in node.attrib:
+                    m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
+                    if m:
+                        thumbnail['width'] = int(m.group(1))
+                        thumbnail['height'] = int(m.group(2))
+                thumbnails.append(thumbnail)
+            return thumbnails
+
+        thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
+
+        format_nodes = doc.findall('.//formitaeten/formitaet')
+        quality = qualities(['veryhigh', 'high', 'med', 'low'])
+
+        def get_quality(elem):
+            return quality(xpath_text(elem, 'quality'))
+        format_nodes.sort(key=get_quality)
+        format_ids = []
+        formats = []
+        for fnode in format_nodes:
+            video_url = fnode.find('url').text
+            is_available = 'http://www.metafilegenerator' not in video_url
+            if not is_available:
+                continue
+            format_id = fnode.attrib['basetype']
+            quality = xpath_text(fnode, './quality', 'quality')
+            format_m = re.match(r'''(?x)
+                (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
+                (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
+            ''', format_id)
+
+            ext = determine_ext(video_url, None) or format_m.group('container')
+            if ext not in ('smil', 'f4m', 'm3u8'):
+                format_id = format_id + '-' + quality
+            if format_id in format_ids:
+                continue
+
+            if ext == 'meta':
+                continue
+            elif ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    video_url, video_id, fatal=False))
+            elif ext == 'm3u8':
+                # the certificates are misconfigured (see
+                # https://github.com/rg3/youtube-dl/issues/8665)
+                if video_url.startswith('https://'):
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    video_url, video_id, f4m_id=format_id, fatal=False))
+            else:
+                proto = format_m.group('proto').lower()
+
+                abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
+                vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
+
+                width = int_or_none(xpath_text(fnode, './width', 'width'))
+                height = int_or_none(xpath_text(fnode, './height', 'height'))
+
+                filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
+
+                format_note = ''
+                if not format_note:
+                    format_note = None
+
+                formats.append({
+                    'format_id': format_id,
+                    'url': video_url,
+                    'ext': ext,
+                    'acodec': format_m.group('acodec'),
+                    'vcodec': format_m.group('vcodec'),
+                    'abr': abr,
+                    'vbr': vbr,
+                    'width': width,
+                    'height': height,
+                    'filesize': filesize,
+                    'format_note': format_note,
+                    'protocol': proto,
+                    '_available': is_available,
+                })
+            format_ids.append(format_id)
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'thumbnails': thumbnails,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'upload_date': upload_date,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          xml_url = 'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
-        return extract_from_xml_url(self, video_id, xml_url)
+        return self.extract_from_xml_url(video_id, xml_url)
  
  
  class ZDFChannelIE(InfoExtractor):
-    _VALID_URL = r'(?:zdf:topic:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/.*kanaluebersicht/)(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'(?:zdf:topic:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/.*kanaluebersicht/(?:[^/]+/)?)(?P<id>[0-9]+)'
+    _TESTS = [{
          'url': 'http://www.zdf.de/ZDFmediathek#/kanaluebersicht/1586442/sendung/Titanic',
          'info_dict': {
              'id': '1586442',
          },
          'playlist_count': 3,
-    }
+    }, {
+        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/332',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/meist-gesehen/332',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/_/1798716?bc=nrt;nrm?flash=off',
+        'only_matching': True,
+    }]
      _PAGE_SIZE = 50
  
      def _fetch_page(self, channel_id, page):
diff --git a/youtube_dl/extractor/zingmp3.py b/youtube_dl/extractor/zingmp3.py

index 7dc1e2f2bd3f36e2b71e199fb5e5f6f2cc4e18e9..437eecb6737161c9d730bf9a93eaed1bdb541799 100644 (file)
--- a/youtube_dl/extractor/zingmp3.py
+++ b/youtube_dl/extractor/zingmp3.py
@@ -9,9 +9,11 @@ from ..utils import ExtractorError
  
  class ZingMp3BaseInfoExtractor(InfoExtractor):
  
-    def _extract_item(self, item):
+    def _extract_item(self, item, fatal=True):
          error_message = item.find('./errormessage').text
          if error_message:
+            if not fatal:
+                return
              raise ExtractorError(
                  '%s returned error: %s' % (self.IE_NAME, error_message),
                  expected=True)
@@ -43,7 +45,9 @@ class ZingMp3BaseInfoExtractor(InfoExtractor):
              entries = []
  
              for i, item in enumerate(items, 1):
-                entry = self._extract_item(item)
+                entry = self._extract_item(item, fatal=False)
+                if not entry:
+                    continue
                  entry['id'] = '%s-%d' % (id, i)
                  entries.append(entry)
  
@@ -85,7 +89,7 @@ class ZingMp3SongIE(ZingMp3BaseInfoExtractor):
  
  
  class ZingMp3AlbumIE(ZingMp3BaseInfoExtractor):
-    _VALID_URL = r'https?://mp3\.zing\.vn/album/(?P<slug>[^/]+)/(?P<album_id>\w+)\.html'
+    _VALID_URL = r'https?://mp3\.zing\.vn/(?:album|playlist)/(?P<slug>[^/]+)/(?P<album_id>\w+)\.html'
      _TESTS = [{
          'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
          'info_dict': {
@@ -94,6 +98,9 @@ class ZingMp3AlbumIE(ZingMp3BaseInfoExtractor):
              'title': 'Lâu Đài Tình Ái - Bằng Kiều ft. Minh Tuyết | Album 320 lossless',
          },
          'playlist_count': 10,
+    }, {
+        'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
+        'only_matching': True,
      }]
      IE_NAME = 'zingmp3:album'
      IE_DESC = 'mp3.zing.vn albums'
diff --git a/youtube_dl/extractor/zippcast.py b/youtube_dl/extractor/zippcast.py

new file mode 100644 (file)

index 0000000..de81937
--- /dev/null
+++ b/youtube_dl/extractor/zippcast.py
@@ -0,0 +1,94 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    str_to_int,
+)
+
+
+class ZippCastIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?zippcast\.com/(?:video/|videoview\.php\?.*\bvplay=)(?P<id>[0-9a-zA-Z]+)'
+    _TESTS = [{
+        # m3u8, hq direct link
+        'url': 'http://www.zippcast.com/video/c9cfd5c7e44dbc29c81',
+        'md5': '5ea0263b5606866c4d6cda0fc5e8c6b6',
+        'info_dict': {
+            'id': 'c9cfd5c7e44dbc29c81',
+            'ext': 'mp4',
+            'title': '[Vinesauce] Vinny - Digital Space Traveler',
+            'description': 'Muted on youtube, but now uploaded in it\'s original form.',
+            'thumbnail': 're:^https?://.*\.jpg$',
+            'uploader': 'vinesauce',
+            'view_count': int,
+            'categories': ['Entertainment'],
+            'tags': list,
+        },
+    }, {
+        # f4m, lq ipod direct link
+        'url': 'http://www.zippcast.com/video/b79c0a233e9c6581775',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.zippcast.com/videoview.php?vplay=c9cfd5c7e44dbc29c81&auto=no',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://www.zippcast.com/video/%s' % video_id, video_id)
+
+        formats = []
+        video_url = self._search_regex(
+            r'<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage,
+            'video url', default=None, group='url')
+        if video_url:
+            formats.append({
+                'url': video_url,
+                'format_id': 'http',
+                'preference': 0,  # direct link is almost always of worse quality
+            })
+        src_url = self._search_regex(
+            r'src\s*:\s*(?:escape\()?(["\'])(?P<url>http://.+?)\1',
+            webpage, 'src', default=None, group='url')
+        ext = determine_ext(src_url)
+        if ext == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(
+                src_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+        elif ext == 'f4m':
+            formats.extend(self._extract_f4m_formats(
+                src_url, video_id, f4m_id='hds', fatal=False))
+        self._sort_formats(formats)
+
+        title = self._og_search_title(webpage)
+        description = self._og_search_description(webpage) or self._html_search_meta(
+            'description', webpage)
+        uploader = self._search_regex(
+            r'<a[^>]+href="https?://[^/]+/profile/[^>]+>([^<]+)</a>',
+            webpage, 'uploader', fatal=False)
+        thumbnail = self._og_search_thumbnail(webpage)
+        view_count = str_to_int(self._search_regex(
+            r'>([\d,.]+) views!', webpage, 'view count', fatal=False))
+
+        categories = re.findall(
+            r'<a[^>]+href="https?://[^/]+/categories/[^"]+">([^<]+),?<',
+            webpage)
+        tags = re.findall(
+            r'<a[^>]+href="https?://[^/]+/search/tags/[^"]+">([^<]+),?<',
+            webpage)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'uploader': uploader,
+            'view_count': view_count,
+            'categories': categories,
+            'tags': tags,
+            'formats': formats,
+        }
diff --git a/youtube_dl/jsinterp.py b/youtube_dl/jsinterp.py

index 0e0c7d90d5aa2fbb8039dddf642ac4692f2974a7..a7440c58242079ea1c6874e1bed0abe756fdc814 100644 (file)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@@ -214,7 +214,7 @@ class JSInterpreter(object):
          obj = {}
          obj_m = re.search(
              (r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
-            r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' +
+            r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' +
              r'\}\s*;',
              self.code)
          fields = obj_m.group('fields')
@@ -232,10 +232,10 @@ class JSInterpreter(object):
      def extract_function(self, funcname):
          func_m = re.search(
              r'''(?x)
-                (?:function\s+%s|[{;]%s\s*=\s*function)\s*
+                (?:function\s+%s|[{;,]%s\s*=\s*function|var\s+%s\s*=\s*function)\s*
                  \((?P<args>[^)]*)\)\s*
                  \{(?P<code>[^}]+)\}''' % (
-                re.escape(funcname), re.escape(funcname)),
+                re.escape(funcname), re.escape(funcname), re.escape(funcname)),
              self.code)
          if func_m is None:
              raise ExtractorError('Could not find JS function %r' % funcname)
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 9016e34983d3fed5e0fab72e9a8626124cdee859..d1f8d1331cf153a58a42b4220ebe37b441f3df4b 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -2,7 +2,6 @@ from __future__ import unicode_literals
  
  import os.path
  import optparse
-import shlex
  import sys
  
  from .downloader.external import list_external_downloaders
@@ -11,6 +10,7 @@ from .compat import (
      compat_get_terminal_size,
      compat_getenv,
      compat_kwargs,
+    compat_shlex_split,
  )
  from .utils import (
      preferredencoding,
@@ -28,7 +28,7 @@ def parseOpts(overrideArguments=None):
          try:
              res = []
              for l in optionf:
-                res += shlex.split(l, comments=True)
+                res += compat_shlex_split(l, comments=True)
          finally:
              optionf.close()
          return res
@@ -85,7 +85,7 @@ def parseOpts(overrideArguments=None):
          if option.takes_value():
              opts.append(' %s' % option.metavar)
  
-        return "".join(opts)
+        return ''.join(opts)
  
      def _comma_separated_values_options_callback(option, opt_str, value, parser):
          setattr(parser.values, option.dest, value.split(','))
@@ -170,6 +170,14 @@ def parseOpts(overrideArguments=None):
          action='store_const', dest='extract_flat', const='in_playlist',
          default=False,
          help='Do not extract the videos of a playlist, only list them.')
+    general.add_option(
+        '--mark-watched',
+        action='store_true', dest='mark_watched', default=False,
+        help='Mark videos watched (YouTube only)')
+    general.add_option(
+        '--no-mark-watched',
+        action='store_false', dest='mark_watched', default=False,
+        help='Do not mark videos watched (YouTube only)')
      general.add_option(
          '--no-color', '--no-colors',
          action='store_true', dest='no_color',
@@ -276,7 +284,7 @@ def parseOpts(overrideArguments=None):
              'For example, to only match videos that have been liked more than '
              '100 times and disliked less than 50 times (or the dislike '
              'functionality is not available at the given service), but who '
-            'also have a description, use  --match-filter '
+            'also have a description, use --match-filter '
              '"like_count > 100 & dislike_count <? 50 & description" .'
          ))
      selection.add_option(
@@ -320,7 +328,7 @@ def parseOpts(overrideArguments=None):
      authentication.add_option(
          '--video-password',
          dest='videopassword', metavar='PASSWORD',
-        help='Video password (vimeo, smotri)')
+        help='Video password (vimeo, smotri, youku)')
  
      video_format = optparse.OptionGroup(parser, 'Video Format Options')
      video_format.add_option(
@@ -338,7 +346,7 @@ def parseOpts(overrideArguments=None):
      video_format.add_option(
          '-F', '--list-formats',
          action='store_true', dest='listformats',
-        help='List all available formats')
+        help='List all available formats of requested videos')
      video_format.add_option(
          '--youtube-include-dash-manifest',
          action='store_true', dest='youtube_include_dash_manifest', default=True,
@@ -363,7 +371,7 @@ def parseOpts(overrideArguments=None):
      subtitles.add_option(
          '--write-auto-sub', '--write-automatic-sub',
          action='store_true', dest='writeautomaticsub', default=False,
-        help='Write automatic subtitle file (YouTube only)')
+        help='Write automatically generated subtitle file (YouTube only)')
      subtitles.add_option(
          '--all-subs',
          action='store_true', dest='allsubtitles', default=False,
@@ -380,7 +388,7 @@ def parseOpts(overrideArguments=None):
          '--sub-lang', '--sub-langs', '--srt-lang',
          action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
          default=[], callback=_comma_separated_values_options_callback,
-        help='Languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
+        help='Languages of the subtitles to download (optional) separated by commas, use --list-subs for available language tags')
  
      downloader = optparse.OptionGroup(parser, 'Download Options')
      downloader.add_option(
@@ -391,6 +399,10 @@ def parseOpts(overrideArguments=None):
          '-R', '--retries',
          dest='retries', metavar='RETRIES', default=10,
          help='Number of retries (default is %default), or "infinite".')
+    downloader.add_option(
+        '--fragment-retries',
+        dest='fragment_retries', metavar='RETRIES', default=10,
+        help='Number of retries for a fragment (default is %default), or "infinite" (DASH only)')
      downloader.add_option(
          '--buffer-size',
          dest='buffersize', metavar='SIZE', default='1024',
@@ -413,8 +425,17 @@ def parseOpts(overrideArguments=None):
          help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
      downloader.add_option(
          '--hls-prefer-native',
-        dest='hls_prefer_native', action='store_true',
-        help='Use the native HLS downloader instead of ffmpeg (experimental)')
+        dest='hls_prefer_native', action='store_true', default=None,
+        help='Use the native HLS downloader instead of ffmpeg')
+    downloader.add_option(
+        '--hls-prefer-ffmpeg',
+        dest='hls_prefer_native', action='store_false', default=None,
+        help='Use ffmpeg instead of the native HLS downloader')
+    downloader.add_option(
+        '--hls-use-mpegts',
+        dest='hls_use_mpegts', action='store_true',
+        help='Use the mpegts container for HLS videos, allowing to play the '
+             'video while downloading (some players may not be able to play it)')
      downloader.add_option(
          '--external-downloader',
          dest='external_downloader', metavar='COMMAND',
@@ -602,7 +623,7 @@ def parseOpts(overrideArguments=None):
      filesystem.add_option(
          '-A', '--auto-number',
          action='store_true', dest='autonumber', default=False,
-        help='[deprecated; use  -o "%(autonumber)s-%(title)s.%(ext)s" ] Number downloaded files starting from 00000')
+        help='[deprecated; use -o "%(autonumber)s-%(title)s.%(ext)s" ] Number downloaded files starting from 00000')
      filesystem.add_option(
          '-t', '--title',
          action='store_true', dest='usetitle', default=False,
@@ -707,7 +728,7 @@ def parseOpts(overrideArguments=None):
      postproc.add_option(
          '--embed-subs',
          action='store_true', dest='embedsubtitles', default=False,
-        help='Embed subtitles in the video (only for mkv and mp4 videos)')
+        help='Embed subtitles in the video (only for mp4, webm and mkv videos)')
      postproc.add_option(
          '--embed-thumbnail',
          action='store_true', dest='embedthumbnail', default=False,
@@ -752,7 +773,7 @@ def parseOpts(overrideArguments=None):
          metavar='CMD', dest='exec_cmd',
          help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
      postproc.add_option(
-        '--convert-subtitles', '--convert-subs',
+        '--convert-subs', '--convert-subtitles',
          metavar='FORMAT', dest='convertsubtitles', default=None,
          help='Convert the subtitles to other format (currently supported: srt|ass|vtt)')
  
diff --git a/youtube_dl/postprocessor/__init__.py b/youtube_dl/postprocessor/__init__.py

index 0d8ef6ca26c6ef7f1b7b402b387d20eebd3f8a8f..3ea5183999d5ed2adacbb05bccc6af93e8ac6750 100644 (file)
--- a/youtube_dl/postprocessor/__init__.py
+++ b/youtube_dl/postprocessor/__init__.py
@@ -6,6 +6,7 @@ from .ffmpeg import (
      FFmpegEmbedSubtitlePP,
      FFmpegExtractAudioPP,
      FFmpegFixupStretchedPP,
+    FFmpegFixupM3u8PP,
      FFmpegFixupM4aPP,
      FFmpegMergerPP,
      FFmpegMetadataPP,
@@ -26,6 +27,7 @@ __all__ = [
      'ExecAfterDownloadPP',
      'FFmpegEmbedSubtitlePP',
      'FFmpegExtractAudioPP',
+    'FFmpegFixupM3u8PP',
      'FFmpegFixupM4aPP',
      'FFmpegFixupStretchedPP',
      'FFmpegMergerPP',
diff --git a/youtube_dl/postprocessor/common.py b/youtube_dl/postprocessor/common.py

index 4191d040bb1da468e248d57b78df1afdacf64cd0..599dd1df2b2fa16b05548ef847d41b3d6d7c9560 100644 (file)
--- a/youtube_dl/postprocessor/common.py
+++ b/youtube_dl/postprocessor/common.py
@@ -4,6 +4,7 @@ import os
  
  from ..utils import (
      PostProcessingError,
+    cli_configuration_args,
      encodeFilename,
  )
  
@@ -61,11 +62,7 @@ class PostProcessor(object):
              self._downloader.report_warning(errnote)
  
      def _configuration_args(self, default=[]):
-        pp_args = self._downloader.params.get('postprocessor_args')
-        if pp_args is None:
-            return default
-        assert isinstance(pp_args, list)
-        return pp_args
+        return cli_configuration_args(self._downloader.params, 'postprocessor_args', default)
  
  
  class AudioConversionError(PostProcessingError):
diff --git a/youtube_dl/postprocessor/embedthumbnail.py b/youtube_dl/postprocessor/embedthumbnail.py

index e19dbf73d5fe36c602d9ffb83cd2d02ab39cb5e1..3bad5a266b6d51aaf0c92224a94986957da230f2 100644 (file)
--- a/youtube_dl/postprocessor/embedthumbnail.py
+++ b/youtube_dl/postprocessor/embedthumbnail.py
@@ -40,7 +40,7 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
                  'Skipping embedding the thumbnail because the file is missing.')
              return [], info
  
-        if info['ext'] == 'mp3':
+        if info['ext'] in ('mp3', 'mkv'):
              options = [
                  '-c', 'copy', '-map', '0', '-map', '1',
                  '-metadata:s:v', 'title="Album cover"', '-metadata:s:v', 'comment="Cover (Front)"']
diff --git a/youtube_dl/postprocessor/execafterdownload.py b/youtube_dl/postprocessor/execafterdownload.py

index 13794b7ba8653b179a08a348744441ae5c296852..74f66d669c0679a9eece06b1924ecc9f5dae00d2 100644 (file)
--- a/youtube_dl/postprocessor/execafterdownload.py
+++ b/youtube_dl/postprocessor/execafterdownload.py
@@ -19,7 +19,7 @@ class ExecAfterDownloadPP(PostProcessor):
  
          cmd = cmd.replace('{}', shlex_quote(information['filepath']))
  
-        self._downloader.to_screen("[exec] Executing command: %s" % cmd)
+        self._downloader.to_screen('[exec] Executing command: %s' % cmd)
          retCode = subprocess.call(cmd, shell=True)
          if retCode != 0:
              raise PostProcessingError(
diff --git a/youtube_dl/postprocessor/ffmpeg.py b/youtube_dl/postprocessor/ffmpeg.py

index 1f723908be8d4ff0247affc5aed9ffc44e777602..1793a878cb57b60ab7e59cb55f17eb4127c32f92 100644 (file)
--- a/youtube_dl/postprocessor/ffmpeg.py
+++ b/youtube_dl/postprocessor/ffmpeg.py
@@ -25,6 +25,19 @@ from ..utils import (
  )
  
  
+EXT_TO_OUT_FORMATS = {
+    "aac": "adts",
+    "m4a": "ipod",
+    "mka": "matroska",
+    "mkv": "matroska",
+    "mpg": "mpeg",
+    "ogv": "ogg",
+    "ts": "mpegts",
+    "wma": "asf",
+    "wmv": "asf",
+}
+
+
  class FFmpegPostProcessorError(PostProcessingError):
      pass
  
@@ -52,7 +65,7 @@ class FFmpegPostProcessor(PostProcessor):
  
      def _determine_executables(self):
          programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
-        prefer_ffmpeg = self._downloader.params.get('prefer_ffmpeg', False)
+        prefer_ffmpeg = False
  
          self.basename = None
          self.probe_basename = None
@@ -60,6 +73,7 @@ class FFmpegPostProcessor(PostProcessor):
          self._paths = None
          self._versions = None
          if self._downloader:
+            prefer_ffmpeg = self._downloader.params.get('prefer_ffmpeg', False)
              location = self._downloader.params.get('ffmpeg_location')
              if location is not None:
                  if not os.path.exists(location):
@@ -135,7 +149,10 @@ class FFmpegPostProcessor(PostProcessor):
  
          files_cmd = []
          for path in input_paths:
-            files_cmd.extend([encodeArgument('-i'), encodeFilename(path, True)])
+            files_cmd.extend([
+                encodeArgument('-i'),
+                encodeFilename(self._ffmpeg_filename_argument(path), True)
+            ])
          cmd = ([encodeFilename(self.executable, True), encodeArgument('-y')] +
                 files_cmd +
                 [encodeArgument(o) for o in opts] +
@@ -155,10 +172,11 @@ class FFmpegPostProcessor(PostProcessor):
          self.run_ffmpeg_multiple_files([path], out_path, opts)
  
      def _ffmpeg_filename_argument(self, fn):
-        # ffmpeg broke --, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details
-        if fn.startswith('-'):
-            return './' + fn
-        return fn
+        # Always use 'file:' because the filename may contain ':' (ffmpeg
+        # interprets that as a protocol) or can start with '-' (-- is broken in
+        # ffmpeg, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details)
+        # Also leave '-' intact in order not to break streaming to stdout.
+        return 'file:' + fn if fn != '-' else fn
  
  
  class FFmpegExtractAudioPP(FFmpegPostProcessor):
@@ -269,7 +287,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
              return [], information
  
          try:
-            self._downloader.to_screen('[' + self.basename + '] Destination: ' + new_path)
+            self._downloader.to_screen('[ffmpeg] Destination: ' + new_path)
              self.run_ffmpeg(path, new_path, acodec, more_opts)
          except AudioConversionError as e:
              raise PostProcessingError(
@@ -314,17 +332,34 @@ class FFmpegVideoConvertorPP(FFmpegPostProcessor):
  
  class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
      def run(self, information):
-        if information['ext'] not in ['mp4', 'mkv']:
-            self._downloader.to_screen('[ffmpeg] Subtitles can only be embedded in mp4 or mkv files')
+        if information['ext'] not in ('mp4', 'webm', 'mkv'):
+            self._downloader.to_screen('[ffmpeg] Subtitles can only be embedded in mp4, webm or mkv files')
              return [], information
          subtitles = information.get('requested_subtitles')
          if not subtitles:
              self._downloader.to_screen('[ffmpeg] There aren\'t any subtitles to embed')
              return [], information
  
-        sub_langs = list(subtitles.keys())
          filename = information['filepath']
-        sub_filenames = [subtitles_filename(filename, lang, sub_info['ext']) for lang, sub_info in subtitles.items()]
+
+        ext = information['ext']
+        sub_langs = []
+        sub_filenames = []
+        webm_vtt_warn = False
+
+        for lang, sub_info in subtitles.items():
+            sub_ext = sub_info['ext']
+            if ext != 'webm' or ext == 'webm' and sub_ext == 'vtt':
+                sub_langs.append(lang)
+                sub_filenames.append(subtitles_filename(filename, lang, sub_ext))
+            else:
+                if not webm_vtt_warn and ext == 'webm' and sub_ext != 'vtt':
+                    webm_vtt_warn = True
+                    self._downloader.to_screen('[ffmpeg] Only WebVTT subtitles can be embedded in webm files')
+
+        if not sub_langs:
+            return [], information
+
          input_files = [filename] + sub_filenames
  
          opts = [
@@ -459,6 +494,21 @@ class FFmpegFixupM4aPP(FFmpegPostProcessor):
          return [], info
  
  
+class FFmpegFixupM3u8PP(FFmpegPostProcessor):
+    def run(self, info):
+        filename = info['filepath']
+        temp_filename = prepend_extension(filename, 'temp')
+
+        options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
+        self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
+        self.run_ffmpeg(filename, temp_filename, options)
+
+        os.remove(encodeFilename(filename))
+        os.rename(encodeFilename(temp_filename), encodeFilename(filename))
+
+        return [], info
+
+
  class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
      def __init__(self, downloader=None, format=None):
          super(FFmpegSubtitlesConvertorPP, self).__init__(downloader)
@@ -475,6 +525,7 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
              self._downloader.to_screen('[ffmpeg] There aren\'t any subtitles to convert')
              return [], info
          self._downloader.to_screen('[ffmpeg] Converting subtitles')
+        sub_filenames = []
          for lang, sub in subs.items():
              ext = sub['ext']
              if ext == new_ext:
@@ -482,14 +533,16 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
                      '[ffmpeg] Subtitle file for %s is already in the requested'
                      'format' % new_ext)
                  continue
+            old_file = subtitles_filename(filename, lang, ext)
+            sub_filenames.append(old_file)
              new_file = subtitles_filename(filename, lang, new_ext)
  
-            if ext == 'dfxp' or ext == 'ttml':
+            if ext == 'dfxp' or ext == 'ttml' or ext == 'tt':
                  self._downloader.report_warning(
                      'You have requested to convert dfxp (TTML) subtitles into another format, '
                      'which results in style information loss')
  
-                dfxp_file = subtitles_filename(filename, lang, ext)
+                dfxp_file = old_file
                  srt_file = subtitles_filename(filename, lang, 'srt')
  
                  with io.open(dfxp_file, 'rt', encoding='utf-8') as f:
@@ -497,8 +550,8 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
  
                  with io.open(srt_file, 'wt', encoding='utf-8') as f:
                      f.write(srt_data)
+                old_file = srt_file
  
-                ext = 'srt'
                  subs[lang] = {
                      'ext': 'srt',
                      'data': srt_data
@@ -506,15 +559,15 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
  
                  if new_ext == 'srt':
                      continue
+                else:
+                    sub_filenames.append(srt_file)
  
-            self.run_ffmpeg(
-                subtitles_filename(filename, lang, ext),
-                new_file, ['-f', new_format])
+            self.run_ffmpeg(old_file, new_file, ['-f', new_format])
  
              with io.open(new_file, 'rt', encoding='utf-8') as f:
                  subs[lang] = {
-                    'ext': ext,
+                    'ext': new_ext,
                      'data': f.read(),
                  }
  
-        return [], info
+        return sub_filenames, info
diff --git a/youtube_dl/postprocessor/metadatafromtitle.py b/youtube_dl/postprocessor/metadatafromtitle.py

index a56077f206b5133f2fae3fadbcad10523353ed5a..42377fa0f0bde0d3fa6ae578c6eaa0fe4a73474d 100644 (file)
--- a/youtube_dl/postprocessor/metadatafromtitle.py
+++ b/youtube_dl/postprocessor/metadatafromtitle.py
@@ -24,7 +24,7 @@ class MetadataFromTitlePP(PostProcessor):
             '(?P<title>.+)\ \-\ (?P<artist>.+)'
          """
          lastpos = 0
-        regex = ""
+        regex = ''
          # replace %(..)s with regex group and escape other string parts
          for match in re.finditer(r'%\((\w+)\)s', fmt):
              regex += re.escape(fmt[lastpos:match.start()])
diff --git a/youtube_dl/postprocessor/xattrpp.py b/youtube_dl/postprocessor/xattrpp.py

index 7d88e130820e073af1b6fd527390cb1cb5dc8dec..e39ca60aa08326b6f05814ff800bb09c75755e48 100644 (file)
--- a/youtube_dl/postprocessor/xattrpp.py
+++ b/youtube_dl/postprocessor/xattrpp.py
@@ -6,6 +6,7 @@ import sys
  import errno
  
  from .common import PostProcessor
+from ..compat import compat_os_name
  from ..utils import (
      check_executable,
      hyphenate_date,
@@ -73,22 +74,22 @@ class XAttrMetadataPP(PostProcessor):
                      raise XAttrMetadataError(e.errno, e.strerror)
  
          except ImportError:
-            if os.name == 'nt':
+            if compat_os_name == 'nt':
                  # Write xattrs to NTFS Alternate Data Streams:
                  # http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
                  def write_xattr(path, key, value):
                      assert ':' not in key
                      assert os.path.exists(path)
  
-                    ads_fn = path + ":" + key
+                    ads_fn = path + ':' + key
                      try:
-                        with open(ads_fn, "wb") as f:
+                        with open(ads_fn, 'wb') as f:
                              f.write(value)
                      except EnvironmentError as e:
                          raise XAttrMetadataError(e.errno, e.strerror)
              else:
-                user_has_setfattr = check_executable("setfattr", ['--version'])
-                user_has_xattr = check_executable("xattr", ['-h'])
+                user_has_setfattr = check_executable('setfattr', ['--version'])
+                user_has_xattr = check_executable('xattr', ['-h'])
  
                  if user_has_setfattr or user_has_xattr:
  
@@ -150,7 +151,7 @@ class XAttrMetadataPP(PostProcessor):
                  value = info.get(infoname)
  
                  if value:
-                    if infoname == "upload_date":
+                    if infoname == 'upload_date':
                          value = hyphenate_date(value)
  
                      byte_value = value.encode('utf-8')
@@ -168,7 +169,7 @@ class XAttrMetadataPP(PostProcessor):
                      'Unable to write extended attributes due to too long values.')
              else:
                  msg = 'This filesystem doesn\'t support extended attributes. '
-                if os.name == 'nt':
+                if compat_os_name == 'nt':
                      msg += 'You need to use NTFS.'
                  else:
                      msg += '(You may have to enable them in your /etc/fstab)'
diff --git a/youtube_dl/swfinterp.py b/youtube_dl/swfinterp.py

index e60505ace8b8451666f2aeebea3277bc58cb6297..06c1d6cc1755ef022aa78967d4b651e21fd66618 100644 (file)
--- a/youtube_dl/swfinterp.py
+++ b/youtube_dl/swfinterp.py
@@ -689,7 +689,7 @@ class SWFInterpreter(object):
                      elif mname in _builtin_classes:
                          res = _builtin_classes[mname]
                      else:
-                        # Assume unitialized
+                        # Assume uninitialized
                          # TODO warn here
                          res = undefined
                      stack.append(res)
diff --git a/youtube_dl/update.py b/youtube_dl/update.py

index fc7ac8305d71c8cce077ef3040cd0903ac9f09c5..676ebe1c42d1d6b54eb50bfc3f087e6fee8e20f0 100644 (file)
--- a/youtube_dl/update.py
+++ b/youtube_dl/update.py
@@ -9,65 +9,43 @@ import subprocess
  import sys
  from zipimport import zipimporter
  
-from .compat import (
-    compat_str,
-    compat_urllib_request,
-)
-from .utils import make_HTTPS_handler
+from .utils import encode_compat_str
+
  from .version import __version__
  
  
  def rsa_verify(message, signature, key):
-    from struct import pack
      from hashlib import sha256
-
      assert isinstance(message, bytes)
-    block_size = 0
-    n = key[0]
-    while n:
-        block_size += 1
-        n >>= 8
-    signature = pow(int(signature, 16), key[1], key[0])
-    raw_bytes = []
-    while signature:
-        raw_bytes.insert(0, pack("B", signature & 0xFF))
-        signature >>= 8
-    signature = (block_size - len(raw_bytes)) * b'\x00' + b''.join(raw_bytes)
-    if signature[0:2] != b'\x00\x01':
-        return False
-    signature = signature[2:]
-    if b'\x00' not in signature:
+    byte_size = (len(bin(key[0])) - 2 + 8 - 1) // 8
+    signature = ('%x' % pow(int(signature, 16), key[1], key[0])).encode()
+    signature = (byte_size * 2 - len(signature)) * b'0' + signature
+    asn1 = b'3031300d060960864801650304020105000420'
+    asn1 += sha256(message).hexdigest().encode()
+    if byte_size < len(asn1) // 2 + 11:
          return False
-    signature = signature[signature.index(b'\x00') + 1:]
-    if not signature.startswith(b'\x30\x31\x30\x0D\x06\x09\x60\x86\x48\x01\x65\x03\x04\x02\x01\x05\x00\x04\x20'):
-        return False
-    signature = signature[19:]
-    if signature != sha256(message).digest():
-        return False
-    return True
+    expected = b'0001' + (byte_size - len(asn1) // 2 - 3) * b'ff' + b'00' + asn1
+    return expected == signature
  
  
-def update_self(to_screen, verbose):
+def update_self(to_screen, verbose, opener):
      """Update the program file with the latest version from the repository"""
  
-    UPDATE_URL = "https://rg3.github.io/youtube-dl/update/"
+    UPDATE_URL = 'https://rg3.github.io/youtube-dl/update/'
      VERSION_URL = UPDATE_URL + 'LATEST_VERSION'
      JSON_URL = UPDATE_URL + 'versions.json'
      UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
  
-    if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, "frozen"):
+    if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, 'frozen'):
          to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
          return
  
-    https_handler = make_HTTPS_handler({})
-    opener = compat_urllib_request.build_opener(https_handler)
-
      # Check if there is a new version
      try:
          newversion = opener.open(VERSION_URL).read().decode('utf-8').strip()
      except Exception:
          if verbose:
-            to_screen(compat_str(traceback.format_exc()))
+            to_screen(encode_compat_str(traceback.format_exc()))
          to_screen('ERROR: can\'t find the current version. Please try again later.')
          return
      if newversion == __version__:
@@ -80,7 +58,7 @@ def update_self(to_screen, verbose):
          versions_info = json.loads(versions_info)
      except Exception:
          if verbose:
-            to_screen(compat_str(traceback.format_exc()))
+            to_screen(encode_compat_str(traceback.format_exc()))
          to_screen('ERROR: can\'t obtain versions info. Please try again later.')
          return
      if 'signature' not in versions_info:
@@ -107,7 +85,7 @@ def update_self(to_screen, verbose):
  
      filename = sys.argv[0]
      # Py2EXE: Filename could be different
-    if hasattr(sys, "frozen") and not os.path.isfile(filename):
+    if hasattr(sys, 'frozen') and not os.path.isfile(filename):
          if os.path.isfile(filename + '.exe'):
              filename += '.exe'
  
@@ -116,7 +94,7 @@ def update_self(to_screen, verbose):
          return
  
      # Py2EXE
-    if hasattr(sys, "frozen"):
+    if hasattr(sys, 'frozen'):
          exe = os.path.abspath(filename)
          directory = os.path.dirname(exe)
          if not os.access(directory, os.W_OK):
@@ -129,7 +107,7 @@ def update_self(to_screen, verbose):
              urlh.close()
          except (IOError, OSError):
              if verbose:
-                to_screen(compat_str(traceback.format_exc()))
+                to_screen(encode_compat_str(traceback.format_exc()))
              to_screen('ERROR: unable to download latest version')
              return
  
@@ -143,7 +121,7 @@ def update_self(to_screen, verbose):
                  outf.write(newcontent)
          except (IOError, OSError):
              if verbose:
-                to_screen(compat_str(traceback.format_exc()))
+                to_screen(encode_compat_str(traceback.format_exc()))
              to_screen('ERROR: unable to write the new version')
              return
  
@@ -163,7 +141,7 @@ start /b "" cmd /c del "%%~f0"&exit /b"
              return  # Do not show premature success messages
          except (IOError, OSError):
              if verbose:
-                to_screen(compat_str(traceback.format_exc()))
+                to_screen(encode_compat_str(traceback.format_exc()))
              to_screen('ERROR: unable to overwrite current version')
              return
  
@@ -175,7 +153,7 @@ start /b "" cmd /c del "%%~f0"&exit /b"
              urlh.close()
          except (IOError, OSError):
              if verbose:
-                to_screen(compat_str(traceback.format_exc()))
+                to_screen(encode_compat_str(traceback.format_exc()))
              to_screen('ERROR: unable to download latest version')
              return
  
@@ -189,7 +167,7 @@ start /b "" cmd /c del "%%~f0"&exit /b"
                  outf.write(newcontent)
          except (IOError, OSError):
              if verbose:
-                to_screen(compat_str(traceback.format_exc()))
+                to_screen(encode_compat_str(traceback.format_exc()))
              to_screen('ERROR: unable to overwrite current version')
              return
  
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 88f9f90707fbd04966638125e94b264af95fedcb..7bcc85e2b530cb2eadb714e100b56d8f4637b87d 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -3,6 +3,8 @@
  
  from __future__ import unicode_literals
  
+import base64
+import binascii
  import calendar
  import codecs
  import contextlib
@@ -33,8 +35,10 @@ import xml.etree.ElementTree
  import zlib
  
  from .compat import (
+    compat_HTMLParser,
      compat_basestring,
      compat_chr,
+    compat_etree_fromstring,
      compat_html_entities,
      compat_http_client,
      compat_kwargs,
@@ -43,9 +47,11 @@ from .compat import (
      compat_str,
      compat_urllib_error,
      compat_urllib_parse,
+    compat_urllib_parse_urlencode,
      compat_urllib_parse_urlparse,
      compat_urllib_request,
      compat_urlparse,
+    compat_xpath,
      shlex_quote,
  )
  
@@ -54,7 +60,7 @@ from .compat import (
  compiled_regex_type = type(re.compile(''))
  
  std_headers = {
-    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/20.0 (Chrome)',
+    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)',
      'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Encoding': 'gzip, deflate',
@@ -68,6 +74,21 @@ ENGLISH_MONTH_NAMES = [
      'January', 'February', 'March', 'April', 'May', 'June',
      'July', 'August', 'September', 'October', 'November', 'December']
  
+KNOWN_EXTENSIONS = (
+    'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac',
+    'flv', 'f4v', 'f4a', 'f4b',
+    'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus',
+    'mkv', 'mka', 'mk3d',
+    'avi', 'divx',
+    'mov',
+    'asf', 'wmv', 'wma',
+    '3gp', '3g2',
+    'mp3',
+    'flac',
+    'ape',
+    'wav',
+    'f4f', 'f4m', 'm3u8', 'smil')
+
  
  def preferredencoding():
      """Get preferred encoding.
@@ -139,21 +160,17 @@ def write_json_file(obj, fn):
  
  
  if sys.version_info >= (2, 7):
-    def find_xpath_attr(node, xpath, key, val):
+    def find_xpath_attr(node, xpath, key, val=None):
          """ Find the xpath xpath[@key=val] """
-        assert re.match(r'^[a-zA-Z-]+$', key)
-        assert re.match(r'^[a-zA-Z0-9@\s:._-]*$', val)
-        expr = xpath + "[@%s='%s']" % (key, val)
+        assert re.match(r'^[a-zA-Z_-]+$', key)
+        expr = xpath + ('[@%s]' % key if val is None else "[@%s='%s']" % (key, val))
          return node.find(expr)
  else:
-    def find_xpath_attr(node, xpath, key, val):
-        # Here comes the crazy part: In 2.6, if the xpath is a unicode,
-        # .//node does not match if a node is a direct child of . !
-        if isinstance(xpath, compat_str):
-            xpath = xpath.encode('ascii')
-
-        for f in node.findall(xpath):
-            if f.attrib.get(key) == val:
+    def find_xpath_attr(node, xpath, key, val=None):
+        for f in node.findall(compat_xpath(xpath)):
+            if key not in f.attrib:
+                continue
+            if val is None or f.attrib.get(key) == val:
                  return f
          return None
  
@@ -173,12 +190,19 @@ def xpath_with_ns(path, ns_map):
      return '/'.join(replaced)
  
  
-def xpath_text(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
-    if sys.version_info < (2, 7):  # Crazy 2.6
-        xpath = xpath.encode('ascii')
+def xpath_element(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
+    def _find_xpath(xpath):
+        return node.find(compat_xpath(xpath))
+
+    if isinstance(xpath, (str, compat_str)):
+        n = _find_xpath(xpath)
+    else:
+        for xp in xpath:
+            n = _find_xpath(xp)
+            if n is not None:
+                break
  
-    n = node.find(xpath)
-    if n is None or n.text is None:
+    if n is None:
          if default is not NO_DEFAULT:
              return default
          elif fatal:
@@ -186,12 +210,40 @@ def xpath_text(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
              raise ExtractorError('Could not find XML element %s' % name)
          else:
              return None
+    return n
+
+
+def xpath_text(node, xpath, name=None, fatal=False, default=NO_DEFAULT):
+    n = xpath_element(node, xpath, name, fatal=fatal, default=default)
+    if n is None or n == default:
+        return n
+    if n.text is None:
+        if default is not NO_DEFAULT:
+            return default
+        elif fatal:
+            name = xpath if name is None else name
+            raise ExtractorError('Could not find XML element\'s text %s' % name)
+        else:
+            return None
      return n.text
  
  
+def xpath_attr(node, xpath, key, name=None, fatal=False, default=NO_DEFAULT):
+    n = find_xpath_attr(node, xpath, key)
+    if n is None:
+        if default is not NO_DEFAULT:
+            return default
+        elif fatal:
+            name = '%s[@%s]' % (xpath, key) if name is None else name
+            raise ExtractorError('Could not find XML attribute %s' % name)
+        else:
+            return None
+    return n.attrib[key]
+
+
  def get_element_by_id(id, html):
      """Return the content of the tag with the specified ID in the passed HTML document"""
-    return get_element_by_attribute("id", id, html)
+    return get_element_by_attribute('id', id, html)
  
  
  def get_element_by_attribute(attribute, value, html):
@@ -217,6 +269,38 @@ def get_element_by_attribute(attribute, value, html):
      return unescapeHTML(res)
  
  
+class HTMLAttributeParser(compat_HTMLParser):
+    """Trivial HTML parser to gather the attributes for a single element"""
+    def __init__(self):
+        self.attrs = {}
+        compat_HTMLParser.__init__(self)
+
+    def handle_starttag(self, tag, attrs):
+        self.attrs = dict(attrs)
+
+
+def extract_attributes(html_element):
+    """Given a string for an HTML element such as
+    <el
+         a="foo" B="bar" c="&98;az" d=boz
+         empty= noval entity="&amp;"
+         sq='"' dq="'"
+    >
+    Decode and return a dictionary of attributes.
+    {
+        'a': 'foo', 'b': 'bar', c: 'baz', d: 'boz',
+        'empty': '', 'noval': None, 'entity': '&',
+        'sq': '"', 'dq': '\''
+    }.
+    NB HTMLParser is stricter in Python 2.6 & 3.2 than in later versions,
+    but the cases in the unit test will work for all of 2.6, 2.7, 3.2-3.5.
+    """
+    parser = HTMLAttributeParser()
+    parser.feed(html_element)
+    parser.close()
+    return parser.attrs
+
+
  def clean_html(html):
      """Clean an HTML snippet into a readable string"""
  
@@ -324,13 +408,23 @@ def sanitize_path(s):
      if drive_or_unc:
          norm_path.pop(0)
      sanitized_path = [
-        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|\.$)', '#', path_part)
+        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|[\s.]$)', '#', path_part)
          for path_part in norm_path]
      if drive_or_unc:
          sanitized_path.insert(0, drive_or_unc + os.path.sep)
      return os.path.join(*sanitized_path)
  
  
+# Prepend protocol-less URLs with `http:` scheme in order to mitigate the number of
+# unwanted failures due to missing protocol
+def sanitize_url(url):
+    return 'http:%s' % url if url.startswith('//') else url
+
+
+def sanitized_Request(url, *args, **kwargs):
+    return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs)
+
+
  def orderedSet(iterable):
      """ Remove all duplicates from the input iterable """
      res = []
@@ -354,10 +448,14 @@ def _htmlentity_transform(entity):
              numstr = '0%s' % numstr
          else:
              base = 10
-        return compat_chr(int(numstr, base))
+        # See https://github.com/rg3/youtube-dl/issues/7518
+        try:
+            return compat_chr(int(numstr, base))
+        except ValueError:
+            pass
  
      # Unknown entity in name, return its literal representation
-    return ('&%s;' % entity)
+    return '&%s;' % entity
  
  
  def unescapeHTML(s):
@@ -398,6 +496,10 @@ def encodeFilename(s, for_subprocess=False):
      if not for_subprocess and sys.platform == 'win32' and sys.getwindowsversion()[0] >= 5:
          return s
  
+    # Jython assumes filenames are Unicode strings though reported as Python 2.x compatible
+    if sys.platform.startswith('java'):
+        return s
+
      return s.encode(get_subprocess_encoding(), 'ignore')
  
  
@@ -584,6 +686,11 @@ class ContentTooShortError(Exception):
  
  
  def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
+    # Working around python 2 bug (see http://bugs.python.org/issue17849) by limiting
+    # expected HTTP responses to meet HTTP/1.0 or later (see also
+    # https://github.com/rg3/youtube-dl/issues/6727)
+    if sys.version_info < (3, 0):
+        kwargs[b'strict'] = True
      hc = http_class(*args, **kwargs)
      source_address = ydl_handler._params.get('source_address')
      if source_address is not None:
@@ -605,6 +712,16 @@ def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
      return hc
  
  
+def handle_youtubedl_headers(headers):
+    filtered_headers = headers
+
+    if 'Youtubedl-no-compression' in filtered_headers:
+        filtered_headers = dict((k, v) for k, v in filtered_headers.items() if k.lower() != 'accept-encoding')
+        del filtered_headers['Youtubedl-no-compression']
+
+    return filtered_headers
+
+
  class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
      """Handler for HTTP requests and responses.
  
@@ -612,7 +729,7 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
      the standard headers to every HTTP request and handles gzipped and
      deflated responses from web servers. If compression is to be avoided in
      a particular request, the original request in the program code only has
-    to include the HTTP header "Youtubedl-No-Compression", which will be
+    to include the HTTP header "Youtubedl-no-compression", which will be
      removed before making the real request.
  
      Part of this code was copied from:
@@ -648,15 +765,28 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
          return ret
  
      def http_request(self, req):
+        # According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
+        # always respected by websites, some tend to give out URLs with non percent-encoded
+        # non-ASCII characters (see telemb.py, ard.py [#3412])
+        # urllib chokes on URLs with non-ASCII characters (see http://bugs.python.org/issue3991)
+        # To work around aforementioned issue we will replace request's original URL with
+        # percent-encoded one
+        # Since redirects are also affected (e.g. http://www.southpark.de/alle-episoden/s18e09)
+        # the code of this workaround has been moved here from YoutubeDL.urlopen()
+        url = req.get_full_url()
+        url_escaped = escape_url(url)
+
+        # Substitute URL if any change after escaping
+        if url != url_escaped:
+            req = update_Request(req, url=url_escaped)
+
          for h, v in std_headers.items():
              # Capitalize is needed because of Python bug 2275: http://bugs.python.org/issue2275
              # The dict keys are capitalized because of this bug by urllib
              if h.capitalize() not in req.headers:
                  req.add_header(h, v)
-        if 'Youtubedl-no-compression' in req.headers:
-            if 'Accept-encoding' in req.headers:
-                del req.headers['Accept-encoding']
-            del req.headers['Youtubedl-no-compression']
+
+        req.headers = handle_youtubedl_headers(req.headers)
  
          if sys.version_info < (2, 7) and '#' in req.get_full_url():
              # Python 2.6 is brain-dead when it comes to fragments
@@ -687,11 +817,25 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
                      raise original_ioerror
              resp = self.addinfourl_wrapper(uncompressed, old_resp.headers, old_resp.url, old_resp.code)
              resp.msg = old_resp.msg
+            del resp.headers['Content-encoding']
          # deflate
          if resp.headers.get('Content-encoding', '') == 'deflate':
              gz = io.BytesIO(self.deflate(resp.read()))
              resp = self.addinfourl_wrapper(gz, old_resp.headers, old_resp.url, old_resp.code)
              resp.msg = old_resp.msg
+            del resp.headers['Content-encoding']
+        # Percent-encode redirect URL of Location HTTP header to satisfy RFC 3986 (see
+        # https://github.com/rg3/youtube-dl/issues/6457).
+        if 300 <= resp.code < 400:
+            location = resp.headers.get('Location')
+            if location:
+                # As of RFC 2616 default charset is iso-8859-1 that is respected by python 3
+                if sys.version_info >= (3, 0):
+                    location = location.encode('iso-8859-1').decode('utf-8')
+                location_escaped = escape_url(location)
+                if location != location_escaped:
+                    del resp.headers['Location']
+                    resp.headers['Location'] = location_escaped
          return resp
  
      https_request = http_request
@@ -715,15 +859,41 @@ class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
              req, **kwargs)
  
  
+class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
+    def __init__(self, cookiejar=None):
+        compat_urllib_request.HTTPCookieProcessor.__init__(self, cookiejar)
+
+    def http_response(self, request, response):
+        # Python 2 will choke on next HTTP request in row if there are non-ASCII
+        # characters in Set-Cookie HTTP header of last response (see
+        # https://github.com/rg3/youtube-dl/issues/6769).
+        # In order to at least prevent crashing we will percent encode Set-Cookie
+        # header before HTTPCookieProcessor starts processing it.
+        # if sys.version_info < (3, 0) and response.headers:
+        #     for set_cookie_header in ('Set-Cookie', 'Set-Cookie2'):
+        #         set_cookie = response.headers.get(set_cookie_header)
+        #         if set_cookie:
+        #             set_cookie_escaped = compat_urllib_parse.quote(set_cookie, b"%/;:@&=+$,!~*'()?#[] ")
+        #             if set_cookie != set_cookie_escaped:
+        #                 del response.headers[set_cookie_header]
+        #                 response.headers[set_cookie_header] = set_cookie_escaped
+        return compat_urllib_request.HTTPCookieProcessor.http_response(self, request, response)
+
+    https_request = compat_urllib_request.HTTPCookieProcessor.http_request
+    https_response = http_response
+
+
  def parse_iso8601(date_str, delimiter='T', timezone=None):
      """ Return a UNIX timestamp from the given date """
  
      if date_str is None:
          return None
  
+    date_str = re.sub(r'\.[0-9]+', '', date_str)
+
      if timezone is None:
          m = re.search(
-            r'(\.[0-9]+)?(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
+            r'(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
              date_str)
          if not m:
              timezone = datetime.timedelta()
@@ -736,9 +906,12 @@ def parse_iso8601(date_str, delimiter='T', timezone=None):
                  timezone = datetime.timedelta(
                      hours=sign * int(m.group('hours')),
                      minutes=sign * int(m.group('minutes')))
-    date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
-    dt = datetime.datetime.strptime(date_str, date_format) - timezone
-    return calendar.timegm(dt.timetuple())
+    try:
+        date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
+        dt = datetime.datetime.strptime(date_str, date_format) - timezone
+        return calendar.timegm(dt.timetuple())
+    except ValueError:
+        pass
  
  
  def unified_strdate(date_str, day_first=True):
@@ -760,9 +933,9 @@ def unified_strdate(date_str, day_first=True):
          '%d %b %Y',
          '%B %d %Y',
          '%b %d %Y',
-        '%b %dst %Y %I:%M%p',
-        '%b %dnd %Y %I:%M%p',
-        '%b %dth %Y %I:%M%p',
+        '%b %dst %Y %I:%M',
+        '%b %dnd %Y %I:%M',
+        '%b %dth %Y %I:%M',
          '%Y %m %d',
          '%Y-%m-%d',
          '%Y/%m/%d',
@@ -803,7 +976,8 @@ def unified_strdate(date_str, day_first=True):
          timetuple = email.utils.parsedate_tz(date_str)
          if timetuple:
              upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
-    return upload_date
+    if upload_date is not None:
+        return compat_str(upload_date)
  
  
  def determine_ext(url, default_ext='unknown_video'):
@@ -812,6 +986,9 @@ def determine_ext(url, default_ext='unknown_video'):
      guess = url.partition('?')[0].rpartition('.')[2]
      if re.match(r'^[A-Za-z0-9]+$', guess):
          return guess
+    # Try extract ext from URLs like http://example.com/foo/bar.mp4/?download
+    elif guess.rstrip('/') in KNOWN_EXTENSIONS:
+        return guess.rstrip('/')
      else:
          return default_ext
  
@@ -836,7 +1013,7 @@ def date_from_str(date_str):
          if sign == '-':
              time = -time
          unit = match.group('unit')
-        # A bad aproximation?
+        # A bad approximation?
          if unit == 'month':
              unit = 'day'
              time *= 30
@@ -846,7 +1023,7 @@ def date_from_str(date_str):
          unit += 's'
          delta = datetime.timedelta(**{unit: time})
          return today + delta
-    return datetime.datetime.strptime(date_str, "%Y%m%d").date()
+    return datetime.datetime.strptime(date_str, '%Y%m%d').date()
  
  
  def hyphenate_date(date_str):
@@ -926,22 +1103,22 @@ def _windows_write_string(s, out):
  
      GetStdHandle = ctypes.WINFUNCTYPE(
          ctypes.wintypes.HANDLE, ctypes.wintypes.DWORD)(
-        (b"GetStdHandle", ctypes.windll.kernel32))
+        (b'GetStdHandle', ctypes.windll.kernel32))
      h = GetStdHandle(WIN_OUTPUT_IDS[fileno])
  
      WriteConsoleW = ctypes.WINFUNCTYPE(
          ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.LPWSTR,
          ctypes.wintypes.DWORD, ctypes.POINTER(ctypes.wintypes.DWORD),
-        ctypes.wintypes.LPVOID)((b"WriteConsoleW", ctypes.windll.kernel32))
+        ctypes.wintypes.LPVOID)((b'WriteConsoleW', ctypes.windll.kernel32))
      written = ctypes.wintypes.DWORD(0)
  
-    GetFileType = ctypes.WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)((b"GetFileType", ctypes.windll.kernel32))
+    GetFileType = ctypes.WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)((b'GetFileType', ctypes.windll.kernel32))
      FILE_TYPE_CHAR = 0x0002
      FILE_TYPE_REMOTE = 0x8000
      GetConsoleMode = ctypes.WINFUNCTYPE(
          ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE,
          ctypes.POINTER(ctypes.wintypes.DWORD))(
-        (b"GetConsoleMode", ctypes.windll.kernel32))
+        (b'GetConsoleMode', ctypes.windll.kernel32))
      INVALID_HANDLE_VALUE = ctypes.wintypes.DWORD(-1).value
  
      def not_a_console(handle):
@@ -1068,13 +1245,23 @@ if sys.platform == 'win32':
              raise OSError('Unlocking file failed: %r' % ctypes.FormatError())
  
  else:
-    import fcntl
+    # Some platforms, such as Jython, is missing fcntl
+    try:
+        import fcntl
  
-    def _lock_file(f, exclusive):
-        fcntl.flock(f, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH)
+        def _lock_file(f, exclusive):
+            fcntl.flock(f, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH)
  
-    def _unlock_file(f):
-        fcntl.flock(f, fcntl.LOCK_UN)
+        def _unlock_file(f):
+            fcntl.flock(f, fcntl.LOCK_UN)
+    except ImportError:
+        UNSUPPORTED_MSG = 'file locking is not supported on this platform'
+
+        def _lock_file(f, exclusive):
+            raise IOError(UNSUPPORTED_MSG)
+
+        def _unlock_file(f):
+            raise IOError(UNSUPPORTED_MSG)
  
  
  class locked_file(object):
@@ -1127,7 +1314,7 @@ def shell_quote(args):
  def smuggle_url(url, data):
      """ Pass additional data in a URL for internal use. """
  
-    sdata = compat_urllib_parse.urlencode(
+    sdata = compat_urllib_parse_urlencode(
          {'__youtubedl_smuggle': json.dumps(data)})
      return url + '#' + sdata
  
@@ -1155,11 +1342,22 @@ def format_bytes(bytes):
      return '%.2f%s' % (converted, suffix)
  
  
+def lookup_unit_table(unit_table, s):
+    units_re = '|'.join(re.escape(u) for u in unit_table)
+    m = re.match(
+        r'(?P<num>[0-9]+(?:[,.][0-9]*)?)\s*(?P<unit>%s)\b' % units_re, s)
+    if not m:
+        return None
+    num_str = m.group('num').replace(',', '.')
+    mult = unit_table[m.group('unit')]
+    return int(float(num_str) * mult)
+
+
  def parse_filesize(s):
      if s is None:
          return None
  
-    # The lower-case forms are of course incorrect and inofficial,
+    # The lower-case forms are of course incorrect and unofficial,
      # but we support those too
      _UNIT_TABLE = {
          'B': 1,
@@ -1198,15 +1396,28 @@ def parse_filesize(s):
          'Yb': 1000 ** 8,
      }
  
-    units_re = '|'.join(re.escape(u) for u in _UNIT_TABLE)
-    m = re.match(
-        r'(?P<num>[0-9]+(?:[,.][0-9]*)?)\s*(?P<unit>%s)' % units_re, s)
-    if not m:
+    return lookup_unit_table(_UNIT_TABLE, s)
+
+
+def parse_count(s):
+    if s is None:
          return None
  
-    num_str = m.group('num').replace(',', '.')
-    mult = _UNIT_TABLE[m.group('unit')]
-    return int(float(num_str) * mult)
+    s = s.strip()
+
+    if re.match(r'^[\d,.]+$', s):
+        return str_to_int(s)
+
+    _UNIT_TABLE = {
+        'k': 1000,
+        'K': 1000,
+        'm': 1000 ** 2,
+        'M': 1000 ** 2,
+        'kk': 1000 ** 2,
+        'KK': 1000 ** 2,
+    }
+
+    return lookup_unit_table(_UNIT_TABLE, s)
  
  
  def month_by_name(name):
@@ -1238,8 +1449,14 @@ def fix_xml_ampersands(xml_str):
  
  def setproctitle(title):
      assert isinstance(title, compat_str)
+
+    # ctypes in Jython is not complete
+    # http://bugs.jython.org/issue2148
+    if sys.platform.startswith('java'):
+        return
+
      try:
-        libc = ctypes.cdll.LoadLibrary("libc.so.6")
+        libc = ctypes.cdll.LoadLibrary('libc.so.6')
      except OSError:
          return
      title_bytes = title.encode('utf-8')
@@ -1263,6 +1480,15 @@ def remove_end(s, end):
      return s
  
  
+def remove_quotes(s):
+    if s is None or len(s) < 2:
+        return s
+    for quote in ('"', "'", ):
+        if s[0] == quote and s[-1] == quote:
+            return s[1:-1]
+    return s
+
+
  def url_basename(url):
      path = compat_urlparse.urlparse(url).path
      return path.strip('/').split('/')[-1]
@@ -1270,7 +1496,7 @@ def url_basename(url):
  
  class HEADRequest(compat_urllib_request.Request):
      def get_method(self):
-        return "HEAD"
+        return 'HEAD'
  
  
  def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
@@ -1279,7 +1505,12 @@ def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
              v = getattr(v, get_attr, None)
      if v == '':
          v = None
-    return default if v is None else (int(v) * invscale // scale)
+    if v is None:
+        return default
+    try:
+        return int(v) * invscale // scale
+    except ValueError:
+        return default
  
  
  def str_or_none(v, default=None):
@@ -1295,7 +1526,12 @@ def str_to_int(int_str):
  
  
  def float_or_none(v, scale=1, invscale=1, default=None):
-    return default if v is None else (float(v) * invscale / scale)
+    if v is None:
+        return default
+    try:
+        return float(v) * invscale / scale
+    except ValueError:
+        return default
  
  
  def parse_duration(s):
@@ -1304,44 +1540,46 @@ def parse_duration(s):
  
      s = s.strip()
  
-    m = re.match(
-        r'''(?ix)(?:P?T)?
-        (?:
-            (?P<only_mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*|
-            (?P<only_hours>[0-9.]+)\s*(?:hours?)|
-
-            \s*(?P<hours_reversed>[0-9]+)\s*(?:[:h]|hours?)\s*(?P<mins_reversed>[0-9]+)\s*(?:[:m]|mins?\.?|minutes?)\s*|
-            (?:
+    days, hours, mins, secs, ms = [None] * 5
+    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
+    if m:
+        days, hours, mins, secs, ms = m.groups()
+    else:
+        m = re.match(
+            r'''(?ix)(?:P?T)?
                  (?:
-                    (?:(?P<days>[0-9]+)\s*(?:[:d]|days?)\s*)?
-                    (?P<hours>[0-9]+)\s*(?:[:h]|hours?)\s*
+                    (?P<days>[0-9]+)\s*d(?:ays?)?\s*
                  )?
-                (?P<mins>[0-9]+)\s*(?:[:m]|mins?|minutes?)\s*
-            )?
-            (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*(?:s|secs?|seconds?)?
-        )$''', s)
-    if not m:
-        return None
-    res = 0
-    if m.group('only_mins'):
-        return float_or_none(m.group('only_mins'), invscale=60)
-    if m.group('only_hours'):
-        return float_or_none(m.group('only_hours'), invscale=60 * 60)
-    if m.group('secs'):
-        res += int(m.group('secs'))
-    if m.group('mins_reversed'):
-        res += int(m.group('mins_reversed')) * 60
-    if m.group('mins'):
-        res += int(m.group('mins')) * 60
-    if m.group('hours'):
-        res += int(m.group('hours')) * 60 * 60
-    if m.group('hours_reversed'):
-        res += int(m.group('hours_reversed')) * 60 * 60
-    if m.group('days'):
-        res += int(m.group('days')) * 24 * 60 * 60
-    if m.group('ms'):
-        res += float(m.group('ms'))
-    return res
+                (?:
+                    (?P<hours>[0-9]+)\s*h(?:ours?)?\s*
+                )?
+                (?:
+                    (?P<mins>[0-9]+)\s*m(?:in(?:ute)?s?)?\s*
+                )?
+                (?:
+                    (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
+                )?$''', s)
+        if m:
+            days, hours, mins, secs, ms = m.groups()
+        else:
+            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
+            if m:
+                hours, mins = m.groups()
+            else:
+                return None
+
+    duration = 0
+    if secs:
+        duration += float(secs)
+    if mins:
+        duration += float(mins) * 60
+    if hours:
+        duration += float(hours) * 60 * 60
+    if days:
+        duration += float(days) * 24 * 60 * 60
+    if ms:
+        duration += float(ms)
+    return duration
  
  
  def prepend_extension(filename, ext, expected_real_ext=None):
@@ -1402,9 +1640,12 @@ class PagedList(object):
  
  
  class OnDemandPagedList(PagedList):
-    def __init__(self, pagefunc, pagesize):
+    def __init__(self, pagefunc, pagesize, use_cache=False):
          self._pagefunc = pagefunc
          self._pagesize = pagesize
+        self._use_cache = use_cache
+        if use_cache:
+            self._cache = {}
  
      def getslice(self, start=0, end=None):
          res = []
@@ -1414,7 +1655,13 @@ class OnDemandPagedList(PagedList):
              if start >= nextfirstid:
                  continue
  
-            page_results = list(self._pagefunc(pagenum))
+            page_results = None
+            if self._use_cache:
+                page_results = self._cache.get(pagenum)
+            if page_results is None:
+                page_results = list(self._pagefunc(pagenum))
+            if self._use_cache:
+                self._cache[pagenum] = page_results
  
              startv = (
                  start % self._pagesize
@@ -1500,6 +1747,7 @@ def escape_url(url):
      """Escape URL as suggested by RFC 3986"""
      url_parsed = compat_urllib_parse_urlparse(url)
      return url_parsed._replace(
+        netloc=url_parsed.netloc.encode('idna').decode('ascii'),
          path=escape_rfc3986(url_parsed.path),
          params=escape_rfc3986(url_parsed.params),
          query=escape_rfc3986(url_parsed.query),
@@ -1509,7 +1757,8 @@ def escape_url(url):
  try:
      struct.pack('!I', 0)
  except TypeError:
-    # In Python 2.6 (and some 2.7 versions), struct requires a bytes argument
+    # In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
+    # See https://bugs.python.org/issue19099
      def struct_pack(spec, *args):
          if isinstance(spec, compat_str):
              spec = spec.encode('ascii')
@@ -1541,30 +1790,45 @@ def read_batch_urls(batch_fd):
  
  
  def urlencode_postdata(*args, **kargs):
-    return compat_urllib_parse.urlencode(*args, **kargs).encode('ascii')
-
+    return compat_urllib_parse_urlencode(*args, **kargs).encode('ascii')
  
-try:
-    etree_iter = xml.etree.ElementTree.Element.iter
-except AttributeError:  # Python <=2.6
-    etree_iter = lambda n: n.findall('.//*')
  
+def update_url_query(url, query):
+    if not query:
+        return url
+    parsed_url = compat_urlparse.urlparse(url)
+    qs = compat_parse_qs(parsed_url.query)
+    qs.update(query)
+    return compat_urlparse.urlunparse(parsed_url._replace(
+        query=compat_urllib_parse_urlencode(qs, True)))
+
+
+def update_Request(req, url=None, data=None, headers={}, query={}):
+    req_headers = req.headers.copy()
+    req_headers.update(headers)
+    req_data = data or req.data
+    req_url = update_url_query(url or req.get_full_url(), query)
+    req_type = HEADRequest if req.get_method() == 'HEAD' else compat_urllib_request.Request
+    new_req = req_type(
+        req_url, data=req_data, headers=req_headers,
+        origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
+    if hasattr(req, 'timeout'):
+        new_req.timeout = req.timeout
+    return new_req
+
+
+def dict_get(d, key_or_keys, default=None, skip_false_values=True):
+    if isinstance(key_or_keys, (list, tuple)):
+        for key in key_or_keys:
+            if key not in d or d[key] is None or skip_false_values and not d[key]:
+                continue
+            return d[key]
+        return default
+    return d.get(key_or_keys, default)
  
-def parse_xml(s):
-    class TreeBuilder(xml.etree.ElementTree.TreeBuilder):
-        def doctype(self, name, pubid, system):
-            pass  # Ignore doctypes
  
-    parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder())
-    kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {}
-    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
-    # Fix up XML parser in Python 2.x
-    if sys.version_info < (3, 0):
-        for n in etree_iter(tree):
-            if n.text is not None:
-                if not isinstance(n.text, compat_str):
-                    n.text = n.text.decode('utf-8')
-    return tree
+def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
+    return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
  
  
  US_RATINGS = {
@@ -1580,12 +1844,12 @@ def parse_age_limit(s):
      if s is None:
          return None
      m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
-    return int(m.group('age')) if m else US_RATINGS.get(s, None)
+    return int(m.group('age')) if m else US_RATINGS.get(s)
  
  
  def strip_jsonp(code):
      return re.sub(
-        r'(?s)^[a-zA-Z0-9_]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
+        r'(?s)^[a-zA-Z0-9_.]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
  
  
  def js_to_json(code):
@@ -1594,8 +1858,8 @@ def js_to_json(code):
          if v in ('true', 'false', 'null'):
              return v
          if v.startswith('"'):
-            return v
-        if v.startswith("'"):
+            v = re.sub(r"\\'", "'", v[1:-1])
+        elif v.startswith("'"):
              v = v[1:-1]
              v = re.sub(r"\\\\|\\'|\"", lambda m: {
                  '\\\\': '\\\\',
@@ -1661,13 +1925,37 @@ def args_to_str(args):
      return ' '.join(shlex_quote(a) for a in args)
  
  
+def error_to_compat_str(err):
+    err_str = str(err)
+    # On python 2 error byte string must be decoded with proper
+    # encoding rather than ascii
+    if sys.version_info[0] < 3:
+        err_str = err_str.decode(preferredencoding())
+    return err_str
+
+
  def mimetype2ext(mt):
+    if mt is None:
+        return None
+
+    ext = {
+        'audio/mp4': 'm4a',
+    }.get(mt)
+    if ext is not None:
+        return ext
+
      _, _, res = mt.rpartition('/')
  
      return {
-        'x-ms-wmv': 'wmv',
-        'x-mp4-fragmented': 'mp4',
+        '3gpp': '3gp',
+        'smptett+xml': 'tt',
+        'srt': 'srt',
+        'ttaf+xml': 'dfxp',
          'ttml+xml': 'ttml',
+        'vtt': 'vtt',
+        'x-flv': 'flv',
+        'x-mp4-fragmented': 'mp4',
+        'x-ms-wmv': 'wmv',
      }.get(res, res)
  
  
@@ -1689,6 +1977,10 @@ def urlhandle_detect_ext(url_handle):
      return mimetype2ext(getheader('Content-Type'))
  
  
+def encode_data_uri(data, mime_type):
+    return 'data:%s;base64,%s' % (mime_type, base64.b64encode(data).decode('ascii'))
+
+
  def age_restricted(content_limit, age_limit):
      """ Returns True iff the content should be blocked """
  
@@ -1827,15 +2119,15 @@ def match_filter_func(filter_str):
  
  def parse_dfxp_time_expr(time_expr):
      if not time_expr:
-        return 0.0
+        return
  
      mobj = re.match(r'^(?P<time_offset>\d+(?:\.\d+)?)s?$', time_expr)
      if mobj:
          return float(mobj.group('time_offset'))
  
-    mobj = re.match(r'^(\d+):(\d\d):(\d\d(?:\.\d+)?)$', time_expr)
+    mobj = re.match(r'^(\d+):(\d\d):(\d\d(?:(?:\.|:)\d+)?)$', time_expr)
      if mobj:
-        return 3600 * int(mobj.group(1)) + 60 * int(mobj.group(2)) + float(mobj.group(3))
+        return 3600 * int(mobj.group(1)) + 60 * int(mobj.group(2)) + float(mobj.group(3).replace(':', '.'))
  
  
  def srt_subtitles_timecode(seconds):
@@ -1846,35 +2138,48 @@ def dfxp2srt(dfxp_data):
      _x = functools.partial(xpath_with_ns, ns_map={
          'ttml': 'http://www.w3.org/ns/ttml',
          'ttaf1': 'http://www.w3.org/2006/10/ttaf1',
+        'ttaf1_0604': 'http://www.w3.org/2006/04/ttaf1',
      })
  
-    def parse_node(node):
-        str_or_empty = functools.partial(str_or_none, default='')
+    class TTMLPElementParser(object):
+        out = ''
  
-        out = str_or_empty(node.text)
+        def start(self, tag, attrib):
+            if tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'):
+                self.out += '\n'
  
-        for child in node:
-            if child.tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'):
-                out += '\n' + str_or_empty(child.tail)
-            elif child.tag in (_x('ttml:span'), _x('ttaf1:span'), 'span'):
-                out += str_or_empty(parse_node(child))
-            else:
-                out += str_or_empty(xml.etree.ElementTree.tostring(child))
+        def end(self, tag):
+            pass
  
-        return out
+        def data(self, data):
+            self.out += data
  
-    dfxp = xml.etree.ElementTree.fromstring(dfxp_data.encode('utf-8'))
+        def close(self):
+            return self.out.strip()
+
+    def parse_node(node):
+        target = TTMLPElementParser()
+        parser = xml.etree.ElementTree.XMLParser(target=target)
+        parser.feed(xml.etree.ElementTree.tostring(node))
+        return parser.close()
+
+    dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8'))
      out = []
-    paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall('.//p')
+    paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall(_x('.//ttaf1_0604:p')) or dfxp.findall('.//p')
  
      if not paras:
          raise ValueError('Invalid dfxp/TTML subtitle')
  
      for para, index in zip(paras, itertools.count(1)):
-        begin_time = parse_dfxp_time_expr(para.attrib['begin'])
+        begin_time = parse_dfxp_time_expr(para.attrib.get('begin'))
          end_time = parse_dfxp_time_expr(para.attrib.get('end'))
+        dur = parse_dfxp_time_expr(para.attrib.get('dur'))
+        if begin_time is None:
+            continue
          if not end_time:
-            end_time = begin_time + parse_dfxp_time_expr(para.attrib['dur'])
+            if not dur:
+                continue
+            end_time = begin_time + dur
          out.append('%d\n%s --> %s\n%s\n\n' % (
              index,
              srt_subtitles_timecode(begin_time),
@@ -1884,6 +2189,32 @@ def dfxp2srt(dfxp_data):
      return ''.join(out)
  
  
+def cli_option(params, command_option, param):
+    param = params.get(param)
+    return [command_option, param] if param is not None else []
+
+
+def cli_bool_option(params, command_option, param, true_value='true', false_value='false', separator=None):
+    param = params.get(param)
+    assert isinstance(param, bool)
+    if separator:
+        return [command_option + separator + (true_value if param else false_value)]
+    return [command_option, true_value if param else false_value]
+
+
+def cli_valueless_option(params, command_option, param, expected_value=True):
+    param = params.get(param)
+    return [command_option] if param == expected_value else []
+
+
+def cli_configuration_args(params, param, default=[]):
+    ex_args = params.get(param)
+    if ex_args is None:
+        return default
+    assert isinstance(ex_args, list)
+    return ex_args
+
+
  class ISO639Utils(object):
      # See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
      _lang_map = {
@@ -2365,3 +2696,58 @@ class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
              return None  # No Proxy
          return compat_urllib_request.ProxyHandler.proxy_open(
              self, req, proxy, type)
+
+
+def ohdave_rsa_encrypt(data, exponent, modulus):
+    '''
+    Implement OHDave's RSA algorithm. See http://www.ohdave.com/rsa/
+
+    Input:
+        data: data to encrypt, bytes-like object
+        exponent, modulus: parameter e and N of RSA algorithm, both integer
+    Output: hex string of encrypted data
+
+    Limitation: supports one block encryption only
+    '''
+
+    payload = int(binascii.hexlify(data[::-1]), 16)
+    encrypted = pow(payload, exponent, modulus)
+    return '%x' % encrypted
+
+
+def encode_base_n(num, n, table=None):
+    FULL_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
+    if not table:
+        table = FULL_TABLE[:n]
+
+    if n > len(table):
+        raise ValueError('base %d exceeds table length %d' % (n, len(table)))
+
+    if num == 0:
+        return table[0]
+
+    ret = ''
+    while num:
+        ret = table[num % n] + ret
+        num = num // n
+    return ret
+
+
+def decode_packed_codes(code):
+    mobj = re.search(
+        r"}\('(.+)',(\d+),(\d+),'([^']+)'\.split\('\|'\)",
+        code)
+    obfucasted_code, base, count, symbols = mobj.groups()
+    base = int(base)
+    count = int(count)
+    symbols = symbols.split('|')
+    symbol_table = {}
+
+    while count:
+        count -= 1
+        base_n_count = encode_base_n(count, base)
+        symbol_table[base_n_count] = symbols[count] or base_n_count
+
+    return re.sub(
+        r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
+        obfucasted_code)
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index fa157cadb232c7238206f568986f055d2960073f..8befd9607e109851db1dde4e5a186c590aa8160f 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2015.07.28'
+__version__ = '2016.04.24'
author	Yen Chi Hsuan <yan12125@gmail.com>
	Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)
committer	Yen Chi Hsuan <yan12125@gmail.com>
	Mon, 25 Apr 2016 13:02:02 +0000 (21:02 +0800)